Tentative Schedule

Technical Program

		Workshops	Main Conference
		Tuesday	Wednesday		Thursday		Friday
07	00
07	30	Registration Open
08	00	Registration Open
08	30	Workshops	Keynote 1 : Dr. Derek Chiou, Microsoft and The University of Texas at Austin		Keynote 2 : Dr. David Richards, Lawrence Livermore National Laboratory		Keynote 3 : Dr. Kate Keahey, Argonne National Laboratory
09	00
09	30		Session 01 : Best Papers 1		Session 08 : Best Papers 2		Session 13 : Scheduling	Session 14 : Performance Profiling
10	00		Session 01 : Best Papers 1		Session 08 : Best Papers 2		Session 13 : Scheduling	Session 14 : Performance Profiling
10	30	Coffee Break
11	00	Workshops	Session 02 : Algorithms	Session 03 : Big Data and HPC	Session 09 : High-Performance Big Data Analytics	Session 10 : Virtualisation	Session 15 : Leveraging Accelerated Systems	Session 16 : Fault Tolerance
11	30
12	00
12	30	Lunch
13	00
13	30
14	00	Workshops	Session 04 : Performance and Energy Modeling and Analysis	Session 05 : Resource Management and Runtime Systems	Extended Lunch		Session 17 : Numerical Methods and Libraries	Session 18 : Programming and System Software
14	30
15	00
15	30	Coffee Break
16	00	Workshops	Session 06 : Memory and Networks	Session 07 : Virtualisation and I/O	Session 11 : Emerging Architectures and Parallel Programming	Session 12 : Data Storage and Processing	Session 19 : Algorithms and Tools for I/O and Big Data Management	Session 20 : Silent Data Corruption
16	30
17	00
17	30
18	00		Poster Reception		Banquet
18	30
19	00
19	30

Room Assignments :

Workshops

Rev-A : Waianae Room
FTS : Honolulu Room
HPCMASPA : Kahuku Room
WRAp : Oahu Room
DOE/MEXT : Waialua Room

Conference Rooms

Lanai

Honolulu

Kahuku

Floor Plan

Keynote 1 : Specializing Data Centers using Reconfigurable Logic

Speaker : Dr. Derek Chiou, Microsoft and The University of Texas at Austin

Chair : Rich Vuduc

Abstract : Introducing reconfigurable logic into data center servers provides both the benefits of specialized hardware and the convenience of homogeneous hardware. Placing an FPGA in the network path as well as attached to the server via PCIe enables an FPGA-centric computational model, in contrast to the CPU-centric computational model that pervades computing today. In an FPGA-centric model, the FPGA is the first to process each packet and only passes the packets it cannot handle to the CPU that acts as a complexity offload engine. Microsoft has deployed such an architecture throughout its cloud and implements a wide range of capabilities, including deep neural networks and software defined networking acceleration, on it. I will describe Microsoft’s Configurable Cloud, some cases of how it is used, and the resulting performance.

Bio : Derek Chiou is a Partner Architect at Microsoft where he leads the Azure Cloud Silicon team working on FPGAs and ASICs for data center applications and infrastructure and a researcher in the Electrical and Computer Engineering Department at The University of Texas at Austin. His research areas are FPGA acceleration, high-performance computer simulation, rapid system design, computer architecture, parallel computing, Internet router architecture, and network processors. Before going to UT, Dr. Chiou was a system architect and lead the performance modeling team at Avici Systems, a manufacturer of terabit core routers. Dr. Chiou received his Ph.D., S.M. and S.B. degrees in Electrical Engineering and Computer Science from MIT.

Keynote 2 : What is a Proxy App and Why Should I Care?

Speaker : Dr. David Richards, Lawrence Livermore National Laboratory

Chair : Todd Gamblin

Abstract : Proxy apps are small, nimble codes that can be used to represent larger applications in situations where it would be difficult or impossible to use the real application. Proxies are used to design and test new computer architectures, software technologies, and programming techniques, and will play a role in selection process of the exascale systems that will be purchased by the Department of Energy (DOE). This talk will discuss the role of proxy apps in DOE’s Exascale Computing Program (ECP) and provide examples of how proxies can be used (and misused) in co-design with vendors, research, and development. We will also describe the ECP Proxy App Project and show how that project can benefit anyone who wants to create or use proxy apps.

Keynote 3 : At the Crossroads of HPC and Big Data

Speaker : Dr. Kate Keahey, Argonne National Laboratory

Chair : Gabriel Antoniu

Abstract : Experiments, data, and computation have always been inextricably linked and are even more so today. Large experimental instruments, equipped with millions of sensors, and producing hundreds of terabytes of data per experiment will be used more efficiently if extended with a computational facility providing the scientist with ongoing insight into data. This relationship is becoming stronger as recently these sensors have left the lab and started multiplying at large: inexpensive and increasingly sophisticated sensor devices now allow scientists to instrument forests, oceans or cities turning our planet into an “instrument at large” and providing unprecedented opportunities in geophysical, environmental, and social sciences. All this is creating demand to process more data, faster, and produce results in a more timely fashion. This presentation will describe how emergent technology is creating potential for new avenues of exploration and how this potential is translated into new scientific applications – but also new infrastructure requirements and new ideas on how computing can support science. I will give examples of different approaches explored by various scientific application groups and discuss ideas on what we can do to catalyze change in tools and infrastructure – from specific solutions to changes in experimental approach – to support new modes of usage.

Full papers have a 30 minute-slot (25-minute talk + questions)
Short papers have a 15 minute-slot (10-minute talk + questions)

Sessions :

Session 1: Best Papers 1

Chair : Gabriel Antoniu

Tahsin Reza, Christine Klymko, Geoffrey Sanders, Roger Pearce and Matei Ripeanu. Towards Practical and Robust Labeled Pattern Matching in Trillion-Edge Graphs
Sourav Chakraborty, Hari Subramoni and Dhabaleswar Panda. Contention-Aware Kernel-Assisted MPI Collectives for Multi/Many-core Systems

Session 2: Algorithms

Chair : Alexandru Costan

Willian Barreiros Jr., George Teodoro, Tahsin Kurc, Jun Kong, Alba Cristina M. A. Melo and Joel Saltz. Parallel and Efficient Sensitivity Analysis of Microscopy Image Segmentation Workflows in Hybrid Systems
Xiaohui Duan, Kai Xu, Yuandong Chan, Christian Hundt, Bertil Schmidt, Pavan Balaji and Weiguo Liu. S-Aligner: Ultrascalable read mapping on Sunway Taihu Light
Bangtian Liu, Chengyao Wen, Anand D. Sarwate and Maryam Mehri Dehnavi. A Unified Optimization Approach for Sparse Tensor Operations on GPUs

Session 3: Big Data and HPC

Chair : Osamu Tatebe

Tao Lu, Eric Suchyta, Dave Pugmire, Jong Choi, Scott Klasky, Qing Liu, Norbert Podhorszki, Mark Ainsworth and Matthew Wolf. Canopus: A Paradigm Shift Towards Elastic Extreme-Scale Data Analytics on HPC Storage
Francois Tessier, Venkatram Vishwanath and Emmanuel Jeannot. TAPIOCA: An I/O Library for Optimized Topology-Aware Data Aggregation on Large-Scale Supercomputers
(Short paper) Pierre Matri, Yevhen Alforov, Álvaro Brandon, Michael Kuhn, Philip Carns and Thomas Ludwig. Could Blobs Enable Storage-Based Convergence Between HPC and Big Data?
(Short paper) Orcun Yildiz, Amelie Chi Zhou and Shadi Ibrahim. Eley: On the Effectiveness of Burst Buffers for Big Data Processing in HPC systems

Session 4: Performance and Energy Modeling and Analysis

Chair : Olga Pearce

Kewen Meng and Boyana Norris. Mira: A Framework for Static Performance Analysis
Franz C. Heinrich, Tom Cornebize, Augustin Degomme, Arnaud Legrand, Alexandra Carpen-Amarie, Sascha Hunold, Anne-Cécile Orgerie and Martin Quinson. Predicting the Energy-Consumption of MPI Applications at Scale Using Only a Single Node
An Huynh and Kenjiro Taura. Delay Spotter: A Tool for Spotting Scheduler-Caused Delays in Task Parallel Runtime Systems

Session 5: Resource Management and Runtime Systems

Chair : Dong Li

Harald Servat, Antonio J. Peña, Germán Llort, Estanislao Mercadal, Hans-Christian Hoppe and Jesus Labarta. Automating the Application Data Placement in Hybrid Memory Systems
Mohammadreza Hoseinyfarahabady, Albert Zomaya and Zahir Tari. Towards QoS- Contention- Aware Resource Provisioning in Streaming Processing Engine
(Short paper) Xiang Ni, Nikhil Jain, Kavitha Chandrasekar and Laxmikant Kale. Runtime Techniques for Programming with Fast and Slow Memory

Session 6: Memory and Networks

Chair : Bronis R. de Supinski

Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, Hui Wang and Onur Mutlu. Utility-Based Hybrid Memory Management
Yingchao Huang and Dong Li. Performance Modeling for Optimal Data Placement on GPU with Heterogeneous Memory Systems
Michihiro Koibuchi, Tomohiro Totoki, Hiroki Matsutani, Hideharu Amano, Fabien Chaix, Ikki Fujiwara and Henri Casanova. A Case for Uni-Directional Network Topologies in Large-Scale Clusters
(Short paper) Mauro Ianni, Alessandro Pellegrini and Francesco Quaglia. A Wait-free Multi-word Atomic (1,N) Register for Large-scale Data Sharing on Multi-core Machines

Session 7: Visualization and I/O

Chair : Toni Cortes

Jianping Li, Misbah Mubarak, Kwan-Liu Ma, Robert Ross and Christopher Carothers. Visual Analytics Techniques for Exploring the Design Space of Large-Scale High-Radix Networks
Misbah Mubarak, Philip Carns, Jonathan Jenkins, Jianping Li, Nikhil Jain, Shane Snyder, Robert Ross, Abhinav Bhatele, Chris Carothers and Kwan-Liu Ma. Quantifying I/O and Communication Traffic Interference on Burst Buffer Equipped Dragonfly Networks
Shaomeng Li, Sudhanshu Sane, Leigh Orf, Pablo Mininni, John Clyne and Hank Child. Spatiotemporal Wavelet Compression for Visualization of Scientific Simulation Data
(Short paper) Hyungsoo Jung, Sooyong Kang, Hyuck Han, Hyeongwon Jang, Sang Youp Rhee and Jae Eun Kim. AutoBahn: Accelerating Concurrent, Durable File I/O via a Non-Volatile Buffer

Session 8: Best Papers 2

Chair : Rich Vuduc

Stratos Dimopoulos, Chandra Krintz and Rich Wolski. JUSTICE: A Deadline-aware, Fair-share Resource Allocator for Implementing Multi-analytics
Jaehyun Han, Donghun Koo, Glenn K. Lockwood, Jaehwan Lee, Hyeonsang Eom and Soonwook Hwang. Accelerating a burst buffer via user-level I/O isolation.

Session 9: High Performance Big Data Analytics

Chair : Pierre Matri

Peng Sun, Yonggang Wen, Ta Nguyen Binh Duong and Xiaokui Xiao. GraphH: High Performance Big Graph Analytics in Small Clusters
Sarat Sreepathi, Jitendra Kumar, Forrest Hoffman, Richard Mills, Vamsi Sripathi and William Hargrove. Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers
Stefano Iannucci, Hisham A. Kholidy, Amrita Dhakal Ghimire, Rui Jia, Sherif Abdelwahed and Ioana Banicescu. A Comparison of Graph-Based Synthetic Data Generators for Benchmarking Next-Generation Intrusion Detection Systems

Session 10: Virtualization

Chair : Balazs Gerofi

Panagiotis Patros, Dayal Dilli, Kenneth Kent and Michael Dawson. Dynamically Compiled Artifact Sharing for Clouds
Daeyoun Kang, Tae Joon Jun, Dohyeun Kim, Jaewook Kim and Daeyoung Kim. ConVGPU: GPU Management Middleware in Container Based Virtualized Environment
Andrew Younge, Kevin Pedretti, Ryan Grant and Ron Brightwell. Enabling Diverse Software Stacks on Supercomputers using High Performance Virtual Clusters

Session 11: Emerging Architectures and Parallel Processing

Chair : Frank Mueller

Vicente Adolfo Bolea Sanchez, Wonbae Kim, Youngmoon Eom, Kibeom Jin, Moohyeon Nam, Deukyeon Hwang, Jik-Soo Kim and Beomseok Nam. EclipseMR: Distributed and Parallel Task Processing with Consistent Hashing
Reza Azimi, Tyler Fox and Sherief Reda. Understanding the Role of GPGPU-accelerated SoC-based ARM Clusters
(Short paper) Kun Tang, Devesh Tiwari, Saurabh Gupta, Sudharshan Vazhkudai and Xubin He. Effective Running of End-to-end HPC Workflows on Emerging Heterogeneous Architectures
(Short paper) Renan Fischer E Silva and Paul Carpenter. High Throughput and Low Latency on Hadoop Clusters using Explicit Congestion Notification: The Untold Truth
(Short paper) Hari Subramoni, Xiaoyi Lu and Dhabaleswar Panda. A Scalable Network-Based Performance Analysis Tool for MPI on Large-Scale HPC Systems

Session 12: Data Storage and Processing

Chair : Maria S. Perez

Houjun Tang, Suren Byna, Bin Dong, Jialin Liu and Quincey Koziol. SoMeta: Scalable Object-centric Metadata Management for High Performance Computing
Clement Mommessin, Matthieu Dreher, Bruno Raffin and Tom Peterka. Automatic Data Filtering for In Situ Workflows
(Short paper) Hongliang Li, Jie Wu, Zhen Jiang, Xiang Li and Xiaohui Wei. Task Allocation for Stream Processing with Recovery Latency Guarantee
(Short paper) Ashish Tapdiya, Yuan Xue and Daniel Fabbri. A Comparative Analysis of Materialized Views Selection and Concurrency Control Mechanisms in NoSQL Databases

Session 13: Scheduling

Chair : Sriram Krishnamoorthy <ul>

Jens Gustedt, Emmanuel Jeannot and Farouk Mansouri. Automatic, Abstracted and Portable Topology-Aware Thread Placement

Jens Breitbart, Simon Pickartz, Josef Weidendorfer, Stefan Lankes and Antonello Monti. Dynamic Co-scheduling Driven by Main Memory Bandwidth Utilization

</ul>

Session 14: Performance Profiling

Chair : Yves Robert

Niyazi Sorkunlu, Varun Chandola and Abani Patra. Tracking System Behavior from Resource Usage Data
David Boehme, David Beckingsale and Martin Schulz. Flexible Data Aggregation for Performance Profiling

Session 15: Leveraging Accelerated Systems

Chair : Misbah Mubarak

Masahiro Nakao, Hitoshi Murai, Hidetoshi Iwashita, Akihiro Tabuchi, Taisuke Boku and Mitsuhisa Sato. Implementing Lattice QCD Application with XcalableACC Language on Accelerated Cluster
Unnikrishnan Cheramangalath, Rupesh Nasre and Y N Srikant. DH-Falcon: A language for large-scale graph processing on Distributed Heterogeneous systems.
David Rohr and Volker Lindenstruth. Fast failure erasure encoding using just in time compilation for CPUs, GPUs, and FPGAs

Session 16: Fault Tolerance

Chair : Frederic Vivien

Omer Subasi, Sriram Krishnamoorthy and Gokcen Kestor. Toward A General Theory of Optimal Checkpoint Placement
Shuo Yang, Kai Wu, Yifan Qiao, Dong Li and Jidong Zhai. Algorithm-Directed Crash Consistence in Non-Volatile Memory for HPC
Li Han, Louis-Claude Canon, Henri Casanova, Yves Robert and Frédéric Vivien. Checkpointing Workflows for Fail-Stop Errors

Session 17: Numerical Methods and Libraries

Chair : Olga Pierce

Matt Martineau and Simon Mcintosh-Smith. Exploring on-node parallelism with neutral, a Monte Carlo neutral particle transport mini-app
Matthieu Dreher, Kiran Sasikumar, Subramanian Sankaranarayanan and Tom Peterka. Manala: a Flexible Flow Control Library for Asynchronous Task Communication
(Short paper) Balazs Nemeth, Tom Haber and Wim Lamotte. Distributed Affine-Invariant MCMC Sampler
(Short paper) Takashi Shimokawabe, Toshio Endo, Naoyuki Onodera and Takayuki Aoki. A Stencil Framework to Realize Large-scale Computations Beyond Device Memory Capacity on GPU Supercomputers

Session 18: Programming and Systems Software

Chair : Sriram Krishnamoorthy

Yuping Fan, Paul Rich, William Allcock, Michael Papka and Zhiling Lan. Trade-off between Prediction Accuracy and Underestimation Rate in Job Runtime Estimates
Pengfei Zou, Tyler Allen, Clauded Davis, Xizhou Feng and Rong Ge. CLIP: Cluster-Level Intelligent Power Coordination for Power-Bounded Systems
(Short paper) Tim Suess, Lars Nagel, Marc-Andre Vef, Andre Brinkmann, Dustin Feld and Thomas Soddemann. Pure Functions in C: A Small Keyword for Automatic Parallelization
(Short paper) Maruf Ahmed and Albert Zomaya. The Effect of Resource Allocation and System Events on the Consolidated Virtual Machines Performance

Session 19: Algorithms and Tools for I/O and Big Data Management

Chair : Naoya Maruyama

Jeremy Logan, Jong Choi, Matthew Wolf, George Ostrouchov, Lipeng Wan, Norbert Podhorszki, William Godoy, Erich Lohrmann, Greg Eisenhauer, Chad Wood, Kevin Huck and Scott Klasky. Extending Skel to support the development and optimization of next generation I/O systems
Xinyu Chen, Trilce Estrada and Jeremy Benson. keybin Key-based Binning for Distributed Clustering
(Short paper) Zhongqi An, Zhengyu Zhang and Qiang Li. Optimizing the Datapath for Key-value Middleware with NVMe SSDs over RDMA Interconnects
(Short paper) Jong Youl Choi, Jeremy Logan, Matthew Wolf, George Ostrouchov, Tahsin Kurc, Gary Liu, Norbert Podhorszki, Scott Klasky, Melissa Romanus, Qian Sun, Manish Parashar, Randy Michael Churchill and Choong-Seock Chang. TGE: Machine Learning Based Task Graph Embedding for Large-scale Topology Mapping

Session 20: Silent Data Corruption

Chair : Christian Engelmann

Pierre-Louis Guhur, Emil Constantinescu, Debojyoti Ghosh, Tom Peterka and Franck Cappello. Detection of Silent Data Corruption in Adaptive Numerical Integration Solvers
(Short paper) Scott Levy, Kurt Ferreira and Patrick Bridges. Detecting and Correcting Silent Corruption of Read-Mostly Application Data
(Short paper) Omer Subasi and Sriram Krishnamoorthy. A Gaussian Process Approach for Effective Soft Error Detection