Tentative Schedule

Technical Program

Workshops Main Conference
Tuesday Wednesday Thursday Friday
07 00
30 Registration Open
08 00
30 Workshops Keynote 1 :
Dr. Derek Chiou,
Microsoft and The University of Texas at Austin
Keynote 2 :
Dr. David Richards,
Lawrence Livermore National Laboratory
Keynote 3 :
Dr. Kate Keahey,
Argonne National Laboratory
09 00
30 Session 01 :
Best Papers 1
Session 08 :
Best Papers 2
Session 13 : Scheduling Session 14 : Performance Profiling
10 00
30 Coffee Break
11 00 Workshops Session 02 : Algorithms Session 03 : Big Data and HPC Session 09 : High-Performance Big Data Analytics Session 10 : Virtualisation Session 15 : Leveraging Accelerated Systems Session 16 : Fault Tolerance
30
12 00
30 Lunch
13 00
30
14 00 Workshops Session 04 : Performance and Energy Modeling and Analysis Session 05 : Resource Management and Runtime Systems Extended Lunch Session 17 : Numerical Methods and Libraries Session 18 : Programming and System Software
30
15 00
30 Coffee Break
16 00 Workshops Session 06 : Memory and Networks Session 07 : Virtualisation and I/O Session 11 : Emerging Architectures and Parallel Programming Session 12 : Data Storage and Processing Session 19 : Algorithms and Tools for I/O and Big Data Management Session 20 : Silent Data Corruption
30
17 00
30
18 00 Poster Reception Banquet
30
19 00
30

Room Assignments :

Workshops

Conference Rooms

Lanai
Honolulu
Kahuku
Floor Plan

Keynote 1 : Specializing Data Centers using Reconfigurable Logic

Speaker : Dr. Derek Chiou, Microsoft and The University of Texas at Austin
Chair : Rich Vuduc

Abstract : Introducing reconfigurable logic into data center servers provides both the benefits of specialized hardware and the convenience of homogeneous hardware. Placing an FPGA in the network path as well as attached to the server via PCIe enables an FPGA-centric computational model, in contrast to the CPU-centric computational model that pervades computing today. In an FPGA-centric model, the FPGA is the first to process each packet and only passes the packets it cannot handle to the CPU that acts as a complexity offload engine. Microsoft has deployed such an architecture throughout its cloud and implements a wide range of capabilities, including deep neural networks and software defined networking acceleration, on it. I will describe Microsoft’s Configurable Cloud, some cases of how it is used, and the resulting performance.

Bio : Derek Chiou is a Partner Architect at Microsoft where he leads the Azure Cloud Silicon team working on FPGAs and ASICs for data center applications and infrastructure and a researcher in the Electrical and Computer Engineering Department at The University of Texas at Austin. His research areas are FPGA acceleration, high-performance computer simulation, rapid system design, computer architecture, parallel computing, Internet router architecture, and network processors. Before going to UT, Dr. Chiou was a system architect and lead the performance modeling team at Avici Systems, a manufacturer of terabit core routers. Dr. Chiou received his Ph.D., S.M. and S.B. degrees in Electrical Engineering and Computer Science from MIT.

Keynote 2 : What is a Proxy App and Why Should I Care?

Speaker : Dr. David Richards, Lawrence Livermore National Laboratory
Chair : Todd Gamblin

Abstract : Proxy apps are small, nimble codes that can be used to represent larger applications in situations where it would be difficult or impossible to use the real application. Proxies are used to design and test new computer architectures, software technologies, and programming techniques, and will play a role in selection process of the exascale systems that will be purchased by the Department of Energy (DOE). This talk will discuss the role of proxy apps in DOE’s Exascale Computing Program (ECP) and provide examples of how proxies can be used (and misused) in co-design with vendors, research, and development. We will also describe the ECP Proxy App Project and show how that project can benefit anyone who wants to create or use proxy apps.

Keynote 3 : At the Crossroads of HPC and Big Data

Speaker : Dr. Kate Keahey, Argonne National Laboratory
Chair : Gabriel Antoniu

Abstract : Experiments, data, and computation have always been inextricably linked and are even more so today. Large experimental instruments, equipped with millions of sensors, and producing hundreds of terabytes of data per experiment will be used more efficiently if extended with a computational facility providing the scientist with ongoing insight into data. This relationship is becoming stronger as recently these sensors have left the lab and started multiplying at large: inexpensive and increasingly sophisticated sensor devices now allow scientists to instrument forests, oceans or cities turning our planet into an “instrument at large” and providing unprecedented opportunities in geophysical, environmental, and social sciences. All this is creating demand to process more data, faster, and produce results in a more timely fashion. This presentation will describe how emergent technology is creating potential for new avenues of exploration and how this potential is translated into new scientific applications – but also new infrastructure requirements and new ideas on how computing can support science. I will give examples of different approaches explored by various scientific application groups and discuss ideas on what we can do to catalyze change in tools and infrastructure – from specific solutions to changes in experimental approach – to support new modes of usage.

  • Full papers have a 30 minute-slot (25-minute talk + questions)
  • Short papers have a 15 minute-slot (10-minute talk + questions)

Sessions :


Session 1: Best Papers 1

Chair : Gabriel Antoniu

  • Tahsin Reza, Christine Klymko, Geoffrey Sanders, Roger Pearce and Matei Ripeanu. Towards Practical and Robust Labeled Pattern Matching in Trillion-Edge Graphs
  • Sourav Chakraborty, Hari Subramoni and Dhabaleswar Panda. Contention-Aware Kernel-Assisted MPI Collectives for Multi/Many-core Systems
Session 2: Algorithms

Chair : Alexandru Costan

  • Willian Barreiros Jr., George Teodoro, Tahsin Kurc, Jun Kong, Alba Cristina M. A. Melo and Joel Saltz. Parallel and Efficient Sensitivity Analysis of Microscopy Image Segmentation Workflows in Hybrid Systems
  • Xiaohui Duan, Kai Xu, Yuandong Chan, Christian Hundt, Bertil Schmidt, Pavan Balaji and Weiguo Liu. S-Aligner: Ultrascalable read mapping on Sunway Taihu Light
  • Bangtian Liu, Chengyao Wen, Anand D. Sarwate and Maryam Mehri Dehnavi. A Unified Optimization Approach for Sparse Tensor Operations on GPUs
Session 3: Big Data and HPC

Chair : Osamu Tatebe

  • Tao Lu, Eric Suchyta, Dave Pugmire, Jong Choi, Scott Klasky, Qing Liu, Norbert Podhorszki, Mark Ainsworth and Matthew Wolf. Canopus: A Paradigm Shift Towards Elastic Extreme-Scale Data Analytics on HPC Storage
  • Francois Tessier, Venkatram Vishwanath and Emmanuel Jeannot. TAPIOCA: An I/O Library for Optimized Topology-Aware Data Aggregation on Large-Scale Supercomputers
  • (Short paper) Pierre Matri, Yevhen Alforov, Álvaro Brandon, Michael Kuhn, Philip Carns and Thomas Ludwig. Could Blobs Enable Storage-Based Convergence Between HPC and Big Data?
  • (Short paper) Orcun Yildiz, Amelie Chi Zhou and Shadi Ibrahim. Eley: On the Effectiveness of Burst Buffers for Big Data Processing in HPC systems
Session 4: Performance and Energy Modeling and Analysis

Chair : Olga Pearce

  • Kewen Meng and Boyana Norris. Mira: A Framework for Static Performance Analysis
  • Franz C. Heinrich, Tom Cornebize, Augustin Degomme, Arnaud Legrand, Alexandra Carpen-Amarie, Sascha Hunold, Anne-Cécile Orgerie and Martin Quinson. Predicting the Energy-Consumption of MPI Applications at Scale Using Only a Single Node
  • An Huynh and Kenjiro Taura. Delay Spotter: A Tool for Spotting Scheduler-Caused Delays in Task Parallel Runtime Systems
Session 5: Resource Management and Runtime Systems

Chair : Dong Li

  • Harald Servat, Antonio J. Peña, Germán Llort, Estanislao Mercadal, Hans-Christian Hoppe and Jesus Labarta. Automating the Application Data Placement in Hybrid Memory Systems
  • Mohammadreza Hoseinyfarahabady, Albert Zomaya and Zahir Tari. Towards QoS-  Contention- Aware Resource Provisioning in Streaming Processing Engine
  • (Short paper) Xiang Ni, Nikhil Jain, Kavitha Chandrasekar and Laxmikant Kale. Runtime Techniques for Programming with Fast and Slow Memory
Session 6: Memory and Networks

Chair : Bronis R. de Supinski

  • Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, Hui Wang and Onur Mutlu. Utility-Based Hybrid Memory Management
  • Yingchao Huang and Dong Li. Performance Modeling for Optimal Data Placement on GPU with Heterogeneous Memory Systems
  • Michihiro Koibuchi, Tomohiro Totoki, Hiroki Matsutani, Hideharu Amano, Fabien Chaix, Ikki Fujiwara and Henri Casanova. A Case for Uni-Directional Network Topologies in Large-Scale Clusters
  • (Short paper) Mauro Ianni, Alessandro Pellegrini and Francesco Quaglia. A Wait-free Multi-word Atomic (1,N) Register for Large-scale Data Sharing on Multi-core Machines
Session 7: Visualization and I/O

Chair : Toni Cortes

  • Jianping Li, Misbah Mubarak, Kwan-Liu Ma, Robert Ross and Christopher Carothers. Visual Analytics Techniques for Exploring the Design Space of Large-Scale High-Radix Networks
  • Misbah Mubarak, Philip Carns, Jonathan Jenkins, Jianping Li, Nikhil Jain, Shane Snyder, Robert Ross, Abhinav Bhatele, Chris Carothers and Kwan-Liu Ma. Quantifying I/O and Communication Traffic Interference on Burst Buffer Equipped Dragonfly Networks
  • Shaomeng Li, Sudhanshu Sane, Leigh Orf, Pablo Mininni, John Clyne and Hank Child. Spatiotemporal Wavelet Compression for Visualization of Scientific Simulation Data
  • (Short paper) Hyungsoo Jung, Sooyong Kang, Hyuck Han, Hyeongwon Jang, Sang Youp Rhee and Jae Eun Kim. AutoBahn: Accelerating Concurrent, Durable File I/O via a Non-Volatile Buffer
Session 8: Best Papers 2

Chair : Rich Vuduc

  • Stratos Dimopoulos, Chandra Krintz and Rich Wolski. JUSTICE: A Deadline-aware, Fair-share Resource Allocator for Implementing Multi-analytics
  • Jaehyun Han, Donghun Koo, Glenn K. Lockwood, Jaehwan Lee, Hyeonsang Eom and Soonwook Hwang. Accelerating a burst buffer via user-level I/O isolation.
Session 9: High Performance Big Data Analytics

Chair : Pierre Matri

  • Peng Sun, Yonggang Wen, Ta Nguyen Binh Duong and Xiaokui Xiao. GraphH: High Performance Big Graph Analytics in Small Clusters
  • Sarat Sreepathi, Jitendra Kumar, Forrest Hoffman, Richard Mills, Vamsi Sripathi and William Hargrove. Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers
  • Stefano Iannucci, Hisham A. Kholidy, Amrita Dhakal Ghimire, Rui Jia, Sherif Abdelwahed and Ioana Banicescu. A Comparison of Graph-Based Synthetic Data Generators for Benchmarking Next-Generation Intrusion Detection Systems
Session 10: Virtualization

Chair : Balazs Gerofi

  • Panagiotis Patros, Dayal Dilli, Kenneth Kent and Michael Dawson. Dynamically Compiled Artifact Sharing for Clouds
  • Daeyoun Kang, Tae Joon Jun, Dohyeun Kim, Jaewook Kim and Daeyoung Kim. ConVGPU: GPU Management Middleware in Container Based Virtualized Environment
  • Andrew Younge, Kevin Pedretti, Ryan Grant and Ron Brightwell. Enabling Diverse Software Stacks on Supercomputers using High Performance Virtual Clusters
Session 11: Emerging Architectures and Parallel Processing

Chair : Frank Mueller

  • Vicente Adolfo Bolea Sanchez, Wonbae Kim, Youngmoon Eom, Kibeom Jin, Moohyeon Nam, Deukyeon Hwang, Jik-Soo Kim and Beomseok Nam. EclipseMR: Distributed and Parallel Task Processing with Consistent Hashing
  • Reza Azimi, Tyler Fox and Sherief Reda. Understanding the Role of GPGPU-accelerated SoC-based ARM Clusters
  • (Short paper) Kun Tang, Devesh Tiwari, Saurabh Gupta, Sudharshan Vazhkudai and Xubin He. Effective Running of End-to-end HPC Workflows on Emerging Heterogeneous Architectures
  • (Short paper) Renan Fischer E Silva and Paul Carpenter. High Throughput and Low Latency on Hadoop Clusters using Explicit Congestion Notification: The Untold Truth
  • (Short paper) Hari Subramoni, Xiaoyi Lu and Dhabaleswar Panda. A Scalable Network-Based Performance Analysis Tool for MPI on Large-Scale HPC Systems
Session 12: Data Storage and Processing

Chair : Maria S. Perez

  • Houjun Tang, Suren Byna, Bin Dong, Jialin Liu and Quincey Koziol. SoMeta: Scalable Object-centric Metadata Management for High Performance Computing
  • Clement Mommessin, Matthieu Dreher, Bruno Raffin and Tom Peterka. Automatic Data Filtering for In Situ Workflows
  • (Short paper) Hongliang Li, Jie Wu, Zhen Jiang, Xiang Li and Xiaohui Wei. Task Allocation for Stream Processing with Recovery Latency Guarantee
  • (Short paper) Ashish Tapdiya, Yuan Xue and Daniel Fabbri. A Comparative Analysis of Materialized Views Selection and Concurrency Control Mechanisms in NoSQL Databases
Session 13: Scheduling

Chair : Sriram Krishnamoorthy <ul>

  • Jens Gustedt, Emmanuel Jeannot and Farouk Mansouri. Automatic, Abstracted and Portable Topology-Aware Thread Placement
  • Jens Breitbart, Simon Pickartz, Josef Weidendorfer, Stefan Lankes and Antonello Monti. Dynamic Co-scheduling Driven by Main Memory Bandwidth Utilization

    </ul>

    Session 14: Performance Profiling

    Chair : Yves Robert

    • Niyazi Sorkunlu, Varun Chandola and Abani Patra. Tracking System Behavior from Resource Usage Data
    • David Boehme, David Beckingsale and Martin Schulz. Flexible Data Aggregation for Performance Profiling
    Session 15: Leveraging Accelerated Systems

    Chair : Misbah Mubarak

    • Masahiro Nakao, Hitoshi Murai, Hidetoshi Iwashita, Akihiro Tabuchi, Taisuke Boku and Mitsuhisa Sato. Implementing Lattice QCD Application with XcalableACC Language on Accelerated Cluster
    • Unnikrishnan Cheramangalath, Rupesh Nasre and Y N Srikant. DH-Falcon: A language for large-scale graph processing on Distributed Heterogeneous systems.
    • David Rohr and Volker Lindenstruth. Fast failure erasure encoding using just in time compilation for CPUs, GPUs, and FPGAs
    Session 16: Fault Tolerance

    Chair : Frederic Vivien

    • Omer Subasi, Sriram Krishnamoorthy and Gokcen Kestor. Toward A General Theory of Optimal Checkpoint Placement
    • Shuo Yang, Kai Wu, Yifan Qiao, Dong Li and Jidong Zhai. Algorithm-Directed Crash Consistence in Non-Volatile Memory for HPC
    • Li Han, Louis-Claude Canon, Henri Casanova, Yves Robert and Frédéric Vivien. Checkpointing Workflows for Fail-Stop Errors
    Session 17: Numerical Methods and Libraries

    Chair : Olga Pierce

    • Matt Martineau and Simon Mcintosh-Smith. Exploring on-node parallelism with neutral, a Monte Carlo neutral particle transport mini-app
    • Matthieu Dreher, Kiran Sasikumar, Subramanian Sankaranarayanan and Tom Peterka. Manala: a Flexible Flow Control Library for Asynchronous Task Communication
    • (Short paper) Balazs Nemeth, Tom Haber and Wim Lamotte. Distributed Affine-Invariant MCMC Sampler
    • (Short paper) Takashi Shimokawabe, Toshio Endo, Naoyuki Onodera and Takayuki Aoki. A Stencil Framework to Realize Large-scale Computations Beyond Device Memory Capacity on GPU Supercomputers
    Session 18: Programming and Systems Software

    Chair : Sriram Krishnamoorthy

    • Yuping Fan, Paul Rich, William Allcock, Michael Papka and Zhiling Lan. Trade-off between Prediction Accuracy and Underestimation Rate in Job Runtime Estimates
    • Pengfei Zou, Tyler Allen, Clauded Davis, Xizhou Feng and Rong Ge. CLIP: Cluster-Level Intelligent Power Coordination for Power-Bounded Systems
    • (Short paper) Tim Suess, Lars Nagel, Marc-Andre Vef, Andre Brinkmann, Dustin Feld and Thomas Soddemann. Pure Functions in C: A Small Keyword for Automatic Parallelization
    • (Short paper) Maruf Ahmed and Albert Zomaya. The Effect of Resource Allocation and System Events on the Consolidated Virtual Machines Performance
    Session 19: Algorithms and Tools for I/O and Big Data Management

    Chair : Naoya Maruyama

    • Jeremy Logan, Jong Choi, Matthew Wolf, George Ostrouchov, Lipeng Wan, Norbert Podhorszki, William Godoy, Erich Lohrmann, Greg Eisenhauer, Chad Wood, Kevin Huck and Scott Klasky. Extending Skel to support the development and optimization of next generation I/O systems
    • Xinyu Chen, Trilce Estrada and Jeremy Benson. keybin Key-based Binning for Distributed Clustering
    • (Short paper) Zhongqi An, Zhengyu Zhang and Qiang Li. Optimizing the Datapath for Key-value Middleware with NVMe SSDs over RDMA Interconnects
    • (Short paper) Jong Youl Choi, Jeremy Logan, Matthew Wolf, George Ostrouchov, Tahsin Kurc, Gary Liu, Norbert Podhorszki, Scott Klasky, Melissa Romanus, Qian Sun, Manish Parashar, Randy Michael Churchill and Choong-Seock Chang. TGE: Machine Learning Based Task Graph Embedding for Large-scale Topology Mapping
    Session 20: Silent Data Corruption

    Chair : Christian Engelmann

    • Pierre-Louis Guhur, Emil Constantinescu, Debojyoti Ghosh, Tom Peterka and Franck Cappello. Detection of Silent Data Corruption in Adaptive Numerical Integration Solvers
    • (Short paper) Scott Levy, Kurt Ferreira and Patrick Bridges. Detecting and Correcting Silent Corruption of Read-Mostly Application Data
    • (Short paper) Omer Subasi and Sriram Krishnamoorthy. A Gaussian Process Approach for Effective Soft Error Detection