Tentative Schedule

Technical Program

Workshops Main Conference
Tuesday Wednesday Thursday Friday
H M Honolulu Kahuku Honolulu Kahuku Honolulu Kahuku
07 00
30 Registration Open
08 00
30 Workshop Keynote 1 * :
Dr. Derek Chiou,
Microsoft and The University of Texas at Austin
Keynote 2 * :
Dr. David Richards,
Lawrence Livermore National Laboratory
Keynote 3 * :
Dr. Kate Keahey,
Argonne National Laboratory
09 00
30 Session 01 * :
Best Papers 1
Session 08 * :
Best Papers 2
Session 13 : Scheduling Session 14 : Performance Profiling
10 00
30 Coffee Break
11 00 Workshop Session 02 : Algorithms Session 03 : Big Data and HPC Session 09 : High-Performance Big Data Analytics Session 10 : Virtualisation Session 15 : Leveraging Accelerated Systems Session 16 : Fault Tolerance
30
12 00
30 Lunch
13 00
30
14 00 Workshop Session 04 : Performance and Energy Modeling and Analysis Session 05 : Resource Management and Runtime Systems Panel * Session 17 : Numerical Methods and Libraries Session 18 : Programming and System Software
30
15 00
30 Coffee Break
16 00 Workshop Session 06 : Memory and Networks Session 07 : Virtualisation and I/O Session 11 : Emerging Architectures and Parallel Programming Session 12 : Data Storage and Processing Session 19 : Algorithms and Tools for I/O and Big Data Management Session 20 : Silent Data Corruption
30
17 00
30
18 00 Poster Reception * Banquet *
30
19 00
30

(*) These sessions are held in the Lanai room.

Keynote 1 : Specializing Data Centers using Reconfigurable Logic

Speaker : Dr. Derek Chiou, Microsoft and The University of Texas at Austin


Abstract : Introducing reconfigurable logic into data center servers provides both the benefits of specialized hardware and the convenience of homogeneous hardware. Placing an FPGA in the network path as well as attached to the server via PCIe enables an FPGA-centric computational model, in contrast to the CPU-centric computational model that pervades computing today. In an FPGA-centric model, the FPGA is the first to process each packet and only passes the packets it cannot handle to the CPU that acts as a complexity offload engine. Microsoft has deployed such an architecture throughout its cloud and implements a wide range of capabilities, including deep neural networks and software defined networking acceleration, on it. I will describe Microsoft’s Configurable Cloud, some cases of how it is used, and the resulting performance.

Bio : Derek Chiou is a Partner Architect at Microsoft where he leads the Azure Cloud Silicon team working on FPGAs and ASICs for data center applications and infrastructure and a researcher in the Electrical and Computer Engineering Department at The University of Texas at Austin. His research areas are FPGA acceleration, high-performance computer simulation, rapid system design, computer architecture, parallel computing, Internet router architecture, and network processors. Before going to UT, Dr. Chiou was a system architect and lead the performance modeling team at Avici Systems, a manufacturer of terabit core routers. Dr. Chiou received his Ph.D., S.M. and S.B. degrees in Electrical Engineering and Computer Science from MIT.

Keynote 2 : Dr. David Richards, Lawrence Livermore National Laboratory

Speaker : Dr. David Richards, Lawrence Livermore National Laboratory


Abstract : TBD.

Keynote 3 : At the Crossroads of HPC and Big Data

Speaker : Dr. Kate Keahey, Argonne National Laboratory


Abstract : Experiments, data, and computation have always been inextricably linked and are even more so today. Large experimental instruments, equipped with millions of sensors, and producing hundreds of terabytes of data per experiment will be used more efficiently if extended with a computational facility providing the scientist with ongoing insight into data. This relationship is becoming stronger as recently these sensors have left the lab and started multiplying at large: inexpensive and increasingly sophisticated sensor devices now allow scientists to instrument forests, oceans or cities turning our planet into an “instrument at large” and providing unprecedented opportunities in geophysical, environmental, and social sciences. All this is creating demand to process more data, faster, and produce results in a more timely fashion. This presentation will describe how emergent technology is creating potential for new avenues of exploration and how this potential is translated into new scientific applications – but also new infrastructure requirements and new ideas on how computing can support science. I will give examples of different approaches explored by various scientific application groups and discuss ideas on what we can do to catalyze change in tools and infrastructure – from specific solutions to changes in experimental approach – to support new modes of usage.

Sessions :

Session 1: Best Papers 1 (Areas 1 and 2) :

   Tahsin Reza, Christine Klymko, Geoffrey Sanders, Roger Pearce and Matei Ripeanu. Towards Practical and Robust Labeled Pattern Matching in Trillion-Edge Graphs 

  Sourav Chakraborty, Hari Subramoni and Dhabaleswar Panda. Contention-Aware Kernel-Assisted MPI Collectives for Multi/Many-core Systems   

Session 2: Algorithms (3 Full papers) :

   Willian Barreiros Jr., George Teodoro, Tahsin Kurc, Jun Kong, Alba Cristina M. A. Melo and Joel Saltz. Parallel and Efficient Sensitivity Analysis of Microscopy Image Segmentation Workflows in Hybrid Systems    

   Xiaohui Duan, Kai Xu, Yuandong Chan, Christian Hundt, Bertil Schmidt, Pavan Balaji and Weiguo Liu. S-Aligner: Ultrascalable read mapping on Sunway Taihu Light    

   Bangtian Liu, Chengyao Wen, Anand D. Sarwate and Maryam Mehri Dehnavi. A Unified Optimization Approach for Sparse Tensor Operations on GPUs    

Session 3: Big Data and HPC (2 Full and 2 Short papers) :

  Tao Lu, Eric Suchyta, Dave Pugmire, Jong Choi, Scott Klasky, Qing Liu, Norbert Podhorszki, Mark Ainsworth and Matthew Wolf. Canopus: A Paradigm Shift Towards Elastic Extreme-Scale Data Analytics on HPC Storage    

  Francois Tessier, Venkatram Vishwanath and Emmanuel Jeannot. TAPIOCA: An I/O Library for Optimized Topology-Aware Data Aggregation on Large-Scale Supercomputers    

   Pierre Matri, Yevhen Alforov, Álvaro Brandon, Michael Kuhn, Philip Carns and Thomas Ludwig. Could Blobs Enable Storage-Based Convergence Between HPC and Big Data? 

  Orcun Yildiz, Amelie Chi Zhou and Shadi Ibrahim. Eley: On the Effectiveness of Burst Buffers for Big Data Processing in HPC systems

Session 4: Performance and Energy Modeling and Analysis (3 Full papers) :

   Franz C. Heinrich, Tom Cornebize, Augustin Degomme, Arnaud Legrand, Alexandra Carpen-Amarie, Sascha Hunold, Anne-Cécile Orgerie and Martin Quinson. Predicting the Energy-Consumption of MPI Applications at Scale Using Only a Single Node       Kewen Meng and Boyana Norris. Mira: A Framework for Static Performance Analysis    

   An Huynh and Kenjiro Taura. Delay Spotter: A Tool for Spotting Scheduler-Caused Delays in Task Parallel Runtime Systems    

Session 5: Resource Management and Runtime Systems (2 Full and 1 Short papers) :

  Harald Servat, Antonio J. Peña, Germán Llort, Estanislao Mercadal, Hans-Christian Hoppe and Jesus Labarta. Automating the Application Data Placement in Hybrid Memory Systems  

   Mohammadreza Hoseinyfarahabady, Albert Zomaya and Zahir Tari. Towards QoS-  Contention- Aware Resource Provisioning in Streaming Processing Engine    

   Xiang Ni, Nikhil Jain, Kavitha Chandrasekar and Laxmikant Kale. Runtime Techniques for Programming with Fast and Slow Memory 

Session 6: Memory and Networks (3 Full and 1 Short papers) :

   Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, Hui Wang and Onur Mutlu. Utility-Based Hybrid Memory Management    

   Yingchao Huang and Dong Li. Performance Modeling for Optimal Data Placement on GPU with Heterogeneous Memory Systems     

   Michihiro Koibuchi, Tomohiro Totoki, Hiroki Matsutani, Hideharu Amano, Fabien Chaix, Ikki Fujiwara and Henri Casanova. A Case for Uni-Directional Network Topologies in Large-Scale Clusters    

   Mauro Ianni, Alessandro Pellegrini and Francesco Quaglia. A Wait-free Multi-word Atomic (1,N) Register for Large-scale Data Sharing on Multi-core Machines       

Session 7: Visualization and I/O (3 Full and 1 Short papers) :

   Jianping Li, Misbah Mubarak, Kwan-Liu Ma, Robert Ross and Christopher Carothers. Visual Analytics Techniques for Exploring the Design Space of Large-Scale High-Radix Networks    

  Misbah Mubarak, Philip Carns, Jonathan Jenkins, Jianping Li, Nikhil Jain, Shane Snyder, Robert Ross, Abhinav Bhatele, Chris Carothers and Kwan-Liu Ma. Quantifying I/O and Communication Traffic Interference on Burst Buffer Equipped Dragonfly Networks

   Shaomeng Li, Sudhanshu Sane, Leigh Orf, Pablo Mininni, John Clyne and Hank Child. Spatiotemporal Wavelet Compression for Visualization of Scientific Simulation Data

   Hyungsoo Jung, Sooyong Kang, Hyuck Han, Hyeongwon Jang, Sang Youp Rhee and Jae Eun Kim. AutoBahn: Accelerating Concurrent, Durable File I/O via a Non-Volatile Buffer   

Session 8: Best Papers 2 (Areas 3 and 4) :

   Stratos Dimopoulos, Chandra Krintz and Rich Wolski. JUSTICE: A Deadline-aware, Fair-share Resource Allocator for Implementing Multi-analytics    

  Jaehyun Han, Donghun Koo, Glenn K. Lockwood, Jaehwan Lee, Hyeonsang Eom and Soonwook Hwang. Accelerating a burst buffer via user-level I/O isolation.

Session 9: High Performance Big Data Analytics (3 Full papers) :

   Peng Sun, Yonggang Wen, Ta Nguyen Binh Duong and Xiaokui Xiao. GraphH: High Performance Big Graph Analytics in Small Clusters    

   Sarat Sreepathi, Jitendra Kumar, Forrest Hoffman, Richard Mills, Vamsi Sripathi and William Hargrove. Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers    

   Stefano Iannucci, Hisham A. Kholidy, Amrita Dhakal Ghimire, Rui Jia, Sherif Abdelwahed and Ioana Banicescu. A Comparison of Graph-Based Synthetic Data Generators for Benchmarking Next-Generation Intrusion Detection Systems

Session 10: Virtualization (3 Full papers) :

   Panagiotis Patros, Dayal Dilli, Kenneth Kent and Michael Dawson. Dynamically Compiled Artifact Sharing for Clouds   

   Daeyoun Kang, Tae Joon Jun, Dohyeun Kim, Jaewook Kim and Daeyoung Kim. ConVGPU: GPU Management Middleware in Container Based Virtualized Environment   

   Andrew Younge, Kevin Pedretti, Ryan Grant and Ron Brightwell. Enabling Diverse Software Stacks on Supercomputers using High Performance Virtual Clusters    

Session 11: Emerging Architectures and Parallel Processing (2 Full and 3 Short papers) :

   Vicente Adolfo Bolea Sanchez, Wonbae Kim, Youngmoon Eom, Kibeom Jin, Moohyeon Nam, Deukyeon Hwang, Jik-Soo Kim and Beomseok Nam. EclipseMR: Distributed and Parallel Task Processing with Consistent Hashing   

   Reza Azimi, Tyler Fox and Sherief Reda. Understanding the Role of GPGPU-accelerated SoC-based ARM Clusters    

   Kun Tang, Devesh Tiwari, Saurabh Gupta, Sudharshan Vazhkudai and Xubin He. Effective Running of End-to-end HPC Workflows on Emerging Heterogeneous Architectures   

   Renan Fischer E Silva and Paul Carpenter. High Throughput and Low Latency on Hadoop Clusters using Explicit Congestion Notification: The Untold Truth    

   Hari Subramoni, Xiaoyi Lu and Dhabaleswar Panda. A Scalable Network-Based Performance Analysis Tool for MPI on Large-Scale HPC Systems      

Session 12: Data Storage and Processing (2 Full and 3 Short papers) :

   Houjun Tang, Suren Byna, Bin Dong, Jialin Liu and Quincey Koziol. SoMeta: Scalable Object-centric Metadata Management for High Performance Computing  

   Clement Mommessin, Matthieu Dreher, Bruno Raffin and Tom Peterka. Automatic Data Filtering for In Situ Workflows      

   Hongliang Li, Jie Wu, Zhen Jiang, Xiang Li and Xiaohui Wei. Task Allocation for Stream Processing with Recovery Latency Guarantee 

   Ashish Tapdiya, Yuan Xue and Daniel Fabbri. A Comparative Analysis of Materialized Views Selection and Concurrency Control Mechanisms in NoSQL Databases    

  Hui Sun, Wei Liu, Weisong Shi and Jianzhong Huang. COL-KV: A Collaborative Key-Value Store Using Near-Data Processing   

Session 13: Scheduling (2 Full papers) :

  Jens Gustedt, Emmanuel Jeannot and Farouk Mansouri. Automatic, Abstracted and Portable Topology-Aware Thread Placement   

   Jens Breitbart, Simon Pickartz, Josef Weidendorfer, Stefan Lankes and Antonello Monti. Dynamic Co-scheduling Driven by Main Memory Bandwidth Utilization   

Session 14: Performance Profiling (2 Full papers) :

   Niyazi Sorkunlu, Varun Chandola and Abani Patra. Tracking System Behavior from Resource Usage Data    

   David Boehme, David Beckingsale and Martin Schulz. Flexible Data Aggregation for Performance Profiling        

Session 15: Leveraging Accelerated Systems (3 Full papers) :

  Masahiro Nakao, Hitoshi Murai, Hidetoshi Iwashita, Akihiro Tabuchi, Taisuke Boku and Mitsuhisa Sato. Implementing Lattice QCD Application with XcalableACC Language on Accelerated Cluster    

   Unnikrishnan Cheramangalath, Rupesh Nasre and Y N Srikant. DH-Falcon: A language for large-scale graph processing on Distributed Heterogeneous systems.    

  David Rohr and Volker Lindenstruth. Fast failure erasure encoding using just in time compilation for CPUs, GPUs, and FPGAs    

Session 16: Fault Tolerance (3 Full papers) :

   Omer Subasi, Sriram Krishnamoorthy and Gokcen Kestor. Toward A General Theory of Optimal Checkpoint Placement  

   Shuo Yang, Kai Wu, Yifan Qiao, Dong Li and Jidong Zhai. Algorithm-Directed Crash Consistence in Non-Volatile Memory for HPC    

   Li Han, Louis-Claude Canon, Henri Casanova, Yves Robert and Frédéric Vivien. Checkpointing Workflows for Fail-Stop Errors  

Session 17: Numerical Methods and Libraries (2 Full and 2 Short papers) :

   Matt Martineau and Simon Mcintosh-Smith. Exploring on-node parallelism with neutral, a Monte Carlo neutral particle transport mini-app   

   Matthieu Dreher, Kiran Sasikumar, Subramanian Sankaranarayanan and Tom Peterka. Manala: a Flexible Flow Control Library for Asynchronous Task Communication   

   Balazs Nemeth, Tom Haber and Wim Lamotte. Distributed Affine-Invariant MCMC Sampler    

   Takashi Shimokawabe, Toshio Endo, Naoyuki Onodera and Takayuki Aoki. A Stencil Framework to Realize Large-scale Computations Beyond Device Memory Capacity on GPU Supercomputers    

Session 18: Programming and Systems Software (2 Full and 2 Short papers) :

   Yuping Fan, Paul Rich, William Allcock, Michael Papka and Zhiling Lan. Trade-off between Prediction Accuracy and Underestimation Rate in Job Runtime Estimates    

   Pengfei Zou, Tyler Allen, Clauded Davis, Xizhou Feng and Rong Ge. CLIP: Cluster-Level Intelligent Power Coordination for Power-Bounded Systems

  Tim Suess, Lars Nagel, Marc-Andre Vef, Andre Brinkmann, Dustin Feld and Thomas Soddemann. Pure Functions in C: A Small Keyword for Automatic Parallelization   

   Maruf Ahmed and Albert Zomaya. The Effect of Resource Allocation and System Events on the Consolidated Virtual Machines Performance

Session 19: Algorithms and Tools for I/O and Big Data Management (2 Full and 2 Short papers) :

   Jeremy Logan, Jong Choi, Matthew Wolf, George Ostrouchov, Lipeng Wan, Norbert Podhorszki, William Godoy, Erich Lohrmann, Greg Eisenhauer, Chad Wood, Kevin Huck and Scott Klasky. Extending Skel to support the development and optimization of next generation I/O systems   

   Xinyu Chen, Trilce Estrada and Jeremy Benson. keybin Key-based Binning for Distributed Clustering    

   Zhongqi An, Zhengyu Zhang and Qiang Li. Optimizing the Datapath for Key-value Middleware with NVMe SSDs over RDMA Interconnects   

   Jong Youl Choi, Jeremy Logan, Matthew Wolf, George Ostrouchov, Tahsin Kurc, Gary Liu, Norbert Podhorszki, Scott Klasky, Melissa Romanus, Qian Sun, Manish Parashar, Randy Michael Churchill and Choong-Seock Chang. TGE: Machine Learning Based Task Graph Embedding for Large-scale Topology Mapping   

Session 20: Silent Data Corruption (1 Full and 2 Short papers) :

   Pierre-Louis Guhur, Emil Constantinescu, Debojyoti Ghosh, Tom Peterka and Franck Cappello. Detection of Silent Data Corruption in Adaptive Numerical Integration Solvers   

   Scott Levy, Kurt Ferreira and Patrick Bridges. Detecting and Correcting Silent Corruption of Read-Mostly Application Data 

   Omer Subasi and Sriram Krishnamoorthy. A Gaussian Process Approach for Effective Soft Error Detection   

Workshops :

  • Rev-A : Waianae Room
  • FTS : Honolulu Room
  • HPCMASPA : Kahuku Room
  • WRAp : Oahu Room
  • DOE/MEXT : Waialua Room