Technical program

Accepted Papers

Full papers:

Area 1: Application, Algorithms, and Libraries

298 Willian Barreiros Jr., George Teodoro, Tahsin Kurc, Jun Kong, Alba Cristina M. A. Melo and Joel Saltz. Parallel and Efficient Sensitivity Analysis of Microscopy Image Segmentation Workflows in Hybrid Systems

22 Xiaohui Duan, Kai Xu, Yuandong Chan, Christian Hundt, Bertil Schmidt, Pavan Balaji and Weiguo Liu. S-Aligner: Ultrascalable read mapping on Sunway Taihu Light

27 Tahsin Reza, Christine Klymko, Geoffrey Sanders, Roger Pearce and Matei Ripeanu. Towards Practical and Robust Labeled Pattern Matching in Trillion-Edge Graphs

40 Matthieu Dreher, Kiran Sasikumar, Subramanian Sankaranarayanan and Tom Peterka. Manala: a Flexible Flow Control Library for Asynchronous Task Communication

173 Franz C. Heinrich, Tom Cornebize, Augustin Degomme, Arnaud Legrand, Alexandra Carpen-Amarie, Sascha Hunold, Anne-Cécile Orgerie and Martin Quinson. Predicting the Energy-Consumption of MPI Applications at Scale Using Only a Single Node

267 David Boehme, David Beckingsale and Martin Schulz. Flexible Data Aggregation for Performance Profiling

54 Bangtian Liu, Chengyao Wen, Anand D. Sarwate and Maryam Mehri Dehnavi. A Unified Optimization Approach for Sparse Tensor Operations on GPUs

79 Peng Sun, Yonggang Wen, Ta Nguyen Binh Duong and Xiaokui Xiao. GraphH: High Performance Big Graph Analytics in Small Clusters

200 Matt Martineau and Simon Mcintosh-Smith. Exploring on-node parallelism with neutral, a Monte Carlo neutral particle transport mini-app

174 Jeremy Logan, Jong Choi, Matthew Wolf, George Ostrouchov, Lipeng Wan, Norbert Podhorszki, William Godoy, Erich Lohrmann, Greg Eisenhauer, Chad Wood, Kevin Huck and Scott Klasky. Extending Skel to support the development and optimization of next generation I/O systems

121 Sarat Sreepathi, Jitendra Kumar, Forrest Hoffman, Richard Mills, Vamsi Sripathi and William Hargrove. Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers

167 Kewen Meng and Boyana Norris. Mira: A Framework for Static Performance Analysis

217 Xinyu Chen, Trilce Estrada and Jeremy Benson. keybin Key-based Binning for Distributed Clustering

127 Niyazi Sorkunlu, Varun Chandola and Abani Patra. Tracking System Behavior from Resource Usage Data

135 An Huynh and Kenjiro Taura. Delay Spotter: A Tool for Spotting Scheduler-Caused Delays in Task Parallel Runtime Systems

6 Stefano Iannucci, Hisham A. Kholidy, Amrita Dhakal Ghimire, Rui Jia, Sherif Abdelwahed and Ioana Banicescu. A Comparison of Graph-Based Synthetic Data Generators for Benchmarking Next-Generation Intrusion Detection Systems

286 David Rohr and Volker Lindenstruth. Fast failure erasure encoding using just in time compilation for CPUs, GPUs, and FPGAs

Area 2: Architecture, Network/Communications, and Management

133 Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, Hui Wang and Onur Mutlu. Utility-Based Hybrid Memory Management

229 Sourav Chakraborty, Hari Subramoni and Dhabaleswar Panda. Contention-Aware Kernel-Assisted MPI Collectives for Multi/Many-core Systems

30 Yingchao Huang and Dong Li. Performance Modeling for Optimal Data Placement on GPU with Heterogeneous Memory Systems

153 Michihiro Koibuchi, Tomohiro Totoki, Hiroki Matsutani, Hideharu Amano, Fabien Chaix, Ikki Fujiwara and Henri Casanova. A Case for Uni-Directional Network Topologies in Large-Scale Clusters

183 Vicente Adolfo Bolea Sanchez, Wonbae Kim, Youngmoon Eom, Kibeom Jin, Moohyeon Nam, Deukyeon Hwang, Jik-Soo Kim and Beomseok Nam. EclipseMR: Distributed and Parallel Task Processing with Consistent Hashing

226 Reza Azimi, Tyler Fox and Sherief Reda. Understanding the Role of GPGPU-accelerated SoC-based ARM Clusters

Area 3: Programming and System Software

120 Stratos Dimopoulos, Chandra Krintz and Rich Wolski. JUSTICE: A Deadline-aware, Fair-share Resource Allocator for Implementing Multi-analytics

44 Harald Servat, Antonio J. Peña, Germán Llort, Estanislao Mercadal, Hans-Christian Hoppe and Jesus Labarta. Automating the Application Data Placement in Hybrid Memory Systems

107 Panagiotis Patros, Dayal Dilli, Kenneth Kent and Michael Dawson. Dynamically Compiled Artifact Sharing for Clouds

140 Yuping Fan, Paul Rich, William Allcock, Michael Papka and Zhiling Lan. Trade-off between Prediction Accuracy and Underestimation Rate in Job Runtime Estimates

158 Daeyoun Kang, Tae Joon Jun, Dohyeun Kim, Jaewook Kim and Daeyoung Kim. ConVGPU: GPU Management Middleware in Container Based Virtualized Environment

213 Pierre-Louis Guhur, Emil Constantinescu, Debojyoti Ghosh, Tom Peterka and Franck Cappello. Detection of Silent Data Corruption in Adaptive Numerical Integration Solvers

32 Shuo Yang, Kai Wu, Yifan Qiao, Dong Li and Jidong Zhai. Algorithm-Directed Crash Consistence in Non-Volatile Memory for HPC

34 Mohammadreza Hoseinyfarahabady, Albert Zomaya and Zahir Tari. Towards QoS-  Contention- Aware Resource Provisioning in Streaming Processing Engine

141 Masahiro Nakao, Hitoshi Murai, Hidetoshi Iwashita, Akihiro Tabuchi, Taisuke Boku and Mitsuhisa Sato. Implementing Lattice QCD Application with XcalableACC Language on Accelerated Cluster

155 Unnikrishnan Cheramangalath, Rupesh Nasre and Y N Srikant. DH-Falcon: A language for large-scale graph processing on Distributed Heterogeneous systems.

161 Andrew Younge, Kevin Pedretti, Ryan Grant and Ron Brightwell. Enabling Diverse Software Stacks on Supercomputers using High Performance Virtual Clusters

297 Omer Subasi, Sriram Krishnamoorthy and Gokcen Kestor. Toward A General Theory of Optimal Checkpoint Placement

310 Jens Breitbart, Simon Pickartz, Josef Weidendorfer, Stefan Lankes and Antonello Monti. Dynamic Co-scheduling Driven by Main Memory Bandwidth Utilization

36 Li Han, Louis-Claude Canon, Henri Casanova, Yves Robert and Frédéric Vivien. Checkpointing Workflows for Fail-Stop Errors

278 Jens Gustedt, Emmanuel Jeannot and Farouk Mansouri. Automatic, Abstracted and Portable Topology-Aware Thread Placement

292 Pengfei Zou, Tyler Allen, Clauded Davis, Xizhou Feng and Rong Ge. CLIP: Cluster-Level Intelligent Power Coordination for Power-Bounded Systems

Area 4: Data, Storage, and Visualization

103 Jaehyun Han, Donghun Koo, Glenn K. Lockwood, Jaehwan Lee, Hyeonsang Eom and Soonwook Hwang. Accelerating a burst buffer via user-level I/O isolation

109 Misbah Mubarak, Philip Carns, Jonathan Jenkins, Jianping Li, Nikhil Jain, Shane Snyder, Robert Ross, Abhinav Bhatele, Chris Carothers and Kwan-Liu Ma. Quantifying I/O and Communication Traffic Interference on Burst Buffer Equipped Dragonfly Networks

152 Francois Tessier, Venkatram Vishwanath and Emmanuel Jeannot. TAPIOCA: An I/O Library for Optimized Topology-Aware Data Aggregation on Large-Scale Supercomputers

129 Jianping Li, Misbah Mubarak, Kwan-Liu Ma, Robert Ross and Christopher Carothers. Visual Analytics Techniques for Exploring the Design Space of Large-Scale High-Radix Networks

24 Tao Lu, Eric Suchyta, Dave Pugmire, Jong Choi, Scott Klasky, Qing Liu, Norbert Podhorszki, Mark Ainsworth and Matthew Wolf. Canopus: A Paradigm Shift Towards Elastic Extreme-Scale Data Analytics on HPC Storage

72 Shaomeng Li, Sudhanshu Sane, Leigh Orf, Pablo Mininni, John Clyne and Hank Child. Spatiotemporal Wavelet Compression for Visualization of Scientific Simulation Data

249 Houjun Tang, Suren Byna, Bin Dong, Jialin Liu and Quincey Koziol. SoMeta: Scalable Object-centric Metadata Management for High Performance Computing

196 Clement Mommessin, Matthieu Dreher, Bruno Raffin and Tom Peterka. Automatic Data Filtering for In Situ Workflows

Short Papers

Area 1: Application, Algorithms, and Libraries

37 Balazs Nemeth, Tom Haber and Wim Lamotte. Distributed Affine-Invariant MCMC Sampler

102 Zhongqi An, Zhengyu Zhang and Qiang Li. Optimizing the Datapath for Key-value Middleware with NVMe SSDs over RDMA Interconnects

142 Takashi Shimokawabe, Toshio Endo, Naoyuki Onodera and Takayuki Aoki. A Stencil Framework to Realize Large-scale Computations Beyond Device Memory Capacity on GPU Supercomputers

206 Jong Youl Choi, Jeremy Logan, Matthew Wolf, George Ostrouchov, Tahsin Kurc, Gary Liu, Norbert Podhorszki, Scott Klasky, Melissa Romanus, Qian Sun, Manish Parashar, Randy Michael Churchill and Choong-Seock Chang. TGE: Machine Learning Based Task Graph Embedding for Large-scale Topology Mapping

Area 2: Architecture, Network/Communications, and Management

280 Mauro Ianni, Alessandro Pellegrini and Francesco Quaglia. A Wait-free Multi-word Atomic (1,N) Register for Large-scale Data Sharing on Multi-core Machines

67 Kun Tang, Devesh Tiwari, Saurabh Gupta, Sudharshan Vazhkudai and Xubin He. Effective Running of End-to-end HPC Workflows on Emerging Heterogeneous Architectures

190 Renan Fischer E Silva and Paul Carpenter. High Throughput and Low Latency on Hadoop Clusters using Explicit Congestion Notification: The Untold Truth

266 Hari Subramoni, Xiaoyi Lu and Dhabaleswar Panda. A Scalable Network-Based Performance Analysis Tool for MPI on Large-Scale HPC Systems

Area 3: Programming and System Software

26 Omer Subasi and Sriram Krishnamoorthy. A Gaussian Process Approach for Effective Soft Error Detection

97 Tim Suess, Lars Nagel, Marc-Andre Vef, Andre Brinkmann, Dustin Feld and Thomas Soddemann. Pure Functions in C: A Small Keyword for Automatic Parallelization

268 Xiang Ni, Nikhil Jain, Kavitha Chandrasekar and Laxmikant Kale. Runtime Techniques for Programming with Fast and Slow Memory

4 Maruf Ahmed and Albert Zomaya. The Effect of Resource Allocation and System Events on the Consolidated Virtual Machines Performance

228 Scott Levy, Kurt Ferreira and Patrick Bridges. Detecting and Correcting Silent Corruption of Read-Mostly Application Data

180 Nathan Hjelm, Matthew Dosanjh, Taylor Groves, Ryan E. Grant, Ron Brightwell, Patrick Bridges and Dorian Arnold. MPI Multi-threaded RMA Communication Performance for Next-Generation Applications

Area 4: Data, Storage, and Visualization

18 Pierre Matri, Yevhen Alforov, Álvaro Brandon, Michael Kuhn, Philip Carns and Thomas Ludwig. Could Blobs Enable Storage-Based Convergence Between HPC and Big Data?

88 Hyungsoo Jung, Sooyong Kang, Hyuck Han, Hyeongwon Jang, Sang Youp Rhee and Jae Eun Kim. AutoBahn: Accelerating Concurrent, Durable File I/O via a Non-Volatile Buffer

271 Orcun Yildiz, Amelie Chi Zhou and Shadi Ibrahim. Eley: On the Effectiveness of Burst Buffers for Big Data Processing in HPC systems

46 Hongliang Li, Jie Wu, Zhen Jiang, Xiang Li and Xiaohui Wei. Task Allocation for Stream Processing with Recovery Latency Guarantee

82 Ashish Tapdiya, Yuan Xue and Daniel Fabbri. A Comparative Analysis of Materialized Views Selection and Concurrency Control Mechanisms in NoSQL Databases

137 Hui Sun, Wei Liu, Weisong Shi and Jianzhong Huang. COL-KV: A Collaborative Key-Value Store Using Near-Data Processing

Program coming soon...