The third Emerging Tech conference, #EMiT2016, was held in Barcelona at the beginning of June. All papers are available in the conference proceedings alongside the presentations on the conference web site. The tech I’ll focus on here falls in to three areas: avenues of technological innovation, energy efficiency, and advances in techniques.(See our LinkedIn post for more about HEC's support in organising this international conference.)
For this year’s EMiT, there were presentations on pushing the boundaries of hardware that has now become mainstream, such as Margetts and Siso on Intel’s Knights Corner, and Duran on GPUs, with a couple of talks (Moulinec and Meng presented by Emerson) on the newly arrived IBM Power8 with NVIDIA GPUs (albeit without NVlink).
Contemporary work on FPGAs was conspicuously absence, perhaps reflecting the higher challenges of programming effort. Having said that, the presenters in the final session of the conference were linked by their employment of OpenCL, albeit as a platform for solutions of rather different granularities: Perez and the Maat library for load balancing OpenCL kernels, and Dafinoiu using OpenCL to expand the capabilities of a HT Condor (high throughput) computing cluster.
OpenCL and FPGAs also featured in the presentation by Bild on the convergence of HPC and Big Data (also see Michelle Weiland's keynote, below), where she outlined Intel's plans for incorporating FPGA (using OpenCL as the programming model). Bild also gave indicative results from Knights Landing (KNL) showing the high bandwidth memory (MCDRAM) has about 5x faster access than regular memory.
The ~100 people at EMiT2016 also heard various speakers discuss hardware, from Ramirez discussing NVIDIA chipsets in embedded solutions - including the Drive-PX which delivers an astounding 2.3 TFLOPS/secto enable autonomous driving - and from keynote Estela Suarez discussing how the DEEP project successfully used a cluster (Xeons) plus booster (384x Xeon Phi Knights Corner (KNCs)) model, offloading to "booster" to accelerate time to solution, with the KNCs having mesh-like comms via Extoll network interface, giving peak of 500 TFLOPS/sec at 3.5 GFLOPS/sec/Watt. DEEP/-ER extended DEEP by embedding memory in to the booster, specifically with NonVolatile Memory on the Xeon Phi nodes (KNLs) with Network Attached Memory as a node in the mesh-like comms.
Energy efficiency was an important strand running throughout the conference – very topical with Mont-Blanc as host. Calore discussed how the time-to-solution and energy-to-solution vary with clock frequencies of the CPU (or GPU) and its main memory (RAM), with a general trend for faster uses less energy but highlighting interesting exceptions. The general trend gives a "baseline" current when neither the CPU/GPU nor its memory is in use (due to leakage current of the various silicon on the node/mobo), and thus the less time it takes to complete a solution the lower this baseline contributes to the overall energy draw. This was a point also highlighted by keynote Michelle Weiland, EPCC, in her presentation on ADEPT and NextGenIO. Michele outlined how ADEPT project, with 5 partners from across Europe, aims to improve power and energy efficiency whilst maintaining performance, and EPCC's focus on out-of-band, high resolution, high accuracy measurements via their FPGA-based measurement board (and associated suite of benchmarks). Michelle also outlined how the NextGenIO project can enable HPDA (high performance data analysis) by co-design of next gen memory (eg efficient use of 3D XPoint).
There were a couple of talks focused on networking, with Perez looking at Network on Chip and showing OmpSs outperformed pthreads for network performance of selected cases from the Parsecs benchmark suite. Navaridas looked at the networking challenges faced by the router-centric SpiNNaker project, which has ~1 millions cores for neural processing capabilities.
On the techniques side, William Sawyer, CSCS, in his keynote outlined how theoretical limits from memory and CPU/GPU interacted to give the roofline model (Williams) cap for achieved performance, before explaining the real challenge would be how to program efficiently Piz Daint and future exascale machines. Will postulated that this would require use of data centric (aka task based) programming languages such as HPX and Legion. Will pointed out that for longevity it would be the likes of C++ (perhaps MPI) that won out, given HPC is only a small fraction of the larger IT market. On the second day, Lorena Barba, George Washington, in her keynote posed a set of questions regarding how to be a “reproducible” programmer, from consideration of fixed precision floating point arithmetic through an appropriate use (including checking) of libraries (including the checking of the convergence criteria) to open access of all input data and final publications.
EMiT2016 had five keynotes. As well as Estela, Lorena, Michelle and William we have already mentioned, there was Chris Adeniyi-Jones, ARM, who suggested that future tech would involve a holistic approach to chip design considering an application's requirements (of CPU, memory, network) and balancing the energy footprint of these in order to produce a time and energy efficient solution - but (much as Will Sawyer raised with respect to software) the question is whether it is economical to produce the relatively low numbers that HPC would require. One emerging tech to keep an eye on from Chris' talk would be CCIX (Cache Coherence Interconnect for Accelerators), an open framework for coherent sharing irrespective of ISA.