Will 2016 be the year high end compute changes?

One's immediate thoughts are probably, "How much faster will things get in 2016?", or perhaps, "How will programming change in the next couple of years?". But one theme will trend increasingly: "How will we minimise the energy footprint of high end compute?"

Known developments

NVIDIA

There will be "Open Power" & ARM64 options for use with NVlink available as of their Pascal GPU range. Whilst Maxwell is out in 2015, it would seem better to bank your money and wait to invest in Pascal.

KNL arrives on the motherboard

60+ Atom cores, low power both in terms of compute grunt but also in terms of energy consumption, it will be interesting to see whether the peak performance deficit to the Pascal GPUs will be immaterial due to Knights Landing being right next to the traditional CPU. KNL will be available as both PCI-e or as on-board co-processor and capable of up to 3 TFLOPS/sec in Double Precision.

Open Power

The Open Power Foundation is IBM opening up the technology of the Power Architecture by open licencing of chip technology to Open Power members. There seems to be some focus on "Web 2.0" and data analytics.

One of the partners is NVIDIA, allowing the use of IBM's Power8 chip with its own GPU (see above). The communications channel between the 2 will be NVIDIA's NVlink. However, IBM is also developing CAPI (coherent accelerator processor interface) for linking over PCI-e between Power8 and any flavour of accelerator. According to HPC Wire, IBM are "working with FPGA makers Xilinx and Altera to show the benefits of a hybrid setup running over the CAPI"

As an example of how the Open Power Foundation may work, NVIDIA will licence their NVlink technology for use by members of the Foundation.

Things to watch out for

But maybe more importantly than the questions above, will be the year's answers to questions such as

  • what will be the total energy to perform a given model simulation
  • memory - how much is immediately available to the cores doing the compute
  • what will be the latency and bandwidth from the compute cores to the main RAM