Client Name: Prof. Paul Popelier, The University of Manchester
Client Web / Social Media:
High End Compute has worked on a number of commissions with Professor Popelier, including submission's to Archer eCSE (for various optimisation/parallelisation opportunities) and help supervising post-docs (on ML of quantum chemistry functions). The most recent commissions have taken place since Summer 2018 and focussed on
- mentoring postgraduate student, Ben Symons, during his optimisation and OpenMP parallelisation of FFLUX leading to a 60x speed-up
- improving the "MORFI" computational chemistry code, reducing runs times by 65-80 fold
Whilst separate codes, they each contribute to the advancement of QCT, and further details are given below. A further project is underway to optimise and parallelise IO on another element of the Popelier Group's suite of codes.
FFLUX is a biomolecular force field that stands apart from its peers by virtue of its unique methodology. FFLUX is based on the theoretical framework of quantum chemical topology (QCT) and utilises modern machine learning techniques in the form of kriging models. This union results in an atomistic force field that is fully polarisable, multipolar and flexible and, therefore, highly accurate. The use of machine learning, carefully validated and acting on pre-calculated atomic properties, makes FFLUX achieve accurate predictions at far less computational cost than the first principle calculations it draws it information from. However, due to the higher computational cost than that of traditional force fields, the FFLUX code required significant reductions in run times. With HEC's support, Ben first optimised the code by careful consideration and restructuring, and then applied OpenMP parallelisation.
As per good practice, our parallelisation process involved an implementation of OpenMP at the coarsest level allowed by the science (to reduce overheads at parallel region fork/join). Parallelising at this coarse level meant considering a very deep level of subroutines/functions, with large lexical and dynamic scopes and several hundred variables defined in a collection of FORTRAN 90 module files. The combination of careful (profile guided) code optimisation and OpenMP parallelisation has resulted in a final speed up (on 16 cores of Skylake) of approximately 55-60 times (compared to the unoptimized single core implementation). In real terms, this is a huge success as it means simulations that previously took a month, now take just half a day.
Professor Popelier noted, "High End Compute Ltd's collaboration with the Popelier group on this project was very helpful. PhD student Ben Symons learnt a lot about recommended coding practices and parallelisation. Our in-house software FFLUX is now highly parallelised and much more efficient. Thanks to this support we now have a far quicker turnaround time for our simulations, which means we can focus on obtaining a host of scientific results, and get on with developing the methodology."
A paper discussing further details and full results, including on Intel Knight's Landing (KNL), is in production.
Improving MORFI's Lot
Central to the research of Prof. Popelier's group is Quantum Chemical Topology (QCT), which is an imaginative, minimal and rigorous approach to extract chemical information from quantum systems, at atomistic level.
Three strands of development emerge from these atomic and bond properties:
- the construction of the novel force field FFLUX (via machine learning),
- the identification of subsystem energetically behaving like the total system (i.e. rigorous interpretative chemistry compatible with the underpinning quantum reality),
- and Linear Free Energy Relationships with an eye on bulk property prediction (pKa in aqueous solution, for example).
The need to improve atomistic biomolecular force fields remains acute. Fortunately, the abundance of contemporary computing power enables an overhaul of the architecture of current force fields. Taking advantage of the recent advances of computing power, FFLUX is a fully polarizable, multipolar, and atomistic force field next generation force field that is being built from machine learning (i.e., kriging) models and quantum mechanics. FFLUX deals with dispersive interactions in a very unique manner, as we obtain ab initio atomistic correlation energies through our homemade software MORFI. However, the evaluation of the two-particle density matrix, necessary to run our calculation, can be very expensive. To get atomic correlation energies in reasonable times, on top of state-of-the-art hardware, a highly optimized and parallelized software is also required.
How HEC Helped
MORFI is a FORTRAN code with static allocation of arrays for variables, often with extent of 4 dimensions.
HEC profiled and confirmed a nested DO loop was taking over 95% of the run time for the provided examples. We examined further details and identified opportunities to reduce memory overhead generally and for parallelising these loops using OpenMP (there being a significant cost copying large 4D arrays where there scope is PRIVATE). Working with the research group members we identified potential parallelism at the outermost DO loop level which we implemented by careful use of OpenMP, noting the need for various REDUCTION clauses. The final version has a run time of 69 minutes on Haswell compared to the original time of 56 hours on the same chip set, with parallel efficiency of 86% when using 23 of the available 24 cores.
The Popelier Research Group had previously, due to a mix of the memory requirements and compiling for a given set of nodes, been accustomed to a run time of 3 days and 3 hours on SandyBridge/IvyBridge architectures. By HEC providing appropriate Makefile & batch scripts to not just run but also to first compile on the target compute node, the group has been able to use all of the available architectures in their cluster. The comparison of using Haswell (best times with new code) to SandyBridge/IvyBridge (only previous choice) is a whopping 65x improvement. Similar improvements have been seen for other test example sizes, with one example giving a 87 times improvement.
HEC also showed that the numerical results remained in agreement to at least 11 significant figures, across all architectures and numbers of OpenMP threads.
Feedback from Popelier Research Group
"Thanks to the work and support of HIGH END COMPUTE LTD, our homemade software is now highly parallelized and much less memory hungry. Thanks to these improvements, dispersion energies for much larger systems can be assessed and we are one step closer to our final goal: highly realistic biomolecular simulations using dispersion forces obtained through first principles."
“Molecular Simulation by knowledgeable Quantum Atoms”, P.L.A. Popelier, Phys.Scripta, 91, 033007 (16 pages) (2016).