Performance Tools

Our top tips on using analysis and debugging tools. Analysis is typically profiling and tracing but we believe includes use of compiler reports and other sources of information. (Also see our Wikipedia entry on HPC Benchmarks.)

Benchmarking Systems

Methodology

Compilers

Often overlooked, compilers are very powerful assets to coders.

  • Always use the latest (stable?) version available eg going from ifort v12 to ifort v14 gives about 5% improvement in performance
  • Most compilers today will provide reports on optimisations and shared memory parallelisation. For example, with Intel compilers, check out -opt-report and -par-report
  • gap analysis, feedback guided optimisation
  • array bounds checks

Vectorise!

All modern microprocessors have vector units. For example, current Intel chips have 512 bit wide vector units. It is important to make good use of these elements in order to get high performance.

Intel compiler vectorisation

It is important to use the relevant -x or -ax option. For example,

  • COMMON-AVX512: the 'base' vectorisation for Intel AVX-512 processors
  • MIC-AVX512 is for Knights Landing (and successors). Includes COMMON-AVX512 plus pre-fetch, FP exponential & reciprocal vector optimisations for KNL
  • CORE-AVX512 is for Intel CPUs and includes COMMON-AVX512 plus additional integer, byte & word vector instructions

OpenMP 4.0 Directives

Check out options such as

  • SIMD
  • CONTIGUOUS

Other useful links

Analysing GPU Codes

MPI Analysis

Useful tools include:

Hardware Counters

Benchmarks

Running a benchmark is only part of the process in understanding system performance. Deeper understanding of why peak performance is not achieved is key to tuning libraries and codes for a given architecture.