What is a compiler?
A compiler translates high level (C, C++, FORTRAN, …) programming language into machine code (assembler) for a specific architecture (chipset) i.e. ISA [Instruction Set Architecture], optionally with some optimisation.
Typically, a compiler will write the machine code for the ISA on which it is running, but it can also write machine code for another target processor ISA. For example, if your HPC facility has Intel SandyBridge login nodes but Intel Skylake compute nodes, then you have several choices
- make the most of the increased capabilities of Skylake (wrt SandyBridge) and compile within batch (i.e. on the compute node) as part of a batch job
- make the most of the increased capabilities of Skylake (wrt SandyBridge) and "cross-compile" from the login node targetting Skylake ISA
- miss out on the increased capabilities of Skylake (wrt SandyBridge) and compile on the login node using default flags (thus for SandyBridge ISA) but run on Skylake compute nodes
It is also possible to build an executable with support for more than one ISA (and it chooses the most appropriate at run time based on physical processor).
What compilers are available?
Traditionally, there are discrete compilers for C and FORTRAN for at least the "front end" (transformation of the programming language to an intermediate representation). The "back end" transforms (with potential optimisations) the intermediate representation in to the machine code that will actually be run. In a given toolchain on a specific architecture, the same back end engine may be used on the intermediate representation, irrespective of which front end engine may have generated it.
|Name (that invokes the compiler)||Vendor||Availability|
|gcc||GNU C/C++ compiler||Available at no charge, and standard version ships with Linux distributions|
|gfortran||GNU FORTRAN compiler||Available at no charge but may need to install Linux relevant package|
|icc||Intel C compiler||Requires download/install. Available at no cost for learning & research, otherwise a purchase is required|
|ifort||Intel FORTRAN compiler||Requires download/install. Purchase is required|
|pgc||PGI  C compiler||Requires download/install. Available at no cost for learning & research, otherwise a purchase is required|
|pgfortran||PGI  FORTRAN compiler||Requires download/install. Available at no cost for learning & research, otherwise a purchase is required|
|xlc||IBM C compiler||For IBM chipsets only|
|xlf||IBM FORTRAN compiler||For IBM chipsets only|
Note  that PGI is now owned by NVIDIA
What is an optimising compiler - and what does it optimise?
All main compilers offer some optimisations to minimise or to maximise metric associated with the input source code on a given target architecture. Common optimisations include: minimise the run time, minimise the size of the executable file, minimise the memory usage of the program, minimise the instantaneous power consumption at run time, and minimise the energy consumed during the program's run time. It is usual to only optimise for one of these matrices within a compilation. The time for compilation is usually dependent upon the number of statements within the source code, and the compilation time generally increases when optimisation is required. There may also be issues with numerical stability of results (see below).
The back end can employ a number of transforms and strategies in order to perform some optimisation and there is no guarantee that a global optimal solution is found. The key transformations include:
- loop transformations: re-ordering of the iterators of nested loops , joining loops with few instructions into a single loop or splitting a long loop in to a number of loops. These methods can allow improved memory reuse and accessing, improved use of any available vector units (AVX, SVE for example)
- use of approximations rather than exact calculations, particularly for trigonometric functions
- re-ordering of statements and operations, to improve memory reuse and access, and to improve use of any available vector units
- re-arrangement of statements. For example if there are many occurrences of division by a variable that is not changing, x say, then setting another variable, y=1.0/x and then instead of undertaking division the compiler would multiply by y, since the cost of division is significantly more processor core cycles than a multiplication.
- dead code elimination
- what is automatic parallelisation - and how do I make the most of it?
- how well is my compiler doing - and how can I improve it?
- how much does vectorisation really matter?
- what is the fuss about floating point numbers?
- why would I bother with profiling?