OpenMP

OpenMP can be used to get your models, simulations and analyses to go faster, either on a multicore desktop, a node of a supercomputing cluster or on an accelerator such as the Xeon Phi or a GPU.

OpenMP is a thread-based standard for parallelism on a shared memory architecture. Advances to the standard are discussed and set out by the OpenMP Architectural Review Board and available on the official OpenMP web site.

OpenMP 4.0 introduced some support for accelerators (such as GPUs and Xeon Phi) and 4.5 extended this support. OpenMP is currently at version 5.2 with version 6 due for release in 2024.

When using OpenMP and you are concerned over performance, it is strongly advised

  • to set OMP_DYNAMIC=false - this will prevent the run time environment from using less than OMP_NUM_THREADS
  • to set OMP_PROC_BIND=true - this will prevent the run time environment moving OpenMP threads between physical cores i.e. the core a thread is first run on, with the core that the thread always runs on. Other options for OMP_PROC_BIND to consider would be "primary" (bind threads to same place (as per value of OMP_PLACES) as the primary thread), "close" (binds threads in places close to parent thread) or "spread" (creates a sparse distribution for a team of threads among the places of the parent's place partition). See OpenMP spec and Stonybrook support page
  • to consider use of OMP_PLACES to set which cores the OpenMP threads will run on. This setting should not be used on a shared node (unless e.g. the scheduler gives each job a subset of nodes). Examples given at Stonybrook support page

The main alternatives to using OpenMP are

  • OpenACC - similar directives-based approach for running code on accelerators
  • CUDA and OpenCL - specific run time libraries and language extensions for running on accelerators. CUDA is specific to the GPUs produced by NVIDIA. OpenCL runs on all GPUs and CPUs (where supported)
  • MPI - allows processes to be distributed across cores, potentially on different nodes. The main player for using more than one compute node.

OpenMP Resources