In the last article we gave a short introduction to parallelisation - we've mentioned different levels of parallelism, different mechanisms and also, things to consider when writing parallel programs. Unfortunately, successful parallelization of a program (or writing a parallel program from the beginning) can often prove to be a difficult task. This is why many tools were developed to aid the programmer in these tasks: one of such tools is OpenMP (Open Multi-Processing).
OpenMP gives programmers a simple and flexible interface for developing parallel applications for platforms ranging from the standard desktop computer to the supercomputer. It is a C/C++/Fortran compiler extension that allows to add parallelism into existing source code without having to rewrite it entirely. Most compilers support OpenMP, and building OpenMP C++ code is as simple as adding the "-fopenmp" option in the linking process.
The easiest way to demonstrate the power and simplicity of OpenMP is by giving an example:
#include <cmath>
int main()
{
const int size = 256;
double array[size];
#pragma omp parallel for
for(int i = 0; i < size; ++i)
array[i] = std::sin(2 * M_PI * i / size);
}
Let's discuss this example: the code is simply initializing the values of an array. The calculations performed are relatively demanding, which will allow us to measure performances of the serial and parallel versions. If you ignore the
#pragma omp parallel for
line, the code looks exactly the same as the serial version. This is one of the greatest advantages of OpenMP. This code divides the array initialization into multiple threads, which are run simultaneously. Each thread initializes a portion of the array, and all of this is done with only one line of code. Another obvious advantage of OpenMP is its high level of abstraction: you do not see how each thread is created and initialized; you don't see a function declaration for the code each thread executes and you also don't know exactly how the array is divided between the threads. In most other parallel programming systems, you will have to write code for all of this and more.
Note: When interpreting OpenMP code, your compiler will most likely determine the number of threads it will instantiate based on the number of CPU cores. For example, a 4-core CPU can, theoretically, improve the performance 4 times. We encourage the reader to experiment with this; measure the performances of serial and parallel versions and compare them. This shouldn't be too difficult - to get the serial version of your OpenMP program, just comment the lines starting with #pragma omp
.
OpenMP also enables the programmer to get and set the number of threads with these library functions:
omp_get_thread_num()
omp_set_num_threads()
Consider this simple example:
for(int i = 0; i < n; ++i)
{
sum += array[i];
}
Here, we have to have some kind of a mechanism of protecting the correctness of the result: since OpenMP is a shared-memory model, threads would always override the value of
sum
, producing a wrong result. This is elegantly solved in OpenMP, with the
reduction
clause:
#pragma omp parallel for reduction(+:sum)
{
for(int i = 0; i < n; ++i)
{
sum += array[i];
}
}
Here, + is the
reduction operand (some other reduction operands in C++ are -, *, | etc).
OpenMP provides many more powerful concepts for C++ parallel programming - this is just the beggining, which will hopefully inspire readers to learn more about it.