Today we are going to talk about something a little bit different: parallel programming. Parallel programming is a paradigm which, in the recent years, has become extremely popular. With the recent trends in Big Data (huge amounts of data collection), and our algorithms and programs getting more and more complicated, its becoming clear that we need to introduce parallelism - computation in which many calculations are carried out simultaneously - to our programs. This, however, is no simple task.
Parallelization can be implemented in many different ways, and on many levels: bit-level, instruction level, data and task parallelism. Also, a distinction can be made between hardware and software parallelism, but the goal is usually to utilize the possibilities of advanced hardware components: multi-core processors, graphics processors (particularly suitable for this purpose) etc.
From a theoretical aspect, one very important thing which has to be mentioned when talking about parallel computing is the Amdahl's law. Amdahl's law states that the upper bound of performance improvement by using multiple processor cores always depends on the amount of serial code in the program - the portion of the code which cannot be parallelized. Having said this, the next step would be to understand that not every program can be entirely parallelized. In fact, writing parallel algorithms (or rewriting serial ones with the purpose of parallelization) is a demanding task for every programmer. Arguably, it is more difficult than "regular" programming - since it introduces new concepts and new possibilities for error, such as the so-called race conditions.
Another thing that makes learning parallel programming difficult is the number of different models: shared memory, message passing, data parallelism, task parallelism... Discussing each of these models surpasses this article, but it is important to reach a consensus around a particular programming model, because that leads to different parallel computers being built with support for the model, thereby facilitating portability of software. In this sense, programming models are referred to as bridges between hardware and software.
The basic idea of parallel computing refers to the construction of threads: a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. Parallelism is achieved when multiple threads are executed as parts of one process, sharing memory and other resources between themselves. It is usually the programmer's responsibility to determine how many threads will be created and when, which resources they can access, what will be the task the thread should perform etc. This is not always simple and straight-forward, since one must also make sure that the threads are working synchronously and not blocking each other.
There are many different programming constructs and tools to simplify writing parallel programs. In the next article, we will talk about one of them, called OpenMP. OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran, on most platforms, processor architectures and operating systems. It consists of a set of compiler directives, library routines and environment variables that influence run-time behavior. OpenMP enables the programmer to easily parallelize blocks of code by simply adding special keywords without changing the actual serial code. In the next article, we will see examples of using OpenMP in C++, and discuss if we've successfully improved the efficiency of our code.