parallel-processing – Make Me Engineer

What is the purpose of the “PAUSE” instruction in x86?

April 27, 2023 by Tarik

Just imagine, how the processor would execute a typical spin-wait loop: 1 Spin_Lock: 2 CMP lockvar, 0 ; Check if lock is free 3 JE Get_Lock 4 JMP Spin_Lock 5 Get_Lock: After a few iterations the branch predictor will predict that the conditional branch (3) will never be taken and the pipeline will fill with … Read more

Processing vec in parallel: how to do safely, or without using unstable features?

November 22, 2022 by Tarik

Today the rayon crate is the de facto standard for this sort of thing: use rayon::prelude::*; fn main() { let mut data = vec![1, 2, 3]; data.par_iter_mut() .enumerate() .for_each(|(i, x)| *x = 10 + i as u32); assert_eq!(vec![10, 11, 12], data); } Note that this is just one line different from the single-threaded version using … Read more

Julia: How to copy data to another processor in Julia

October 11, 2022 by Tarik

I didn’t know how to do this at first, so I spent some time figuring it out. Here are some functions I wrote to pass objects: sendto Send an arbitrary number of variables to specified processes. New variables are created in the Main module on specified processes. The name will be the key of the … Read more

MPI: blocking vs non-blocking

October 7, 2022 by Tarik

Blocking communication is done using MPI_Send() and MPI_Recv(). These functions do not return (i.e., they block) until the communication is finished. Simplifying somewhat, this means that the buffer passed to MPI_Send() can be reused, either because MPI saved it somewhere, or because it has been received by the destination. Similarly, MPI_Recv() returns when the receive … Read more

GNU parallel not working at all

September 3, 2022 by Tarik

As I was about to complete writing this question, I ran parallel –version to report the version, only to find: WARNING: YOU ARE USING –tollef. IF THINGS ARE ACTING WEIRD USE –gnu. It is not clear to me why that flag is set by default. Needless to say, using –gnu worked! Thought I would post … Read more

OpenMP program is slower than sequential one

July 8, 2022 by Tarik

The random number generator rand(3) uses global state variables (hidden in the (g)libc implementation). Access to them from multiple threads leads to cache issues and also is not thread safe. You should use the rand_r(3) call with seed parameter private to the thread: long i; unsigned seed; #pragma omp parallel private(seed) { // Initialise the … Read more

Optimal number of threads per core

May 11, 2022 by Tarik

If your threads don’t do I/O, synchronization, etc., and there’s nothing else running, 1 thread per core will get you the best performance. However that very likely not the case. Adding more threads usually helps, but after some point, they cause some performance degradation. Not long ago, I was doing performance testing on a 2 … Read more