What is the purpose of the “PAUSE” instruction in x86?

Just imagine, how the processor would execute a typical spin-wait loop: 1 Spin_Lock: 2 CMP lockvar, 0 ; Check if lock is free 3 JE Get_Lock 4 JMP Spin_Lock 5 Get_Lock: After a few iterations the branch predictor will predict that the conditional branch (3) will never be taken and the pipeline will fill with … Read more

Processing vec in parallel: how to do safely, or without using unstable features?

Today the rayon crate is the de facto standard for this sort of thing: use rayon::prelude::*; fn main() { let mut data = vec![1, 2, 3]; data.par_iter_mut() .enumerate() .for_each(|(i, x)| *x = 10 + i as u32); assert_eq!(vec![10, 11, 12], data); } Note that this is just one line different from the single-threaded version using … Read more

MPI: blocking vs non-blocking

Blocking communication is done using MPI_Send() and MPI_Recv(). These functions do not return (i.e., they block) until the communication is finished. Simplifying somewhat, this means that the buffer passed to MPI_Send() can be reused, either because MPI saved it somewhere, or because it has been received by the destination. Similarly, MPI_Recv() returns when the receive … Read more

OpenMP program is slower than sequential one

The random number generator rand(3) uses global state variables (hidden in the (g)libc implementation). Access to them from multiple threads leads to cache issues and also is not thread safe. You should use the rand_r(3) call with seed parameter private to the thread: long i; unsigned seed; #pragma omp parallel private(seed) { // Initialise the … Read more

tech