nvidia – Make Me Engineer

How can I make tensorflow run on a GPU with capability 2.x?

June 14, 2023 by Tarik

Recent GPU versions of tensorflow require compute capability 3.5 or higher (and use cuDNN to access the GPU. cuDNN also requires a GPU of cc3.0 or higher: cuDNN is supported on Windows, Linux and MacOS systems with Pascal, Kepler, Maxwell, Tegra K1 or Tegra X1 GPUs. Kepler = cc3.x Maxwell = cc5.x Pascal = cc6.x … Read more

Cuda kernel returning vectors

May 6, 2023 by Tarik

something like this should work (coded in browser, not tested): // N is the maximum number of structs to insert #define N 10000 typedef struct { int A, B, C; } Match; __device__ Match dev_data[N]; __device__ int dev_count = 0; __device__ int my_push_back(Match * mt) { int insert_pt = atomicAdd(&dev_count, 1); if (insert_pt < N){ … Read more

128 bit integer on cuda?

May 4, 2023 by Tarik

For best performance, one would want to map the 128-bit type on top of a suitable CUDA vector type, such as uint4, and implement the functionality using PTX inline assembly. The addition would look something like this: typedef uint4 my_uint128_t; __device__ my_uint128_t add_uint128 (my_uint128_t addend, my_uint128_t augend) { my_uint128_t res; asm (“add.cc.u32 %0, %4, %8;\n\t” … Read more

How do I select which GPU to run a job on?

April 30, 2023 by Tarik

The problem was caused by not setting the CUDA_VISIBLE_DEVICES variable within the shell correctly. To specify CUDA device 1 for example, you would set the CUDA_VISIBLE_DEVICES using export CUDA_VISIBLE_DEVICES=1 or CUDA_VISIBLE_DEVICES=1 ./cuda_executable The former sets the variable for the life of the current shell, the latter only for the lifespan of that particular executable invocation. … Read more

How do I use Nvidia Multi-process Service (MPS) to run multiple non-MPI CUDA applications?

April 29, 2023 by Tarik

The necessary instructions are contained in the documentation for the MPS service. You’ll note that those instructions don’t really depend on or call out MPI, so there really isn’t anything MPI-specific about them. Here’s a walkthrough/example. Read section 2.3 of the above-linked documentation for various requirements and restrictions. I recommend using CUDA 7, 7.5, or … Read more

nvidia-smi Volatile GPU-Utilization explanation?

November 26, 2022 by Tarik

It is a sampled measurement over a time period. For a given time period, it reports what percentage of time one or more GPU kernel(s) was active (i.e. running). It doesn’t tell you anything about how many SMs were used, or how “busy” the code was, or what it was doing exactly, or in what … Read more

What is a bank conflict? (Doing Cuda/OpenCL programming)

November 26, 2022 by Tarik

For nvidia (and amd for that matter) gpus the local memory is divided into memorybanks. Each bank can only address one dataset at a time, so if a halfwarp tries to load/store data from/to the same bank the access has to be serialized (this is a bank conflict). For gt200 gpus there are 16 banks … Read more

What can I do against ‘CUDA driver version is insufficient for CUDA runtime version’?

November 22, 2022 by Tarik

Update your NVIDIA driver. At the moment you have the driver which only supports CUDA 6 or lower, and you are trying to use the CUDA 7.0 toolkit with it.

Horrible redraw performance of the DataGridView on one of my two screens

November 8, 2022 by Tarik

You just need to make a custom class based off of DataGridView so you can enable its DoubleBuffering. That’s it! class CustomDataGridView: DataGridView { public CustomDataGridView() { DoubleBuffered = true; } } As long as all of my instances of the grid are using this custom version, all is well. If I ever run into … Read more

Understanding CUDA grid dimensions, block dimensions and threads organization (simple explanation) [closed]

November 7, 2022 by Tarik

Hardware If a GPU device has, for example, 4 multiprocessing units, and they can run 768 threads each: then at a given moment no more than 4*768 threads will be really running in parallel (if you planned more threads, they will be waiting their turn). Software threads are organized in blocks. A block is executed … Read more