nvidia – Page 2 – Make Me Engineer

Why is NVIDIA Pascal GPUs slow on running CUDA Kernels when using cudaMallocManaged

November 5, 2022 by Tarik

Under CUDA 8 with Pascal GPUs, managed memory data migration under a unified memory (UM) regime will generally occur differently than on previous architectures, and you are experiencing the effects of this. (Also see note at the end about CUDA 9 updated behavior for windows.) With previous architectures (e.g. Maxwell), managed allocations used by a … Read more

How do CUDA blocks/warps/threads map onto CUDA cores?

October 6, 2022 by Tarik

Two of the best references are NVIDIA Fermi Compute Architecture Whitepaper GF104 Reviews I’ll try to answer each of your questions. The programmer divides work into threads, threads into thread blocks, and thread blocks into grids. The compute work distributor allocates thread blocks to Streaming Multiprocessors (SMs). Once a thread block is distributed to a … Read more

How to create NVIDIA OpenCL project

October 5, 2022 by Tarik

The OpenCL Runtime is already included in the Nvidia graphics drivers. You only need the OpenCL C++ header files, the OpenCL.lib file and on Linux also the libOpenCL.so file. These come with the CUDA toolkit, but there is no need to install it only to get the 9 necessary files. Here are the OpenCL C++ … Read more

How to measure the inner kernel time in NVIDIA CUDA?

October 4, 2022 by Tarik

You can do something like this: __global__ void kernelSample(int *runtime) { // …. clock_t start_time = clock(); //some code here clock_t stop_time = clock(); // …. runtime[tidx] = (int)(stop_time – start_time); } Which gives the number of clock cycles between the two calls. Be a little careful though, the timer will overflow after a couple … Read more

What is the correct version of CUDA for my nvidia driver?

August 6, 2022 by Tarik

304.xx is a driver that will support CUDA 5 and previous (does not support newer CUDA versions.) If you want to reinstall ubuntu to create a clean setup, the linux getting started guide has all the instructions needed to set up CUDA if that is your intent. I believe you are picking up a 304.xx … Read more

How is CUDA memory managed?

July 10, 2022 by Tarik

The device memory available to your code at runtime is basically calculated as Free memory = total memory – display driver reservations – CUDA driver reservations – CUDA context static allocations (local memory, constant memory, device code) – CUDA context runtime heap (in kernel allocations, recursive call stack, printf buffer, only on Fermi and newer … Read more

Error Message : Cannot find or open the PDB file

May 24, 2022 by Tarik

The PDB file is a Visual Studio specific file that has the debugging symbols for your project. You can ignore those messages, unless you’re hoping to step into the code for those dlls with the debugger (which is doubtful, as those are system dlls). In other words, you can and should ignore them, as you … Read more

Swing rendering appears broken in JDK 1.8, correct in JDK 1.7

May 21, 2022 by Tarik

For those whose problem has not been solved; try this solution: Set the global environment variable “J2D_D3D” to “false” inside the OS. According to Sun, this setting is used to turn off the Java 2D system’s use of Direct3D in Java 1.4.1_02 and later. ie: simply create a environmental variable with name “J2D_D3D” and value … Read more

How do I choose grid and block dimensions for CUDA kernels?

May 8, 2022 by Tarik

There are two parts to that answer (I wrote it). One part is easy to quantify, the other is more empirical. Hardware Constraints: This is the easy to quantify part. Appendix F of the current CUDA programming guide lists a number of hard limits which limit how many threads per block a kernel launch can … Read more