How is CUDA memory managed?

The device memory available to your code at runtime is basically calculated as Free memory = total memory – display driver reservations – CUDA driver reservations – CUDA context static allocations (local memory, constant memory, device code) – CUDA context runtime heap (in kernel allocations, recursive call stack, printf buffer, only on Fermi and newer … Read more

GPU Emulator for CUDA programming without the hardware [closed]

For those who are seeking the answer in 2016 (and even 2017) … Disclaimer I’ve failed to emulate GPU after all. It might be possible to use gpuocelot if you satisfy its list of dependencies. I’ve tried to get an emulator for BunsenLabs (Linux 3.16.0-4-686-pae #1 SMP Debian 3.16.7-ckt20-1+deb8u4 (2016-02-29) i686 GNU/Linux). I’ll tell you … Read more

Thrust inside user written kernels

As it was originally written, Thrust is purely a host side abstraction. It cannot be used inside kernels. You can pass the device memory encapsulated inside a thrust::device_vector to your own kernel like this: thrust::device_vector< Foo > fooVector; // Do something thrust-y with fooVector Foo* fooArray = thrust::raw_pointer_cast( fooVector.data() ); // Pass raw array and … Read more

CUDA compute capability requirements

CUDA VERSION Min CC Deprecated CC Default CC Max CC 5.5 (and prior) 1.0 N/A 1.0 6.0 1.0 1.0 1.0 6.5 1.1 1.x 2.0 7.x 2.0 N/A 2.0 8.0 2.0 2.x 2.0 6.2 9.x 3.0 N/A 3.0 7.0 10.x 3.0 N/A 3.0 7.5 (3.0 deprecated in 10.2) 11.x 3.5 3.x,5.0 5.2 8.6 (11.0:8.0, 11.1:8.6) (CUDA … Read more

Different CUDA versions shown by nvcc and NVIDIA-smi

CUDA has 2 primary APIs, the runtime and the driver API. Both have a corresponding version (e.g. 8.0, 9.0, etc.) The necessary support for the driver API (e.g. libcuda.so on linux) is installed by the GPU driver installer. The necessary support for the runtime API (e.g. libcudart.so on linux, and also nvcc) is installed by … Read more

Unspecified launch failure on Memcpy

When I compile and run your code, I get: an illegal memory access was encountered-3 printed out. You may indeed be getting “unspecified launch failure” instead. The exact error reporting will depend on CUDA version, GPU, and platform. But we can proceed forward regardless. Either message indicates that the kernel launched but encountered an error, … Read more

What is the canonical way to check for errors using the CUDA runtime API?

Probably the best way to check for errors in runtime API code is to define an assert style handler function and wrapper macro like this: #define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); } inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true) { if (code != cudaSuccess) { fprintf(stderr,”GPUassert: %s %s %d\n”, cudaGetErrorString(code), file, … Read more