gpgpu – Page 2 – Make Me Engineer

How to measure the inner kernel time in NVIDIA CUDA?

October 4, 2022 by Tarik

You can do something like this: __global__ void kernelSample(int *runtime) { // …. clock_t start_time = clock(); //some code here clock_t stop_time = clock(); // …. runtime[tidx] = (int)(stop_time – start_time); } Which gives the number of clock cycles between the two calls. Be a little careful though, the timer will overflow after a couple … Read more

Should I unify two similar kernels with an ‘if’ statement, risking performance loss?

July 20, 2022 by Tarik

You have a third alternative, which is to use C++ templating and make the variable which is used in the if/switch statement a template parameter. Instantiate each version of the kernel you need, and then you have multiple kernels doing different things with no branch divergence or conditional evaluation to worry about, because the compiler … Read more

sending 3d array to CUDA kernel

July 17, 2022 by Tarik

First of all, I think talonmies when he posted the response to the previous question you mention, was not intending that to be representative of good coding. So figuring out how to extend it to 3D might not be the best use of your time. For example, why do we want to write programs which … Read more

Modifying registry to increase GPU timeout, windows 7

July 17, 2022 by Tarik

The link in your post is correct, you just need to create the corresponding key with the desired value. You will find the TDR Registry Keys description here. The setting you are looking for is TdrDelay Specifies the number of seconds that the GPU can delay the preempt request from the GPU scheduler. This is … Read more

Utilizing the GPU with c# [closed]

July 13, 2022 by Tarik

[Edit OCT 2017 as even this answer gets quite old] Most of these answers are quite old, so I thought I’d give an updated summary of where I think each project is: GPU.Net (TidePowerd) – I tried this 6 months ago or so, and did get it working though it took a little bit of … Read more

CUDA limit seems to be reached, but what limit is that?

May 25, 2022 by Tarik

The resource which is being exhausted is time. On all current CUDA platforms, the display driver includes a watchdog timer which will kill any kernel which takes more than a few seconds to execute. Running code on a card which is running a display is subject to this limit. On the WDDM Windows platforms you … Read more

Fastest sort of fixed length 6 int array

May 20, 2022 by Tarik

For any optimization, it’s always best to test, test, test. I would try at least sorting networks and insertion sort. If I were betting, I’d put my money on insertion sort based on past experience. Do you know anything about the input data? Some algorithms will perform better with certain kinds of data. For example, … Read more