How to measure the inner kernel time in NVIDIA CUDA?

You can do something like this: __global__ void kernelSample(int *runtime) { // …. clock_t start_time = clock(); //some code here clock_t stop_time = clock(); // …. runtime[tidx] = (int)(stop_time – start_time); } Which gives the number of clock cycles between the two calls. Be a little careful though, the timer will overflow after a couple … Read more

Should I unify two similar kernels with an ‘if’ statement, risking performance loss?

You have a third alternative, which is to use C++ templating and make the variable which is used in the if/switch statement a template parameter. Instantiate each version of the kernel you need, and then you have multiple kernels doing different things with no branch divergence or conditional evaluation to worry about, because the compiler … Read more

sending 3d array to CUDA kernel

First of all, I think talonmies when he posted the response to the previous question you mention, was not intending that to be representative of good coding. So figuring out how to extend it to 3D might not be the best use of your time. For example, why do we want to write programs which … Read more

Utilizing the GPU with c# [closed]

[Edit OCT 2017 as even this answer gets quite old] Most of these answers are quite old, so I thought I’d give an updated summary of where I think each project is: GPU.Net (TidePowerd) – I tried this 6 months ago or so, and did get it working though it took a little bit of … Read more