How performing multiple matrix multiplications in CUDA?
I think it’s likely that the fastest performance will be achieved by using the CUBLAS batch gemm function which was specifically designed for this purpose (performing a large number of “relatively small” matrix-matrix multiply operations). Even though you want to multiply your array of matrices (M[]) by a single matrix (N), the batch gemm function … Read more