I think I’ve recreated the csr
row indexing with:
def extractor(indices, N):
indptr=np.arange(len(indices)+1)
data=np.ones(len(indices))
shape=(len(indices),N)
return sparse.csr_matrix((data,indices,indptr), shape=shape)
Testing on a csr
I had hanging around:
In [185]: M
Out[185]:
<30x40 sparse matrix of type '<class 'numpy.float64'>'
with 76 stored elements in Compressed Sparse Row format>
In [186]: indices=np.r_[0:20]
In [187]: M[indices,:]
Out[187]:
<20x40 sparse matrix of type '<class 'numpy.float64'>'
with 57 stored elements in Compressed Sparse Row format>
In [188]: extractor(indices, M.shape[0])*M
Out[188]:
<20x40 sparse matrix of type '<class 'numpy.float64'>'
with 57 stored elements in Compressed Sparse Row format>
As with a number of other csr
methods, it uses matrix multiplication to produce the final value. In this case with a sparse matrix with 1 in selected rows. Time is actually a bit better.
In [189]: timeit M[indices,:]
1000 loops, best of 3: 515 µs per loop
In [190]: timeit extractor(indices, M.shape[0])*M
1000 loops, best of 3: 399 µs per loop
In your case the extractor matrix is (len(training_indices),347) in shape, with only len(training_indices)
values. So it is not big.
But if the matrix
is so large (or at least the 2nd dimension so big) that it produces some error in the matrix multiplication routines, it could give rise to segmentation fault without python/numpy trapping it.
Does matrix.sum(axis=1)
work. That too uses a matrix multiplication, though with a dense matrix of 1s. Or sparse.eye(347)*M
, a similar size matrix multiplication?