09 September 2017 0 7K Report

I have been working with Python's package for sparse matrices (scipy.sparse) which boast a number of sparseness structures: CSR, CSC, LIL, COO, etc... I find that certain operations are faster in particular structures than others.

In practice CSR structure is the fastest for e.g. row or column sums. However, it is terribly slow for zeroising entire rows of a sparse CSR matrix or for assigning an entire row. It requires dedicate routines like csr_matrix.eliminate_zeros() to all zeroes from the CSR matrix.

For zeroising and row assignment, LIL is faster but there is a cost to the conversion.

From the web, I managed to find dedicated routines for fast zeroising a CSR matrix, e.g.:

https://stackoverflow.com/questions/19784868/what-is-most-efficient-way-of-setting-row-to-zeros-for-a-sparse-scipy-matrix

and row assignment, e.g.:

https://stackoverflow.com/questions/28427236/set-row-of-csr-matrix

They basically work for the tasks at hand but they muck around with the internal CSR structure using "indptr".  You often have to "fix" the output by another call to csr_matrix() on the output matrix before proceeding to other tasks. I would like a better more reliable way of fast zeroising and row assignment of a CSR matrix!

Is there any book out there or doc or forum or simply advice which explains these pitfalls in detail and hopefully ofter solutions to them?

Similar questions and discussions