Hello - I need to apply multi-threading to a subroutine in python on spark. Any feedback or references w.r.t. getting a fast and efficient outcome would be greatly appreciated. FYI the subroutine involves the creation of a sparse matrix from the input data followed by a logical row reduction within a while do-loop. Pyspark has specific modules but I need to create one. Any feedback on this would be greatly appreciated. On-line docs do not help very much. This is specialized knowledge. I don't want to have to worry about e.g. issues of shared memory that can negatively affect performance. Thank you. Best - Tony Scott

Similar questions and discussions