I'm working on a Python project that performs heavy calculations, and I’m trying to optimize performance using parallel processing. So far,with the use of joblib performance has improved significantly.
However, when I try to use multiprocessing.pool for parallelism instead of joblib, it actually slows down the computation, making it slower than the non-parallel version.
Using profiling tools, I discovered that a large portion of the runtime when i use multiprocessing.pool is spent on Windows-specific _winapi calls:
so i want to know that is there any specific reason for multiprocess.pool being slower or is there any workarounds to reduce _winapi overhead?
Are there any specific techniques or libraries in Python that provide better control over CPU and memory allocation in parallel processing, especially for heavy computations?
Python's built-in libraries are limited in handling resource allocation efficiently on Windows, are there external tools or alternative approaches for managing CPU and memory resources more effectively?