How can I reduce the computation time of a code written in Python?

Boris L. Glebov Popular answer

Simply switching to C/C++/Matlab is not, usually, a good solution, unless the researcher is already skilled in those languages. Especially C/C++. The time they will spend just trying to find their way around will overwhelm any computational speed gains they might make.

In any case, Python's computational libraries (numpy and scipy) are based on C code anyway, so a dramatic increase in computational speed depends on the level of optimization, and that's much more dependent on the user's skill with a particular language rather than the language itself.

A couple recommendations for code optimization:

1. Use the built-in multiprocessing library to parallelize computation. I recommend using map.async method, in case the computational sub-tasks don't all take the exact same amount of time to finish. Generally, this works if your model involves lots of computations that are independent from each other. For example, Monte Carlo simulations are perfectly suitable for this approach.

2. Use pre-computation. For example, if you are doing combinatorics, there will be lots of pre-factors that are integer-only, and therefore there is a finite number of combinations you will actually need. Simply compute a table of these values once. There is a memory vs. speed trade-off here, but memory is usually plentiful. Just make sure you don't break your memory limits. Even pre-computing factors like square root of pi can save you time (square roots are very slow).

2a. Carefully look at your computation procedure and try to find chunks of values that can be calculated once and re-used. For example, some function optimization routines rely on computing the function's Jacobian. But all these derivatives look very similar, especially if your function has any exponential terms. So you can compute the exponential term once, and then use it in every element of the Jacobian.

3. Speaking of integers. If a part of calculation cycle can be done using pure integers, do it in integers and then convert to floats at the appropriate time. Integer computation is almost always significantly faster than float operations.

4. If and For loops are generally considered slow. Either use vectorization (numpy arrays use it intrinsically) or list comprehension when iterating over quantities. It's also more "Pythonic" this way.

5. To follow Daewonn Lee's advice - profile your code to figure out where you're losing time. It's possible you are not stuck on computations, but rather reading / writing to disk.

Mohamed S. Eid

I don't think 10 minutes is that long, but if time is of the essence, did consider choosing a different language that is faster and more efficient; C, C++, and Java?

Also, you might want to parallel compute some of the functions that are independent on each other.

http://stackoverflow.com/questions/20548628/how-to-do-parallel-programming-in-python

Minvydas Ragulskis

If using C++ would be a too complex task for you - my recommendation would be to opt for Matlab and to use vectorization.

Herbert H H Homeier

You can extend Python with C or C++ for the numerically intensive parts of the program. See the first link below. To identify the latter, you might use profiling, see the second link below. Sometimes, performance issues are caused by nested loops. Then, you can often optimize the code by computing quantities needed in the innermost loops at least partly outside the loop structure and store them.

https://docs.python.org/2/extending/extending.html

https://docs.python.org/2/library/profile.html

Daewon Lee

I think it will be more helpful if you attach the bottleneck part of your souce code.

Boris L. Glebov

A couple recommendations for code optimization:

5. To follow Daewonn Lee's advice - profile your code to figure out where you're losing time. It's possible you are not stuck on computations, but rather reading / writing to disk.

Ruchika Vyas

You can use parallel programming for faster execution.

first find out which part of your code taking more time for that you can use intel's profiling tool and then apply parallel approach. MPI library is available for python.

C/C++ will help to run code more faster then python.

Mateusz Andrzej Kȩdzior

I think that your question is too general.

Without information about what you're computing or providing us code chunks, you cannot receive answers that will be really helpful for you.

In few answers, parallel computations or switching to C have been suggested. However, I think that you should avoid it as long as you will solve some simpler issues (I like list provided by Boris L. Glebov).

Huijun Wu

I think that depends on your specific problem. But in general, when your python code is too slow, you can follow the following step:

(1) Use some profile tools like cProfile or kern_prof to find the execution time of your code and find the performance bottleneck.

(2) Try to improve the data structure, remove non-necessary computation in the bottleneck code. If necessary, implement this part by C and use as a library in python.

ZnO Schottky barrier solar cells?

What's the effect of an external applied electric field on a solar cell? Is there any work in this topic?

How to simulate a perpendicular parallel junction solar cell?

What is the minimum required number of training sets for i number of features in order to have a good prediction of a problem using neural networks?

Can we use PECVD technique for a deposition of a-Si layer wih 3 nm thickness on glass? Otherwise, plz suggest other techniques to do so?

In bifacial solar cell (n+pp+), why does the rear side illumination yields lower efficiency (i.e lower Isc, Voc) than the front side illumination?

Which is dominating the industry of PV , N-Type or P-type Si ? Why?

How to repeat an operation on an atom selection for multiple times in PyMol?

How to define and get results (stresses) for certain points of a model in abaqus, using python? When the size of the model changes?

Which is suitable for use with Python? MySQL or SQL Server?

Can anyone provide the Support vector regression code structure of LIBSVM MATLAB or python ?

How to modify Johnson cook equation in ABAQUS?

No module named 'tensorflow.keras.backend.tensorflow_backend' ?

Implementation of chance-constrained optimization problem via scenario approach?

Bayesian decision trees and bayesian regression trees in Python and/or R?

Are there packages (R or Python) for simple models (bucket?) to assess soil water content?

How can I retrieve FASTA sequence from gene ID or UID of a Gene ? I have GeneBank IDs ?