I made this question because the sum with reduction(+...) is made in a different order of threads and the results are not exactly the same each time I run the code.
Can you post a basic summation example to show the problem at hand?
If you are summing integers and the resulting summation is still within the computer's word size for that number, the summation should always be the same, regardless of the sequence at which the numbers are added to the total.
If you are summing floating point, it is possible that the result will not be the same because some floating-point numbers cannot be exactly represented. An example is 1/3.
Something that should be considered: Are all the intended numbers actually being added to the overall summation? Does the difference matter in the final result, the required determination? For example, does 25.01 make a difference in the final result compared to 25.013? What numeric accuracy is required?
I am working with the Navier-Stokes solution and my question was made to optimize a Conjugated Gradient method to solve Poisson equations.
The code is on the link: https://github.com/lacia-udesc/SuLi/blob/main/SuLi-IC/9_grad.f90, and the part where I have the issue is in between lines 191-197 and 222-228.
Answering your questions: I am working with floating point numbers. The difference between using OpenMP and not using it is very small, but I am working with billions of iterations, and it changes the final results of the code.
I need that the results be exactly the same if the same set of initial parameters is considered for verification and validation procedures.
Do you think that there is a possible solution to this issue?
There is an entire field of mathematics regarding error accumulation during automated iterative operations employing floating-point numbers. It is a huge issue, to be sure.
Let me brainstorm a bit .....
If the results MUST be EXACTLY the same then some means is needed, as you originally said, to ensure parameter summation takes place in exactly the same sequence. Also implied is that sub-iterations and calculations also occur in the same order. That is a matter of programming and communication between processes and will be expensive in terms of runtime and message interchange.
Not being familiar with the calculation you are carrying out, I am wondering if "the same" can be defined as "within a given tolerance". If the result is within a given tolerance then it is said to be "the same". Why must the result be EXACTLY the same? Would it be "the same" if the result were rounded to a smaller number of decimal places? What is the smallest number of decimal places within the model's input data and parameter initialization? That is the number of valid decimal places in the result.
A small bit of mitigation might be to consider "floating point numbers". It has been a very long time since I worked with Fortran and I am not equipped for it here. Does Fortran have various word sizes for floating point? In C there is float and double, for instance. Both are floating point but double will accommodate larger numbers and more decimal places. Be aware that the word sizes for float and double are the same under certain compilers. In that case, there would be no change in accuracy.
How long does a run take on one core? Are you doing multiple runs using different initializations and data? Is it possible that the multiple runs could be done using one core, the results saved, and a report on results produced after all runs are finished? In that way, multiple arrangements of data and configuration parameters are assured of all being done in the same sequence of operations, as the model adapts to the data and initialization.
Thank you for your brainstorming! It made me think about some aspects of my code. In fact, for most of the simulations, there is no need of this EXACT similarity between results and the use of "reduction (+ ...)" is good enough. Only for some verification procedures this characteristic of similarity is needed. So I can work with the usual parallelization, and when I need to run some verification procedures I can work without OpenMP with only one core.
Answering some questions. Fortran can work with different word sizes of the floating number. I am already using double precision.
A run can take some days, but usually it takes 3 or 4 hours. I am not working with multiple runs at the same time. Only one run can take a long time.