Nowadays all of the major Fortran related numerical calculus have exactly mapped equivalent libraries in more modern language framework like Numerical Python (NumPy) and Scientific Python (SciPy).
What keeps physicists stuck with Fortran?
Performance?
Portability?
Scientific evidence?
http://numpy.org
http://scipy.org
Fortran is the favorite whipping boy of the programming world. People feel completely comfortable making condescending statements about the language and those who use it. I've been coding since I was 12, taught myself three languages by 10th grade (including 6502 machine language coded in hex), and learned several more by the time I finished grad school. Fortran has become my language of choice for many reasons, but being "stuck" isn't one of them. Expressiveness is part of it. Support for mathematical calculations is another. And performance is very high on the list. But the number one reason from which most of my other reasons flow is that Fortran is the only language with an international standards body that sees scientific programmers as its target audience
On performance, take a look at this comparison of Fortran, C++, and Python in a domain of broad interest in physics (solving PDEs):
http://www.hindawi.com/journals/sp/2014/870146/abs/
Python isn't even in the running and C++ only comes close on the largest problems. And on the performance question, I think it's important to note that, in the multi-core/many-core world, high performance necessitates parallelism. Fortran is the only standardized language with its own high-performing and scalable parallel programming model in the form of Fortran 2008 coarray parallel programming. If you download OpenCoarrays (www.opencoarrys.org), the test suite contains multiple PDE solvers, including
(1) an object-oriented, parallel Burgers equation solver that has been shown to scale 16,384 cores with 87% parallel efficiency using nothing but Fortran 2008 with no need to embed any compiler directives or calls to libraries external to the language (I'll be glad to provide references if desired), and
(2) a spectral Navier-Stokes solver for which the coarray Fortran version outperforms the MPI version even when the compiler uses MPI under the hood to support coarray communication and synchronization.
While I think Python is wonderful and is great for scripting purposes, I'm more likely to move to something like Julia if and when I finally switch away from Fortran for production code. Like Fortran, Julia was conceived with numerical computation and high performance in mind. For a great perspective on languages for scientific computing, including Fortran, Python, and Julia, see http://arstechnica.com/science/2014/05/scientific-computings-future-can-any-coding-language-top-a-1950s-behemoth/.
Many things can cause a group to stick with a given computer language.
Reuse of existing packages is one.
Ability to know and understand how boundary conditions (underflow and overflow) are handled is another.
Access to support environment for debugging and testing performance can be another.
Support for input and output of large data sets can be a factor.
Ability to express well understand algorithms is important.
Support for parallel processing can be a factor.
There are many reasons languages such as FORTRAN, COBOL, and SAS have their loyal following despite the creation of languages that can be in principle be considered superior.
Thank you John, you are making good points.
Let me try to discuss your points, one answer at a time..
1. REUSE:
SciPy and NumPy are already supposed to be built upon the long standing history of the Fortran legacy, rewritten and tested in the new language Python (and its high performance derivatives).
Also reusability among Python projects is very well demonstrated by the extensive ecosystem that is now in place around Python.
A physics paper publishing results out of a Python-ported numerical computation today, will have no problem in the (long term) future to be reused, cited and verified by independent research groups worldwide.
2. BOUNDARY CONDITIONS.
If I well understood your point here, what is necessary to be able to do here that cannot be done with a Python based framework? It looks to me more related to how you formulate a problem, than which language you are using to express boundary condition, right?
3. DEBUGGING AND TESTING. In my experience, the time spent on debugging and testing a Python program is less than 50% of any other language framework, including C, C++, Java. No experience un debugging/testing with Fortran, but I can imagine...
4. BIG DATA INPUT/OUTPUT. Python has a native support to memory mapped I/O wich is the most efficient way to read large files that I know, regardless of the language you use.
https://docs.python.org/2/library/mmap.html
5. READABILITY OF CODE. Python is definitively the best! And this is the key to its success. Have a look at the included SciPy.Org documentation page about integration methods.
Also , a Python program can be transformed almost in a paper-like form using the IPython Notebook, where code, formulas and graphs are easily blended together , like you can see from the second link.
http://docs.scipy.org/doc/scipy-0.14.0/reference/tutorial/integrate.html
http://nbviewer.ipython.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-3-Scipy.ipynb
6. PARALLEL PROCESSING. Well, on a Unix/Linux machines many of the message passing, shared memory software architectures well known in literature are already supported by a Python environment. Also a native multiprocessing support is documented in the link.
https://docs.python.org/2/library/multiprocessing.html
John, on the more general issues you were mentioning as your concluding remarks, I can say that keeping all of the legacy Fortran code as it is, new team members should be encouraged to explore how to integrate new Python code into it, starting right from the I/O subsystem.
Perhaps one more argument. I, for example, feel very uncomfortable, like an ape with razor - not knowing what am I really doing using new efficient languages. Unconditional trust? No way, too many bad surprises in past.
I've heard a lot of time its legacy code that works and no one wants to touch and migrate. Old habits
Marek, you can actually use the line debugger with a Python program going as deep as you desire into source libraries code, which are not perfect , we do agree. You can even show the assembly language generated out of a user defined function. There will be a point when you will only have to trust the under hood CPU system.
I hope the link can help to remember the history about the boundary of trust.
https://en.wikipedia.org/wiki/Pentium_FDIV_bug
1. What keeps physicists stuck with Fortran?
It's sort of like asking why someone speaks only english or prefers beef over pork--often these conversations turn religious.
Pragmatically, in a team environment, one prefers to speak a common language with colleagues to communicate ideas effectively, and read/write and easily comprehend codes. Maybe it's the Inertia of a large mass--tends to keep going without change in direction and without much friction.
2. Performance?
Yes. Also, coding practices (lack of wide pointer usage) and existing optimized modules (purpose built in house libraries) tends to keep these codes performant.
3. Portability?
Definitely.
4. Scientific evidence?
Not sure what you meant by this...
In summary, it's a matter of taste and habits of a group, and besides the language keeps evolving in a decent manner and with the notion "if it ain't broke, don't fix it."
Giovanni,
This is misunderstanding. "The ape with razor" is by no means related to the very detail of generated assembly code, it's on different level. I don't care what code is behind a simple formatted PRINT statement in Fortran (and it is surely much longer than a simple assembly language statement). But I'm very unhappy to have at my disposal a system-delivered (built-in) fancy function performing, say, some kind of optimization by minimizing a "vector norm". Without ever mentioning which norm. Here is where I don't know what I'm really doing, no matter that the results may seem sensible and are produced suspiciously quickly. Besides, digging into 500 page book (the one I have on Python) is tiring, especially for a newcomer.
I agree with this suggestion to adopt the new language but it depends on the requirements of a particular project. In my opinion, this question will be more interesting if Giovanni De Gasperis could provide some numerical result for the execution time for same codes using the Fortran and Python.
As many colleagues said above, I believe thta it's just a matter of purpose. The computer languages are tools, in fact interfaces, to perform calculations. If your main propouse is speed no matter what, FORTAN (or C, etc) may be the answer. But if reprodutibility, development speed and utilization of additional libraries, I would recommend a script language such as Python, Perl or Julia. But among those three I would definetely recommend Python.
Important to say, as many colleagues have spoken as well, the main tools that allow Python to be fast (like the numpy arrays) has its kernels implemented in FORTRAN and/or C, so Python works just as a interface.
One can make an argument that compiling a program provides a level of protection not provided in a scripting languages. The compiling of a program insures the entire program can be parsed and transformed into an executable.
Fortran is the favorite whipping boy of the programming world. People feel completely comfortable making condescending statements about the language and those who use it. I've been coding since I was 12, taught myself three languages by 10th grade (including 6502 machine language coded in hex), and learned several more by the time I finished grad school. Fortran has become my language of choice for many reasons, but being "stuck" isn't one of them. Expressiveness is part of it. Support for mathematical calculations is another. And performance is very high on the list. But the number one reason from which most of my other reasons flow is that Fortran is the only language with an international standards body that sees scientific programmers as its target audience
On performance, take a look at this comparison of Fortran, C++, and Python in a domain of broad interest in physics (solving PDEs):
http://www.hindawi.com/journals/sp/2014/870146/abs/
Python isn't even in the running and C++ only comes close on the largest problems. And on the performance question, I think it's important to note that, in the multi-core/many-core world, high performance necessitates parallelism. Fortran is the only standardized language with its own high-performing and scalable parallel programming model in the form of Fortran 2008 coarray parallel programming. If you download OpenCoarrays (www.opencoarrys.org), the test suite contains multiple PDE solvers, including
(1) an object-oriented, parallel Burgers equation solver that has been shown to scale 16,384 cores with 87% parallel efficiency using nothing but Fortran 2008 with no need to embed any compiler directives or calls to libraries external to the language (I'll be glad to provide references if desired), and
(2) a spectral Navier-Stokes solver for which the coarray Fortran version outperforms the MPI version even when the compiler uses MPI under the hood to support coarray communication and synchronization.
While I think Python is wonderful and is great for scripting purposes, I'm more likely to move to something like Julia if and when I finally switch away from Fortran for production code. Like Fortran, Julia was conceived with numerical computation and high performance in mind. For a great perspective on languages for scientific computing, including Fortran, Python, and Julia, see http://arstechnica.com/science/2014/05/scientific-computings-future-can-any-coding-language-top-a-1950s-behemoth/.
Damian - thanks for mentioning our paper. I do agree that Fortran is faster than Python, but one of the point of the paper (http://www.hindawi.com/journals/sp/2014/870146/abs/) was to show that Python can be faster (especially with some extra tools/library/interpreters - e.g. PyPy) than most people think.
I also believe, that not all scientists (and definitely not all the time) have to think only about the running time. Coding/testing/learning time is also very important.
Marek Wojciech Gutowsk: People write papers about fancy functions/scientific schemes (because this is something that you might need to use in your work) and many of them are implemented in Fortan. Using them with full confidence from Fortan might be as hard (harder ?) than using from Python.
40 years of linear Algebra libraries.
mpi implementation in f77.
Massive parallel computing in f77.
Need I say more?
Python is for pussies (i.e. psychologists and economists).
I may deviate from the gentlemen who prefer it in FORTRAN. I did switch to C/C++/C# almost 15 years ago. I do feel I did the right thing. There are so many libraries in all branches (math, GUI, etc.). Honestly, to be a C programmer, it means that FORTRAN is a peace of cake. But not the other way!. C enables you to write low-level codes and, hence, giving you larger flexibility over your code. You may also import other languages into your C code pretty easily. There is a wide range of bonuses you get with C programming. Try it and see for your self. Regards.
please u have speed classification of languages ... look at google ... python is very very very slow.
... and we are not stuck with fortran, we love fortran! It is the languange of numerical science, math, geo science, of course c++ has some benefits but it is not very handy for scientists. Everything else cannot be used because of computational speed ...
Everybody's preferences are strongly related to practical needs and available libraries to satisfy them. Python is a great tool for measurement data collection and manipulation, but it will need to develop a lot of models to compete with Matlab/Simulink optimization or system recognition toolbox libraries, for example. There are many counter-arguments, but, again, everything depends on particular needs, where speed, convenience and other arguments become widely negotiable.
I think one reason is that several generations of physicists have used fortran language in their programs and created a lot of subroutines and libraries. These programs constitute today a reached "gallery" of solved numerically problems that is very useful for future generations. Another reason is that the old physicists have difficulty to shift to an different programing language like as Python or C, C++, etc.
Bejo, be assured that for a good physicist, young or old, moving to a new programming language is not a big deal.
Hehe, this is actually a very funny statement:
"Nowadays all of the major Fortran related numerical calculus have exactly mapped equivalent libraries in more modern language framework like Numerical Python (NumPy) and Scientific Python (SciPy).
What keeps physicists stuck with Fortran?"
Do you know why? Well it is funny for the same reason NumPy and SciPy can not be faster than Fortran by definition. This for the simple reason that the NumPy and SciPy libraries are wrappers around C wrappers around FORTRAN code.
So when you move to SciPy and NumPy you are actually still using Fortran...actually it is worse, you are still using FORTRAN 77 mainly, while most Fortran programmers have moved on to Fortran 95 and Fortran 2003. (If you don't believe this, try to trace a high performance NumPy routine down to the source code, you will be surprised.eg leastsq)
Guys, thank you for your hints. I like very much the approach reported by Aldo Dall'Osso, which is were NumPy is founded. I am personally starting to use PyPy, the just in time compiler, with multiprocessing and the models fits my applications domain (text mining, extended gaussian algebra expressions and logic programming).
Looking forward to learn more about Julia, seem's very promising.
Cheers!
There is no need to convert old code into Python because you can call fortran code directly from Python using f2py.
You know it becomes harder to learn with aging , and physicians love the traditions .
Btw, I like the approach of Software Carpentry web site on one of their Python online course for scientists:
the time to arrive to a solution is given by: Ts = Td + Te
where
Ts : time to the solution
Td: time to focus on the problem, think of an algorithm, write a program, debug it
Te: time of execution
Nowdays Td>>Te, that is why i am definitively stuck with Python/Numpy/Scipy.
Cheers!
http://software-carpentry.org/v4/python/intro.html
Why you want to "replace" fortran with numpy+scipy if scipy is in a considerable amount fortran and also numpy has fortran code?
The issue here is not to "replace" anything. How about the NEW problems that physicists want to tackle. Why they keep pushing young students to be stuck with Fortran, while those students could learn a new and fascinating language the improve their productivity by orders of magnitude?
It really depends what the students are doing.
If they are analyzing a small data set and looking to make some plots, obviously python will be faster to code and easier to use.
However, for any serious simulation or numerical computation, Fortan will be many times faster. Also, if you want to parallelize your code (for clusters or GPUs) it is very difficult to do that with Python. Parallelization really can give a research group an edge and is becoming more and more popular.
I'd also argue that for many things Fortran is easier to learn than C and even Python. The programming model that Fortran uses is much easier to learn than the Object oriented model that Python is built around. If you wish to go object oriented, you can use Fortran 2003, but you don't have to learn it from the outset.
Fortran offers a nice mix of ease of use for numerical calculation and raw speed which is so far unmatched.
The fact that physicists still use Fortran is more of an indictment of the computer science community than of physicists. The fact is, they have not yet developed a better alternative that combines speed and ease of use. The closest thing is the Julia language which is currently being developed.
I have written about this in much more detail on my blog (see attached link).
https://moreisdifferent.wordpress.com/2015/07/16/why-physicsts-still-use-fortran/
Dear all,
I am glad to see a great open science initiative from the LIGO research team that published a web page about the gravitational wave event with the full dataset and a scientific Python program so anyone can re-analyze it.
(see attached link)
https://losc.ligo.org/s/events/GW150914/GW150914_tutorial.html
I am using Fortran actively only about half a year, in the course of my PhD project – a requirement, because I'm writing an extension to an existing code. However, while the original reason to pick up Fortran was related to the age of the code, I found that it really makes life easier for implementing mathematics.
The equations I have to implement – preferably as close as possible to their written form to keep them readable – are most easily written in an index notation, which would be extremely inefficient when using e.g. Python or Matlab; Rewriting them to use efficient array operations provided by these languages would obscure their interpretation, especially when it involves 3rd or higher rank tensors. (Note: numpy.einsum may partly solve these issues, and indexing without temporary variables in fortran may result in lost performance due to non-continguous memory access).
C, C++ would allow efficient index-wise operations. But neither has builtin support for dynamically sized multidimensional arrays; In C++ at least libraries can provide more convient syntax here, but this still increases the learning curve compared to Fortran. (Disclaimer: So far I didn't use C/C++ for anything beyond "trying the language" and some exercises in computer science lectures.)
Granted though, Fortran has its own shortcomings – notably the limited support for generic programming and, possibly related, the lack of anything like C++'s STL (e.g. a generic, type-safe linked list).
If you use some standart numerical methods, Python may be fine. Indeed, modules like scipy are actualy compiled C or FORTRAN code. But if you are going to develope something original, procedures that are not exist in standart Python libraries, then FORTRAN is significantly better. It will work many times faster then Python.
I realize that's an old topic, but just wanted to throw in my 3 cents to Fortran vs C argument. (I won't be commenting Fortran vs Python thing because it's completely irrelevant and these two languages serve completely different purposes.)
I started my PhD project a year ago. Before that I had some considerable experience with programming, particularly in C, Python, IDL, PHP and several other languages. I decided to learn Fortran 2008 as my major language (I'm mainly solving PDEs) and I think it's a wonderful choice for this purpose.
I think the proper version of your question "Why they keep pushing young students to be stuck with Fortran, while those students could learn a new and fascinating language the improve their productivity by orders of magnitude?" should be rewritten as "Why they keep pushing students to be stuck with C, which is a horrible choice for numerical computations in terms of readability and code maintainance, if they could learn Fortran 2008 -- a modern language, as fast as C, with built-in array handling and multi-processing mechanism?".
I think Fortran is way better for array operations and number crunching, while C is preferable for binary trees and other logical algorithms. So I guess it's not "being stuck with Fortran", it's "choosing the best solution available which Fortran is".
Dominik, the original question was not about Fortran vs. C. And the original post also didn't just ask about Python vs. Fortran either. Instead the question was about using Python together with NumPy and SciPy instead of Fortran. There is research showing that scripting languages, like Python, have fewer lines of code and as a consequence fewer errors (as the number of errors per line is almost constant among all programming languages). So, Python with NumPy and SciPy helps to write your code faster (as in it requires less time to write the code), is more robust, and it is almost as fast as Fortran. BTW, using modern Fortran, e.g. Fortran 2008, will decrease performance significantly if you are actually using modern features (I have seen benchmarks showing a performance drop up to a factor of 100 compared to 'old' Fortran). In these cases Python would even be faster (because internally NumPy and SciPy is highly optimized Fortran and C).
Furthermore, you should distinguish between C and C++. I agree with you that plain C is not a good choice for numerical applications. However, C++ has libraries available which provide the same syntax for vectors and matrices as in Fortran and allow for more sophisticated optimizations than what Fortran compilers actually could do. The secret for any programming language is that you should rely on libraries as much as possible. They are well tested (so they'll have few bugs) and mostly optimized. One huge disadvantage of Fortran is that it does not have many libraries, especially not for common tasks.
I agree with the original post that 'being stuck with Fortran' is the right wording. Built-in arrays is not a good argument for Fortran because other languages provide capabilities which are comparable or even superior (even if it is through libraries). Additionally, it is impossible to combine fast Fortran with modern Fortran. You have to make a choice (though you can mix it for different parts of the code). Other languages provide modern language features without any loss of performance (or with moderate loss compared to 'old' Fortran). I prefer readable/maintainable code (even when the main requirement is high performance) which Fortran, in my opinion, is not able to provide (in my current job I am working on a simulation software written in Fortran).
@Hossein: Sorry, I don't have any details. I have seen this live in a Fortran course held by one of the Fortran standardization committee members. This had to do with how you hand down arrays to subroutines. Initially, just plain fixed size arrays, then pointers (not too bad), then types (struct of arrays), down to full classes (object oriented programming). This was tested with gfortran.
Hello Simon,
I perfectly understand that Python makes your code shorter. I am not denying that for operations that can be represented by matrices, NumPy is an excellent tool, since its core is written in Fortran (as someone mention), yet you don't have to deal with all issues that low-level languages have. I use NumPy and PyLab as the main tool in my work, mostly for post-processing models computed in fortran (my case is not easy to parallelize, therefore I stick to F2008 to generate the data, and then handle it in python). This is why I'm saying these languages serve completely different purposes and there's no point in comparing them.
However, I cannot believe that slowdown you're speaking about. I have rewritten my code to C one day which caused about 1.5x speedup in performance, but this came at a cost of code readability. Of course you can't get any faster than C even using Fortran. The "more abstraction in Fortran helps compiler do its optimizations" argument has not been valid for years since compilers got so intelligent they can optimize the C code even better. Why I decided to keep my project in Fortran rather than moving to C is the ease of use, maintainance, modification, debugging etc, as well as excellent code readability (comparable to Python+NumPy arrays). These "soft" factor are often missed, yet they are just as important as 1.5x speed difference.
And, last, I am certainly not able to believe that built-in Fortran arrays can be slower than object-based arrays in C++ that provide the same functionality and readbility unless I see this myself. Only then I might reconsider my language choices.
Have a good day :)
Dominik
Hello Dominik,
C itself probably will not be faster than Fortran. However, for C++ there exist matrix/vector libraries that make use of so-called expression templates. Blitz++ was the first library to use it and Eigen is one of the most well-known modern libraries in this area today. I am not entirely sure in which case which library (even BLAS or LAPACK) will be the fastest.
The advantage of expression templates in this context is that these libraries can optimize for the order of individual multiplication instructions. Especially when matrix multiplications are chained expression templates help reducing the number of nested loops. In theory, Fortran compilers could do the same, I guess. But, so far I haven't heard of it.
Speed always depends on your particular application. From my experience, the Intel Fortran compiler produces the fastest code. The Intel C++ compiler can produce fast code, but only if you don't use object oriented programming (tested with version 14). With g++ you can use any feature of C++ without a major performance hit. But, it's baseline is slightly lower from Intel's Fortran compiler. And gfortran is on par with g++ (haven't tested modern Fortran myself).
You might have heard of libraries like MKL or ATLAS which use traditional approaches (Fortran-like). Here are some benchmarks for Eigen:
http://eigen.tuxfamily.org/index.php?title=Benchmarkand Blaze:
https://bitbucket.org/blaze-lib/blaze/wiki/Benchmarks
Unfortunately I couldn't find good benchmarks comparing with Fortran. There is, however, an old article about Blitz++ vs. Fortran 77:
http://www.drdobbs.com/cpp/scientific-computing-c-versus-fortran/184410315?pgno=1
You can find the figures when clicking through the pages.
Have a good day as well
Simon
Hello Simon,
Python with NumPy / SciPy is NOT "almost as fast as Fortran". It is much slower, and the API really isn't that nice anyway.
Python+NumPy is popular because for a lot of the less demanding tasks (e.g. a lot of data analysis) it is fast enough. But these tasks were never done in Fortran anyway. Those tasks are traditionally the domain of Matlab, or in astronomy a language called IDL. But anything that actually requires performance is simply outside the scope of NumPy.
Also don't forget that Fortran is already a high-level language. Porting code from Fortran to Python+NumPy doesn't make it that much shorter.
If you want a scripting language to replace Fortran, have a look at Julia. It is faster than NumPy and it actually has a nice syntax that was designed for scientific computing from the start. It's still not as fast as Fortran, but it closes the gap a lot. I recently considered Julia for a hydrodynamic simulation and the Julia version of the code was faster than the default compile with GFortran, but slower than the default compiles from the Fortran compilers from Intel and PGI. Then I re-compiled the Fortran again with optimization flags and then it was 2x faster than Julia. The fact that Julia can get in the ballpark of Fortran is impressive. NumPy can't really do that, unless your program works mostly by calling external libraries like LAPACK, in which case it doesn't matter what language you use because it's the library that's doing the work and not the language.
Finally, "being stuck" is definitely wrong; extremely wrong. A lot of physicists already know Python+NumPy. I certainly do. I am comfortable with Python/NumPy, Julia, C, C++, Fortran, and more. Instead of starting from the position that there is a fundamental need to replace Fortran, please consider that it might often be the right tool for the job. Python does not replace Fortran for the reason that Python does not replace Java and Python does not replace C++. Different languages offer different features and trade-offs and some times people will legitimately look at alternatives and choose a language that is not Python.
Dear all,
I encourage you to read this nice article about Python which underlines my point of view:
Yes, Python is Slow, and I Don’t Care
A rant on sacrificing performance for productivity.
Cheers
Giovanni
https://hackernoon.com/yes-python-is-slow-and-i-dont-care-13763980b5a1
Dear all,
again a supporting point of view:
Why Astronomers Love Python And Why You Should Too, Youtube video
Giovanni
http://youtu.be/W9dwGZ6yY0k
Because most of the data is available in the fortran language. Users from the beginning have used fortran language. The conversion from Fortran to other will be difficult. Though, python like languages are used nowadays.
As a Fortran programmer most of my life, I had not considered that expressiveness was one of the best aspects of Fortran. But on reflection, I think there is a lot less line noise in a Fortran program compared to Python and C. Most of the differences here are superficial, such as extra square brackets for multidimensional arrays, the lack of curly braces/more use of english (although this also becomes a burden), and the way arguments are listed below the subroutine line, enabling INTENT, PURE, and other details to be documented clearly (the one line C/C++ style is just not very readable).
Having said all that, nowadays modern C++ with libraries would be acceptable for me, even for speed, but not convincing enough to leave modern Fortran, unless I had to interact with non-numerical or GUIs. But not everything comes down to compute speed, eliminating an d detecting bugs i.e. maintainance is a huge consideration. And as I said, on reflection, modern Fortran is just really nice and pretty for coding maths. There is a huge amount of reusable high quality code out there.
Standardization is a big issue as regards a language. I write codes that I design and expected to work for half a century, if not more, so I don't want to deal with stability and standardization issues in this situation. The lifetime of codes you write may not be like this.
The advantages of templates is highly overstated, in my opinion. Templates are easy to do in Fortran with include statements and macros. This is a much easier way to do inheritance for beginners. The "is-a" and "has-a" problems just disappear. I've never understood the "part of the language" argument as regards templates. In addition, with Fortran, you can use the same code in different objects using the PASS() passed-object descriptor. The only time templates become very useful is when you are coding huge projects, making very high level abstractions e.g. like Blitz++.
It is simply not true that "compilers got so intelligent they can optimize the C code even better". I don't know why this statement appears so often, and is stated with such conviction. It is almost never true for me, in practice. And in theory, it depends very much on the compiler, and to get code as fast as Fortran in C you may have to help the compiler by avoid aliasing, and make use of the volatile keyword. For specific application libraries, C is undoubtedly better for writing the low-level compute kernels, since in most cases the OS is written in C, as is done in the modern Flame libraries. But these professionally written libraries have APIs for Fortran and C, and no computational scientists write this stuff -- these are made by computer scientists and mathematicians. (Thanks guys, keep up the good work).
It is true that much better designed compilers are now available, such as LLVM. I very much hope that those guys can make a Fortran compiler as good as gfortran, I think they are nearly there. Not yet though. This is certainly not the "compilers got so intelligent they can optimize the C code even better". In fact, it clearly points out the speed advantages of *not* doing that.
Python is great for doing little ditties, for quick turnaround for algorithm exploration, and transforming data into useful input for further analysis. In many cases, especially in a time-limited situation, it will do the job. But it is not the optimum language for bespoke quality numerical software for particular applications. It's better then to have Python wrappers. These days, anyway, you can link Python to any (old style) Fortran subroutines easily. A very sad point is that it is hard to link Python to modern Fortran arrays and derived types. If you want to use Python for big quality jobs, you will have to be more disciplined in the programming, design, commenting, and to control errors, you really need a typed language, in my opinion.
I can't recommend Julia either, for numerical work. Julia is really great, it is modern, but I reject it for long term work because of standardization issues.
There is only two languages I would currently leave Fortran for, and it is too early to do so. The first is Chapel. It is the ultimate compute language. I am only waiting for Cray to throw more money onto that. I may very well switch over in the near future. It is so well designed. The second is Haskell, only because in principle, it can do everything Julia can do. In principle. In practice it is nowhere near what is required for numerical work, but it is fantastic for symbolic manipulations, and this mathematics part high-end part of numerics.
Fortran is just a lot faster than numpy, and also more adaptable; numpy or matlab for that matter are sometimes difficult to adapt to a problem (like solving one 1D differental equation). It is well maintained and is in constant evolution (the last is Fortran2018). . There are some critics like not being able of a quick interactive session. Or to have an easy interface to GPU. But even that will probably be solved in the near future. Have a look at lfortan. https://docs.lfortran.org/
The first question is 'What's wrong with Fortran'? Is Fortran 'broken' and beyond fixing at that? If not, why move to Python? Why move to a much slower OO-language to run procedural code? Because it is fashion among people doing very different work? People in science spend their time doing research and computational aspects are very important, but not the end, just like a soccer player must have speed and endurance, but that is not his main job.
If you follow current #AI developments, you notice a dramatic shift towards Python and the named packages. Any good programmer can write code in any language, however also any good programmer will make a good decision of what is acceptable for process, team working, future, and quality, verification and validation, documentation, etc.There are so many aspects, and specifically persons should take much broader consideration / perspective before deciding what is the right language, not just speed, or because it is cool, or because it worked because it glued some really deep complexity together. Thanks for having opened this discussion.
Related: https://www.fortran90.org/src/faq.html#what-is-the-advantage-of-using-fortran-as-opposed-to-for-example-c-c
Further reference: this is an excellent(!) introduction to Fortran:
https://ethz.ch/content/dam/ethz/special-interest/mavt/energy-technology/renewable-energy-carriers-dam/documents/teaching/Lectures/Additional-material/03530_Computing_with_Fortran_by_Haselbacher.pdf
One advertised advantage of Python is the simplicity of syntax. But being simple can be a wrong attribute if a student learns to program for the first time. "Implicit none" approach of Fortran with strict syntax will serve better for a beginning science student and he can easily learn to code in any language later, with the strong programming foundation gained.
@ Ludwig Schreier : We are not talking about which language is better in general, but which one is better for scientific work and the hands down answer is Fortran because of speed which is of primary importance. Imagine a metereological application where solving the equations happens after the prediction interval. And this is not just for the 'final' results, but also for debugging and testing: When running tests to check code aspects I want them to run fast, not wait a day per test. As for AI, do you seriously want to compare a clustering or neural calculation in Fortran vs in Python? I can do this in R or Weka or Rapidminer also, which may be easier to use but not if speed is your concern.
Fortran 2018 is backwards compatible with Fortran 2008, and Fortran 2003, and Fortran 2000, and Fortran 1995, and Fortran 1990, and (mostly) with Fortran 77 and Fortran IV.
Python 3.0 is not backwards compatible with Python 2.7.
What will happen with Python 4.0?
In terms of simplicity and availability, Python is very good. In terms of performance, Python is adequate for small projects but very far behind in High Performance Computing.
Modern Fortran (at least as recent as Fortran 90) is an excellent language, take a look: http://www.hpc.lsu.edu/training/weekly-materials/2014-Spring/Fortran-Spring-2014.pdf
Hi Ben,
I'm going with laziness. FORTRAN is well established, strongly typed and very easy to use. Why should I devote any time learning a new language when the one I truly understand does the same job (for my purposes) without requiring me to learn a new way to do the same old thing?
Hi Brad -- Good to hear from you -- it's been a long time! I hope you're enjoying retirement -- I've still got a ways to go.
I've got to respectfully disagree on the laziness as a major cause, but my evidence is anecdotal: all Fortran programmers I know also have multiple languages under their belt -- most are using Fortran because its array processing is cleaner than C pointers and FOR loops. My problem with Python (beside slowness) is the need to re-write everything at the whim of the Python gods. I'd like to see someone do an estimate of how many man-hours are being wasted because of Python 2.X and 3.0 incompatibilities. I know my team has lost man-weeks of effort over this self-inflicted absurdity. Based on our experience over the past couple years, I will never commit resources to any major long-term efforts in Python until the Python gods provide assurance that ALL future Python versions will be 100% backwards compatible with packages written in Python 3.0. Until then - Python is a nice demonstration and prototyping toy, but our long-term codes will be in Fortran, C, or R. Thanks for listening to my rant.
Hey guys. nice comments. I just want to mention add to Ben that I have never had backwards compatibility using R. Most of my code is done with base, data.table and sf.
I have to go with reliability, compatibility and simplicity. Many age-old scientific codes written in Fortran remain alive because they do the job. Fortran is also somewhat simpler to digest than C, in particular if you are not interested in having a deep relationship with your computer. I also agree with Ben, Python is risky and there is too many open ends on its development. As Sergio, I would vouch for R as a high-level language (lovely for post-processing my results), but for performance, I can clearly see why so many of us stick to C/Fortran. Another good reason of why Fortran is alive and well, is that some very large communities (climate science, for example) rely on very well established Fortran codes. They are not in the business of throwing away 20+ years of testing their implementations.
From my experience, the only reason to stick to Fortran is the already written code. When a scientist has a code than compiles and runs, maybe written in the '80s so that it will require a lot of effort to move to another language, the options are very few.
If the code works well, fulfills all expectations, it is fast, efficient, continuously updated, why to change to another language? The only reason for swapping to another language is functionality. If there is something you want in your code that Fortran cannot realize, that's a good reason for seeking another option. However, in science, I do not know what Fortran cannot carry out.
In physics the programmer needs more control over the code than AI usually provides.
A large library of linkable Fortran routines is available, and some major simulators were written in Fortran.
I've been writing Fortran for 50 years and now usually prefer other languages like Assembly, C+, and Pascal for various features in data structures and resource management.
Artificial Intelligence has come and gone several times over the years, and the new packages show some promise. Claims have been made.
Arguments are not compelling for many scientists to switch software.
FORTRAN is not a dead/obsolete language. It is language that is solving very well the problems for which it was designed to work: fast mathematical calculations. Since FORTRAN 77, there is a lot of evolution and currently since FORTRAN 90 it has renovated possibility of success. Note that with FORTRAN you can call C methods easily (https://rchg.github.io//computing-blog/C-in-Fortran/) and also it implements OpenMP and OpenMPI (https://rchg.github.io//computing-blog/OpenMPI/) for parallel computation in HPC and Clusters.
In fact, numpy and scipy in python had been using traditionally wrappers to FORTRAN libraries like BLAS, LAPACK, between others. This means that Python infrastructure for mathematical calculations has grown over the shoulders of languages like FORTRAN or C.
In general, I would say that both languages have different purposes, and are used on different context. Numpy and Scipy are not so useful for scientific codes for a larger scale of computations: once you need use a thousand of processors to work in parallel and need a language with a strict type system to be confident that everything is working as you expected.
In practical situations, we use both: FORTRAN and Python.
As someone who absolutely loves Python and probably uses it more than any other language, I'll be the first to tell you not everything should be written in it.
Pypy and other JIT compilers that are coming out are helping a bit, but even then they are still often slower than well optimized C/Fortran when you put it under heavy work loads. The other problem with them at the current moment is they do not work for every script. It can be a bit hit or miss as sometimes it runs slower than regular Python for some types of code it isn't compatible with. That may change in the future, but at the moment you can't really bank on them.
In science fields, Python is excellent for
1. Codes with reasonably low intensity computations
2. Codes with a lot of high level decision making
3. Processes which need to coordinate IO between many different codes written in multiple languages.
4. Codes which need to do a lot of string manipulation or file scrapping
5. Codes that need complicated functions such as SSH or Web Interfaces
Not an exhaustive list, but it gives you an idea.
But areas it is not quite as good at are primarily codes that have extremely long run times and that need to be distributed over more than one physical computer node. Mass parallelization in Python is actually one of the more frustrating things to do.
As such C/C++/Fortran still have use cases even within the near future. Julia might be useful at some point, but right now adoption rates are still pretty low with it. So writing code that no one can understand or maintain is a huge risk.
The advantage of Fortran is that it is a fast compiled language that is actually a lot easier to teach fresh graduate students who have little to no background in programming as many Chemistry/Physics/etc. students do. It also has a fair amount of safety built into it compared to C/C++ especially when it comes to dynamic memory. You rarely need to mess with pointers in Fortran where as in C you have to interact with them all the time. C++ is a lot better than C in that regard with the introduction of smart pointers, vectors, etc.
Modern Fortran has actually eliminated a lot of the weaknesses that Fortran previously had. Introduced Object-Oriented tools, C compatibility is massively better, dynamic memory, etc. With the ISO_C_Bindings it has become incredibly easy to directly link Fortran to Python/R/C++/etc.
I often find people knocking on Fortran are still thinking the language is stuck on the 77 standard. I've lost count of the number of people who say "Well Fortran doesn't have object oriented programming" when that hasn't been true for decades now. And no the other myth is "the compilers don't have it implemented yet" both GNU and Intel have most of the OOP implemented and working. Again hasn't been true for nearly a decade now. The 03 and later standard have really improved it a lot.
I use both C++/Fortran for most of my simulation stuff. C++ is a great language, but it's also considered one of the harder ones to learn even among professional programmers for a reason. There's a ton of ways to shoot yourself in the foot and it is a lot more cryptic for new comers. It takes quite a while to become proficient with it. Ironically I've seen more students have success by starting with Fortran and then moving over to C++.
Sometimes as more experienced programmers, we forget what it's like to have to learn programming for the first time.
That said older professors need to stop teaching F77. Modern Fortran is a wonderful language, F77 needs to be strapped to a rocket and aimed at the sun.
The things you mention python being good for, there are other excellent alternatives, perhaps better than python such as perl. In m experience OOP is rarely useful for scientific apps. If you really want a wrapper, you could just as well have a fortran code
to do the calculation and do system calls such as a perl/Tk interface to specify the parameters, then compile and run the fortran code and even graph the results.
"The things you mention python being good for, there are other excellent alternatives, perhaps better than python such as perl."
No one said it was exclusive to Python. Rather those are situations Python is extremely strong at.
The thing about Python at the moment is it is a highly supported language both inside of science and outside of it. If you need some kind of functionality there's probably a package already made for it. The user base is becoming absolutely massive and as a result it's one of the best supported languages at the moment.
It's also one of the easiest programming languages to learn. Which also is great for teaching to students with minima programming backgrounds.
If you wish to use Perl go for it. But personally Python is my goto.
"If you really want a wrapper, you could just as well have a fortran code"
I have a code that uses SSH to submit jobs on a remote supercomputer. I don't think that would be implemented in Fortran anytime soon. I call up Fortran packages to do computations and then use Python to submit the remote job and manage it via SSH.
The whole point of a wrapper is you can wrap up a routine and then use it in a larger workflow. Especially in situations where the greater work flow is 1000x easier to write in a high level language like Python.
The other side of it is by writing interfaces it ensures people who aren't Fortran gurus can use your code. If I write a Python extension it widens the potential user base. In the process I also wrote a C extension which means people who use R, Perl, or whatever language you desire can also import it.
"In m experience OOP is rarely useful for scientific apps"
I would have to heavily disagree. I would actually say a massive problem with scientific codes is because they are often written in a linear style, extending their functionality is a nightmare. I've lost count of the number of codes I've basically tossed out because they were so rigid you couldn't do anything besides what the author originally intended. If anything OOP is criminally under utilized by scientist.
Part of the reason LAMMPS has pretty much become an industry standard code among Molecular Dynamics users is that it's designed in a way that makes extending it's features reasonable. And it takes advantage of C++'s advanced features to do so.
I have a Monte Carlo code that I've been maintaining since graduate school. I wrote it in an object oriented way and because of that I can quickly add a module like a new forcefield, data gathering approach, sampling method, etc. with little to no effort. My recent 3 publications all use a method I coded up and added in about 1 hour of effort.
It's fine to write short codes in a direct manner, but larger codes or codes you want to reuse should not be. There's a reason these paradigms exist. It's to make managing large code bases reasonable. OOP allows you to write highly modular code that can be easily imported and used in other projects. It makes it so your code doesn't just sit on some Github repo somewhere never to be used again. I am personally saddened by the amount of effort that goes into writing many of these codes just to see them rot away because of poor code management.
Well, religious wars are rarely productive but just to point out:
1)If you need some kind of functionality there's probably a package already made for it.
-Same for perl
2)It's also one of the easiest programming languages to learn.
-Same for perl
3)I call up Fortran packages to do computations and then use Python to submit the remote job and manage it via SSH.
-Similar to what I do e.g.for interactively preparing the input file
4) OOP allows you to write highly modular code
-It does, but so does good old procedural subroutines, like Unix commands. The reason for lots of obsolete codes in my experience is that nobody uses it. If someone wants to use it, then a maintainer/further developper was found
"1)If you need some kind of functionality there's probably a package already made for it.
-Same for perl"
I'm sorry, but I can't say I believe this to be true based on all available data.
Python is the #2 language most commonly used project language among all fields only second to Javascript. This is reported in several metrics on sites such as Github and through other repository websites. In contrast, Perl isn't even in the top 10.
https://www.benfrederickson.com/images/github/language-popularity/major.svg
https://www.benfrederickson.com/images/github/language-popularity/oldthing.svg
https://www.benfrederickson.com/images/github/language-popularity/oldthing_u.svg
Especially when it comes to data science and machine learning, most of the codes coming out are in Python and the ones that don't have their engine in Python usually have Python bindings out of the box. That fact alone it's not possible for Perl to have the exact same amount of packages available.
To give you an example, try to find a Perl equivalent for Gym which is a package commonly used for reinforcement learning projects. I would genuinely be interested to see if that is in Perl. Actually I would even ask if there's Perl bindings for Tensorflow or a nice front end like Keras. Obviously asking about Pytorch is out of the question given that it's Python based.
Perl may be able to perform much of the day to day stuff, but when we start talking about specialized programs that's where I can't really buy that Perl does exactly what Python does.
"The reason for lots of obsolete codes in my experience is that nobody uses it. If someone wants to use it, then a maintainer/further developper was found"
A huge reason no one wants to use them is because it's faster to write your own stuff or look for an alternative when a code base is a complete mess. Which scientific codes sadly are much of the time very frustrating for anyone besides the original author to read. Readability, extendability, and re-usability are usually an afterthought in many codes.
If you don't make your code user friendly people tend not to adopt it and instead opt for codes that are less of a headache. The only exception to this rule is if it's a piece of code that no one else can write. At which point a lot of cursing and swearing occurs.
A major reason LAMMPS is a huge success in Molecular Dynamics is because it's actually an extendable code, it has a good user interface, and the documentation on the website is very robust. It makes for a fairly pleasant user experience. Which it uses OOP design under the hood to facilitate it's extendable features.
I can point you to a number of specialized Data Science/ML perl packages, but to be honest if I want say a clustering or neural net code, I use FORTRAN with a perl wrapper to read data and prepare the input (numerize, scale etc)
An example of old codes (even F77 or F66!) that survived because of their usefulness are R.D. Cowan's codes
https://www.tcd.ie/Physics/people/Cormac.McGuinness/Cowan/
That's just my experience, if something proves to be quite useful, it will find a maintainer.....
I'll leave the Perl vs Python alone for now as we can go around in circles on it.
"An example of old codes (even F77 or F66!) that survived because of their usefulness are R.D. Cowan's codes"
Sure and banks code in COBOL because they have legacy codes which need to be maintained. However, if we were writing bank codes from scratch I doubt many people would choose to write it the same way in 2020. People will grit their teeth and maintain a code if it's the only thing that can do it or the cost of rewriting it would be so massive that it isn't worth the rewrite. The reason we maintain old F77 codes isn't that it's superior style of coding, it's because it was often what was available at the time. It may have been the best choice at the time, but that doesn't mean it's a great choice in 2020.
It isn't an iron clad rule that a frustrating code bases won't be maintained. Code inertia is a thing that does happen. But for that to happen a code needs to be adopted in the first place. Which adoption rates are tied to user friendliness.
The higher the barrier to use something the more likely someone won't use it which for young fledgling codes can be make or break it. User friction is a term used for exactly this phenomena. The higher the resistance from a code base the higher the chance that someone will just throw their hands up and go use something. That's why they design sites like Amazon to give as little resistance as possible because they know the easier it is for people to spend money, the more likely they will impulse buy. The concept also applies to software. People tend to avoid software that is frustrating to use especially when alternatives exist. It's basically reaction kinetics on a human level. Higher the reaction barrier, the fewer people who cross it.
We should not be using old codes as a justification for writing new codes in a frustrating and difficult to maintain manner. It's fine to write in F77 if you are adding onto an old code, but if you are writing new Fortran in 2020. Please stop writing like it's 1977. We've had nearly half a century of improvements since then.
Troy Loeffler Shells are the most natural for jobs submitting, ssh'ing,... aren't they?
For most common operations, shell and of course shell scripting works great. Batch submission for example I use shell scripting all the time since it's pretty trivial.
The case I am talking about is more when the submission process is decided on the fly by a program.
To give an example I can probably talk about since my exact research is under wraps.
Let's say you are running genetic algorithms using a computationally expensive model like coupled cluster. I don't actually do this, but my real problem isn't too far off. Due to limitations on our large super computer (No job smaller than 256 nodes X 32 cores and running things on the headnode for longer than a certain time is blocked) running the main script which decides which structures to run computations on that computer isn't possible. So a work around is to instead run the brains of the operation (the GA code itself) on a local machine and have the script log in via SSH. When the local machine creates a structure and is ready to evaluate the energy, it feeds the structure information via SSH tunneling to the remote computer, sets up the job on the remote super computer, and then goes to sleep while it waits for the calculation to finish. It periodically polls the remote computer to see the status of the job and when it detects the job is finished it wakes back up, collects the output energy, and then proceeds to make the next decision on what to compute.
We are using a scheme similar to this for problems related to this research
Article Active Learning the Potential Energy Landscape for Water Clu...
This was the proof of concept paper, but we just submitted a paper where we actually trained a model against DFT.
It's technically possible to do all this in shell scripting as I have seen GA codes written in it, but keeping it in Python allows you to reuse the code for other projects a lot easier. That said Python's tools for doing this is still pretty good. Shell scripting is easier for more routine stuff, but for complicated workflows Python does a very good job.
Ultimately no tool is perfect for every job, so pick the one that's best suited for it. While Python is a great language, it does have it's weak points as I mentioned. Computational efficiency is a major thorn in Python's side. Which is why C++ and Fortran are still a large part of my daily coding.
Although I do not think we are that far apart, the 'user-friendliness' aspect may need some refining: User-friendliness typically refers, well, to the end user. So a program that is user-friendly, but a complete black box would fit; In my research for instance I work with Cowan's code very happily primarily not because of its 'user'-friendliness, but because of source code availability, which allows people (in this case with the author's kind assistance) to make the necesary code changes to deal with the research questions in hand. When I want user-friendliness, I can write a gui to specify parameters and run the code. So for scientific applications, true relevant 'user-friendliness' is source code readablity.
I don't disagree that people are more than willing to put up with something being on the annoying side if it's the best tool for the job. Especially if writing your own version would be either impossible or would take so long that it's not worth it. If I were to write it as an equation
(Probability of Use) = (Tool Uniqueness/Rareness) + (Usefulness) - (Frustration)
If you make a tool that's useful, but it's frustrating to use and there's other codes that do the same thing then people are probably going to use the other codes. If there's no other codes around that do the same thing and it's useful, people will probably put up with the frustration of using it. That said it's always a good idea to minimize frustration.
I might add this is one of the big the reasons higher level languages like Python, Perl, R, C#, etc. gained in popularity. They were significantly less frustrating to use than languages like C++ so whenever they could do the same job as C++ people prefer to use them. People still use C++ when the other languages are not up for the task, but it almost always loses out whenever another is available. That's a prime example of the frustration factor influencing people's choices. Usefulness only matters up until less frustrating alternatives exist.
While black-boxing something and GUIs are more a front end version of user-friendliness, there's concepts for the back end that also help quite a bit.
Source code readability is a big part of back-end friendliness as you mention. If no one can understand what a code does it's hard to modify it. I would say something that is equally import for back-end user friendliness is ease of integration both adding new features into the code and importing the code into other projects which often overlooked in a lot of scientific codes. The reason being is that it's often hard to retrofit extendability after the code has already been written. Usually scientist have a habit of writing a code for a specific problem, but don't think about the application beyond their specific problem. Designing a code to be easy to swap components around needs to be planned out as part of the core engine.
It's the same concept of designing say a car. We wouldn't design a car such that all the parts of the engine are hard welded to every other part. We would instead like to design it where we could replace the alternator or the muffler without needing to replace the entire engine if one piece of the engine goes bad. Or likewise we design computer hardware where the if the video card goes bad we can't simply swap it out for a new one. Well ok, there's computers that are like that, but that's another conversation. But in the same way with a car or a computer, to do that you need to design it in a way that allows you to exchange one part with another without compromising the core mission. In a car you have to build joints that can be disconnected, but at the same time can hold up to the forces that naturally occur in an engine. Which takes a lot of careful thought and engineering.
Likewise when it comes to programming if you want a code that is extendable, you need to design that into the code. For example in a Neural Network code, if you want people to design their own custom loss functions then you need to design the way the code calls the loss function such that you can exchange pieces seamlessly. Which for bigger code bases is not always an easy thing to do.
If you write a code that's not only useful, but also reasonably easy for user's to implement their own functionality into or to import into their own project. You usually end up with a code base like some of the more popular ones out there. I've sadly ran across many codes in the Molecular Monte Carlo space where their rigidity inhibits their usability. It's actually been a huge problem in that area. Molecular Dynamics has largely gotten it solved with LAMMPS, Gromacs, NAMD, etc.
Troy Loeffler Back to your GA example. You have 1) calculations to construct your job and 2) tool(s) to submit it to (a) cluster(s). You can do 1) with Fortran66 :) producing some output and 2) with your preferrable shell eating this output and feeding job to a cluster. In my limited experience with clusters, they always came provided with a quite small set of submission/recuperation *shell* commands/scripts and it was natural to use it to produce my own batch system. In early 2000s I was given access to some university cluster and was asked to evaluate their own job manager. It was written in plain C... but the only reasons were a good C knowledge of student who did it and his need to reach PhD :). I made few clicks, found that job manager does work, presented them half a page report, and turned back to my shell scripts. In a week they told me they had a lot of complaints from users. Why my jobs fill all the running nodes when just 1-2 were waiting in queues?! With their own jobs, they observed the opposite. Poor professors did it manually... they were unable to compete with my very simple scripts running 24/7.
"Back to your GA example. You have 1) calculations to construct your job and 2) tool(s) to submit it to (a) cluster(s). You can do 1) with Fortran66 :) producing some output and 2) with your preferrable shell eating this output and feeding job to a cluster."
Oh certainly. We've done a few things similar with an old Fortran code where we gave it the ability to do system calls so it could offload calculations to another engine. In that situation we had an older code that did an amazing job at optimization so we gave it a way to make calls to simulation software like COMSOL.
In the GA example the actual "brains" of the project is pretty fast and the bottle neck is primarily the energy evaluations since those are typically 30min or greater where as the GA component is a few seconds. So we didn't really need the speed from Fortran. It's more about setting up the code in a way that's easier to manage. I like Fortran as it's a great language for low level stuff, but Python code is definitely a much easier language to use.
"In my limited experience with clusters, they always came provided with a quite small set of submission/recuperation *shell* commands/scripts and it was natural to use it to produce my own batch system. In early 2000s I was given access to some university cluster and was asked to evaluate their own job manager. It was written in plain C... but the only reasons were a good C knowledge of student who did it and his need to reach PhD :). I made few clicks, found that job manager does work, presented them half a page report, and turned back to my shell scripts. In a week they told me they had a lot of complaints from users. Why my jobs fill all the running nodes when just 1-2 were waiting in queues?! With their own jobs, they observed the opposite. Poor professors did it manually... they were unable to compete with my very simple scripts running 24/7."
"People sleep, but scripts don't." is a term a former professor of mine said :)
I think shell scripting is a great tool to have in your arsenal for many of the reasons you talk about. It really helps cut out of a lot of the redundancies that commonly come with running computations. From batching to automated resubmission scripts to file management it is a great tool to have.
The first thing I tell a lot of students I've helped is to optimize your workflow. The initial habit is always to just simply type out each command one by one or repeat a lot of work. I always instruct them when you realize you are doing something a lot to figure out a way to automate it. It's amazing the amount of time that gets spent redoing commands over and over, but you often don't notice how much time that is till you stop doing it. Taking a step back to write a good script or to learn various shortcuts saves an amazing amount of time. Even something as simple as setting up an RSA key to not have to type your password every time you log into a cluster is surprisingly time saving.
I think the best way to understand/learn why some people are still stuck with Fortran is to perform numerical experiments by taking an example that is computationally intensive, for example, billions of arithmetic operations or a direct or iterative solver for a large-scale matrix system of equations, say a matrix of rank 1 million (doesn't have to be a dense matrix).
Implement it in both Python and Fortran and notice your observations.
Chennakesava Kadapa , it is not simple as evaluating performance issues. I just completed an update of around 40k lines of F77 code to F90. Do you think that, even with a gain of 50% in computing speed, someone is willing to pay me to translate F90 to another language?
Roberto Casalegno I completely agree with you on the resources required to port the code into other languages.
My comment was intended for those who think that one can simply replace Fortran with Python because there are equivalent libraries in Python. My point is that, even if one can port the code to Python, Python is not the magic solution, especially for scientific computing. Python is good for some applications but not for all. Python is notoriously slow for scientific computing.
I think some definitions are needed here to understand Damian Rouson's answer
1) C++ and Fortran (and COBOL, ADA, FORTH, etc...) produce codes. The programmer is using generalized, high level, CONTROLLED statements (regulated by an international organization) to convert an algorithm to bits a computer can understand. The programmer has full control (if they want) of this conversion and execution (minus licensing issues of course). The program you write for the compiler I would consider code.
2) Python (any form) and shell (and Perl, Ruby, BASIC, Julia, etc...) produce scripts and not codes. Yes, you are programming using generalized statements (some at a high level - maybe higher than available in Fortran or C++, but none are controlled by an international body: no standard), but the programmer is NOT in full control of the process and execution much less the version and reproduciblity of what is being executed. These interpreted languages have systems under them that control things like data passing, data collection, garbage collection (and the list continues). Execution output can differ from one computer to the next, one manufacture's version to the next because there is no standard agreed upon by all. The program you write for these interpreters I would consider scripts. It does not matter how many compiled (using C++ and/or Fortran) language extensions you include in the language, you are scripting together non-controlled pieces to perform an algorithm in a system that lies on top of the machines Operating System (OS) or running a program (your script) over a program (Python). This will be by definition slow.
With those definitions, why Fortran? Yes, it is a 50's relic, but not at the same time. It has native language, standardized, parallel, object oriented constructs, direct formula translation, and direct memory access data structures. Is it frustrating? Yes if you are writing a code you will run once. Scripting is much easier for that! However, if you are writing a code that will be run millions of times on different machines (weather, multi-physics, particle transport, this list is VERY long), or even a couple of 100 times to get a paper written, then I'd rather be stuck in Fortran! It is reliable, maintainable, overwhelmingly faster than any other language, especially any scripted (interpreted) language and for scientific programming, even C++, and has a huge support system - well, except if you are on the bleeding edge of Fortran 2018 like I am trying to be!
Why Fortran? Because it is the best tool for the job I am trying to complete.
I agree with both Damian Rouson and Robert Singleterry as well. Having the machine full control in solving your numerical issues is very important to what are aiming in your results. Obviously, scripts are welcome but together with Fortran and C++. In this way you can do great jobs.
Modern Fortran has everything needed + compatable with OpenMP and MPI on all platforms. I do not see any need to switch to Pyton.