I would like to have an opinion about the present options in terms of programming languages and compilers to develop high-performance computing codes (indeed including parallel computing).
generally speaking, I think it's best to use any of the rather modern Fortran incarnations (90/95, 03 ...). As compared to C++ IMHO it is much easier to write a well performing (possibly MPI/OpenMP parallel) code in Fortran -- the reason is that it is a lot easier to spoil or mess up the memory access when juggling around with pointers in C++.
All the rest concerning the efficiency coming from implementing a good algorithm in a good way was probably already discussed above.
Concerning the idea for CUDA-based computations you are right. If you want to use CUDA anyway it is a lot easier (i.e. cost-effective) to go for C/C++, because the only Fortran compiler that I am aware of which allows a built-in interface with CUDA is the PGI (i.e. expensive) one.
From my personal experience I found best performance for a hybridly parallel code in mostly Fortran 90 using Intel Fortran and MKL on homogeneous Intel Xeon clusters or using GNU Fortran, ACML and FFTW on homogeneous AMD Opteron clusters.
I'm currently using Fortran in is modern incarnation. The new language features in the versions 95, 03, 08 enable nice coding, significantly better than Fortran 77. Fortran compilers are matured, although a lot of the features from Fortran 2008 is right now not available.
The alternative would be C or C++. Compilers are also available for both languages, they also support MPI and openMP and as far as I know the performance is in general the same. Performance will be much more influenced by your coding style than by the language choice.
In both directions, it is possible and not too complicated to mix the languages, e.g. calling C/C++ from Fortran and vice versa, a coupling to other languages is also possible.
Therefore, in my opinion, it depends on the following questions which language you should use.
1) If there is any exisiting code, in which language is it written?
2) In which language you and your colleagues are experienced?
3) Are there any librarys you want to use?
4) Do want to use object-oriented features?
If you want to use heavily object-oriented features, C++ might be the language of your choice. If needed only rarely, modern Fortran will do. Currently, there are much more C++ codes and experience than for modern, object-oriented Fortran.
Anyone might correct me if I'm wrong, but to me it seems that modern Fortran is related to old Fortran like C++ to C. If you use C++, C language features are still available as a subset, in the same way as F77 features are a subset of F95+
I would suggest C++ in many cases, although Fortran is entrenched in some areas. In addition to the object-oriented vs. not object-oriented discussion, a recent trend in modern C++ is template-based programming. In many cases, that can give you versatile libraries with simple syntax, that still perform comparable to a much more arcane hand-tuned code, since the compiler will have extensive information available on e.g. loop lengths. The recent C++11 standard and the boost library have also raised the bar on what kind of libraries you can almost take for granted within the C++ world.
However, it should also be noted that serious HPC efforts are also done in scripting or high-level languages, e.g. Python augmented by SciPy or MATLAB. While both environments are inherently constrained and rather slow, it is possible to write HPC codes in them given the proper parallelization support and assuming that most of your computational load will be contained within library calls. If most of your calculations consist of standard operations, be it matrix algebra or some optimization or clustering process (that you are calling, not implementing yourself), the convenience of a scripting environment with a wider pool of available talent and supposedly less steep learning curve may be preferable.
I would, however, generally not prefer Java in this case. Java will be much faster than Python for the code you write yourself, but in general interactions with external libraries for just the kind of tasks mentioned above can be more cumbersome. Java will be a compromise in this manner - "easier" than C++ or Fortran, but still not as productive as Python and with less flexible interactions to C libraries than you get in Python.
But, really, the algorithms matter the most. If a good implementation of your core algorithms already exists (with a suitable license), use it. Don't reinvent the wheel, unless you feel that your competitive advantage will lie in adaptions to that core algorithm. Most frequently, though, the advantage is in how you use a core method, not in how you specifically implement it.
(NOTE: When it comes to personal preference, I tend to do a lot in C++. I'm trying to make the case for at least considering Python etc anyway just to give a more general view of the options available.)
I like C and C++. Codes are a little difficult to manage during development but they execute fastest. I use structures in C to get a feel of OOPs even in C programming and to protect the global variables.
I guess the choice is between C and Fortran and their many re-incarnations.
I was not considering commercial scripting language like matlab as large deployment for example on 64-128 parallel nodes significantly affect the general cost (it is still a quite expensive piece of software). I don't have deep knowledge of python.
I howevre would like to raise the bar of the discussion saying that also
licensing cost of codes and compiler is still an element to care about.
For example under Linux and its variants (you don't want to develop HPC codes in windows or mac, do you?) . Intel offers its set of compilers free of charge for academic purposes and there is still the option of the GNU compiler.
For example, i'm evaluating the PGI compilers (very popular), still the compiler itself is expensive.
thanks for your answer, this is a very interesting point. I played a bit with CUDA devices, but I just scratched the surface. However there are number of limitation with CUDA hardware that makes me a little less confident (maybe you can give me feedback from your first hand experience).
The real point in CUDA is that if you have problems 'memory' demanding (like FDTD)
the cost to have let say 24GB of memory hardware per node (e.g. 4 x 6GB Tesla Card) is like 2 orders of magnitude the cost for a similar central memory amount.
(Actually i tried also the AMD solution and it is a bit more cost effective in in double precision math)
In addition if your computation requires more memory (let say 128Gb per node) there is not presently available option. You need to implement very low efficiency in-code transfer and flush of the CUDA hardware memory. Basically you have a performance overhead that eventually can hit the CUDA approach respect the regular approach.
generally speaking, I think it's best to use any of the rather modern Fortran incarnations (90/95, 03 ...). As compared to C++ IMHO it is much easier to write a well performing (possibly MPI/OpenMP parallel) code in Fortran -- the reason is that it is a lot easier to spoil or mess up the memory access when juggling around with pointers in C++.
All the rest concerning the efficiency coming from implementing a good algorithm in a good way was probably already discussed above.
Concerning the idea for CUDA-based computations you are right. If you want to use CUDA anyway it is a lot easier (i.e. cost-effective) to go for C/C++, because the only Fortran compiler that I am aware of which allows a built-in interface with CUDA is the PGI (i.e. expensive) one.
From my personal experience I found best performance for a hybridly parallel code in mostly Fortran 90 using Intel Fortran and MKL on homogeneous Intel Xeon clusters or using GNU Fortran, ACML and FFTW on homogeneous AMD Opteron clusters.
there are two directions in this area, multicores or multi-machines or both. OpenMP is good for multicores and MPI is the way I prefer on multi-machines. It depends on many of the items mentioned before. Both are C centric which is supposedly taught in almost all universities., but most probably other languages support them also.
If you have to go clusters, you might also consider PVM http://www.csm.ornl.gov/pvm/ One can also use if from Fortran, C/C++, R, Python, and others.
Instead of CUDA I would always suggest OpenCL, it may be less feature rich. but unlike with CUDA its CPU code path is not just an emulation mode - mostley because AMD and Intel support it, and they are both producing CPUs.
Alexander: I think you have a fair point in that computer science have produced a lot of supposedly useful techniques that have not entered the main stream. On the other hand, most of the people discussing Fortran here have also highlighted the ways in which modern Fortran gives you a more expressive language than Fortran77 (or even older versions). When I advocate C++, I do so with significant use of the STL, boost and similar libraries. That will bring it quite some distance away from C and allow e.g. some aspects of functional paradigms to fit it in nicely in an overall imperative structure.
And for parallel processing we are all taking OpenMP and MPI more or less for granted. While both are comromises of practicality, they have become mainstream and changed the simple serial view that dominated for decades. Someone simply writing a purely serial code these days is rightly considered slightly backwards, or working on toy-size examples (or that the problems are not really demanding in the first place - there are naturally plenty of relevant scientific problems that can benefit from computations while also being trivial from a computational standpoint).
Most have already mentioned OpenMP and MPI as methods for taking advantage of both multi-core and clusters - available in C/C++/FORTRAN. Hybrid hardware (LLNL RoadRunner) and GPU programing (nVidia and others) presents it's own issues as you need to handle - in some cases different compilations, because machine code is different.
It is my opinion that the algorithms you select will have more influence on performance than the compiler/language you choose. When selecting (or designing) your algorithm, how it scales will be critical to its success.
I've done IBM Blue Gene programming for the past six years. Blue Gene is MPP, but with multiple cores per node, so I've done both OpenMP and MPI. The more you can keep on a given node will help performance. Scaling to large/huge number of nodes can have significant communications overhead. BG/P hardware is 1K nodes, w/4 cores per node. Are you better with 4K MPI tasks or 1K tasks with 4 threads per task? What happens when you scale up to 8K nodes? can your problem scale that much?
Answering those questions becomes critical before worrying about what language (and which compiler) to implement it in.
On the use of code, Fortran (ifort Intel) or gfortran .. is a good choice I think for parallel computing, being essential to know about openmpi for best performance levels in the source distribution.
When using computers with Intel processors are available ifort compilers that we have reported very good results, for example in the use of programs like Dalton, Gaussian, Dirac .. among others.
The framework could be with python + Fortran, gives very good results.
Certainly Fortran is the dominant language in HPC. HPC has been the focus of Fortran for as long as the concept of HPC has existed. There is a large existing code base, and the language semantics lead to better compiler optimization opportunities. Fortran is a continually evolving language even though the name stays the same. The current Fortran standard (Fortran 2008) includes some very useful features for HPC programming such as arrays and array operations built into the language, object-oriented programming and modules/submodules, and well defined syntax and semantics for calling C functions (for interacting with system utilities). Most, if not all, HPC codes are targeted for execution on parallel systems. Perhaps the most distinguishing feature is that SPMD (distributed memory) parallelism is built directly into the language syntax. Local (on node) parallelism can be represented using the same syntax, and OpenMP also has a Fortran binding.
Other languages (notably C and C++) try to emulated the native facilities in Fortran by adding on layers of libraries, but the result generally does not perform as well and is harder to maintain. Which is why Fortran continues to dominate this space.
Pavel, not to focus on a side issue, but I'm trying to understand your statement
"Moreover, in Unix, all Fortran codes are previously translated on C++ and only then compiled into executables files!"
Are you saying that Unix Fortran compilers are really only translators - translating to C++ and then compiling using C++ compiler? If so, then I'll have to strongly disagree with you. If I recall correctly, GCC doesn't work that way and I *know* that IBM's XL compilers do not. I'm trying to understand where/why the statement is being made.
Fortran is a perfectly viable language. While new codes are being written in C/C++, many of the existing HPC codes are still maintained (and enhanced) in Fortran.
For an expert C++ practitioner, using restrict extension and other compiler specific extensions, it's certainly possible to meet or exceed Fortran performance. Those who track portability, coding productivity, or maintenance may be inclined not to use those features fully.
I did not understand the comment "Moreover, in Unix, all Fortran codes are previously translated on C++ and only then compiled into executables files! ". Fortran compilers create either assembly language or binary output - never C++ source that is later compiled. That would defeat the optimization advantages that Fortran permits. And Fortran compilers have been around much longer than C++ (or C) has existed.
At the moment, C++ does not have native parallelism in the same way Fortran does, though it could be added to C++ in the future. So comparison is not so straightforward. If you are comparing Fortran + MPI to C++ + MPI, then I would not expect the parallel part of the performance to be much different, since both are calling the same library. The more interesting comparison is Fortran without MPI (or any other library) compared to C/C++ with MPI. There are several examples of Fortran coming out ahead in that comparison.
I agree that the large accumulation of legacy code helps Fortran maintain its domination is HPC. However, the syntax and semantics are also specifically aimed at large scale scientific computing, and these are advantages as well. Fortran's main popularity problem is that most computer scientists, especially at universities, do not know the language, so the students do not learn it. I agree that C++ attracts a more diverse set of users - business applications, or even system programming - and will provide more job opportunities. In fact, some Fortran compilers are written in C++.
difficult question with no proper answer. It really depends on what you want. Different programming languages are good for different types of applications. I myself prefer to work with both Java and Python.
In my years of experience, up to 6 years ago, it has always been Fortran and C. Which one depended mostly on whether the people came more from engineering or more from CS school of thought.
In what we were doing Fortran was almost exclusively used.
From what I read around nowadays, with the new updates to the Standard F2003/08 and with the introduction of Co-Arrays, it seems to be gaining momentum again. Its development has been slow due to new emerging languages. But old is gold.
I believe that MATLAB turns computationally intensive parts of a program into calls to library routines that are compiled, and evidently optimized. That would explain how it might be faster than user-written code. Of course, for an extensive project (large climate model, for example) you can get better overall performance with a hand-written code.
I presently make extensive use of Matlab for analisys but I would not say it compares in speed with native codes. Indeed library computational extensive functions (e.g FFT) run at a similar speed, but the extensive control code required in complex problems and also iteration-intensive problems (e.g. FDTD) are order of magnitude faster using standard compilers.
In addition how well Matlab fits in highly parallelism is quite debatable.
Intel C (icc) compilers, highly preferable than gcc. They are fast and provide better optimization. Check out the -fast options in man pages. I use MPI to build parallel programs and is the best option. It's easy and you can install it over Quad core processor and enables one to use 8 processors to execute the tasks. C has always been my only choice. Though I worked on C++ and Fortran (a bit), the C coding seemed much better and also easier to build very large programs. Eg. you can write a subroutine in a single line and allows one to visualize maximum segment of the software.
Ravi - gcc may not be as efficient, but icc limits you to Intel based platforms only :-(
Many of the top 500 (top500.org) HPC installations are not Intel.
As an aside, as one teaches parallel programming, remember that desktop systems, and even small clusters generally don't go beyond 256 cores/threads. Limiting the algorithmic implementations to that stunts the ability of some solutions to come to the forefront. Today's large HPC installations have 10Ks (if not 100Ks) of available compute elements, creating a solution that doesn't scale hurts both the implementation and the implementor.
If you think seriously, try Mathematica. It is not only for matrices :), not worse in the field of numerics, parallel calculations, CUDA etc. are also contained (without the need for low level programming). It does the calculcations in C if you wish. See also http://www.siam.org/books/reviews/ot86review.pdf
C/C++ and Fortran allows to implement parallelism in a easy way. As Simen Gaure comments, both C++ and Fortran can be combined with Threads, MPI and OpenMP to implement parallel programs.
Actually, with modern Fortran you do not even need calls to the MPI (or any other) library. SPMD parallelism is built directly into the language syntax. And, for a beginning student, I would argue it is simpler to understand since the remote accesses are immediately visible in the syntax.
I think there is a misunderstanding here between the compiler (and the associated language), and the programming model. You can have one of the best compiler in the world, but if your parallel programming model is not well suited to your hardware environment, or to your initial problem, this will give poor results.
I personally think that even with the best compiler, using MPI over a single multi-core CPU is not a good idea in term of performance. POSIX thread are in this case much more adapted to the hardware architecture. Now, it also hardly depends on what is the problem to solve.
Moreover, the final performance does not only rely on the compiler performance, or on the hardware performance, it also depends on your synchronization scheme used in your code. Two common errors in parallel programming come from bad synchronization schemes that will generate to much sequential parts, and from an unadapted parallel algorithm (mainly a code that is a transformation of an existing sequential one). Think parallel :-)
Try CUDA C/C++ if you want to use Nvidia GPU specific requirements otherwise use OpenCL it provides efficient way to increase performance in vendor independent manner. You can use OpenMP , threads and all basic function in both. If you are familiar with python then pyCUDA wrapper is also helpful to you.
Please do check out new programming language called Julia. Its inherently parallelised for execution and can be embedded easily, thus producing complete scientific solution. I wrote a book for introductory users which can be found as a link at my website http://bookmuft.com/my-books/ there are some useful tutorials on the website too.