Different research areas such as live data mining and weather forecasting in which we can implement parallelism to use GPU computing to increase performance (by using GPGPU computing principles).
I agree that the paper is an excellent start. A few years ago, I would have said that the best application areas are computational chemistry and structural analysis, including implicit crash codes. Now I think certain forms of search (big data) are an even more important fit.
There are two key principles to check when examining whether any kind of accelerator is worth using: First, how many times will the data be re-used once moved to the accelerator? Second, can the CPU continue operating in parallel while the GPU works?
Discrete GPUs are usually attached by a PCIe bus, which in the current generation means about 16 GB/s transfer bandwidth, and more latency than you'd get simply accessing local DRAM. That means you need a rule-of-thumb about when you'll get enough speedup from the accelerator to make it worth the trip; a few years ago, you needed several thousand re-uses of each data point, but now it can be as low as 100. When it was in the thousands, people mainly used accelerators to raise their TOP500 score (dense matrix LINPACK) since matrix-matrix multiplication is very forgiving of bandwidth-challenged architectures (order M reuse to multiply M by M matrices). Chemistry applications tend to have a lot of re-use of data also. But now imagine we're trying to search for a fit to a particular complex object, like a video clip or a sound or a person's face... once each GPU is loaded with its own part of the database against which to compare, all you have to send the GPU is a few bytes for each query, so you get immense re-use from moving the original database to the GPU. This is why many search engine companies are planning to add GPU accelerators to their data centers.
And on the second point, GPUs are best NOT regarded as "offload engines" since there is generally no reason the CPU has to stop and wait for the GPU to finish its specialized task. Properly used, the CPU does general-purpose computing while asynchronously sending out the "heavy-lifting" tasks to the accelerators. Not all applications permit this, but when they do, the GPU becomes additive performance and not alternative performance (which is when people compute performance ratios). Even if an accelerator were half as fast as the CPU, if you've got it installed for other reasons and you can use it in parallel, why not invoke it to get 50% more speed?
Lastly, vendors are now incorporating GPUs into the same chip as the CPU, which dramatically lowers the barrier to being a good application fit; the GPU can share the same memory, even with cache coherency, which allows for very large memory applications that would not fit into the separate (although extremely fast) memory of a discrete graphics unit. So in a few years, the answer to your question will probably be very different from what it is now, and there will be a vast set of applications that can benefit from the GPU accelerator approach.
The GPU is not a panacea for all kinds of parallel activities, but it does support activities in which the I/O is not dominant, but ALU activities on a substantial block of memory (after being transferred via PCIe) are significant. Thus a real-time system transferring large volumes of data would not be good (lofar correlation for example see, " R. van Nieuwport and J. Romein, “Using Many-Core Hardware to Correlate Radio Astronomy Signals,” ICS.09 , 2009. ") But for performing a large number of vector style operations on all combinations of a big block of data they are very good. But the latter is precisely what they were designed for. Their evolution (in my case of AMD GPUs) shows that the manufacturers are trying to expand their ranges. The I/O capabilties (ie fetching data from memory on the GPU ) has improved significantly over the last three years.
Thus, I would guess, the domain is defined by the number of data-accesses relative to the number of ALU operations on that data-buffer before you move on to the next buffer
John S., I think you restated exactly what I said: the number of times a data point is re-used for arithmetic dictates whether at GPU fits or not.
It may be useful to start listing things for which a GPU is NOT a good thing, at least if it's a discrete card as opposed to an integrated GPU (that is, an APU). Streaming FFT data is one of the best examples I know, and it's frustrating because FFTs are so important to so many applications. It doesn't matter whether the FFT is 1D, 2D, or 3D; they just don't need more than a few dozen operations per data point, so it's very difficult to push data in and out at 16 GB/sec yet sustain over a teraflop of arithmetic performance unless you're doing repeated FFT operations or have a lot of other processing work to do on the data besides FFTs. It's a bit counterintuitive, because historically, accelerators have been GREAT for speeding up FFT operations... but that was back when CPUs were much less adept at floating-point than they are now.
If you can keep an entire problem, say a PDE solver that uses FFTs, in the graphics memory on the card (currently about 4 to 8 Gbytes), then you can get a high fraction of peak speed and it will be a great use of the GPU; but using it for streamed signal data is going to be an exercise in frustration, for example.
Predictive Modeling is done through extensive computer simulation experiments, which often involve large-scale computations to achieve the desired accuracy and turnaround time. It can also be called "Modeling the Future". Such numerical modeling requires state-of-the-art computing at speeds approaching 1 GFLOPS and beyond. In case of Computational Biology , It is the modeling and simulation of self-organizing adaptive response of systems where spatial and proximal information is of paramount importance.
Areas of Research include:
1.Numerical Weather Forecasting.
2.Flood Warning
3. Semiconductor Simulation.
4. Oceanography.
5. Astrophysics (Modeling of Black holes and Astronomical formations).
6. Sequencing of the human genome.
7. Socio-economic and Government use.
2.Engineering Design and Automation
Applications include:
1.Finite-element analysis.
2.Computational aerodynamics.
3.Remote Sensing Applications.
4.Artificial Intelligence and Automation
This areas requires parallel processing for the following intelligence functions:
1.Image Processing
2. Pattern Recognition
3. Computer Vision
3.Energy Resources Exploration
Applications include:
1. Seismic Exploration.
2. Reservoir Modeling.
3. Plasma Fusion Power.
4. Nuclear Reactor Safety.
4.Medical, Military and Basic Research
Applications include:
1. Medical Imaging
2. Quantum Mechanics problems.
3. Polymer Chemistry.
4. Nuclear Weapon Design.
5.Visualization
Applications include:
1.Computer-generated graphics, films and animations.