To what extent is RAM shared in large parallel computers?

More Douglas C Youvan's questions See All

Best mask design for metal deposition aiming build a Perovskite solar cell device?

What types of mask designs for metal deposition, to be used in a PVD system, are best suited for perovskite-based solar cells?

31 July 2024 4,835 3 View

Is entropy a better explanation for redshift than dark energy?

A seemingly obvious explanation for redshift in light that has traveled distances on the order of hundreds of millions or billions of light-years would be a slight loss of energy resulting in...

14 January 2024 10,034 2 View

How can the time dilation predicted by special relativity be confirmed in one reference frame, and contradicted in another reference frame?

In 1971, Joseph Hafele and Richard Keating used atomic clocks to test the prediction of time dilation resulting from motion (special relativity) and gravity (general relativity). In 1972,they...

12 January 2024 3,290 11 View

What justifies the use of p-values in tests of d-separation?

In his very helpful online book on structural equation modeling, Jon Lefcheck writes the following concerning d-separation tests for SEMs: "Once the model is fit, statistical independence is...

20 November 2023 4,457 8 View

Flight satisfaction using R and Linear Regression?

Need to use csv files to create a flight satisfaction paper using R and Linear Regression

14 October 2023 6,334 3 View

What is the impact of statutory instruments on the performance of banks?

definition of statutory instruments and its effect on profitability of banking sector

11 October 2023 9,047 3 View

What is the impact of statutory instruments on the performance of banks?

the impacts of statutory instruments on banking perfomance

11 October 2023 7,080 2 View

How do I spotlight a particular recent article? Must I wait until spotlight time runs out on another of my articles?

I want to do a Research Spotlight on this article: Organizational and Coalition Strategies for Youth Violence P...

15 September 2023 4,509 0 View

How do I get ResearchGate to remove citations to things I have NOT written?

There is no obvious link on the website to allow correction of errors by ResearchGate or myself.

17 May 2023 4,339 2 View

Plasmid yield decreasing and reduce transfection efficancy?

I have been regularly amplifying bacteria with my plasmid of interest and performing midipreps yielding 400-600 ng/uL. As of late, my yields have been very low tanking to almost 80 ng/uL. I...

29 March 2023 6,399 3 View

Separation of organic acids-HPLC?

Hello What should be done to separate and identify organic acids in HPC when their RetTime is the same?Like oxalic acid with Propanoic Acid.or acids that have a very close RetTime.

07 August 2024 8,782 3 View

Which test should be used to study association among demographic profile and awarness level?

i have to study the awareness and adoption level of cloud computing in a district of India. i also want to use association among demographic variables like gender, age, education, income etc and...

02 August 2024 2,420 3 View

Difficulty with permittivitt and Magnetic Permeability Calculations?

Difficulty with permittivitt and Magnetic Permeability Calculations Hello everyone, I have all the parameters related to the calculations of the permittivitty and magnetic permeability...

30 July 2024 5,206 1 View

Simulation of metal drawing by Abaqus with UMAT?

Hello, colleagues. Recently, I have been working on a metal processing simulation with my UMAT in Abaqus. I have outlined the corresponding simulation, but I keep encountering issues that cause...

30 July 2024 7,062 1 View

How to use Desmond in HPC ?

Our department has recently acquired an HPC (High-Performance Computing) system, and I'm thrilled to take my molecular dynamics calculations to the next level using Desmond. I used to run my...

28 July 2024 6,553 1 View

Please, what is the memory consumption of the Matlab function quad tree decomposition procedure [S = qtdecomp(I)] with respect to the input set I?

27 July 2024 5,455 2 View

All math can be explained by iterator of code?

all math can be traversed by code? all math can be translate to code?

26 July 2024 9,530 0 View

Flow through curved domains?

Hi everyone, I am working on a curved domain in which a ship is situated in the middle (geometry is given below). In my understanding the general fluid flow is parallel to the x axis from inlet to...

25 July 2024 9,058 4 View

Does post-translational protein modification cause devisions on observed pI verses calculated pI?

In running two-dimensional gel electrophoresis on bacterial protein, some spots that appear to match a protein sequence have a significantly more acidic isoelectric point than the calculated pI....

24 July 2024 8,076 3 View

Can a shoot-through event of a tri-state digital buffer cause momentary Hi-Z state?

// interested in the difference between floating events and short circuits.

22 July 2024 6,565 0 View

Graeme Smith

Douglas, it depends on the architecture of the parallel computer how much ram is shared. Some parallel computers have small ram areas that are shared, and some have no shared ram areas, while others have large ram areas. The size of the computer doesn't depend on shared ram, but the amount of shared ram may impact on the type of processing that can be done by it.

Peter Tröger

It basically depends on your understanding of a "large parallel computer". If you refer to machines used in high-performance computing (HPC) - those that can be found on the TOP500 list - then most of the RAM is not shared on a physical level. These HPC installations are massive clusters of multi-core computers, where (per node) only a few cores physically share the same RAM. Naming such a HPC cluster as single "large parallel computer" is common, but maybe already debatable. The MPI programming model facilitates this by enforcing the developer to think in message-based coordination of their parallel codes, instead of coordination through shared (synchronization) variables. Therefore, MPI programs fit seamlessly on such large parallel computers.

The majority of the literature defines the problem as "scale up" (more processors / memory per box) vs. "scale out" (more connected machines with processors and memory) for performance improvement.

Scaling up hardware is technically and physically difficult. If more and more physical processors share the same physical RAM, then the hardware coordination becomes more and more difficult. On of the most obvious problems is the cache coherency between the processors. GPU's are another interesting approach for hardware scale-up, were the shared memory coordination problem is tackled by introducing a deep hierarchy of different, but still directly accessible, memory levels. But all in all, there are tough limits in scale-up, and HPC people already understood that a long time ago. Check the Intel SCC project and IBM mainframe technology for state-of-the-art ideas in hardware scale-up.

Whatever the hardware situation is - big single shared memory machine or cluster setup - runtime software can emulate either a shared memory or a message passing environment for the developer. One typical example are distributed shared memory (DSM) concepts.

Jerrold (Jerry) Heyman

I'll second Peter Tröger's comments.

SMP (Symetric Multi Processing) v MPP (Massively Parallel Processing) are the two architectures. SMP machines usually have very large memory that is shared amongst the different cores/cpus. The more cpus/cores you add to an SMP, the more on chip logic you have to have to keep cache coherencies in sync for *all* the cores/cpus. Additionally, you are running one instance of the OS which has to maintain all the software scheduling, mapping threads to cores, paging out applications - with a huge number of cores/cpus the OS spends many more cycles in the operation of the hardware - keeping your application from getting all the available cycles.

MPP would qualify as the scale-out that Peter mentions. In a previous job, I worked with IBM's Blue Gene supercomputer - an MPP machine. It had 1024 nodes (16cores/16GB mem per node) in a cabinet, and you could have as many as 96(?) cabinets linked together. In that scenario, there is no shared memory except on the individual node, so MPI programming was what was expected.

Bill Long

The answer to the original question depends on what is meant by "shared" for memory in an MPP distributed-memory system. The nodes in such a system are interconnected by a hardware network that facilitates moving data between nodes. Others have already mentioned using MPI_Send and MPI_Recv to program such data movement.

A reasonable definition of "shared" in this context is that a program execution stream running on node A can put a value into, or get a value from, a memory location on a different node B without any corresponding software executing on node B. This is sometimes referred to as "globally addressable" memory, with such remote addressing supported by the network hardware. In software terminology, this is called "one-sided communication" since only one of the nodes is executing program statements to affect the transfer. Such systems do exist, mainly in the HPC space, and including a fair fraction of the ones near the top of the Top500 list. Not surprisingly, there are performance advantages to a design like this.

The one-sided communication scheme is inherent in several parallel programming models, including the SHMEM library (name a contraction of SHared MEMory), the MPI 3 one-sided facility using MPI_Get and MPI_Put, and languages that have this model built into the syntax, such as UPC, UPC++, coarrayC++, and Fortran.