It would be nice to hear answers that talk about physical limitations as well as answers that discuss how efficiently low-level software anticipates memory allocation.
Douglas, it depends on the architecture of the parallel computer how much ram is shared. Some parallel computers have small ram areas that are shared, and some have no shared ram areas, while others have large ram areas. The size of the computer doesn't depend on shared ram, but the amount of shared ram may impact on the type of processing that can be done by it.
It basically depends on your understanding of a "large parallel computer". If you refer to machines used in high-performance computing (HPC) - those that can be found on the TOP500 list - then most of the RAM is not shared on a physical level. These HPC installations are massive clusters of multi-core computers, where (per node) only a few cores physically share the same RAM. Naming such a HPC cluster as single "large parallel computer" is common, but maybe already debatable. The MPI programming model facilitates this by enforcing the developer to think in message-based coordination of their parallel codes, instead of coordination through shared (synchronization) variables. Therefore, MPI programs fit seamlessly on such large parallel computers.
The majority of the literature defines the problem as "scale up" (more processors / memory per box) vs. "scale out" (more connected machines with processors and memory) for performance improvement.
Scaling up hardware is technically and physically difficult. If more and more physical processors share the same physical RAM, then the hardware coordination becomes more and more difficult. On of the most obvious problems is the cache coherency between the processors. GPU's are another interesting approach for hardware scale-up, were the shared memory coordination problem is tackled by introducing a deep hierarchy of different, but still directly accessible, memory levels. But all in all, there are tough limits in scale-up, and HPC people already understood that a long time ago. Check the Intel SCC project and IBM mainframe technology for state-of-the-art ideas in hardware scale-up.
Whatever the hardware situation is - big single shared memory machine or cluster setup - runtime software can emulate either a shared memory or a message passing environment for the developer. One typical example are distributed shared memory (DSM) concepts.
SMP (Symetric Multi Processing) v MPP (Massively Parallel Processing) are the two architectures. SMP machines usually have very large memory that is shared amongst the different cores/cpus. The more cpus/cores you add to an SMP, the more on chip logic you have to have to keep cache coherencies in sync for *all* the cores/cpus. Additionally, you are running one instance of the OS which has to maintain all the software scheduling, mapping threads to cores, paging out applications - with a huge number of cores/cpus the OS spends many more cycles in the operation of the hardware - keeping your application from getting all the available cycles.
MPP would qualify as the scale-out that Peter mentions. In a previous job, I worked with IBM's Blue Gene supercomputer - an MPP machine. It had 1024 nodes (16cores/16GB mem per node) in a cabinet, and you could have as many as 96(?) cabinets linked together. In that scenario, there is no shared memory except on the individual node, so MPI programming was what was expected.
The answer to the original question depends on what is meant by "shared" for memory in an MPP distributed-memory system. The nodes in such a system are interconnected by a hardware network that facilitates moving data between nodes. Others have already mentioned using MPI_Send and MPI_Recv to program such data movement.
A reasonable definition of "shared" in this context is that a program execution stream running on node A can put a value into, or get a value from, a memory location on a different node B without any corresponding software executing on node B. This is sometimes referred to as "globally addressable" memory, with such remote addressing supported by the network hardware. In software terminology, this is called "one-sided communication" since only one of the nodes is executing program statements to affect the transfer. Such systems do exist, mainly in the HPC space, and including a fair fraction of the ones near the top of the Top500 list. Not surprisingly, there are performance advantages to a design like this.
The one-sided communication scheme is inherent in several parallel programming models, including the SHMEM library (name a contraction of SHared MEMory), the MPI 3 one-sided facility using MPI_Get and MPI_Put, and languages that have this model built into the syntax, such as UPC, UPC++, coarrayC++, and Fortran.