I don't really know what "grid site" means. Grids were just a faddish term from the last decade for PaaS or SaaS, though mainly distinguished by administrative as well as platform heterogeneity. "Cluster" really only refers to one of two things: either a HA cluster (failover, STONITH, etc) or HPC cluster (shared high-performance fabric, often heterogenous) - but either meaning implies a single administrative domain.
A Grid site means a group of computers that were clustered together and was contributed as resources to the grid by its' owner. So in the context of Job scheduling and resource management purpose, I would say YES to the use of grid site and cluster interchangeably....
No, definitely these world cannot be exchanged freely. A grid can use a cluster, but a cluster is not necessarily a grid. Grid involves the aggregation of "geographically-disperse" and "heterogeneous" resources from "different organizations" to solve computationally complex problems (mainly); although there are other types of grids such as grid services, data grids, etc.
A grid is a collection of resources. Those resources might be clusters or clusters plus other resources. In general, a grid is a geographically dispersed and more importantly organizationally and administratively diverse collection. I.E. a virtual organization. It is usually about more than just scheduling jobs on compute clusters, although for some grids (such as NSF's TeraGrid) that turns out to be the most important aspect. It can be a mushy idea since it was taken up and used as a marketing term, often by folks you didn't fully grok it. The classic paper as a reference is http://toolkit.globus.org/alliance/publications/papers/anatomy.pdf
Again, Grid is archaic. It's hard to imagine a context where it would make sense to pursue a Grid approach, rather than something more modern. The only significant Grid is Cern's, and it's more of a "distributed charity cluster" (since nodes involved are statically partitioned from their hosting organization and run the Cern stack exclusively). Since Teragrid has a mostly coordinated administrative infrastructure, I'd claim it's not a grid.
That's really the point of why Grids were a historic failure (OK, cul-de-sac). An organization that has acquired a large resource is responsible to its funding organization to justify the capital and operating costs. If the resource is loaned to some third party (the definition of Grid), then the funder may not be impressed. "Oh, yes, you gave us $10M for a cluster because we said we needed it, and we would up giving 82% of the cycles away to Particles@Home."
This is not totally off-topic. The original question asks to distinguish grids and clusters for scheduling purposes. A grid is a virtual, opportunistic/charity cluster, and so yes, it has very different scheduling needs. A grid receives resources at unpredictable times, in varying amounts, and might not even be able to hold onto them for significant periods. A (real) cluster owns its resources, and can schedule them into the future. In fact, cluster efficiency (for a mixed workload) depends on being able to forward-schedule. (This fact is ironic, since many HPC clusters do no significant forward scheduling, and thus wind up with unfair opportunistic schedules.)
Thank you. You have some good comments but I would like to clarify a few of them. As the ORNL site principal investigator, I don't think that the statement that the "Tera[G]rid has a mostly coordinated administrative infrastructure" is completely accurate. Parts were coordinated and parts were not. For example, each TeraGrid user had separate username and identities at each site and the TeraGrid overlay provided identity mapping across the different nodes. There are other aspects as well of how it was decentralized with coordination overlays (for example routing policies.) Again, this is a point about the need to think about Virtual Organizations (VO's) to think about grids whereas clusters tended to live within a single administrative domain.
The discussion about offering "charity" cycles making grid a "cul de sac" is important, but not complete. For example, there are examples where one organization funds system acquisition but does not make arrangements for operations. Unfortunately, this happens a lot. Finding a different organization to fund operations for a portion of the cycles is an option and has happened many, many times (and still does). Moreover, there is a common trend of funding a new system with portions of the cycles dedicated to different sponsors both local and national/international in scope. Having 25% of a machine 4X larger than one could buy alone but having to share it with three other partners is, it turns out, a desired opportunity for many funding agencies, especially for large, leadership class, systems. I would rather have 3 months of a machine 4X larger than exclusive access to a smaller supercomputer/cluster.
However, your point about it creating operations management challenges is correct. A system operations staff now has more than one "customer" to satisfy and the queue (and on demand) priority structure must reflect that.
RE "charity" issues. You are correct that it does present challenges both in term of operations and in terms of justifications, but therefore dismissing it as a "historic failure" is an incomplete characterization. For example, the Open Science Grid is alive and kicking at the current time and many (including sponsors who buy large clusters) see it and its "Charity" cycles as a success story showing good collaboration across HPC centers, across funding agencies and internationally.
Grid computing system is a widely distributed resource for a common goal. It is Brother of Cloud Computing and Sister of Supercomputer. We can think the grid is a distributed system connected to a single network. This types of computing work with the large volume of files. Basically, it is a cluster types system. So people call it cluster computing.
Grid computer tends to be more geographically disperse and heterogeneous by nature. Grid network also has various types. A single grid is like dedicated connection but a common grid perform multiple tasks.
The size of the grid is large. So grid computing is like supercomputing. It consists of many network, computer, and middleware. Grid computer is dedicated to some specific function of the large volume of data. In the grid process, each task divided into a various process. All the process starts execution simultaneously on a different computer. As a result, very few seconds needs to execute and enjoy the flavor of supercomputing.