In low level process architecture, cache memory plays a vital role. If we increase the size of cache memory then we can increase performance with the same hardware. Why it is so small in size?
Physically, cache memory belongs to the so called associative memory type, meaning that instead of looking for a memory position via the typical access algorithms, you compare a set of labels associated to each position with the entrance (in parallel, usually). This require extra space and extra hardware: not only they are more expensive, but also harder to miniaturize. Let me try to explain it with an example:
Let's say we have a 1024 bytes ROM with an access time t1 and a 128 bytes cache with access time t2 equal to 1/10 t1. In every program, there are usually a few instructions that repeat again and again, whereas the rest are executed only once or twice.
We have a whole program of 500 bytes is in the ROM, but only 50 of those code lines will repeat again and again. Hence, it makes sense that those lines go straight to the cache so that, most of the time, the CPU will be wasting just t2 seconds in accessing the code.
Unfortunately, those code lines are not consecutive!! For example, let's say that the program goes from ROM position 0 to 500 and some repeating instructions are 50,51,52 and 53, then 103,104,105, then 199,200 ... you get the drill. The main problem is that the CPU will feed the system with the positions that those code lines have assigned in the ROM, so how does it search in the cache? Easy. We assign a "label" to each cache position: when we fill a code line in the cache, the label is equal to the position of THAT code line in the ROM. Hence, when the CPU ask for a code line:
First, we compare (fast and in parallel) the requested ROM position with all the labels in the cache. If there is a match (cache hit) we extract the associated code line from the cache in t2 seconds.
If there is no match (cache miss) we have to go to the ROM to get our line in t1+t2 seconds (because we checked the cache first).
Let's say that the probability of cache hit is p. In average, the access time for our memory system would be:
t_average = p t2 + (1-p) (t1+t2) = (1-p) t1 + t2
meaning that if p is large, the average access time is close to t2 even although t2 has only a few positions. Any cache worth its salt would have a p larger than 95%, so you do the math.
However, notice that at the very least, we need two memory units (label and actual information) for every position (and the label won't be a byte, it needs to be at least as large as a memory address), plus hardware to compare labels and a few extras. This means space and money, but even if the cache is small, it should pay for itself.
the manufacturers have always go for a compromise. The most important reason is the architectural limitations. A huge number of transistors is needed to build a cache. It is also one of the main energy absorbing element. Therefore, selecting the capacitance must be balanced the performance, power consumption and the surface of the core, and thus the cost of production. Finally, the more cache means a larger area of memory you need to "search". Thus, the access time can be longer.
Although the other person has answered most of the stuff, but i will just add that increasing the die area also result in more chances of defective chips, so if you have a large die area this means that there are going to be fewer correctly working chips from one sheet of silicon. One of the reason that Celeron was introduced at the time of Pentium 2 was that it had a very large die size resulting in huge number of defective chips and most of the defective chips had issues in cache, therefore to reduce its loses intel introduced celeron with no cache. Later on it became so successful that celeron became a product line for intel and they designed cache not as one large block but 4 - 8 small blocks in which they can block the defective blocks.
If you need large size caches then you can buy XEON which have upto 24MB cache.
Coz its expensive..........n as we need a balance b/w quality and cost......that's y cache has to be there to increase quality but it has to be small to keep the cost in limit
What I think that cost is not the real issue behind the limited size of the cache. Now days GPU is very popular as an alternative option for fast processing. However it is very expensive, people are like to get it to achieve much better performance. So, in short money is not the matter.
Along with everything else that is mentioned, there is also the additional circuitry necessary to handle coherency between processors/cores. If you have a quad-core chip, each will have it's own cache. If you touch something in cache, each core/cpu has to be notified that such change was made - assuming that they each loaded that same area of memory into their cache.
Each level limitation is based on the static power dissipation. In short, static power proportional directly with the no. of transistors. Static RAM (i.e., cache L1) requires 6 transistors per bit whilst Dynamic RAM requires one transistor per bit. Other levels are hybrid of both technologies.
The purpose of cache memory is to act as a buffer between the very limited, very high-speed CPU registers and the relatively slower and much larger main system memory usually referred to as RAM. Cache memory has an operating speed similar to the CPU itself so, when the CPU accesses data in cache, the CPU is not kept waiting for the data. In terms of storage capacity, cache is much smaller than RAM. Therefore, not every byte in RAM can have its own unique location in cache. As such, it is necessary to split cache up into sections that can be used to cache different areas of RAM, and to have a mechanism that allows each area of cache to cache different areas of RAM at different times. Even with the difference in size between cache and RAM, given the sequential and localized nature of storage access, a small amount of cache can effectively speed access to a large amount of RAM.
Physically, cache memory belongs to the so called associative memory type, meaning that instead of looking for a memory position via the typical access algorithms, you compare a set of labels associated to each position with the entrance (in parallel, usually). This require extra space and extra hardware: not only they are more expensive, but also harder to miniaturize. Let me try to explain it with an example:
Let's say we have a 1024 bytes ROM with an access time t1 and a 128 bytes cache with access time t2 equal to 1/10 t1. In every program, there are usually a few instructions that repeat again and again, whereas the rest are executed only once or twice.
We have a whole program of 500 bytes is in the ROM, but only 50 of those code lines will repeat again and again. Hence, it makes sense that those lines go straight to the cache so that, most of the time, the CPU will be wasting just t2 seconds in accessing the code.
Unfortunately, those code lines are not consecutive!! For example, let's say that the program goes from ROM position 0 to 500 and some repeating instructions are 50,51,52 and 53, then 103,104,105, then 199,200 ... you get the drill. The main problem is that the CPU will feed the system with the positions that those code lines have assigned in the ROM, so how does it search in the cache? Easy. We assign a "label" to each cache position: when we fill a code line in the cache, the label is equal to the position of THAT code line in the ROM. Hence, when the CPU ask for a code line:
First, we compare (fast and in parallel) the requested ROM position with all the labels in the cache. If there is a match (cache hit) we extract the associated code line from the cache in t2 seconds.
If there is no match (cache miss) we have to go to the ROM to get our line in t1+t2 seconds (because we checked the cache first).
Let's say that the probability of cache hit is p. In average, the access time for our memory system would be:
t_average = p t2 + (1-p) (t1+t2) = (1-p) t1 + t2
meaning that if p is large, the average access time is close to t2 even although t2 has only a few positions. Any cache worth its salt would have a p larger than 95%, so you do the math.
However, notice that at the very least, we need two memory units (label and actual information) for every position (and the label won't be a byte, it needs to be at least as large as a memory address), plus hardware to compare labels and a few extras. This means space and money, but even if the cache is small, it should pay for itself.
It's expensive and depends on hierarchy. For example L1 is more closed to the processor so you can not increase the size. And for large cache size you need more address spaces so that's also a limitation.
The total silicon area (max chip size) is limited. Putting more cores or more cache, or more hierarchy of the caches are the trade-off of the design as James Coyle wrote.
I think that eDRAM cache for L3 is better solution as written in our paper.
Cache memory is static memory, its cells are fabricated from flip flops to be fast, the size of flip flop is almost 6 MOSFET gates, so it expensive and needs for more chip area, if you do not care about the price, still the chip area represents another concern, and then the size of cache will be limited.
Cache, which temporarily stores frequently used instructions and data for quicker processing by the CPU. Frequently used instructions is the main objective of Cache:
The smaller capacity of the Cache reduces the time required to locate data within it and provide it to the computer for processing.
If the Cache size will be bigger then the CPU seek time will be increase to find out the desirable address. Thus the processing speed will be slow down.
"If the Cache size will be bigger then the CPU seek time will be increase to find out the desirable address. Thus the processing speed will be slow down."
That would not have to be true. If the cache's set associativity does not change,
then there would be no increase in locating the item in cache.
The only issue could be if the cache grows so large that the items contained within the cache exceed the virtual space where the page mappings are covered by the TLB.
TLB misses are much more expensive than cache misses.
This though is quite easily solved by using huge pages to avoid thrashing the TLB
(The TLB is a form of cache for page translation tables.)