09 September 2013 6 6K Report

I wrote a paper in ISCA with two collaborators, in which we investigate the possibility of performing computation with STT-MRAM memory technologies where WRITE OPERATIONS are very expensive (5x more expensive than DRAM, both speed, and energy-wise) , and READ operations are a lot less expensive then DRAM (2x less expensive). The density of STT-MRAM can be 3x higher than DRAM, thereby potentially giving you a lot less ENERGY-HUNGRY RAM, that has 3x higher capacity.

This memory is being heavily researched and is expected to be mainstream within 3-5 years. Here is my question: If STT-MRAM was the dominant RAM in the market, this changes the way we view data manipulation. Currently, the fact that READ and WRITE operations are the same speed, and same energy, and same (everything) makes us ignore when we are reading and when we are writing. However, if WRITE operations are more expensive, things change ... Check this out:

Look at the code fragment:

d=b+c;

d=d*q;

A typical way for a compiler to arrange this code is to perform the addition (TWO READ operations from memory), and WRITE into d first, and read fromd, and read from q (THIRD READ) and perform the second multiplication operation, and write back to d.

This is THREE READ and TWO WRITE operations.

Instead, if the variables are small enough, I could do:

d=LOOKUP_ADD_MULT(b,c,q).

The look up table (LUT) is a very large (1MB, 16MB, or more) and allows me to use massive storage areas to eliminate WRITE operations. Notice, I only have a single write operation now. What happened is that, I am storing "pre-computed results" into a LUT and there is no reason for the intermediate write operation now ...

NOTE: everything is in MEMORY, since this code is a part of a large read of a 1Giga-Entry array. This simple example shows how READ+READ+WRITE+WRITE could be turned into LOOKUP+WRITE, saving energy.

Also NOTE: LUTs are the heart of FPGAs.

Do you see this type of memory eventually taking over from DRAM ? or, is this too non-traditional and will never catch on ? Can you conceptualize a programming environment where READ operations are 80% of all arithmetic operations, and WRITE's are 20% by re-working the code to use massive LUTs?

More Tolga Soyata's questions See All
Similar questions and discussions