Scientists make applications run faster with die-stacked dram cache

By admin On Sep 27, 2016

Spread the love

Researchers from North Carolina State University and Samsung have used simulations to show that Dense Footprint Cache can be used efficiently: with the cache technology, applications can start more than 9 percent faster.

With die-stacked dram, the memory is stacked on top of that of the processor. This enables lower latencies and, above all, higher bandwidth. If the dram is used as a last level cache for the processor, it is a problem that accessing the memory through the extensive tag array demands a lot from the sram budget.

To reduce the overhead of the sram, larger memory blocks, or Mblocks, can be chosen. For example, with a block size of 2KiB instead of 64B, 256MB llc only consumes 1MB of sram. Intel, among others, uses Mblocks from the Haswell generation. The disadvantage is that large parts of the blocks are not needed at all for the processor, but are loaded in the cache. For this purpose, the Footprint technique has been developed: this ensures a subdivision of the Mblocks into smaller blocks. Those are only added to the cache when there are indications that they may be needed.

The researchers from North Carolina State University and Samsung describe the new problem this poses: parts of the Mblocks remain unused and these gaps could be filled with useful data. Therefore, when fetching parts of Mblocks, they suggest caching the blocks contiguously. They call this technique Dense Footprint Cache. The Mblocks have variable sizes, which poses challenges in terms of placing, repositioning and updating memory parts. However, the first test results show efficiency improvements, according to the researchers.

In simulations of big data applications, Dense Footprint Cache would run it 9.5 percent faster than without using this technique, with an average energy consumption that is 4.3 percent lower. In addition, the miss ratio would decrease by 43 percent: last-level cache misses occur when the processor tries to retrieve data from the cache that is not there, after which it has to be retrieved from the slower working memory. Performance was measured using the Cloudsuite benchmark.

The researchers’ paper is titled Dense Footprint Cache: Capacity-Efficient Die-Stacked DRAM Last Level Cache and will be presented at the International Symposium on Memory Systems in early October.