Sectored DRAM: An Energy-Efficient High-Throughput and Practical Fine-Grained DRAM Architecture

by   Ataberk Olgun, et al.

There are two major sources of inefficiency in computing systems that use modern DRAM devices as main memory. First, due to coarse-grained data transfers (size of a cache block, usually 64B between the DRAM and the memory controller, systems waste energy on transferring data that is not used. Second, due to coarse-grained DRAM row activation, systems waste energy by activating DRAM cells that are unused in many workloads where spatial locality is lower than the large row size (usually 8-16KB). We propose Sectored DRAM, a new, low-overhead DRAM substrate that alleviates the two inefficiencies, by enabling fine-grained DRAM access and activation. To efficiently retrieve only the useful data from DRAM, Sectored DRAM exploits the observation that many cache blocks are not fully utilized in many workloads due to poor spatial locality. Sectored DRAM predicts the words in a cache block that will likely be accessed during the cache block's cache residency and: (i) transfers only the predicted words on the memory channel, as opposed to transferring the entire cache block, by dynamically tailoring the DRAM data transfer size for the workload and (ii) activates a smaller set of cells that contain the predicted words, as opposed to activating the entire DRAM row, by carefully operating physically isolated portions of DRAM rows (MATs). Compared to prior work in fine-grained DRAM, Sectored DRAM greatly reduces DRAM energy consumption, does not reduce DRAM throughput, and can be implemented with low hardware cost. We evaluate Sectored DRAM using 41 workloads from widely-used benchmark suites. Sectored DRAM reduces the DRAM energy consumption of highly-memory-intensive workloads by up to (on average) 33 DRAM energy savings, combined with its system performance improvement, allows system-wide energy savings of up to 23


page 1

page 2

page 3

page 4


FIGARO: Improving System Performance via Fine-Grained In-DRAM Data Relocation and Caching

DRAM Main memory is a performance bottleneck for many applications due t...

A Memory Controller with Row Buffer Locality Awareness for Hybrid Memory Systems

Non-volatile memory (NVM) is a class of promising scalable memory techno...

Understanding and Optimizing Serverless Workloads in CXL-Enabled Tiered Memory

Recent Serverless workloads tend to be largescaled/CPU-memory intensive,...

EXMA: A Genomics Accelerator for Exact-Matching

Genomics is the foundation of precision medicine, global food security a...

Scalable and Configurable Tracking for Any Rowhammer Threshold

The Rowhammer vulnerability continues to get worse, with the Rowhammer T...

OS Scheduling Algorithms for Improving the Performance of Multithreaded Workloads

Major chip manufacturers have all introduced multicore microprocessors. ...

Extending Memory Capacity in Consumer Devices with Emerging Non-Volatile Memory: An Experimental Study

The number and diversity of consumer devices are growing rapidly, alongs...

Please sign up or login with your details

Forgot password? Click here to reset