TLB and Pagewalk Performance in Multicore Architectures with Large Die-Stacked DRAM Cache

by   Adarsh Patil, et al.

In this work we study the overheads of virtual-to-physical address translation in processor architectures, like x86-64, that implement paged virtual memory using a radix tree which are walked in hardware. Translation Lookaside Buffers are critical to system performance, particularly as applications demand larger memory footprints and with the adoption of virtualization; however the cost of a TLB miss potentially results in multiple memory accesses to retrieve the translation. Architectural support for superpages has been introduced to increase TLB hits but are limited by the operating systems ability to find contiguous memory. Numerous prior studies have proposed TLB designs to lower miss rates and reduce page walk overhead; however, these studies have modeled the behavior analytically. Further, to eschew the paging overhead for big-memory workloads and virtualization, Direct Segment maps part of a process linear virtual address space with segment registers albeit requiring a few application and operating system modifications. The recently evolved die-stacked DRAM technology promises a high bandwidth and large last-level cache, in the order of Gigabytes, closer to the processors. With such large caches the amount of data that can be accessed without causing a TLB fault - the reach of a TLB, is inadequate. TLBs are on the critical path for data accesses and incurring an expensive page walk can hinder system performance, especially when the data being accessed is a cache hit in the LLC. Hence, we are interested in exploring novel address translation mechanisms, commensurate to the size and latency of stacked DRAM. By accurately simulating the multitude of multi-level address translation structures using the QEMU based MARSSx86 full system simulator, we perform detailed study of TLBs in conjunction with the large LLCs using multi-programmed and multi-threaded workloads.


page 1

page 6


Die-Stacked DRAM: Memory, Cache, or MemCache?

Die-stacked DRAM is a promising solution for satisfying the ever-increas...

Design Guidelines for High-Performance SCM Hierarchies

With emerging storage-class memory (SCM) nearing commercialization, ther...

Fast TLB Simulation for RISC-V Systems

Address translation and protection play important roles in today's proce...

Enabling Storage Class Memory as a DRAM Replacement for Datacenter Services

With emerging storage-class memory (SCM) nearing commercialization, ther...

An Enhanced Multi-Pager Environment Support for Second Generation Microkernels

The main objective of this paper is to present a mechanism of enhanced p...

Utopia: Efficient Address Translation using Hybrid Virtual-to-Physical Address Mapping

The conventional virtual-to-physical address mapping scheme enables a vi...

Please sign up or login with your details

Forgot password? Click here to reset