Flash-Cosmos: In-Flash Bulk Bitwise Operations Using Inherent Computation Capability of NAND Flash Memory

by   Jisung Park, et al.

Bulk bitwise operations, i.e., bitwise operations on large bit vectors, are prevalent in a wide range of important application domains, including databases, graph processing, genome analysis, cryptography, and hyper-dimensional computing. In conventional systems, the performance and energy efficiency of bulk bitwise operations are bottlenecked by data movement between the compute units and the memory hierarchy. In-flash processing (i.e., processing data inside NAND flash chips) has a high potential to accelerate bulk bitwise operations by fundamentally reducing data movement through the entire memory hierarchy. We identify two key limitations of the state-of-the-art in-flash processing technique for bulk bitwise operations; (i) it falls short of maximally exploiting the bit-level parallelism of bulk bitwise operations; (ii) it is unreliable because it does not consider the highly error-prone nature of NAND flash memory. We propose Flash-Cosmos (Flash Computation with One-Shot Multi-Operand Sensing), a new in-flash processing technique that significantly increases the performance and energy efficiency of bulk bitwise operations while providing high reliability. Flash-Cosmos introduces two key mechanisms that can be easily supported in modern NAND flash chips: (i) Multi-Wordline Sensing (MWS), which enables bulk bitwise operations on a large number of operands with a single sensing operation, and (ii) Enhanced SLC-mode Programming (ESP), which enables reliable computation inside NAND flash memory. We demonstrate the feasibility of performing bulk bitwise operations with high reliability in Flash-Cosmos by testing 160 real 3D NAND flash chips. Our evaluation shows that Flash-Cosmos improves average performance and energy efficiency by 3.5x/32x and 3.3x/95x, respectively, over the state-of-the-art in-flash/outside-storage processing techniques across three real-world applications.


page 3

page 6

page 7


SIMDRAM: An End-to-End Framework for Bit-Serial SIMD Computing in DRAM

Processing-using-DRAM has been proposed for a limited set of basic opera...

Brain-inspired Cognition in Next Generation Racetrack Memories

Hyperdimensional computing (HDC) is an emerging computational framework ...

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

Data movement between the CPU and main memory is a first-order obstacle ...

Accelerating Time Series Analysis via Processing using Non-Volatile Memories

Time Series Analysis (TSA) is a critical workload for consumer-facing de...

Efficient Error-Correcting-Code Mechanism for High-Throughput Memristive Processing-in-Memory

Inefficient data transfer between computation and memory inspired emergi...

FAT-PIM: Low-Cost Error Detection for Processing-In-Memory

Processing In Memory (PIM) accelerators are promising architecture that ...

Please sign up or login with your details

Forgot password? Click here to reset