Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator

by   Matthew J. Marinella, et al.

Neural networks are an increasingly attractive algorithm for natural language processing and pattern recognition applications. Deep networks with >50M parameters made possible by modern GPU clusters operating at <50 pJ per op and more recently, production accelerators capable of <5pJ per operation at the board level. However, with the slowing of CMOS scaling, new paradigms will be required to achieve the next several orders of magnitude in performance per watt gains. Using an analog resistive memory (ReRAM) crossbar to perform key matrix operations in an accelerator is an attractive option that is gaining significant interest. This work presents a detailed design using a state of the art 14/16 nm PDK for of an analog crossbar circuit block designed to process three key kernels required in training and inference of neural networks. A detailed circuit and device-level analysis of energy, latency, area, and accuracy are given and compared to relevant designs using standard digital ReRAM and SRAM operations. It is shown that the analog accelerator has a 310x energy and 270x latency advantage over a similar block utilizing only digital ReRAM and takes only 11 fJ per multiply and accumulate (MAC). Although training accuracy is degraded in the analog accelerator, several options to improve this are presented. The possible gains over a similar digital-only version of this accelerator block suggest that continued optimization of analog resistive memories is valuable. This detailed circuit and device analysis of a training accelerator may serve as a foundation for further architecture-level studies.


Leveraging Residue Number System for Designing High-Precision Analog Deep Neural Network Accelerators

Achieving high accuracy, while maintaining good energy efficiency, in an...

MRAM-based Analog Sigmoid Function for In-memory Computing

We propose an analog implementation of the transcendental activation fun...

On the Accuracy of Analog Neural Network Inference Accelerators

Specialized accelerators have recently garnered attention as a method to...

Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals

Processing-in-memory (PIM) architectures have demonstrated great potenti...

A Python Framework for SPICE Circuit Simulation of In-Memory Analog Computing Circuits

With the increased attention to memristive-based in-memory analog comput...

Design space exploration of Ferroelectric FET based Processing-in-Memory DNN Accelerator

In this letter, we quantify the impact of device limitations on the clas...

AID: Accuracy Improvement of Analog Discharge-Based in-SRAM Multiplication Accelerator

This paper presents a novel circuit (AID) to improve the accuracy of an ...

Please sign up or login with your details

Forgot password? Click here to reset