The Fourier neural operator (FNO) is a powerful technique for learning
s...
Stream processing engines (SPEs) are widely used for large scale streami...
Training deep learning models can be computationally expensive. Prior wo...
As deep learning models nowadays are widely adopted by both cloud servic...
Reinforcement learning (RL) workloads take a notoriously long time to tr...
Modern deep learning applications require increasingly more compute to t...
Training deep neural networks on large datasets can often be accelerated...
Deep reinforcement learning (RL) has made groundbreaking advancements in...
Driven by the tremendous effort in researching novel deep learning (DL)
...
Deep learning researchers and practitioners usually leverage GPUs to hel...
Hospitals around the world collect massive amount of physiological data ...
To accelerate CNN inference, existing deep learning frameworks focus on
...
We present FPRaker, a processing element for composing training accelera...
TensorDash is a hardware level technique for enabling data-parallel MAC ...
Training a state-of-the-art deep neural network (DNN) is a
computational...
Recently, large scale Transformer-based language models such as BERT, GP...
We present automatic horizontal fusion, a novel optimization technique t...
Modern deep neural network (DNN) training jobs use complex and heterogen...
Machine-learning (ML) hardware and software system demand is burgeoning....
Machine learning is experiencing an explosion of software and hardware
s...
In an era when the performance of a single compute device plateaus, soft...
In an era when the performance of a single compute device plateaus, soft...
Data parallel training is widely used for scaling distributed deep neura...
Machine learning (ML) techniques are enjoying rapidly increasing adoptio...
Stream analytics have an insatiable demand for memory and performance.
E...
Long-Short-Term-Memory Recurrent Neural Network (LSTM RNN) is a
state-of...
This paper summarizes the idea of ChargeCache, which was published in HP...
This paper summarizes the SoftMC DRAM characterization infrastructure, w...
This article summarizes key results of our work on experimental
characte...
In existing systems, to perform any bulk data movement operation (copy o...
This paper summarizes the idea of Adaptive-Latency DRAM (AL-DRAM), which...
The application resource specification--a static specification of severa...
The recent popularity of deep neural networks (DNNs) has generated a lot...
The application resource specification--a static specification of severa...
Variation has been shown to exist across the cells within a modern DRAM ...
In this thesis, we describe a new, practical approach to integrating
har...
This paper summarizes the idea of Adaptive-Latency DRAM (AL-DRAM), which...