We introduce a software-hardware co-design approach to reduce memory tra...
Increasingly larger and better Transformer models keep advancing
state-o...
Data accesses between on- and off-chip memories account for a large frac...
We present FPRaker, a processing element for composing training accelera...
TensorDash is a hardware level technique for enabling data-parallel MAC ...
Attention-based models have demonstrated remarkable success in various
n...
Neural networks have demonstrably achieved state-of-the art accuracy usi...
We reduce training time in convolutional networks (CNNs) with a method t...
We motivate a method for transparently identifying ineffectual computati...
We show that selecting a fixed precision for all activations in Convolut...
We show that, during inference with Convolutional Neural Networks (CNNs)...
Tartan (TRT), a hardware accelerator for inference with Deep Neural Netw...
Loom (LM), a hardware inference accelerator for Convolutional Neural Net...
Stripes is a Deep Neural Network (DNN) accelerator that uses bit-serial
...
This work studies the behavior of state-of-the-art memory controller des...
This work investigates how using reduced precision data in Convolutional...