We introduce a software-hardware co-design approach to reduce memory tra...
Increasingly larger and better Transformer models keep advancing
state-o...
We present FPRaker, a processing element for composing training accelera...
TensorDash is a hardware level technique for enabling data-parallel MAC ...
Attention-based models have demonstrated remarkable success in various
n...