TinyStack: A Minimal GPU Stack for Client ML

05/04/2021
by   Heejin Park, et al.
0

TinyStack is a novel way for deploying GPU-accelerated computation on mobile and embedded devices. It addresses the high complexity of a modern GPU stack. Without an overhaul of the stack, TinyStack provides a static, fast path for an app to push its computation to GPU. It records GPU executions on the full GPU stack ahead of time and replays the executions with only a small replayer on new input at run time. TinyStack addresses challenges in capturing key CPU/GPU interactions and GPU states, working around proprietary GPU internals, and preventing replay divergence. The resultant replayer is a drop-in replacement of the original GPU stack. It is tiny (as few as 50 KB executable), robust (replaying long executions without divergence), portable (running in a POSIX OS, in TEE, or on baremetal), and quick to launch (speeding up startup by up to two orders of magnitude). We have implemented TinyStack and tested it with a variety of ML frameworks, GPU programming APIs, and integrated GPUs.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro