The orchestration of deep neural network (DNN) model inference on GPU
cl...
We present a strongly polynomial-time algorithm to generate bandwidth op...
Cloud-native containerized applications constantly seek high-performance...
Service meshes play a central role in the modern application ecosystem b...
We consider the problem of distilling optimal network topologies for
col...
Resource disaggregation has gained huge popularity in recent years. Exis...
ML workloads are becoming increasingly popular in the cloud. Good cloud
...
The learning rate (LR) schedule is one of the most important hyper-param...
Cost-efficiency and training time are primary concerns in cloud-based
di...
Today's mobile devices sense, collect, and store huge amounts of persona...
Talek is a private group messaging system that sends messages through
po...
The last few years have seen the proliferation of low-power wide area
ne...
Training complex machine learning models in parallel is an increasingly
...
Virtual execution environments allow for consolidation of multiple
appli...
Hardware acceleration is an enabler for ubiquitous and efficient deep
le...
The advent of RoCE (RDMA over Converged Ethernet) has led to a significa...
We introduce a learning-based framework to optimize tensor programs for ...
Distributed deep neural network (DDNN) training constitutes an increasin...
A perennial question in computer networks is where to place functionalit...
There is an increasing need to bring machine learning to a wide diversit...
Scalable frameworks, such as TensorFlow, MXNet, Caffe, and PyTorch drive...
Most work in the deep learning systems community has focused on faster
i...
Recent advances have enabled "oracle" classifiers that can classify acro...