Distributed full-graph training of Graph Neural Networks (GNNs) over lar...
When communicating with elders with cognitive impairment, cognitive
stim...
We present Rhino, a system for accelerating tensor programs with automat...
As the size of deep learning models gets larger and larger, training tak...
This paper presents TAG, an automatic system to derive optimized DNN tra...
Inductive node-wise graph incremental learning is a challenging task due...
Machine learning (ML) tasks are one of the major workloads in today's ed...
This paper proposes DisCo, an automatic deep learning compilation module...
Distributed training using multiple devices (e.g., GPUs) has been widely...
To train modern large DNN models, pipeline parallelism has recently emer...
Fueled by advances in distributed deep learning (DDL), recent years have...
Efficient scheduling of distributed deep learning (DL) jobs in large GPU...
Graph neural networks (GNNs) have extended the success of deep neural
ne...
Online algorithm is an important branch in algorithm design. Designing o...
Deep learning frameworks such as TensorFlow and PyTorch provide a produc...
Recent years have witnessed a rapid growth of distributed machine learni...
In recent years, to sustain the resource-intensive computational needs f...
It is a challenging task to train large DNN models on sophisticated GPU
...
Many emerging AI applications request distributed machine learning (ML) ...
Modern deep learning models have been exploited in various domains, incl...
More and more companies have deployed machine learning (ML) clusters, wh...
Resilience functionality, including failure resilience and flow migratio...
Nowadays large-scale distributed machine learning systems have been depl...
Optimization algorithms for training deep models not only affects the
co...