Model pre-training on large text corpora has been demonstrated effective...
Contrastive loss has been increasingly used in learning representations ...
The mixture of Expert (MoE) parallelism is a recent advancement that sca...
Can we combine heterogenous graph structure with text to learn high-qual...
Recent research has shown that large language models pretrained using
un...
Existing general purpose frameworks for gigantic model training, i.e., m...
Aligning signals from different modalities is an important step in
visio...
Vision-language representation learning largely benefits from image-text...
Pre-training and then fine-tuning large language models is commonly used...
Vision-and-Language Pre-training (VLP) improves model performance for
do...
Tiering is an essential technique for building large-scale information
r...