research
∙
09/15/2023
Headless Language Models: Learning without Predicting with Contrastive Weight Tying
Self-supervised pre-training of language models usually consists in pred...
research
∙
06/13/2023
Is Anisotropy Inherent to Transformers?
The representation degeneration problem is a phenomenon that is widely o...
research
∙
12/14/2022