DeepAI AI Chat
Log In Sign Up

Robust Embeddings Via Distributions

by   Kira A. Selby, et al.

Despite recent monumental advances in the field, many Natural Language Processing (NLP) models still struggle to perform adequately on noisy domains. We propose a novel probabilistic embedding-level method to improve the robustness of NLP models. Our method, Robust Embeddings via Distributions (RED), incorporates information from both noisy tokens and surrounding context to obtain distributions over embedding vectors that can express uncertainty in semantic space more fully than any deterministic method. We evaluate our method on a number of downstream tasks using existing state-of-the-art models in the presence of both natural and synthetic noise, and demonstrate a clear improvement over other embedding approaches to robustness from the literature.


page 1

page 2

page 3

page 4


Contextual Text Denoising with Masked Language Models

Recently, with the help of deep learning models, significant advances ha...

Neural Embeddings of Graphs in Hyperbolic Space

Neural embeddings have been used with great success in Natural Language ...

SECNLP: A Survey of Embeddings in Clinical Natural Language Processing

Traditional representations like Bag of words are high dimensional, spar...

On the Robustness of Text Vectorizers

A fundamental issue in natural language processing is the robustness of ...

Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation

A well-known limitation in pretrain-finetune paradigm lies in its inflex...

ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models

Large-scale vision-language models (VLMs) like CLIP successfully find co...

Learning Large-scale Universal User Representation with Sparse Mixture of Experts

Learning user sequence behaviour embedding is very sophisticated and cha...