Multinomial Loss on Held-out Data for the Sparse Non-negative Matrix Language Model

11/05/2015
by   Ciprian Chelba, et al.
0

We describe Sparse Non-negative Matrix (SNM) language model estimation using multinomial loss on held-out data. Being able to train on held-out data is important in practical situations where the training data is usually mismatched from the held-out/test data. It is also less constrained than the previous training algorithm using leave-one-out on training data: it allows the use of richer meta-features in the adjustment model, e.g. the diversity counts used by Kneser-Ney smoothing which would be difficult to deal with correctly in leave-one-out training. In experiments on the one billion words language modeling benchmark, we are able to slightly improve on our previous results which use a different loss function, and employ leave-one-out training on a subset of the main training set. Surprisingly, an adjustment model with meta-features that discard all lexical information can perform as well as lexicalized meta-features. We find that fairly small amounts of held-out data (on the order of 30-70 thousand words) are sufficient for training the adjustment model. In a real-life scenario where the training data is a mix of data sources that are imbalanced in size, and of different degrees of relevance to the held-out and test data, taking into account the data source for a given skip-/n-gram feature and combining them for best performance on held-out/test data improves over skip-/n-gram SNM models trained on pooled data by about 8 setup, or as much as 15 The ability to mix various data sources based on how relevant they are to a mismatched held-out set is probably the most attractive feature of the new estimation method for SNM LM.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/03/2014

Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation

We present a novel family of language model (LM) estimation techniques n...
research
05/24/2023

Estimating Large Language Model Capabilities without Labeled Test Data

Large Language Models (LLMs) have exhibited an impressive ability to per...
research
01/12/2015

Combining Language and Vision with a Multimodal Skip-gram Model

We extend the SKIP-GRAM model of Mikolov et al. (2013a) by taking visual...
research
03/31/2017

N-gram Language Modeling using Recurrent Neural Network Estimation

We investigate the effective memory depth of RNN models by using them fo...
research
03/29/2022

Zero-shot meta-learning for small-scale data from human subjects

While developments in machine learning led to impressive performance gai...
research
11/10/2022

The CRINGE Loss: Learning what language not to model

Standard language model training employs gold human documents or human-h...
research
04/20/2018

Lightweight Adaptive Mixture of Neural and N-gram Language Models

It is often the case that the best performing language model is an ensem...

Please sign up or login with your details

Forgot password? Click here to reset