Variation of word frequencies in Russian literary texts

03/01/2015
by   Vladislav Kargin, et al.
0

We study the variation of word frequencies in Russian literary texts. Our findings indicate that the standard deviation of a word's frequency across texts depends on its average frequency according to a power law with exponent 0.62, showing that the rarer words have a relatively larger degree of frequency volatility (i.e., "burstiness"). Several latent factors models have been estimated to investigate the structure of the word frequency distribution. The dependence of a word's frequency volatility on its average frequency can be explained by the asymmetry in the distribution of latent factors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/30/2017

The origins of Zipf's meaning-frequency law

In his pioneering research, G. K. Zipf observed that more frequent words...
research
11/11/2016

Generalized Entropies and the Similarity of Texts

We show how generalized Gibbs-Shannon entropies can provide new insights...
research
09/28/2017

The Dependence of Frequency Distributions on Multiple Meanings of Words, Codes and Signs

The dependence of the frequency distributions due to multiple meanings o...
research
08/05/2020

Generalized Word Shift Graphs: A Method for Visualizing and Explaining Pairwise Comparisons Between Texts

A common task in computational text analyses is to quantify how two corp...
research
01/14/2021

Estimation of the Frequency of Occurrence of Italian Phonemes in Text

The purpose of this project was to derive a reliable estimate of the fre...
research
12/09/2014

Zipf's Law and the Frequency of Characters or Words of Oracles

The article discusses the frequency of characters of Oracle,concluding t...
research
03/06/2018

Co-occurrence of the Benford-like and Zipf Laws Arising from the Texts Representing Human and Artificial Languages

We demonstrate that large texts, representing human (English, Russian, U...

Please sign up or login with your details

Forgot password? Click here to reset