Complexity and universality in the long-range order of words

03/03/2015
by   Marcelo A Montemurro, et al.
0

As is the case of many signals produced by complex systems, language presents a statistical structure that is balanced between order and disorder. Here we review and extend recent results from quantitative characterisations of the degree of order in linguistic sequences that give insights into two relevant aspects of language: the presence of statistical universals in word ordering, and the link between semantic information and the statistical linguistic structure. We first analyse a measure of relative entropy that assesses how much the ordering of words contributes to the overall statistical structure of language. This measure presents an almost constant value close to 3.5 bits/word across several linguistic families. Then, we show that a direct application of information theory leads to an entropy measure that can quantify and extract semantic structures from linguistic samples, even without prior knowledge of the underlying language.

READ FULL TEXT
research
05/07/2020

Phonotactic Complexity and its Trade-offs

We present methods for calculating a measure of phonotactic complexity—b...
research
10/07/2018

Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources

Transliteration converts words in a source language (e.g., English) into...
research
12/10/2014

Statistical Patterns in Written Language

Quantitative linguistics has been allowed, in the last few decades, with...
research
01/17/2023

Statistical analysis of word flow among five Indo-European languages

A recent increase in data availability has allowed the possibility to pe...
research
08/13/2018

Comparing morphological complexity of Spanish, Otomi and Nahuatl

We use two small parallel corpora for comparing the morphological comple...
research
04/13/2021

On the Impact of Knowledge-based Linguistic Annotations in the Quality of Scientific Embeddings

In essence, embedding algorithms work by optimizing the distance between...
research
09/04/2021

A Neural Network-Based Linguistic Similarity Measure for Entrainment in Conversations

Linguistic entrainment is a phenomenon where people tend to mimic each o...

Please sign up or login with your details

Forgot password? Click here to reset