Prediction of Listener Perception of Argumentative Speech in a Crowdsourced Dataset Using (Psycho-)Linguistic and Fluency Features

by   Yu Qiao, et al.

One of the key communicative competencies is the ability to maintain fluency in monologic speech and the ability to produce sophisticated language to argue a position convincingly. In this paper we aim to predict TED talk-style affective ratings in a crowdsourced dataset of argumentative speech consisting of 7 hours of speech from 110 individuals. The speech samples were elicited through task prompts relating to three debating topics. The samples received a total of 2211 ratings from 737 human raters pertaining to 14 affective categories. We present an effective approach to the classification task of predicting these categories through fine-tuning a model pre-trained on a large dataset of TED talks public speeches. We use a combination of fluency features derived from a state-of-the-art automatic speech recognition system and a large set of human-interpretable linguistic features obtained from an automatic text analysis system. Classification accuracy was greater than 60 categories, with a peak performance of 72 'informative'. In a secondary experiment, we determined the relative importance of features from different groups using SP-LIME.


BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

We summarize the results of a host of efforts using giant automatic spee...

Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech

In this paper, we present our progress in pretraining Czech monolingual ...

Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition

End-to-end models have achieved impressive results on the task of automa...

The Impact of ASR on the Automatic Analysis of Linguistic Complexity and Sophistication in Spontaneous L2 Speech

In recent years, automated approaches to assessing linguistic complexity...

The VoiceMOS Challenge 2022

We present the first edition of the VoiceMOS Challenge, a scientific eve...

Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency

English proficiency assessments have become a necessary metric for filte...

Automatic recognition of suprasegmentals in speech

This study reports our efforts to improve automatic recognition of supra...

Please sign up or login with your details

Forgot password? Click here to reset