AI Chat AI Image Generator AI Video Text to Speech

Please, Don't Forget the Difference and the Confidence Interval when Seeking for the State-of-the-Art Status

05/23/2022

∙

by Yves Bestgen, et al.

∙

∙

This paper argues for the widest possible use of bootstrap confidence intervals for comparing NLP system performances instead of the state-of-the-art status (SOTA) and statistical significance testing. Their main benefits are to draw attention to the difference in performance between two systems and to help assessing the degree of superiority of one system over another. Two cases studies, one comparing several systems and the other based on a K-fold cross-validation procedure, illustrate these benefits. A python module for obtaining these confidence intervals as well as a second function implementing the Fisher-Pitman test for paired samples are freely available on PyPi.

page 1

page 2

page 3

page 4

research

∙ 07/24/2020

Cross-validation Confidence Intervals for Test Error

This work develops central limit theorems for cross-validation and consi...

0 Pierre Bayle, et al. ∙

research

∙ 02/22/2022

Resampling-free bootstrap inference for quantiles

Bootstrap inference is a powerful tool for obtaining robust inference fo...

0 Mårten Schultzberg, et al. ∙

research

∙ 05/04/2020

Confidence intervals of prediction accuracy measures for multivariable prediction models based on the bootstrap-based optimism correction methods

In assessing prediction accuracy of multivariable prediction models, opt...

0 Hisashi Noma, et al. ∙

research

∙ 03/06/2023

Estimation of incidence from aggregated current status data

We use historical data about breathlessness in British coal miners and r...

0 Ralph Brinks, et al. ∙

research

∙ 12/07/2018

On the lengths of t-based confidence intervals

Given n=mk iid samples from N(θ,σ^2) with θ and σ^2 unknown, we have two...

0 Yu Zhang, et al. ∙

research

∙ 03/31/2023

Homogeneity Tests and Interval Estimations of Risk Differences for Stratified Bilateral and Unilateral Correlated Data

Homogeneity tests and interval estimations of the risk difference betwee...

0 Shuyi Liang, et al. ∙

research

∙ 01/12/2023

confidence-planner: Easy-to-Use Prediction Confidence Estimation and Sample Size Planning

Machine learning applications, especially in the fields of medicine an...

0 Antoni Klorek, et al. ∙