Probabilistic performance estimators for computational chemistry methods: Systematic Improvement Probability and Ranking Probability Matrix. I. Theory

by   Pascal Pernot, et al.

The comparison of benchmark error sets is an essential tool for the evaluation of theories in computational chemistry. The standard ranking of methods by their Mean Unsigned Error is unsatisfactory for several reasons linked to the non-normality of the error distributions and the presence of underlying trends. Complementary statistics have recently been proposed to palliate such deficiencies, such as quantiles of the absolute errors distribution or the mean prediction uncertainty. We introduce here a new score, the systematic improvement probability (SIP), based on the direct system-wise comparison of absolute errors. Independently of the chosen scoring rule, the uncertainty of the statistics due to the incompleteness of the benchmark data sets is also generally overlooked. However, this uncertainty is essential to appreciate the robustness of rankings. In the present article, we develop two indicators based on robust statistics to address this problem: P_inv, the inversion probability between two values of a statistic, and P_r, the ranking probability matrix. We demonstrate also the essential contribution of the correlations between error sets in these scores comparisons.


page 1

page 2

page 3

page 4


Scale invariant proper scoring rules Scale dependence: Why the average CRPS often is inappropriate for ranking probabilistic forecasts

Averages of proper scoring rules are often used to rank probabilistic fo...

Estimating the concentration parameter of a von Mises distribution: a systematic simulation benchmark

In directional statistics, the von Mises distribution is a key element i...

Prasatul Matrix: A Direct Comparison Approach for Analyzing Evolutionary Optimization Algorithms

The performance of individual evolutionary optimization algorithms is mo...

Confident Feature Ranking

Interpretation of feature importance values often relies on the relative...

A comparison of university performance scores and ranks by MNCS and FSS

In a previous article of ours, we explained the reasons why the MNCS and...

Optimal Estimation of Simultaneous Signals Using Absolute Inner Product with Applications to Integrative Genomics

Integrating the summary statistics from genome-wide association study (G...

Generalized statistics: applications to data inverse problems with outlier-resistance

The conventional approach to data-driven inversion framework is based on...

Please sign up or login with your details

Forgot password? Click here to reset