Rethinking Symbolic Regression Datasets and Benchmarks for Scientific Discovery

by   Yoshitomo Matsubara, et al.

This paper revisits datasets and evaluation criteria for Symbolic Regression (SR), specifically focused on its potential for scientific discovery. Focused on a set of formulas used in the existing datasets based on Feynman Lectures on Physics, we recreate 120 datasets to discuss the performance of symbolic regression for scientific discovery (SRSD). For each of the 120 SRSD datasets, we carefully review the properties of the formula and its variables to design reasonably realistic sampling ranges of values so that our new SRSD datasets can be used for evaluating the potential of SRSD such as whether or not an SR method can (re)discover physical laws from such datasets. We also create another 120 datasets that contain dummy variables to examine whether SR methods can choose necessary variables only. Besides, we propose to use normalized edit distances (NED) between a predicted equation and the true equation trees for addressing a critical issue that existing SR metrics are either binary or errors between the target values and an SR model's predicted values for a given input. We conduct experiments on our new SRSD datasets using six SR methods. The experimental results show that we provide a more realistic performance evaluation, and our user study shows that the NED correlates with human judges significantly more than an existing SR metric.


Active Learning in Symbolic Regression with Physical Constraints

Evolutionary symbolic regression (SR) fits a symbolic equation to data, ...

A computational framework for physics-informed symbolic regression with straightforward integration of domain knowledge

Discovering a meaningful symbolic expression that explains experimental ...

GSR: A Generalized Symbolic Regression Approach

Identifying the mathematical relationships that best describe a dataset ...

Symbolic regression for scientific discovery: an application to wind speed forecasting

Symbolic regression corresponds to an ensemble of techniques that allow ...

Information Fusion via Symbolic Regression: A Tutorial in the Context of Human Health

This tutorial paper provides a general overview of symbolic regression (...

Exhaustive Symbolic Regression

Symbolic Regression (SR) algorithms learn analytic expressions which bot...

Discovering Asymptotic Expansions Using Symbolic Regression

Recently, symbolic regression (SR) has demonstrated its efficiency for d...

Please sign up or login with your details

Forgot password? Click here to reset