Predicting Regression Probability Distributions with Imperfect Data Through Optimal Transformations

01/27/2020
by   Jerome H. Friedman, et al.
42

The goal of regression analysis is to predict the value of a numeric outcome variable y given a vector of joint values of other (predictor) variables x. Usually a particular x-vector does not specify a repeatable value for y, but rather a probability distribution of possible y–values, p(y|x). This distribution has a location, scale and shape, all of which can depend on x, and are needed to infer likely values for y given x. Regression methods usually assume that training data y-values are perfect numeric realizations from some well behaived p(y|x). Often actual training data y-values are discrete, truncated and/or arbitrary censored. Regression procedures based on an optimal transformation strategy are presented for estimating location, scale and shape of p(y|x) as general functions of x, in the possible presence of such imperfect training data. In addition, validation diagnostics are presented to ascertain the quality of the solutions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2021

What's a good imputation to predict with missing values?

How to learn a good predictor on data with missing values? Most efforts ...
research
03/27/2013

Predicting the Likely Behaviors of Continuous Nonlinear Systems in Equilibrium

This paper introduces a method for predicting the likely behaviors of co...
research
05/27/2020

Antenna Optimization Using a New Evolutionary Algorithm Based on Tukey-Lambda Probability Distribution

In this paper, we introduce a new evolutionary optimization algorithm ba...
research
08/10/2021

Estimating a distribution function for discrete data subject to random truncation with an application to structured finance

The literature for estimating a distribution function from truncated dat...
research
01/18/2008

P-values for classification

Let (X,Y) be a random variable consisting of an observed feature vector ...
research
10/09/2019

Estimating regression errors without ground truth values

Regression analysis is a standard supervised machine learning method use...
research
09/10/2019

Generalized Score Distribution

A class of discrete probability distributions contains distributions wit...

Please sign up or login with your details

Forgot password? Click here to reset