# A rigorous introduction for linear models

This note is meant to provide an introduction to linear models and the theories behind them. Our goal is to give a rigorous introduction to the readers with prior exposure to ordinary least squares. In machine learning, the output is usually a nonlinear function of the input. Deep learning even aims to find a nonlinear dependence with many layers which require a large amount of computation. However, most of these algorithms build upon simple linear models. We then describe linear models from different views and find the properties and theories behind the models. The linear model is the main technique in regression problems and the primary tool for it is the least squares approximation which minimizes a sum of squared errors. This is a natural choice when we're interested in finding the regression function which minimizes the corresponding expected squared error. We first describe ordinary least squares from three different points of view upon which we disturb the model with random noise and Gaussian noise. By Gaussian noise, the model gives rise to the likelihood so that we introduce a maximum likelihood estimator. It also develops some distribution theories for it via this Gaussian disturbance. The distribution theory of least squares will help us answer various questions and introduce related applications. We then prove least squares is the best unbiased linear model in the sense of mean squared error and most importantly, it actually approaches the theoretical limit. We end up with linear models with the Bayesian approach and beyond.