Now, as we relaxed the constraints of the deterministic model and introduced an error term \(\epsilon\), we run into another problem. There are infinitely many regression lines, which fulfill the specifications of the probabilistic model.

Obviously, we need a strategy to select the particular regression line, which corresponds to the best model in order to describe the data. In this section we discuss one of the most popular methods to achieve that task, the so called ordinary least squares method (OLS).

As mentioned in the previous section, for each particular pair of values \((x_1,y_1)\) the error \(\epsilon_i\) is calculated by \(y_1-\hat y\). In order to get the best fitting line for the given data the sum of squares error, denoted by SSE, is minimized:

\[min\; SSE = \sum_{i=1}^n \epsilon_i^2=\sum_{i=1}^n (y_i - \hat y_i)^2 \text{.}\]

For the simple linear model there exists an analytic solution for \(\beta\)

\[\beta = \frac{\sum_{i=1}^n ((x_i- \bar x) (y_i-\bar y))}{\sum_{i=1}^n (x_i-\bar x)^2} = \frac{cov(x,y)}{var(x)}\]

and \(\alpha\):

\[\alpha = \bar y - \beta \bar x\]

The OLS gives the maximum likelihood estimate for \(\beta\), when the residuals \(\epsilon\) are uncorrelated, follow a Gaussian distribution and have equal variance over the entire range of considered predictor variable \(x\) (homoscedasticity).