In this section we discuss a special type of regression, which is called simple linear regression. In this special case of regression analysis the relationship between the response variable \(y\) and and the predictor variable \(x\) is given in form of a linear equation

\[y= a + bx\text{,}\]

where \(a\) and \(b\) are constants. The number \(a\) is called intercept and defines the point of intersection of the regression line and the \(y\)-axis (\(x=0\)). The number \(b\) is called regression coefficient. It is a measure of the slope of the regression line. Thus, \(b\) indicates how much the \(y\)-value changes when the \(x\)-value increases by 1 unit. The adjective simple refers to the fact that the outcome variable is related to a single predictor. The model is considered as a deterministic model, as it gives an exact relationship between \(x\) and \(y\).

Let us consider a simple example. Given a population of \(n = 3\) points with cartesian coordinates \((x_i,y_i)\) of \((1,6)\), \((2,8)\) and \((3,10)\). These points plot on a straight line and thus, can be described by a linear equation model in the form of \(y= a + bx\), where the intercept \(a=4\) and \(b=2\).

In many cases however, the relationship between two variables \(x\) and \(y\) is not exact. This is due to the fact, that the response variable \(y\) is affected by other unknown and/or random processes, that are not fully captured by the predictor variable \(x\). In such a case the data points do not line up on a straight line. However, the data still may follow an underlying linear relationship. In order to take these unknowns into consideration a random error term, denoted by \(\epsilon\), is added to the linear model equation, thus resulting in a probabilistic model in contrast to the deterministic model from above.

\[y = a + b x + \epsilon\]

where the error term \(\epsilon_i\) is assumed to consist of independent normal distributed values, \(e_i \sim N(0, \sigma^2)\).

In linear regression modelling following assumptions are made about the model (Mann 2012).

Let us consider another example. This time we take a random sample of sample size \(n = 8\) from a population. In order to emphasis that the values of the intercept and slope are calculated from sample data, \(a\) and \(b\) are denoted by \(\beta_0\) and \(\beta_1\), respectively. In addition, the error term \(\epsilon\) is denoted as \(e\). Thus, \(\beta_0\), \(\beta_1\) and \(e\) are estimates based on sample data for the population parameters \(a\), \(b\) and \(\epsilon\).

\[\hat y = \beta_0 + \beta_1 x + e \text{,}\]

where \(\hat y\) is the the estimated or predicted value of \(y\) for any given value of \(x\).

The error \(e_i\) for each particular pair of values (\(x_i,y_i\)), also called residual, is computed by the difference of the observed value \(y_i\) and the predicted value given by \(\hat y_i\).

\[e_i = y_i - \hat y_i\]

Depending on the data \(e_i\) is a negative number if \(y_i\) plots below the regression line or it is a positive number if \(y_i\) plots above the regression line.