The linear model is given by the equation
\[y = \beta_0 + \beta_1x+e\text{,}\] where \(a\) is the intercept, \(b\) is the regression coefficient and \(e\) the error term. The best regression line is found by applying the ordinary least squares method, in order to minimize the sum of squared error (SSE); thus, minimizing the squared difference of the measured response variable \(y\) and the model prediction \(\hat y\), which is given by
\[SSE = \sum_{i=1}^n e_i^2=\sum_{i=1}^n (y - \hat y)^2\text{.}\] Refer to the section on linear regression for more details on the linear model.
However, no matter what, we have to acknowledge that we build our models, in this case our linear regression model, on observation data. Hence the data originates from a population and its corresponding statistical properties, which are in general unknown to us. Thus, by taking measurements each observation represents a manifestation of the population, denoted by the term random variable.
Let us consider an example, as shown in the figure below. In this example the population parameters are known, and thus, we may build a linear regression model of the form \(y = \beta_0+\beta_1x = 1 + 0.25x\).
However, if we take a random sample of the population and build a linear model based on the sample data, the sample regression line will not be the same as the population regression line. In the figure below we took four random samples of sample size 25 (blue dots). We immediately see that the sample regression line (blue dashed line) is not the same as the population regression line (grey line). In order to account for that variability, which is due to the random sampling process, a statistic is calculated by applying the equation
\[s_e = \sqrt{\frac{SSE}{n-2}}\text{,}\]
where \(SSE\) corresponds to the sum of squared error and \(n\) corresponds to the sample size. The statistic, \(s_e\), is denoted as standard error of the estimate \((s_e)\) or the residual standard error.