Polynomial regression is a special type of linear regression in which the relationship between the predictor variable \(x\) and the response variable \(y\) is modeled by a \(n^{th}\)-degree polynomial in \(x\). In other words we include second-order and higher powers of a variable in the model along with the original linear term. The incorporation of \(n^{th}\)-degree polynomials results in a nonlinear relation between \(y\) and \(x\), but the model is still a linear model, as the relation between the coefficients \((\beta_i)\) and the expected observations is linear. Thus, the model equation can be written as

\[\hat y = \beta_0+\beta_1x+\beta_2x^2+...+\beta_kx^k+\epsilon\text{.}\]

The values of the coefficients are determined by fitting the polynomial to the observation data \((y)\). As in simple linear regression discussed in the previous section, this is done by minimizing the sum of squared errors (SSE), given by the equation

\[SSE = \sum e^2 = \sum (\hat y - y)^2\text{.}\]

By fitting a polynomial to observations there arises the problem of choosing the order \(k\) of the polynomial. How to choose the right number for the polynomial is a matter of an important concept called model comparison or model selection. To keep it simple we use the root-mean-square error (RMSE) defined by

\[RMSE = \sqrt{\frac{\sum_{i=1}^n (\hat y - y)^2}{n}}\]

to evaluate the goodness-of-fit of the model.