Polynomial regression is a special type of linear regression in which the relationship between the predictor variable $$x$$ and the response variable $$y$$ is modeled by a $$n^{th}$$-degree polynomial in $$x$$. In other words we include second-order and higher powers of a variable in the model along with the original linear term. The incorporation of $$n^{th}$$-degree polynomials results in a nonlinear relation between $$y$$ and $$x$$, but the model is still a linear model, as the relation between the coefficients $$(\beta_i)$$ and the expected observations is linear. Thus, the model equation can be written as

$\hat y = \beta_0+\beta_1x+\beta_2x^2+...+\beta_kx^k+\epsilon\text{.}$

The values of the coefficients are determined by fitting the polynomial to the observation data $$(y)$$. As in simple linear regression discussed in the previous section, this is done by minimizing the sum of squared errors (SSE), given by the equation

$SSE = \sum e^2 = \sum (\hat y - y)^2\text{.}$

By fitting a polynomial to observations there arises the problem of choosing the order $$k$$ of the polynomial. How to choose the right number for the polynomial is a matter of an important concept called model comparison or model selection. To keep it simple we use the root-mean-square error (RMSE) defined by

$RMSE = \sqrt{\frac{\sum_{i=1}^n (\hat y - y)^2}{n}}$

to evaluate the goodness-of-fit of the model.