Polynomial regression is a special type of linear regression, in which the relationship between the predictor variable $$x$$ and the response variable $$y$$ is modeled by a $$k^{th}$$-degree polynomial of $$x$$. In other words, we include second-order and higher powers of a variable in the model along with the original linear term. The incorporation of $$k^{th}$$-degree polynomials results in a nonlinear relation between $$y$$ and $$x$$. Still, the model is a linear model, since the relation between the coefficients $$(\beta_i)$$ and the expected observations is linear. The model equation can be written as

$\hat y = \beta_0+\beta_1x+\beta_2x^2+...+\beta_kx^k+\epsilon\text{.}$

The values of the coefficients are determined by fitting the polynomial to the observational data $$(y)$$. As in simple linear regression discussed in the previous section, this is done by minimizing the sum of squared errors (SSE), given by the equation

$SSE = \sum_{i=1}^n \epsilon_i^2 = \sum_{i=1}^n (y_i - \hat y_i)^2\text{.}$

By fitting a polynomial to observations there arises the problem of choosing the order $$k$$ of the polynomial. How to choose the right order polynomial is a matter of an important concept called model comparison or model selection. To keep it simple we use the root-mean-square error (RMSE) defined by

$RMSE = \sqrt{\frac{\sum_{i=1}^n (y_i - \hat y_i)^2}{n}}$

to evaluate the goodness-of-fit of the model.

Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.