Polynomial regression is a special type of linear regression, in which the relationship between the predictor variable \(x\) and the response variable \(y\) is modeled by a \(k^{th}\)-degree polynomial of \(x\). In other words, we include second-order and higher powers of a variable in the model along with the original linear term. The incorporation of \(k^{th}\)-degree polynomials results in a nonlinear relation between \(y\) and \(x\). Still, the model is a linear model, since the relation between the coefficients \((\beta_i)\) and the expected observations is linear. The model equation can be written as

\[\hat y = \beta_0+\beta_1x+\beta_2x^2+...+\beta_kx^k+\epsilon\text{.}\]

The values of the coefficients are determined by fitting the polynomial to the observational data \((y)\). As in simple linear regression discussed in the previous section, this is done by minimizing the sum of squared errors (SSE), given by the equation

\[SSE = \sum_{i=1}^n \epsilon_i^2 = \sum_{i=1}^n (y_i - \hat y_i)^2\text{.}\]

By fitting a polynomial to observations there arises the problem of choosing the order \(k\) of the polynomial. How to choose the right order polynomial is a matter of an important concept called model comparison or model selection. To keep it simple we use the root-mean-square error (RMSE) defined by

\[RMSE = \sqrt{\frac{\sum_{i=1}^n (y_i - \hat y_i)^2}{n}}\]

to evaluate the goodness-of-fit of the model.


Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.