20820_polynomial_regression.knit

Loading [MathJax]/jax/output/HTML-CSS/jax.js

Polynomial regression is a special type of linear regression, in which the relationship between the predictor variable $x$ and the response variable $y$ is modeled by a $k^{th}$ -degree polynomial of $x$ . In other words, we include second-order and higher powers of a variable in the model along with the original linear term. The incorporation of $k^{th}$ -degree polynomials results in a nonlinear relation between $y$ and $x$ . Still, the model is a linear model, since the relation between the coefficients $(\beta_i)$ and the expected observations is linear. The model equation can be written as

$\hat y = \beta_0+\beta_1x+\beta_2x^2+...+\beta_kx^k+\epsilon\text{.}$

The values of the coefficients are determined by fitting the polynomial to the observational data $(y)$ . As in simple linear regression discussed in the previous section, this is done by minimizing the sum of squared errors (SSE), given by the equation

$SSE = \sum_{i=1}^n \epsilon_i^2 = \sum_{i=1}^n (y_i - \hat y_i)^2\text{.}$

By fitting a polynomial to observations there arises the problem of choosing the order $k$ of the polynomial. How to choose the right order polynomial is a matter of an important concept called model comparison or model selection. To keep it simple we use the root-mean-square error (RMSE) defined by

$RMSE = \sqrt{\frac{\sum_{i=1}^n (y_i - \hat y_i)^2}{n}}$

to evaluate the goodness-of-fit of the model.

Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.