20820_polynomial_regression

Polynomial regression is a special type of linear regression, in which the relationship between the predictor variable $x$ and the response variable $y$ is modeled by a $k^{th}$-degree polynomial of $x$. In other words, we include second-order and higher powers of a variable in the model along with the original linear term. The incorporation of $k^{th}$-degree polynomials results in a nonlinear relation between $y$ and $x$. Still, the model is a linear model, since the relation between the coefficients $(\beta_i)$ and the expected observations is linear. The model equation can be written as

$$\hat y = \beta_0+\beta_1x+\beta_2x^2+...+\beta_kx^k+\epsilon\text{.}$$

The values of the coefficients are determined by fitting the polynomial to the observational data $(y)$. As in simple linear regression discussed in the previous section, this is done by minimizing the sum of squared errors (SSE), given by the equation

$$SSE = \sum_{i=1}^n \epsilon_i^2 = \sum_{i=1}^n (y_i - \hat y_i)^2\text{.}$$

By fitting a polynomial to observations there arises the problem of choosing the order $k$ of the polynomial. How to choose the right order polynomial is a matter of an important concept called model comparison or model selection. To keep it simple we use the root-mean-square error (RMSE) defined by

$$RMSE = \sqrt{\frac{\sum_{i=1}^n (y_i - \hat y_i)^2}{n}}$$

to evaluate the goodness-of-fit of the model.

Citation

The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.