A regression model relates the dependent variable (a.k.a. response variable), y, to a function of independent variables (a.k.a. explanatory or predictor variables), x, and unknown parameters (a.k.a. model coefficients) β. Such a regression model can be written as
y=f(x;β).
The goal of regression is to find a function such that y≈f(x;β) for the data pair (x;y). The function f(x;β) is called a regression function, and its free parameters (β) are the function coefficients. A regression method is linear if the prediction function f is a linear function of the unknown parameters β.
By extending the equation to a set of n observations and d explanatory variables, x1,...,xd the regression model can be written as
yi=β0+d∑j=1xijβj+ϵi=β0+β1x1i+β2x2i+...+βdxdi+ϵi,i=1,2,...m, x∈Rd,
where β0 corresponds to the intercept, sometimes referred to as bias, shift or offset and ϵ corresponds to the error term, referred to as residuals.
A regression model based upon m observations (measurements) consists of n response variables, y1,y2,...ym. For the ease of notation we write the response variables as a one-dimensional column vector of the size ym×1.
ym×1=[y1y2⋮ym]
Moreover, for each particular observation xi (x1,x2,...,xm) we represent the d associated explanatory variables as a column vector as well.
xi=[xi1xi2⋮xid](e.g.)⇒[heightweight⋮age]
Further, by transposing xi we stack a set of m observation vectors into a matrix X of the form Xm×d:
Xm×d=[xT1xT2⋮xTm1]=[x11x12⋯x1dx21x22⋯x2d⋮⋮⋱⋮xm1xm2⋯xmd].
This matrix notation is very similar to a spreadsheet representation, where each row corresponds to an observation and each column to a feature. Please note that we assume that all features are continuous-valued (x∈Rd) and that there are more observations than dimensions (m>d).
Citation
The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.