The logit function maps probabilities to values over the entire real number range. Thus, the probability of an event/outcome/success to be true \((y=1)\), given the set of predictors \(x_i\), which is our data, is written as

\[logit(P(y=1|x_i))= \beta_0+ \beta_1x_1+ \beta_2x_2+ ... +\beta_kx_k\text{,}\] For a matter of simplification we express the inverse of the function above as

\[\phi(\eta) = \frac{1}{1+e^{-\eta}}\text{,}\]

where \(\eta\) is the linear combination of coefficients \((\beta_i)\) and predictor variables \((x_i)\), calculated as \(\eta = \beta_0+ \beta_1x_1+ \beta_2x_2+ ... +\beta_kx_k\).

The parameters \((\beta_i)\) of the logit model are estimated by the **method of maximum likelihood**. However, there is no closed-form solution, so the maximum likelihood estimates are obtained by using iterative algorithms such as Newton-Raphson, iteratively re-weighted least squares or gradient descent, among others.

The output of the sigmoid function is interpreted as the probability of a particular observation belonging to class 1. It is written as \(\phi(\eta)=P(y=1|x_i,\beta_i)\), the probability of success \((y=1)\) given the predictor variables \(x_i\) parameterized by the coefficients \(\beta_i\). For example, if we compute \(\phi(\eta)=0.65\) for a particular observation, this means that the chance that this observation belongs to class 1 is 65%. Similarly, the probability that this observation belongs to class 2 is calculated as \(\phi(\eta)=P(y=0|x_i,\beta_i)= 1 - P(y=1|x_i,\beta_i)=1-0.65=0.35\) or 35%. For class assignment the predicted probability is then converted into a binary outcome via a unit step function:

\[ \hat y = \begin{cases} 1, & \text{if $\phi(\eta) \ge$ 0.5} \\ 0, & \text{otherwise} \end{cases} \]