The logit function maps probabilities to values over the entire real number range. Thus, the probability of an event/outcome/success to be true $$(y=1)$$, given the set of predictors $$x_i$$, which is our data, is written as

$logit(P(y=1|x_i))= \beta_0+ \beta_1x_1+ \beta_2x_2+ ... +\beta_kx_k\text{,}$ For a matter of simplification we express the inverse of the function above as

$\phi(\eta) = \frac{1}{1+e^{-\eta}}\text{,}$

where $$\eta$$ is the linear combination of coefficients $$(\beta_i)$$ and predictor variables $$(x_i)$$, calculated as $$\eta = \beta_0+ \beta_1x_1+ \beta_2x_2+ ... +\beta_kx_k$$.

The parameters $$(\beta_i)$$ of the logit model are estimated by the method of maximum likelihood. However, there is no closed-form solution, so the maximum likelihood estimates are obtained by using iterative algorithms such as Newton-Raphson, iteratively re-weighted least squares or gradient descent, among others.

The output of the sigmoid function is interpreted as the probability of a particular observation belonging to class 1. It is written as $$\phi(\eta)=P(y=1|x_i,\beta_i)$$, the probability of success $$(y=1)$$ given the predictor variables $$x_i$$ parameterized by the coefficients $$\beta_i$$. For example, if we compute $$\phi(\eta)=0.65$$ for a particular observation, this means that the chance that this observation belongs to class 1 is 65%. Similarly, the probability that this observation belongs to class 2 is calculated as $$\phi(\eta)=P(y=0|x_i,\beta_i)= 1 - P(y=1|x_i,\beta_i)=1-0.65=0.35$$ or 35%. For class assignment the predicted probability is then converted into a binary outcome via a unit step function:

$\hat y = \begin{cases} 1, & \text{if \phi(\eta) \ge 0.5} \\ 0, & \text{otherwise} \end{cases}$