@@ -436,35 +436,35 @@ you a desirable model size.
436
436
437
437
Finally, let' s take a minute to talk about what the Logistic Regression model
438
438
actually looks like in case you' re not already familiar with it. We' ll denote
439
- the label as $$ Y $$ , and the set of observed features as a feature vector
440
- $$ \ m athbf{x}=[x_1, x_2, ..., x_d]$$ . We define $$ Y=1$$ if an individual earned >
441
- 50,000 dollars and $$ Y=0$$ otherwise. In Logistic Regression, the probability of
442
- the label being positive ($$ Y=1$$ ) given the features $$ \ m athbf{x}$$ is given
439
+ the label as \\ (Y \\ ) , and the set of observed features as a feature vector
440
+ \\ ( \ m athbf{x}=[x_1, x_2, ..., x_d]\\ ) . We define \\ ( Y=1\\ ) if an individual earned >
441
+ 50,000 dollars and \\ ( Y=0\\ ) otherwise. In Logistic Regression, the probability of
442
+ the label being positive (\\ ( Y=1\\ )) given the features \\ ( \ m athbf{x}\\ ) is given
443
443
as:
444
444
445
445
$$ P(Y=1| \m athbf{x}) = \f rac{1}{1+\e xp(-(\m athbf{w}^T\m athbf{x}+b))}$$
446
446
447
- where $$ \ m athbf{w}=[w_1, w_2, ..., w_d]$$ are the model weights for the features
448
- $$ \ m athbf{x}=[x_1, x_2, ..., x_d]$$ . $$ b $$ is a constant that is often called
447
+ where \\ ( \ m athbf{w}=[w_1, w_2, ..., w_d]\\ ) are the model weights for the features
448
+ \\ ( \ m athbf{x}=[x_1, x_2, ..., x_d]\\ ). \\ (b \\ ) is a constant that is often called
449
449
the ** bias** of the model. The equation consists of two parts—A linear model and
450
450
a logistic function:
451
451
452
- * ** Linear Model** : First, we can see that $$ \m athbf{w}^T\m athbf{x}+b = b +
453
- w_1x_1 + ... +w_dx_d$$ is a linear model where the output is a linear
454
- function of the input features $$ \ m athbf{x}$$ . The bias $$ b $$ is the
452
+ * ** Linear Model** : First, we can see that \\ ( \m athbf{w}^T\m athbf{x}+b = b +
453
+ w_1x_1 + ... +w_dx_d\\ ) is a linear model where the output is a linear
454
+ function of the input features \\ ( \ m athbf{x}\\ ) . The bias \\ (b \\ ) is the
455
455
prediction one would make without observing any features. The model weight
456
- $$ w_i$$ reflects how the feature $$ x_i$$ is correlated with the positive
457
- label. If $$ x_i$$ is positively correlated with the positive label, the
458
- weight $$ w_i$$ increases, and the probability $$ P(Y=1| \m athbf{x})$$ will be
459
- closer to 1. On the other hand, if $$ x_i$$ is negatively correlated with the
460
- positive label, then the weight $$ w_i$$ decreases and the probability
461
- $$ P(Y=1| \m athbf{x})$$ will be closer to 0.
456
+ \\ ( w_i\\ ) reflects how the feature \\ ( x_i\\ ) is correlated with the positive
457
+ label. If \\ ( x_i\\ ) is positively correlated with the positive label, the
458
+ weight \\ ( w_i\\ ) increases, and the probability \\ ( P(Y=1| \m athbf{x})\\ ) will be
459
+ closer to 1. On the other hand, if \\ ( x_i\\ ) is negatively correlated with the
460
+ positive label, then the weight \\ ( w_i\\ ) decreases and the probability
461
+ \\ ( P(Y=1| \m athbf{x})\\ ) will be closer to 0.
462
462
463
463
* ** Logistic Function** : Second, we can see that there' s a logistic function
464
- (also known as the sigmoid function) $$ S(t) = 1/(1+\exp(-t))$$ being applied
464
+ (also known as the sigmoid function) \\( S(t) = 1/(1+\exp(-t))\\) being applied
465
465
to the linear model. The logistic function is used to convert the output of
466
- the linear model $$\ mathbf{w}^T\mathbf{x}+b$$ from any real number into the
467
- range of $$ [0, 1]$$ , which can be interpreted as a probability.
466
+ the linear model \\(\ mathbf{w}^T\mathbf{x}+b\\) from any real number into the
467
+ range of \\( [0, 1]\\) , which can be interpreted as a probability.
468
468
469
469
Model training is an optimization problem: The goal is to find a set of model
470
470
weights (i.e. model parameters) to minimize a **loss function** defined over the
0 commit comments