@@ -14,9 +14,9 @@ MSE is one of the most commonly used cost functions, particularly in regression
14
14
The MSE is defined as:
15
15
$$ MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$
16
16
Where:
17
- - \( n \) is the number of samples.
18
- - \( y_i \) is the actual value.
19
- - \( y^i \) is the predicted value.
17
+ - ` n ` is the number of samples.
18
+ - $ y_i$ is the actual value.
19
+ - $\hat{y} _ i$ is the predicted value.
20
20
21
21
** Advantages:**
22
22
- Sensitive to large errors due to squaring.
@@ -43,9 +43,9 @@ MAE is another commonly used cost function for regression tasks. It measures the
43
43
The MAE is defined as:
44
44
$$ MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| $$
45
45
Where:
46
- - \( n \) is the number of samples.
47
- - \( y_i \) is the actual value.
48
- - \( y^i \) is the predicted value.
46
+ - ` n ` is the number of samples.
47
+ - $ y_i$ is the actual value.
48
+ - $\hat{y} _ i$ is the predicted value.
49
49
50
50
** Advantages:**
51
51
- Less sensitive to outliers compared to MSE.
@@ -76,9 +76,9 @@ For binary classification, the cross-entropy loss is defined as:
76
76
$$ \text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)] $$
77
77
78
78
Where:
79
- - \( n \) is the number of samples.
80
- - \( y_i \) is the actual class label (0 or 1).
81
- - \( y^i \) is the predicted probability of the positive class.
79
+ - ` n ` is the number of samples.
80
+ - $ y_i$ is the actual class label (0 or 1).
81
+ - $\hat{y} _ i$ is the predicted probability of the positive class.
82
82
83
83
84
84
** Advantages:**
@@ -109,11 +109,10 @@ The multiclass cross-entropy loss is defined as:
109
109
$$ \text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} \sum_{c=1}^{C} y_{i,c} \log(\hat{y}_{i,c}) $$
110
110
111
111
Where:
112
- - \( n \) is the number of samples.
113
- - \( C \) is the number of classes.
114
- - \( y_ {i,c} \) is the indicator function for the true class of sample \( i \) .
115
-
116
- - (y^i,c) is the predicted probability of sample \( i \) belonging to class \( c \) .
112
+ - ` n ` is the number of samples.
113
+ - ` C ` is the number of classes.
114
+ - $y_ {i,c}$ is the indicator function for the true class of sample ` i ` .
115
+ - $\hat{y}_ {i,c}$ is the predicted probability of sample ` i ` belonging to class ` c ` .
117
116
118
117
** Advantages:**
119
118
- Handles multiple classes effectively.
@@ -143,9 +142,9 @@ For binary classification, the hinge loss is defined as:
143
142
$$ \text{Hinge Loss} = \frac{1}{n} \sum_{i=1}^{n} \max(0, 1 - y_i \cdot \hat{y}_i) $$
144
143
145
144
Where:
146
- - \( n \) is the number of samples.
147
- - \( y_i \) is the actual class label (-1 or 1).
148
- - \( \ hat{y}_ i \) is the predicted score for sample \( i \) .
145
+ - ` n ` is the number of samples.
146
+ - $ y_i$ is the actual class label (-1 or 1).
147
+ - $\ hat{y}_ i$ is the predicted score for sample \( i \) .
149
148
150
149
** Advantages:**
151
150
- Encourages margin maximization in SVMs.
@@ -182,8 +181,8 @@ $$\text{Huber Loss} = \frac{1}{n} \sum_{i=1}^{n} \left\{
182
181
\right. $$
183
182
184
183
Where:
185
- - \( n \) is the number of samples.
186
- - \( delta\) is a threshold parameter.
184
+ - ` n ` is the number of samples.
185
+ - $\ delta$ is a threshold parameter.
187
186
188
187
** Advantages:**
189
188
- Provides a smooth loss function.
@@ -214,7 +213,7 @@ The Log-Cosh loss is defined as:
214
213
$$ \text{Log-Cosh Loss} = \frac{1}{n} \sum_{i=1}^{n} \log(\cosh(y_i - \hat{y}_i)) $$
215
214
216
215
Where:
217
- - \( n \) is the number of samples.
216
+ - ` n ` is the number of samples.
218
217
219
218
** Advantages:**
220
219
- Smooth and differentiable everywhere.
@@ -234,5 +233,3 @@ def logcosh_loss(y_true, y_pred):
234
233
```
235
234
236
235
These implementations provide various options for cost functions suitable for different machine learning tasks. Each function has its advantages and disadvantages, making them suitable for different scenarios and problem domains.
237
-
238
- ---
0 commit comments