Skip to content

Commit f125cf4

Browse files
authored
Update cost-functions.md
1 parent c774608 commit f125cf4

File tree

1 file changed

+19
-22
lines changed

1 file changed

+19
-22
lines changed

contrib/machine-learning/cost-functions.md

+19-22
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@ MSE is one of the most commonly used cost functions, particularly in regression
1414
The MSE is defined as:
1515
$$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$
1616
Where:
17-
- \( n \) is the number of samples.
18-
- \( y_i \) is the actual value.
19-
- \( y^i\) is the predicted value.
17+
- `n` is the number of samples.
18+
- $y_i$ is the actual value.
19+
- $\hat{y}_i$ is the predicted value.
2020

2121
**Advantages:**
2222
- Sensitive to large errors due to squaring.
@@ -43,9 +43,9 @@ MAE is another commonly used cost function for regression tasks. It measures the
4343
The MAE is defined as:
4444
$$MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$
4545
Where:
46-
- \( n \) is the number of samples.
47-
- \( y_i \) is the actual value.
48-
- \( y^i\) is the predicted value.
46+
- `n` is the number of samples.
47+
- $y_i$ is the actual value.
48+
- $\hat{y}_i$ is the predicted value.
4949

5050
**Advantages:**
5151
- Less sensitive to outliers compared to MSE.
@@ -76,9 +76,9 @@ For binary classification, the cross-entropy loss is defined as:
7676
$$\text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)]$$
7777

7878
Where:
79-
- \( n \) is the number of samples.
80-
- \( y_i \) is the actual class label (0 or 1).
81-
- \( y^i\) is the predicted probability of the positive class.
79+
- `n` is the number of samples.
80+
- $y_i$ is the actual class label (0 or 1).
81+
- $\hat{y}_i$ is the predicted probability of the positive class.
8282

8383

8484
**Advantages:**
@@ -109,11 +109,10 @@ The multiclass cross-entropy loss is defined as:
109109
$$\text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} \sum_{c=1}^{C} y_{i,c} \log(\hat{y}_{i,c})$$
110110

111111
Where:
112-
- \( n \) is the number of samples.
113-
- \( C \) is the number of classes.
114-
- \( y_{i,c} \) is the indicator function for the true class of sample \( i \).
115-
116-
- (y^i,c) is the predicted probability of sample \( i \) belonging to class \( c \).
112+
- `n` is the number of samples.
113+
- `C` is the number of classes.
114+
- $y_{i,c}$ is the indicator function for the true class of sample `i`.
115+
- $\hat{y}_{i,c}$ is the predicted probability of sample `i` belonging to class `c`.
117116

118117
**Advantages:**
119118
- Handles multiple classes effectively.
@@ -143,9 +142,9 @@ For binary classification, the hinge loss is defined as:
143142
$$\text{Hinge Loss} = \frac{1}{n} \sum_{i=1}^{n} \max(0, 1 - y_i \cdot \hat{y}_i)$$
144143

145144
Where:
146-
- \( n \) is the number of samples.
147-
- \( y_i \) is the actual class label (-1 or 1).
148-
- \( \hat{y}_i \) is the predicted score for sample \( i \).
145+
- `n` is the number of samples.
146+
- $y_i$ is the actual class label (-1 or 1).
147+
- $\hat{y}_i$ is the predicted score for sample \( i \).
149148

150149
**Advantages:**
151150
- Encourages margin maximization in SVMs.
@@ -182,8 +181,8 @@ $$\text{Huber Loss} = \frac{1}{n} \sum_{i=1}^{n} \left\{
182181
\right.$$
183182

184183
Where:
185-
- \( n \) is the number of samples.
186-
- \(delta\) is a threshold parameter.
184+
- `n` is the number of samples.
185+
- $\delta$ is a threshold parameter.
187186

188187
**Advantages:**
189188
- Provides a smooth loss function.
@@ -214,7 +213,7 @@ The Log-Cosh loss is defined as:
214213
$$\text{Log-Cosh Loss} = \frac{1}{n} \sum_{i=1}^{n} \log(\cosh(y_i - \hat{y}_i))$$
215214

216215
Where:
217-
- \( n \) is the number of samples.
216+
- `n` is the number of samples.
218217

219218
**Advantages:**
220219
- Smooth and differentiable everywhere.
@@ -234,5 +233,3 @@ def logcosh_loss(y_true, y_pred):
234233
```
235234

236235
These implementations provide various options for cost functions suitable for different machine learning tasks. Each function has its advantages and disadvantages, making them suitable for different scenarios and problem domains.
237-
238-
---

0 commit comments

Comments
 (0)