@@ -12,7 +12,7 @@ MSE is one of the most commonly used cost functions, particularly in regression
12
12
13
13
** Mathematical Formulation:**
14
14
The MSE is defined as:
15
- $$ MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$
15
+ $$ MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$
16
16
Where:
17
17
- \( n \) is the number of samples.
18
18
- \( y_i \) is the actual value.
@@ -41,7 +41,7 @@ MAE is another commonly used cost function for regression tasks. It measures the
41
41
42
42
** Mathematical Formulation:**
43
43
The MAE is defined as:
44
- $$ MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| $$
44
+ $$ MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| $$
45
45
Where:
46
46
- \( n \) is the number of samples.
47
47
- \( y_i \) is the actual value.
@@ -70,8 +70,11 @@ def mean_absolute_error(y_true, y_pred):
70
70
Cross-entropy loss is commonly used in binary classification problems. It measures the dissimilarity between the true and predicted probability distributions.
71
71
72
72
** Mathematical Formulation:**
73
+
73
74
For binary classification, the cross-entropy loss is defined as:
74
- $$ \text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)] $$
75
+
76
+ $$ \text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)] $$
77
+
75
78
Where:
76
79
- \( n \) is the number of samples.
77
80
- \( y_i \) is the actual class label (0 or 1).
@@ -100,8 +103,11 @@ def binary_cross_entropy(y_true, y_pred):
100
103
For multiclass classification problems, the cross-entropy loss is adapted to handle multiple classes.
101
104
102
105
** Mathematical Formulation:**
106
+
103
107
The multiclass cross-entropy loss is defined as:
104
- $$ \text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} \sum_{c=1}^{C} y_{i,c} \log(\hat{y}_{i,c}) $$
108
+
109
+ $$ \text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} \sum_{c=1}^{C} y_{i,c} \log(\hat{y}_{i,c}) $$
110
+
105
111
Where:
106
112
- \( n \) is the number of samples.
107
113
- \( C \) is the number of classes.
@@ -131,8 +137,11 @@ def categorical_cross_entropy(y_true, y_pred):
131
137
Hinge loss is commonly used in support vector machines (SVMs) for binary classification tasks. It penalizes misclassifications by a linear margin.
132
138
133
139
** Mathematical Formulation:**
140
+
134
141
For binary classification, the hinge loss is defined as:
135
- $$ \text{Hinge Loss} = \frac{1}{n} \sum_{i=1}^{n} \max(0, 1 - y_i \cdot \hat{y}_i) $$
142
+
143
+ $$ \text{Hinge Loss} = \frac{1}{n} \sum_{i=1}^{n} \max(0, 1 - y_i \cdot \hat{y}_i) $$
144
+
136
145
Where:
137
146
- \( n \) is the number of samples.
138
147
- \( y_i \) is the actual class label (-1 or 1).
@@ -165,17 +174,16 @@ Huber loss is a combination of MSE and MAE, providing a compromise between the t
165
174
The Huber loss is defined as:
166
175
167
176
168
- $$
169
- \text{Huber Loss} = \frac{1}{n} \sum_{i=1}^{n} \left\{
177
+ $$ \text{Huber Loss} = \frac{1}{n} \sum_{i=1}^{n} \left\{
170
178
\begin{array}{ll}
171
179
\frac{1}{2} (y_i - \hat{y}_i)^2 & \text{if } |y_i - \hat{y}_i| \leq \delta \\
172
180
\delta(|y_i - \hat{y}_i| - \frac{1}{2} \delta) & \text{otherwise}
173
181
\end{array}
174
- \right.
175
- $$
182
+ \right. $$
183
+
176
184
Where:
177
185
- \( n \) is the number of samples.
178
- - \( \ delta \) is a threshold parameter.
186
+ - \( delta\) is a threshold parameter.
179
187
180
188
** Advantages:**
181
189
- Provides a smooth loss function.
@@ -200,8 +208,11 @@ def huber_loss(y_true, y_pred, delta):
200
208
Log-Cosh loss is a smooth approximation of the MAE and is less sensitive to outliers than MSE. It provides a smooth transition from quadratic for small errors to linear for large errors.
201
209
202
210
** Mathematical Formulation:**
211
+
203
212
The Log-Cosh loss is defined as:
204
- $$ \text{Log-Cosh Loss} = \frac{1}{n} \sum_{i=1}^{n} \log(\cosh(y_i - \hat{y}_i)) $$
213
+
214
+ $$ \text{Log-Cosh Loss} = \frac{1}{n} \sum_{i=1}^{n} \log(\cosh(y_i - \hat{y}_i)) $$
215
+
205
216
Where:
206
217
- \( n \) is the number of samples.
207
218
@@ -224,4 +235,4 @@ def logcosh_loss(y_true, y_pred):
224
235
225
236
These implementations provide various options for cost functions suitable for different machine learning tasks. Each function has its advantages and disadvantages, making them suitable for different scenarios and problem domains.
226
237
227
- ---
238
+ ---
0 commit comments