Skip to content

Commit 1915923

Browse files
authored
Merge pull request animator#726 from manishh12/main
Added types of cost functions
2 parents f671f4d + f125cf4 commit 1915923

File tree

2 files changed

+236
-0
lines changed

2 files changed

+236
-0
lines changed
+235
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
2+
# Cost Functions in Machine Learning
3+
4+
Cost functions, also known as loss functions, play a crucial role in training machine learning models. They measure how well the model performs on the training data by quantifying the difference between predicted and actual values. Different types of cost functions are used depending on the problem domain and the nature of the data.
5+
6+
## Types of Cost Functions
7+
8+
### 1. Mean Squared Error (MSE)
9+
10+
**Explanation:**
11+
MSE is one of the most commonly used cost functions, particularly in regression problems. It calculates the average squared difference between the predicted and actual values.
12+
13+
**Mathematical Formulation:**
14+
The MSE is defined as:
15+
$$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$
16+
Where:
17+
- `n` is the number of samples.
18+
- $y_i$ is the actual value.
19+
- $\hat{y}_i$ is the predicted value.
20+
21+
**Advantages:**
22+
- Sensitive to large errors due to squaring.
23+
- Differentiable and convex, facilitating optimization.
24+
25+
**Disadvantages:**
26+
- Sensitive to outliers, as the squared term amplifies their impact.
27+
28+
**Python Implementation:**
29+
```python
30+
import numpy as np
31+
32+
def mean_squared_error(y_true, y_pred):
33+
n = len(y_true)
34+
return np.mean((y_true - y_pred) ** 2)
35+
```
36+
37+
### 2. Mean Absolute Error (MAE)
38+
39+
**Explanation:**
40+
MAE is another commonly used cost function for regression tasks. It measures the average absolute difference between predicted and actual values.
41+
42+
**Mathematical Formulation:**
43+
The MAE is defined as:
44+
$$MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$
45+
Where:
46+
- `n` is the number of samples.
47+
- $y_i$ is the actual value.
48+
- $\hat{y}_i$ is the predicted value.
49+
50+
**Advantages:**
51+
- Less sensitive to outliers compared to MSE.
52+
- Provides a linear error term, which can be easier to interpret.
53+
54+
55+
**Disadvantages:**
56+
- Not differentiable at zero, which can complicate optimization.
57+
58+
**Python Implementation:**
59+
```python
60+
import numpy as np
61+
62+
def mean_absolute_error(y_true, y_pred):
63+
n = len(y_true)
64+
return np.mean(np.abs(y_true - y_pred))
65+
```
66+
67+
### 3. Cross-Entropy Loss (Binary)
68+
69+
**Explanation:**
70+
Cross-entropy loss is commonly used in binary classification problems. It measures the dissimilarity between the true and predicted probability distributions.
71+
72+
**Mathematical Formulation:**
73+
74+
For binary classification, the cross-entropy loss is defined as:
75+
76+
$$\text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)]$$
77+
78+
Where:
79+
- `n` is the number of samples.
80+
- $y_i$ is the actual class label (0 or 1).
81+
- $\hat{y}_i$ is the predicted probability of the positive class.
82+
83+
84+
**Advantages:**
85+
- Penalizes confident wrong predictions heavily.
86+
- Suitable for probabilistic outputs.
87+
88+
**Disadvantages:**
89+
- Sensitive to class imbalance.
90+
91+
**Python Implementation:**
92+
```python
93+
import numpy as np
94+
95+
def binary_cross_entropy(y_true, y_pred):
96+
n = len(y_true)
97+
return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
98+
```
99+
100+
### 4. Cross-Entropy Loss (Multiclass)
101+
102+
**Explanation:**
103+
For multiclass classification problems, the cross-entropy loss is adapted to handle multiple classes.
104+
105+
**Mathematical Formulation:**
106+
107+
The multiclass cross-entropy loss is defined as:
108+
109+
$$\text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} \sum_{c=1}^{C} y_{i,c} \log(\hat{y}_{i,c})$$
110+
111+
Where:
112+
- `n` is the number of samples.
113+
- `C` is the number of classes.
114+
- $y_{i,c}$ is the indicator function for the true class of sample `i`.
115+
- $\hat{y}_{i,c}$ is the predicted probability of sample `i` belonging to class `c`.
116+
117+
**Advantages:**
118+
- Handles multiple classes effectively.
119+
- Encourages the model to assign high probabilities to the correct classes.
120+
121+
**Disadvantages:**
122+
- Requires one-hot encoding for class labels, which can increase computational complexity.
123+
124+
**Python Implementation:**
125+
```python
126+
import numpy as np
127+
128+
def categorical_cross_entropy(y_true, y_pred):
129+
n = len(y_true)
130+
return -np.mean(np.sum(y_true * np.log(y_pred), axis=1))
131+
```
132+
133+
### 5. Hinge Loss (SVM)
134+
135+
**Explanation:**
136+
Hinge loss is commonly used in support vector machines (SVMs) for binary classification tasks. It penalizes misclassifications by a linear margin.
137+
138+
**Mathematical Formulation:**
139+
140+
For binary classification, the hinge loss is defined as:
141+
142+
$$\text{Hinge Loss} = \frac{1}{n} \sum_{i=1}^{n} \max(0, 1 - y_i \cdot \hat{y}_i)$$
143+
144+
Where:
145+
- `n` is the number of samples.
146+
- $y_i$ is the actual class label (-1 or 1).
147+
- $\hat{y}_i$ is the predicted score for sample \( i \).
148+
149+
**Advantages:**
150+
- Encourages margin maximization in SVMs.
151+
- Robust to outliers due to the linear penalty.
152+
153+
**Disadvantages:**
154+
- Not differentiable at the margin, which can complicate optimization.
155+
156+
**Python Implementation:**
157+
```python
158+
import numpy as np
159+
160+
def hinge_loss(y_true, y_pred):
161+
n = len(y_true)
162+
loss = np.maximum(0, 1 - y_true * y_pred)
163+
return np.mean(loss)
164+
```
165+
166+
### 6. Huber Loss
167+
168+
**Explanation:**
169+
Huber loss is a combination of MSE and MAE, providing a compromise between the two. It is less sensitive to outliers than MSE and provides a smooth transition to MAE for large errors.
170+
171+
**Mathematical Formulation:**
172+
173+
The Huber loss is defined as:
174+
175+
176+
$$\text{Huber Loss} = \frac{1}{n} \sum_{i=1}^{n} \left\{
177+
\begin{array}{ll}
178+
\frac{1}{2} (y_i - \hat{y}_i)^2 & \text{if } |y_i - \hat{y}_i| \leq \delta \\
179+
\delta(|y_i - \hat{y}_i| - \frac{1}{2} \delta) & \text{otherwise}
180+
\end{array}
181+
\right.$$
182+
183+
Where:
184+
- `n` is the number of samples.
185+
- $\delta$ is a threshold parameter.
186+
187+
**Advantages:**
188+
- Provides a smooth loss function.
189+
- Less sensitive to outliers than MSE.
190+
191+
**Disadvantages:**
192+
- Requires tuning of the threshold parameter.
193+
194+
**Python Implementation:**
195+
```python
196+
import numpy as np
197+
198+
def huber_loss(y_true, y_pred, delta):
199+
error = y_true - y_pred
200+
loss = np.where(np.abs(error) <= delta, 0.5 * error ** 2, delta * (np.abs(error) - 0.5 * delta))
201+
return np.mean(loss)
202+
```
203+
204+
### 7. Log-Cosh Loss
205+
206+
**Explanation:**
207+
Log-Cosh loss is a smooth approximation of the MAE and is less sensitive to outliers than MSE. It provides a smooth transition from quadratic for small errors to linear for large errors.
208+
209+
**Mathematical Formulation:**
210+
211+
The Log-Cosh loss is defined as:
212+
213+
$$\text{Log-Cosh Loss} = \frac{1}{n} \sum_{i=1}^{n} \log(\cosh(y_i - \hat{y}_i))$$
214+
215+
Where:
216+
- `n` is the number of samples.
217+
218+
**Advantages:**
219+
- Smooth and differentiable everywhere.
220+
- Less sensitive to outliers.
221+
222+
**Disadvantages:**
223+
- Computationally more expensive than simple losses like MSE.
224+
225+
**Python Implementation:**
226+
```python
227+
import numpy as np
228+
229+
def logcosh_loss(y_true, y_pred):
230+
error = y_true - y_pred
231+
loss = np.log(np.cosh(error))
232+
return np.mean(loss)
233+
```
234+
235+
These implementations provide various options for cost functions suitable for different machine learning tasks. Each function has its advantages and disadvantages, making them suitable for different scenarios and problem domains.

contrib/machine-learning/index.md

+1
Original file line numberDiff line numberDiff line change
@@ -13,5 +13,6 @@
1313
- [PyTorch.md](pytorch.md)
1414
- [Types of optimizers](types-of-optimizers.md)
1515
- [Logistic Regression](logistic-regression.md)
16+
- [Types_of_Cost_Functions](cost-functions.md)
1617
- [Clustering](clustering.md)
1718
- [Grid Search](grid-search.md)

0 commit comments

Comments
 (0)