Skip to content

Commit 56e9721

Browse files
committed
added types of cost functions issue#625
1 parent 3f999a6 commit 56e9721

File tree

2 files changed

+228
-0
lines changed

2 files changed

+228
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,227 @@
1+
2+
# Cost Functions in Machine Learning
3+
4+
Cost functions, also known as loss functions, play a crucial role in training machine learning models. They measure how well the model performs on the training data by quantifying the difference between predicted and actual values. Different types of cost functions are used depending on the problem domain and the nature of the data.
5+
6+
## Types of Cost Functions
7+
8+
### 1. Mean Squared Error (MSE)
9+
10+
**Explanation:**
11+
MSE is one of the most commonly used cost functions, particularly in regression problems. It calculates the average squared difference between the predicted and actual values.
12+
13+
**Mathematical Formulation:**
14+
The MSE is defined as:
15+
$$ MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$
16+
Where:
17+
- \( n \) is the number of samples.
18+
- \( y_i \) is the actual value.
19+
- \( y^i\) is the predicted value.
20+
21+
**Advantages:**
22+
- Sensitive to large errors due to squaring.
23+
- Differentiable and convex, facilitating optimization.
24+
25+
**Disadvantages:**
26+
- Sensitive to outliers, as the squared term amplifies their impact.
27+
28+
**Python Implementation:**
29+
```python
30+
import numpy as np
31+
32+
def mean_squared_error(y_true, y_pred):
33+
n = len(y_true)
34+
return np.mean((y_true - y_pred) ** 2)
35+
```
36+
37+
### 2. Mean Absolute Error (MAE)
38+
39+
**Explanation:**
40+
MAE is another commonly used cost function for regression tasks. It measures the average absolute difference between predicted and actual values.
41+
42+
**Mathematical Formulation:**
43+
The MAE is defined as:
44+
$$ MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| $$
45+
Where:
46+
- \( n \) is the number of samples.
47+
- \( y_i \) is the actual value.
48+
- \( y^i\) is the predicted value.
49+
50+
**Advantages:**
51+
- Less sensitive to outliers compared to MSE.
52+
- Provides a linear error term, which can be easier to interpret.
53+
54+
55+
**Disadvantages:**
56+
- Not differentiable at zero, which can complicate optimization.
57+
58+
**Python Implementation:**
59+
```python
60+
import numpy as np
61+
62+
def mean_absolute_error(y_true, y_pred):
63+
n = len(y_true)
64+
return np.mean(np.abs(y_true - y_pred))
65+
```
66+
67+
### 3. Cross-Entropy Loss (Binary)
68+
69+
**Explanation:**
70+
Cross-entropy loss is commonly used in binary classification problems. It measures the dissimilarity between the true and predicted probability distributions.
71+
72+
**Mathematical Formulation:**
73+
For binary classification, the cross-entropy loss is defined as:
74+
$$ \text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)] $$
75+
Where:
76+
- \( n \) is the number of samples.
77+
- \( y_i \) is the actual class label (0 or 1).
78+
- \( y^i\) is the predicted probability of the positive class.
79+
80+
81+
**Advantages:**
82+
- Penalizes confident wrong predictions heavily.
83+
- Suitable for probabilistic outputs.
84+
85+
**Disadvantages:**
86+
- Sensitive to class imbalance.
87+
88+
**Python Implementation:**
89+
```python
90+
import numpy as np
91+
92+
def binary_cross_entropy(y_true, y_pred):
93+
n = len(y_true)
94+
return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
95+
```
96+
97+
### 4. Cross-Entropy Loss (Multiclass)
98+
99+
**Explanation:**
100+
For multiclass classification problems, the cross-entropy loss is adapted to handle multiple classes.
101+
102+
**Mathematical Formulation:**
103+
The multiclass cross-entropy loss is defined as:
104+
$$ \text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} \sum_{c=1}^{C} y_{i,c} \log(\hat{y}_{i,c}) $$
105+
Where:
106+
- \( n \) is the number of samples.
107+
- \( C \) is the number of classes.
108+
- \( y_{i,c} \) is the indicator function for the true class of sample \( i \).
109+
110+
- (y^i,c) is the predicted probability of sample \( i \) belonging to class \( c \).
111+
112+
**Advantages:**
113+
- Handles multiple classes effectively.
114+
- Encourages the model to assign high probabilities to the correct classes.
115+
116+
**Disadvantages:**
117+
- Requires one-hot encoding for class labels, which can increase computational complexity.
118+
119+
**Python Implementation:**
120+
```python
121+
import numpy as np
122+
123+
def categorical_cross_entropy(y_true, y_pred):
124+
n = len(y_true)
125+
return -np.mean(np.sum(y_true * np.log(y_pred), axis=1))
126+
```
127+
128+
### 5. Hinge Loss (SVM)
129+
130+
**Explanation:**
131+
Hinge loss is commonly used in support vector machines (SVMs) for binary classification tasks. It penalizes misclassifications by a linear margin.
132+
133+
**Mathematical Formulation:**
134+
For binary classification, the hinge loss is defined as:
135+
$$ \text{Hinge Loss} = \frac{1}{n} \sum_{i=1}^{n} \max(0, 1 - y_i \cdot \hat{y}_i) $$
136+
Where:
137+
- \( n \) is the number of samples.
138+
- \( y_i \) is the actual class label (-1 or 1).
139+
- \( \hat{y}_i \) is the predicted score for sample \( i \).
140+
141+
**Advantages:**
142+
- Encourages margin maximization in SVMs.
143+
- Robust to outliers due to the linear penalty.
144+
145+
**Disadvantages:**
146+
- Not differentiable at the margin, which can complicate optimization.
147+
148+
**Python Implementation:**
149+
```python
150+
import numpy as np
151+
152+
def hinge_loss(y_true, y_pred):
153+
n = len(y_true)
154+
loss = np.maximum(0, 1 - y_true * y_pred)
155+
return np.mean(loss)
156+
```
157+
158+
### 6. Huber Loss
159+
160+
**Explanation:**
161+
Huber loss is a combination of MSE and MAE, providing a compromise between the two. It is less sensitive to outliers than MSE and provides a smooth transition to MAE for large errors.
162+
163+
**Mathematical Formulation:**
164+
165+
The Huber loss is defined as:
166+
167+
168+
$$
169+
\text{Huber Loss} = \frac{1}{n} \sum_{i=1}^{n} \left\{
170+
\begin{array}{ll}
171+
\frac{1}{2} (y_i - \hat{y}_i)^2 & \text{if } |y_i - \hat{y}_i| \leq \delta \\
172+
\delta(|y_i - \hat{y}_i| - \frac{1}{2} \delta) & \text{otherwise}
173+
\end{array}
174+
\right.
175+
$$
176+
Where:
177+
- \( n \) is the number of samples.
178+
- \( \delta \) is a threshold parameter.
179+
180+
**Advantages:**
181+
- Provides a smooth loss function.
182+
- Less sensitive to outliers than MSE.
183+
184+
**Disadvantages:**
185+
- Requires tuning of the threshold parameter.
186+
187+
**Python Implementation:**
188+
```python
189+
import numpy as np
190+
191+
def huber_loss(y_true, y_pred, delta):
192+
error = y_true - y_pred
193+
loss = np.where(np.abs(error) <= delta, 0.5 * error ** 2, delta * (np.abs(error) - 0.5 * delta))
194+
return np.mean(loss)
195+
```
196+
197+
### 7. Log-Cosh Loss
198+
199+
**Explanation:**
200+
Log-Cosh loss is a smooth approximation of the MAE and is less sensitive to outliers than MSE. It provides a smooth transition from quadratic for small errors to linear for large errors.
201+
202+
**Mathematical Formulation:**
203+
The Log-Cosh loss is defined as:
204+
$$ \text{Log-Cosh Loss} = \frac{1}{n} \sum_{i=1}^{n} \log(\cosh(y_i - \hat{y}_i)) $$
205+
Where:
206+
- \( n \) is the number of samples.
207+
208+
**Advantages:**
209+
- Smooth and differentiable everywhere.
210+
- Less sensitive to outliers.
211+
212+
**Disadvantages:**
213+
- Computationally more expensive than simple losses like MSE.
214+
215+
**Python Implementation:**
216+
```python
217+
import numpy as np
218+
219+
def logcosh_loss(y_true, y_pred):
220+
error = y_true - y_pred
221+
loss = np.log(np.cosh(error))
222+
return np.mean(loss)
223+
```
224+
225+
These implementations provide various options for cost functions suitable for different machine learning tasks. Each function has its advantages and disadvantages, making them suitable for different scenarios and problem domains.
226+
227+
---

contrib/machine-learning/index.md

+1
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,4 @@
1010
- [PyTorch.md](pytorch.md)
1111
- [Types of optimizers](Types_of_optimizers.md)
1212
- [Logistic Regression](logistic-regression.md)
13+
-[Types_of_Cost_Functions](Types_of_Cost_Functions.md)

0 commit comments

Comments
 (0)