From 56e972133ff33ca0532c6d0637fb465425b41c09 Mon Sep 17 00:00:00 2001
From: manishh12 <manishdid360@gmail.com>
Date: Fri, 31 May 2024 12:02:09 +0530
Subject: [PATCH 1/4] added types of cost functions issue#625

---
 .../Types_of_Cost_Functions.md                | 227 ++++++++++++++++++
 contrib/machine-learning/index.md             |   1 +
 2 files changed, 228 insertions(+)
 create mode 100644 contrib/machine-learning/Types_of_Cost_Functions.md

diff --git a/contrib/machine-learning/Types_of_Cost_Functions.md b/contrib/machine-learning/Types_of_Cost_Functions.md
new file mode 100644
index 00000000..547a05e3
--- /dev/null
+++ b/contrib/machine-learning/Types_of_Cost_Functions.md
@@ -0,0 +1,227 @@
+
+# Cost Functions in Machine Learning
+
+Cost functions, also known as loss functions, play a crucial role in training machine learning models. They measure how well the model performs on the training data by quantifying the difference between predicted and actual values. Different types of cost functions are used depending on the problem domain and the nature of the data.
+
+## Types of Cost Functions
+
+### 1. Mean Squared Error (MSE)
+
+**Explanation:**
+MSE is one of the most commonly used cost functions, particularly in regression problems. It calculates the average squared difference between the predicted and actual values.
+
+**Mathematical Formulation:**
+The MSE is defined as:
+$$ MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$
+Where:
+- \( n \) is the number of samples.
+- \( y_i \) is the actual value.
+- \( y^i\) is the predicted value.
+
+**Advantages:**
+- Sensitive to large errors due to squaring.
+- Differentiable and convex, facilitating optimization.
+
+**Disadvantages:**
+- Sensitive to outliers, as the squared term amplifies their impact.
+
+**Python Implementation:**
+```python
+import numpy as np
+
+def mean_squared_error(y_true, y_pred):
+    n = len(y_true)
+    return np.mean((y_true - y_pred) ** 2)
+```
+
+### 2. Mean Absolute Error (MAE)
+
+**Explanation:**
+MAE is another commonly used cost function for regression tasks. It measures the average absolute difference between predicted and actual values.
+
+**Mathematical Formulation:**
+The MAE is defined as:
+$$ MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| $$
+Where:
+- \( n \) is the number of samples.
+- \( y_i \) is the actual value.
+- \( y^i\) is the predicted value.
+
+**Advantages:**
+- Less sensitive to outliers compared to MSE.
+- Provides a linear error term, which can be easier to interpret.
+
+
+**Disadvantages:**
+- Not differentiable at zero, which can complicate optimization.
+
+**Python Implementation:**
+```python
+import numpy as np
+
+def mean_absolute_error(y_true, y_pred):
+    n = len(y_true)
+    return np.mean(np.abs(y_true - y_pred))
+```
+
+### 3. Cross-Entropy Loss (Binary)
+
+**Explanation:**
+Cross-entropy loss is commonly used in binary classification problems. It measures the dissimilarity between the true and predicted probability distributions.
+
+**Mathematical Formulation:**
+For binary classification, the cross-entropy loss is defined as:
+$$ \text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)] $$
+Where:
+- \( n \) is the number of samples.
+- \( y_i \) is the actual class label (0 or 1).
+- \( y^i\)  is the predicted probability of the positive class.
+
+
+**Advantages:**
+- Penalizes confident wrong predictions heavily.
+- Suitable for probabilistic outputs.
+
+**Disadvantages:**
+- Sensitive to class imbalance.
+
+**Python Implementation:**
+```python
+import numpy as np
+
+def binary_cross_entropy(y_true, y_pred):
+    n = len(y_true)
+    return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
+```
+
+### 4. Cross-Entropy Loss (Multiclass)
+
+**Explanation:**
+For multiclass classification problems, the cross-entropy loss is adapted to handle multiple classes.
+
+**Mathematical Formulation:**
+The multiclass cross-entropy loss is defined as:
+$$ \text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} \sum_{c=1}^{C} y_{i,c} \log(\hat{y}_{i,c}) $$
+Where:
+- \( n \) is the number of samples.
+- \( C \) is the number of classes.
+- \( y_{i,c} \) is the indicator function for the true class of sample \( i \).
+
+- (y^i,c) is the predicted probability of sample \( i \) belonging to class \( c \).
+
+**Advantages:**
+- Handles multiple classes effectively.
+- Encourages the model to assign high probabilities to the correct classes.
+
+**Disadvantages:**
+- Requires one-hot encoding for class labels, which can increase computational complexity.
+
+**Python Implementation:**
+```python
+import numpy as np
+
+def categorical_cross_entropy(y_true, y_pred):
+    n = len(y_true)
+    return -np.mean(np.sum(y_true * np.log(y_pred), axis=1))
+```
+
+### 5. Hinge Loss (SVM)
+
+**Explanation:**
+Hinge loss is commonly used in support vector machines (SVMs) for binary classification tasks. It penalizes misclassifications by a linear margin.
+
+**Mathematical Formulation:**
+For binary classification, the hinge loss is defined as:
+$$ \text{Hinge Loss} = \frac{1}{n} \sum_{i=1}^{n} \max(0, 1 - y_i \cdot \hat{y}_i) $$
+Where:
+- \( n \) is the number of samples.
+- \( y_i \) is the actual class label (-1 or 1).
+- \( \hat{y}_i \) is the predicted score for sample \( i \).
+
+**Advantages:**
+- Encourages margin maximization in SVMs.
+- Robust to outliers due to the linear penalty.
+
+**Disadvantages:**
+- Not differentiable at the margin, which can complicate optimization.
+
+**Python Implementation:**
+```python
+import numpy as np
+
+def hinge_loss(y_true, y_pred):
+    n = len(y_true)
+    loss = np.maximum(0, 1 - y_true * y_pred)
+    return np.mean(loss)
+```
+
+### 6. Huber Loss
+
+**Explanation:**
+Huber loss is a combination of MSE and MAE, providing a compromise between the two. It is less sensitive to outliers than MSE and provides a smooth transition to MAE for large errors.
+
+**Mathematical Formulation:**
+
+The Huber loss is defined as:
+
+
+$$
+\text{Huber Loss} = \frac{1}{n} \sum_{i=1}^{n} \left\{
+\begin{array}{ll}
+\frac{1}{2} (y_i - \hat{y}_i)^2 & \text{if } |y_i - \hat{y}_i| \leq \delta \\
+\delta(|y_i - \hat{y}_i| - \frac{1}{2} \delta) & \text{otherwise}
+\end{array}
+\right.
+$$
+Where:
+- \( n \) is the number of samples.
+- \( \delta \) is a threshold parameter.
+
+**Advantages:**
+- Provides a smooth loss function.
+- Less sensitive to outliers than MSE.
+
+**Disadvantages:**
+- Requires tuning of the threshold parameter.
+
+**Python Implementation:**
+```python
+import numpy as np
+
+def huber_loss(y_true, y_pred, delta):
+    error = y_true - y_pred
+    loss = np.where(np.abs(error) <= delta, 0.5 * error ** 2, delta * (np.abs(error) - 0.5 * delta))
+    return np.mean(loss)
+```
+
+### 7. Log-Cosh Loss
+
+**Explanation:**
+Log-Cosh loss is a smooth approximation of the MAE and is less sensitive to outliers than MSE. It provides a smooth transition from quadratic for small errors to linear for large errors.
+
+**Mathematical Formulation:**
+The Log-Cosh loss is defined as:
+$$ \text{Log-Cosh Loss} = \frac{1}{n} \sum_{i=1}^{n} \log(\cosh(y_i - \hat{y}_i)) $$
+Where:
+- \( n \) is the number of samples.
+
+**Advantages:**
+- Smooth and differentiable everywhere.
+- Less sensitive to outliers.
+
+**Disadvantages:**
+- Computationally more expensive than simple losses like MSE.
+
+**Python Implementation:**
+```python
+import numpy as np
+
+def logcosh_loss(y_true, y_pred):
+    error = y_true - y_pred
+    loss = np.log(np.cosh(error))
+    return np.mean(loss)
+```
+
+These implementations provide various options for cost functions suitable for different machine learning tasks. Each function has its advantages and disadvantages, making them suitable for different scenarios and problem domains.
+
+--- 
\ No newline at end of file
diff --git a/contrib/machine-learning/index.md b/contrib/machine-learning/index.md
index 46100dfb..cfe9a67d 100644
--- a/contrib/machine-learning/index.md
+++ b/contrib/machine-learning/index.md
@@ -10,3 +10,4 @@
 - [PyTorch.md](pytorch.md)
 - [Types of optimizers](Types_of_optimizers.md)
 - [Logistic Regression](logistic-regression.md)
+-[Types_of_Cost_Functions](Types_of_Cost_Functions.md)

From d23389a8ea17ca0d16b96b14b710ac41606e8383 Mon Sep 17 00:00:00 2001
From: Manish kumar gupta <97523900+manishh12@users.noreply.github.com>
Date: Fri, 31 May 2024 12:07:34 +0530
Subject: [PATCH 2/4] Updated maths formulas

---
 .../Types_of_Cost_Functions.md                | 35 ++++++++++++-------
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/contrib/machine-learning/Types_of_Cost_Functions.md b/contrib/machine-learning/Types_of_Cost_Functions.md
index 547a05e3..f6507268 100644
--- a/contrib/machine-learning/Types_of_Cost_Functions.md
+++ b/contrib/machine-learning/Types_of_Cost_Functions.md
@@ -12,7 +12,7 @@ MSE is one of the most commonly used cost functions, particularly in regression
 
 **Mathematical Formulation:**
 The MSE is defined as:
-$$ MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$
+$$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$
 Where:
 - \( n \) is the number of samples.
 - \( y_i \) is the actual value.
@@ -41,7 +41,7 @@ MAE is another commonly used cost function for regression tasks. It measures the
 
 **Mathematical Formulation:**
 The MAE is defined as:
-$$ MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| $$
+$$MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$
 Where:
 - \( n \) is the number of samples.
 - \( y_i \) is the actual value.
@@ -70,8 +70,11 @@ def mean_absolute_error(y_true, y_pred):
 Cross-entropy loss is commonly used in binary classification problems. It measures the dissimilarity between the true and predicted probability distributions.
 
 **Mathematical Formulation:**
+
 For binary classification, the cross-entropy loss is defined as:
-$$ \text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)] $$
+
+$$\text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)]$$
+
 Where:
 - \( n \) is the number of samples.
 - \( y_i \) is the actual class label (0 or 1).
@@ -100,8 +103,11 @@ def binary_cross_entropy(y_true, y_pred):
 For multiclass classification problems, the cross-entropy loss is adapted to handle multiple classes.
 
 **Mathematical Formulation:**
+
 The multiclass cross-entropy loss is defined as:
-$$ \text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} \sum_{c=1}^{C} y_{i,c} \log(\hat{y}_{i,c}) $$
+
+$$\text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} \sum_{c=1}^{C} y_{i,c} \log(\hat{y}_{i,c})$$
+
 Where:
 - \( n \) is the number of samples.
 - \( C \) is the number of classes.
@@ -131,8 +137,11 @@ def categorical_cross_entropy(y_true, y_pred):
 Hinge loss is commonly used in support vector machines (SVMs) for binary classification tasks. It penalizes misclassifications by a linear margin.
 
 **Mathematical Formulation:**
+
 For binary classification, the hinge loss is defined as:
-$$ \text{Hinge Loss} = \frac{1}{n} \sum_{i=1}^{n} \max(0, 1 - y_i \cdot \hat{y}_i) $$
+
+$$\text{Hinge Loss} = \frac{1}{n} \sum_{i=1}^{n} \max(0, 1 - y_i \cdot \hat{y}_i)$$
+
 Where:
 - \( n \) is the number of samples.
 - \( y_i \) is the actual class label (-1 or 1).
@@ -165,17 +174,16 @@ Huber loss is a combination of MSE and MAE, providing a compromise between the t
 The Huber loss is defined as:
 
 
-$$
-\text{Huber Loss} = \frac{1}{n} \sum_{i=1}^{n} \left\{
+$$\text{Huber Loss} = \frac{1}{n} \sum_{i=1}^{n} \left\{
 \begin{array}{ll}
 \frac{1}{2} (y_i - \hat{y}_i)^2 & \text{if } |y_i - \hat{y}_i| \leq \delta \\
 \delta(|y_i - \hat{y}_i| - \frac{1}{2} \delta) & \text{otherwise}
 \end{array}
-\right.
-$$
+\right.$$
+
 Where:
 - \( n \) is the number of samples.
-- \( \delta \) is a threshold parameter.
+- \(delta\) is a threshold parameter.
 
 **Advantages:**
 - Provides a smooth loss function.
@@ -200,8 +208,11 @@ def huber_loss(y_true, y_pred, delta):
 Log-Cosh loss is a smooth approximation of the MAE and is less sensitive to outliers than MSE. It provides a smooth transition from quadratic for small errors to linear for large errors.
 
 **Mathematical Formulation:**
+
 The Log-Cosh loss is defined as:
-$$ \text{Log-Cosh Loss} = \frac{1}{n} \sum_{i=1}^{n} \log(\cosh(y_i - \hat{y}_i)) $$
+
+$$\text{Log-Cosh Loss} = \frac{1}{n} \sum_{i=1}^{n} \log(\cosh(y_i - \hat{y}_i))$$
+
 Where:
 - \( n \) is the number of samples.
 
@@ -224,4 +235,4 @@ def logcosh_loss(y_true, y_pred):
 
 These implementations provide various options for cost functions suitable for different machine learning tasks. Each function has its advantages and disadvantages, making them suitable for different scenarios and problem domains.
 
---- 
\ No newline at end of file
+--- 

From c7746086b91d44369aa6f8e7f3c4a863cd655da2 Mon Sep 17 00:00:00 2001
From: Ankit Mahato <ankmahato@gmail.com>
Date: Sun, 2 Jun 2024 04:18:14 +0530
Subject: [PATCH 3/4] Rename Types_of_Cost_Functions.md to cost-functions.md

---
 .../{Types_of_Cost_Functions.md => cost-functions.md}             | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename contrib/machine-learning/{Types_of_Cost_Functions.md => cost-functions.md} (100%)

diff --git a/contrib/machine-learning/Types_of_Cost_Functions.md b/contrib/machine-learning/cost-functions.md
similarity index 100%
rename from contrib/machine-learning/Types_of_Cost_Functions.md
rename to contrib/machine-learning/cost-functions.md

From f125cf4a33de46d7f8fee1066a347551f71ea13a Mon Sep 17 00:00:00 2001
From: Ankit Mahato <ankmahato@gmail.com>
Date: Sun, 2 Jun 2024 04:26:45 +0530
Subject: [PATCH 4/4] Update cost-functions.md

---
 contrib/machine-learning/cost-functions.md | 41 ++++++++++------------
 1 file changed, 19 insertions(+), 22 deletions(-)

diff --git a/contrib/machine-learning/cost-functions.md b/contrib/machine-learning/cost-functions.md
index f6507268..c1fe2170 100644
--- a/contrib/machine-learning/cost-functions.md
+++ b/contrib/machine-learning/cost-functions.md
@@ -14,9 +14,9 @@ MSE is one of the most commonly used cost functions, particularly in regression
 The MSE is defined as:
 $$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$
 Where:
-- \( n \) is the number of samples.
-- \( y_i \) is the actual value.
-- \( y^i\) is the predicted value.
+- `n` is the number of samples.
+- $y_i$ is the actual value.
+- $\hat{y}_i$ is the predicted value.
 
 **Advantages:**
 - Sensitive to large errors due to squaring.
@@ -43,9 +43,9 @@ MAE is another commonly used cost function for regression tasks. It measures the
 The MAE is defined as:
 $$MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$
 Where:
-- \( n \) is the number of samples.
-- \( y_i \) is the actual value.
-- \( y^i\) is the predicted value.
+- `n` is the number of samples.
+- $y_i$ is the actual value.
+- $\hat{y}_i$ is the predicted value.
 
 **Advantages:**
 - Less sensitive to outliers compared to MSE.
@@ -76,9 +76,9 @@ For binary classification, the cross-entropy loss is defined as:
 $$\text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)]$$
 
 Where:
-- \( n \) is the number of samples.
-- \( y_i \) is the actual class label (0 or 1).
-- \( y^i\)  is the predicted probability of the positive class.
+- `n` is the number of samples.
+- $y_i$ is the actual class label (0 or 1).
+- $\hat{y}_i$  is the predicted probability of the positive class.
 
 
 **Advantages:**
@@ -109,11 +109,10 @@ The multiclass cross-entropy loss is defined as:
 $$\text{Cross-Entropy} = -\frac{1}{n} \sum_{i=1}^{n} \sum_{c=1}^{C} y_{i,c} \log(\hat{y}_{i,c})$$
 
 Where:
-- \( n \) is the number of samples.
-- \( C \) is the number of classes.
-- \( y_{i,c} \) is the indicator function for the true class of sample \( i \).
-
-- (y^i,c) is the predicted probability of sample \( i \) belonging to class \( c \).
+- `n` is the number of samples.
+- `C` is the number of classes.
+- $y_{i,c}$ is the indicator function for the true class of sample `i`.
+- $\hat{y}_{i,c}$ is the predicted probability of sample `i` belonging to class `c`.
 
 **Advantages:**
 - Handles multiple classes effectively.
@@ -143,9 +142,9 @@ For binary classification, the hinge loss is defined as:
 $$\text{Hinge Loss} = \frac{1}{n} \sum_{i=1}^{n} \max(0, 1 - y_i \cdot \hat{y}_i)$$
 
 Where:
-- \( n \) is the number of samples.
-- \( y_i \) is the actual class label (-1 or 1).
-- \( \hat{y}_i \) is the predicted score for sample \( i \).
+- `n` is the number of samples.
+- $y_i$ is the actual class label (-1 or 1).
+- $\hat{y}_i$ is the predicted score for sample \( i \).
 
 **Advantages:**
 - Encourages margin maximization in SVMs.
@@ -182,8 +181,8 @@ $$\text{Huber Loss} = \frac{1}{n} \sum_{i=1}^{n} \left\{
 \right.$$
 
 Where:
-- \( n \) is the number of samples.
-- \(delta\) is a threshold parameter.
+- `n` is the number of samples.
+- $\delta$ is a threshold parameter.
 
 **Advantages:**
 - Provides a smooth loss function.
@@ -214,7 +213,7 @@ The Log-Cosh loss is defined as:
 $$\text{Log-Cosh Loss} = \frac{1}{n} \sum_{i=1}^{n} \log(\cosh(y_i - \hat{y}_i))$$
 
 Where:
-- \( n \) is the number of samples.
+- `n` is the number of samples.
 
 **Advantages:**
 - Smooth and differentiable everywhere.
@@ -234,5 +233,3 @@ def logcosh_loss(y_true, y_pred):
 ```
 
 These implementations provide various options for cost functions suitable for different machine learning tasks. Each function has its advantages and disadvantages, making them suitable for different scenarios and problem domains.
-
----