0% found this document useful (0 votes)
16 views

loss-functions

Uploaded by

zo63toscrib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

loss-functions

Uploaded by

zo63toscrib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Loss Functions:

Comprehensive Notes
Introduction to Loss Functions
A loss function, also known as a cost function or objective function, measures the difference between predicted values
and actual values in a machine learning model. It quantifies how well a model performs and provides the basis for model
optimization.

Classification Loss Functions


1. Binary Cross-Entropy Loss (Log Loss)
Used for binary classification
Formula:

BCE(y, ŷ) = -[y log(ŷ) + (1-y)log(1-ŷ)]


where:
- y: true label (0 or 1)
- ŷ: predicted probability

Properties:

Range: [0, ∞)
Perfect prediction: 0
Heavily penalizes confident wrong predictions
Requires predicted probabilities

Implementation:

def binary_cross_entropy(y_true, y_pred):


epsilon = 1e-15 # Small constant to avoid log(0)
y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
return -np.mean(y_true * np.log(y_pred) +
(1 - y_true) * np.log(1 - y_pred))

2. Categorical Cross-Entropy Loss


Used for multi-class classification
Formula:

CCE(y, ŷ) = -Σᵢ yᵢlog(ŷᵢ)


where:
- yᵢ: true probability of class i
- ŷᵢ: predicted probability of class i

Properties:

Generalizes binary cross-entropy


Requires one-hot encoded labels
Suitable for softmax outputs
Range: [0, ∞)

Implementation:

def categorical_cross_entropy(y_true, y_pred):


epsilon = 1e-15
y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
return -np.sum(y_true * np.log(y_pred)) / y_true.shape[0]

3. Hinge Loss (Support Vector Machine Loss)


Used in SVMs and margin-based classifiers
Formula:

L(y, ŷ) = max(0, 1 - y * ŷ)
where:
- y: true label (-1 or 1)
- ŷ: predicted score

Properties:

Linear penalty for misclassification


Zero loss for correct predictions with margin
Range: [0, ∞)
Promotes sparsity

Regression Loss Functions


1. Mean Squared Error (MSE)
Most common regression loss
Formula:

MSE = (1/n)Σᵢ(yᵢ - ŷᵢ)²


where:
- n: number of samples
- yᵢ: true value
- ŷᵢ: predicted value

Properties:

Heavily penalizes large errors


Differentiable everywhere
Sensitive to outliers
Range: [0, ∞)

Implementation:

def mse_loss(y_true, y_pred):


return np.mean((y_true - y_pred) ** 2)

2. Mean Absolute Error (MAE)


Also called L1 Loss
Formula:

MAE = (1/n)Σᵢ|yᵢ - ŷᵢ|

Properties:

Less sensitive to outliers than MSE


Linear penalty
Not differentiable at zero
Range: [0, ∞)

Implementation:

def mae_loss(y_true, y_pred):


return np.mean(np.abs(y_true - y_pred))

3. Huber Loss
Combines MSE and MAE
Formula:

L(y, ŷ) = {
0.5(y - ŷ)² if |y - ŷ| ≤ δ
δ|y - ŷ| - 0.5δ² otherwise
}

Properties:

Robust to outliers
Differentiable everywhere
Adjustable sensitivity (δ)
Range: [0, ∞)

Implementation:

def huber_loss(y_true, y_pred, delta=1.0):


error = y_true - y_pred
is_small_error = np.abs(error) <= delta
squared_loss = 0.5 * error ** 2
linear_loss = delta * np.abs(error) - 0.5 * delta ** 2
return np.mean(np.where(is_small_error, squared_loss, linear_loss))

Specialized Loss Functions


1. Focal Loss
Modified cross-entropy for imbalanced data
Formula:

FL(p) = -α(1-p)ᵧ log(p)


where:
- α: balancing factor
- γ: focusing parameter
- p: predicted probability

Properties:

Down-weights easy examples


Focuses on hard negatives
Addresses class imbalance
Range: [0, ∞)
2. Contrastive Loss
Used in siamese networks
Formula:

L(y,d) = (1-y)½d² + y½max(0, m-d)²


where:
- y: binary label (1 for same class)
- d: distance between pairs
- m: margin

3. Triplet Loss
Used in metric learning
Formula:

L = max(d(a,p) - d(a,n) + margin, 0)


where:
- a: anchor
- p: positive
- n: negative

Custom Loss Functions


1. Creating Custom Losses

class CustomLoss:
def __init__(self, weights):
self.weights = weights

def __call__(self, y_true, y_pred):


return self.forward(y_true, y_pred)

def forward(self, y_true, y_pred):


# Custom loss computation
pass

def backward(self, y_true, y_pred):


# Gradient computation
pass
2. Combining Multiple Losses

def combined_loss(y_true, y_pred, alpha=0.5):


mse = mse_loss(y_true, y_pred)
mae = mae_loss(y_true, y_pred)
return alpha * mse + (1 - alpha) * mae

Loss Function Selection Guidelines


1. Classification Tasks
Binary: Binary Cross-Entropy
Multi-class: Categorical Cross-Entropy
Imbalanced: Focal Loss
Margin-based: Hinge Loss

2. Regression Tasks
General purpose: MSE
Outlier robust: MAE or Huber
Custom requirements: Combined loss

3. Special Cases
Metric learning: Contrastive/Triplet Loss
Generative models: Custom losses
Multi-task learning: Weighted combinations

Practical Considerations
1. Numerical Stability
Add small epsilon to logs
Clip prediction ranges
Use stable implementations
Monitor for NaN values

2. Scaling and Normalization


Normalize inputs
Scale targets appropriately
Consider batch statistics
Use appropriate initializations

3. Gradient Properties
Check gradients magnitude
Monitor gradient flow
Implement gradient clipping
Use appropriate optimizers

Best Practices
1. Loss Function Selection

Consider problem nature


Evaluate data distribution
Test multiple options
Validate assumptions

2. Implementation

Use stable implementations


Add proper testing
Monitor training
Implement early stopping

3. Debugging

Verify loss values


Check gradients
Monitor convergence
Validate predictions

Common Issues and Solutions


1. Vanishing Gradients
Use appropriate activation functions
Implement gradient clipping
Consider loss scaling
Monitor gradient flow

2. Exploding Gradients
Clip gradient norms
Scale loss appropriately
Use stable implementations
Monitor loss values

3. Class Imbalance
Use weighted losses
Implement focal loss
Balance dataset
Adjust class weights

Conclusion
Choosing and implementing appropriate loss functions is crucial for:

1. Model performance
2. Training stability
3. Convergence speed
4. Robustness to outliers
5. Handling specific problem requirements

You might also like