|
| 1 | +# Logistic Regression |
| 2 | + |
| 3 | +Logistic Regression is a statistical method used for binary classification problems. It is a type of regression analysis where the dependent variable is categorical. This README provides an overview of logistic regression, including its fundamental concepts, assumptions, and how to implement it using Python. |
| 4 | + |
| 5 | +## Table of Contents |
| 6 | + |
| 7 | +1. [Introduction](#introduction) |
| 8 | +2. [Concepts](#concepts) |
| 9 | +3. [Assumptions](#assumptions) |
| 10 | +4. [Implementation](#implementation) |
| 11 | + - [Using Scikit-learn](#using-scikit-learn) |
| 12 | + - [Code Example](#code-example) |
| 13 | +5. [Evaluation Metrics](#evaluation-metrics) |
| 14 | +6. [Conclusion](#conclusion) |
| 15 | +7. [References](#references) |
| 16 | + |
| 17 | +## Introduction |
| 18 | + |
| 19 | +Logistic Regression is used to model the probability of a binary outcome based on one or more predictor variables (features). It is widely used in various fields such as medical research, social sciences, and machine learning for tasks such as spam detection, fraud detection, and predicting user behavior. |
| 20 | + |
| 21 | +## Concepts |
| 22 | + |
| 23 | +### Sigmoid Function |
| 24 | + |
| 25 | +The logistic regression model uses the sigmoid function to map predicted values to probabilities. The sigmoid function is defined as: |
| 26 | + |
| 27 | +$$ |
| 28 | +\sigma(z) = \frac{1}{1 + e^{-z}} |
| 29 | +$$ |
| 30 | + |
| 31 | +Where \( z \) is a linear combination of the input features. |
| 32 | + |
| 33 | +### Odds and Log-Odds |
| 34 | + |
| 35 | +- **Odds**: The odds represent the ratio of the probability of an event occurring to the probability of it not occurring. |
| 36 | + |
| 37 | +$$\text{Odds} = \frac{P(Y=1)}{P(Y=0)}$$ |
| 38 | + |
| 39 | +- **Log-Odds**: The log-odds is the natural logarithm of the odds. |
| 40 | + |
| 41 | + $$\text{Log-Odds} = \log \left( \frac{P(Y=1)}{P(Y=0)} \right)$$ |
| 42 | + |
| 43 | +Logistic regression models the log-odds as a linear combination of the input features. |
| 44 | + |
| 45 | +### Model Equation |
| 46 | + |
| 47 | +The logistic regression model equation is: |
| 48 | + |
| 49 | +$$ |
| 50 | +\log \left( \frac{P(Y=1)}{P(Y=0)} \right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_n X_n |
| 51 | +$$ |
| 52 | + |
| 53 | +Where: |
| 54 | +- β₀ is the intercept. |
| 55 | +- β<sub>i</sub> are the coefficients for the predictor variables X<sub>i</sub>. |
| 56 | + |
| 57 | + |
| 58 | +## Assumptions |
| 59 | + |
| 60 | +1. **Linearity**: The log-odds of the response variable are a linear combination of the predictor variables. |
| 61 | +2. **Independence**: Observations should be independent of each other. |
| 62 | +3. **No Multicollinearity**: Predictor variables should not be highly correlated with each other. |
| 63 | +4. **Large Sample Size**: Logistic regression requires a large sample size to provide reliable results. |
| 64 | + |
| 65 | +## Implementation |
| 66 | + |
| 67 | +### Using Scikit-learn |
| 68 | + |
| 69 | +Scikit-learn is a popular machine learning library in Python that provides tools for logistic regression. |
| 70 | + |
| 71 | +### Code Example |
| 72 | + |
| 73 | +```python |
| 74 | +import numpy as np |
| 75 | +import pandas as pd |
| 76 | +from sklearn.model_selection import train_test_split |
| 77 | +from sklearn.linear_model import LogisticRegression |
| 78 | +from sklearn.metrics import accuracy_score, confusion_matrix, classification_report |
| 79 | + |
| 80 | +# Load dataset |
| 81 | +data = pd.read_csv('path/to/your/dataset.csv') |
| 82 | + |
| 83 | +# Define features and target variable |
| 84 | +X = data[['feature1', 'feature2', 'feature3']] |
| 85 | +y = data['target'] |
| 86 | + |
| 87 | +# Split data into training and testing sets |
| 88 | +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) |
| 89 | + |
| 90 | +# Initialize and train logistic regression model |
| 91 | +model = LogisticRegression() |
| 92 | +model.fit(X_train, y_train) |
| 93 | + |
| 94 | +# Make predictions |
| 95 | +y_pred = model.predict(X_test) |
| 96 | + |
| 97 | +# Evaluate the model |
| 98 | +accuracy = accuracy_score(y_test, y_pred) |
| 99 | +conf_matrix = confusion_matrix(y_test, y_pred) |
| 100 | +class_report = classification_report(y_test, y_pred) |
| 101 | + |
| 102 | +print("Accuracy:", accuracy) |
| 103 | +print("Confusion Matrix:\n", conf_matrix) |
| 104 | +print("Classification Report:\n", class_report) |
| 105 | +``` |
| 106 | + |
| 107 | +## Evaluation Metrics |
| 108 | + |
| 109 | +- **Accuracy**: The proportion of correctly classified instances among all instances. |
| 110 | +- **Confusion Matrix**: A table showing the number of true positives, true negatives, false positives, and false negatives. |
| 111 | +- **Precision, Recall, and F1-Score**: Metrics to evaluate the performance of the classification model. |
| 112 | + |
| 113 | +## Conclusion |
| 114 | + |
| 115 | +Logistic regression is a fundamental classification technique that is easy to implement and interpret. It is a powerful tool for binary classification problems and provides a probabilistic framework for predicting binary outcomes. |
0 commit comments