From 97f852c99616cab741e8c06469f239bd34b74dd6 Mon Sep 17 00:00:00 2001 From: rohit Date: Fri, 31 May 2024 10:26:09 +0530 Subject: [PATCH 1/2] Added sklearn.md file --- contrib/machine-learning/index.md | 1 + contrib/machine-learning/sklearn.md | 144 ++++++++++++++++++++++++++++ 2 files changed, 145 insertions(+) create mode 100644 contrib/machine-learning/sklearn.md diff --git a/contrib/machine-learning/index.md b/contrib/machine-learning/index.md index 46100dfb..68bd24d2 100644 --- a/contrib/machine-learning/index.md +++ b/contrib/machine-learning/index.md @@ -10,3 +10,4 @@ - [PyTorch.md](pytorch.md) - [Types of optimizers](Types_of_optimizers.md) - [Logistic Regression](logistic-regression.md) +- [sklearn.md](sklearn.md) diff --git a/contrib/machine-learning/sklearn.md b/contrib/machine-learning/sklearn.md new file mode 100644 index 00000000..8174d938 --- /dev/null +++ b/contrib/machine-learning/sklearn.md @@ -0,0 +1,144 @@ +# scikit-learn (sklearn) Python Library + +## Overview + +scikit-learn, also known as sklearn, is a popular open-source Python library that provides simple and efficient tools for data mining and data analysis. It is built on NumPy, SciPy, and matplotlib. The library is designed to interoperate with the Python numerical and scientific libraries. + +## Key Features + +- **Classification**: Identifying which category an object belongs to. Example algorithms include SVM, nearest neighbors, random forest. +- **Regression**: Predicting a continuous-valued attribute associated with an object. Example algorithms include support vector regression (SVR), ridge regression, Lasso. +- **Clustering**: Automatic grouping of similar objects into sets. Example algorithms include k-means, spectral clustering, mean-shift. +- **Dimensionality Reduction**: Reducing the number of random variables to consider. Example algorithms include PCA, feature selection, non-negative matrix factorization. +- **Model Selection**: Comparing, validating, and choosing parameters and models. Example methods include grid search, cross-validation, metrics. +- **Preprocessing**: Feature extraction and normalization. + +## When to Use scikit-learn + +- **Use scikit-learn if**: + - You are working on machine learning tasks such as classification, regression, clustering, dimensionality reduction, model selection, and preprocessing. + - You need an easy-to-use, well-documented library. + - You require tools that are compatible with NumPy and SciPy. + +- **Do not use scikit-learn if**: + - You need to perform deep learning tasks. In such cases, consider using TensorFlow or PyTorch. + - You need out-of-the-box support for large-scale data. scikit-learn is designed to work with in-memory data, so for very large datasets, you might want to consider libraries like Dask-ML. + +## Installation + +You can install scikit-learn using pip: + +```bash +pip install scikit-learn +``` + +Or via conda: + +```bash +conda install scikit-learn +``` + +## Basic Usage with Code Snippets + +### Importing the Library + +```python +import numpy as np +from sklearn.model_selection import train_test_split +from sklearn.preprocessing import StandardScaler +from sklearn.linear_model import LogisticRegression +from sklearn.metrics import accuracy_score +``` + +### Loading Data + +For illustration, let's create a simple synthetic dataset: + +```python +from sklearn.datasets import make_classification + +X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42) +``` + +### Splitting Data + +Split the dataset into training and testing sets: + +```python +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) +``` + +### Preprocessing + +Standardizing the features: + +```python +scaler = StandardScaler() +X_train = scaler.fit_transform(X_train) +X_test = scaler.transform(X_test) +``` + +### Training a Model + +Train a Logistic Regression model: + +```python +model = LogisticRegression() +model.fit(X_train, y_train) +``` + +### Making Predictions + +Make predictions on the test set: + +```python +y_pred = model.predict(X_test) +``` + +### Evaluating the Model + +Evaluate the accuracy of the model: + +```python +accuracy = accuracy_score(y_test, y_pred) +print(f"Accuracy: {accuracy * 100:.2f}%") +``` + +### Putting it All Together + +Here is a complete example from data loading to model evaluation: + +```python +import numpy as np +from sklearn.datasets import make_classification +from sklearn.model_selection import train_test_split +from sklearn.preprocessing import StandardScaler +from sklearn.linear_model import LogisticRegression +from sklearn.metrics import accuracy_score + +# Load data +X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42) + +# Split data +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) + +# Preprocess data +scaler = StandardScaler() +X_train = scaler.fit_transform(X_train) +X_test = scaler.transform(X_test) + +# Train model +model = LogisticRegression() +model.fit(X_train, y_train) + +# Make predictions +y_pred = model.predict(X_test) + +# Evaluate model +accuracy = accuracy_score(y_test, y_pred) +print(f"Accuracy: {accuracy * 100:.2f}%") +``` + +## Conclusion + +scikit-learn is a powerful and versatile library that can be used for a wide range of machine learning tasks. It is particularly well-suited for beginners due to its easy-to-use interface and extensive documentation. Whether you are working on a simple classification task or a more complex clustering problem, scikit-learn provides the tools you need to build and evaluate your models effectively. \ No newline at end of file From d6e610d8cc3505c806687db1d4dfb9ad0f4f5e66 Mon Sep 17 00:00:00 2001 From: Ankit Mahato Date: Sun, 2 Jun 2024 04:03:36 +0530 Subject: [PATCH 2/2] Rename sklearn.md to sklearn-introduction.md --- .../machine-learning/{sklearn.md => sklearn-introduction.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename contrib/machine-learning/{sklearn.md => sklearn-introduction.md} (99%) diff --git a/contrib/machine-learning/sklearn.md b/contrib/machine-learning/sklearn-introduction.md similarity index 99% rename from contrib/machine-learning/sklearn.md rename to contrib/machine-learning/sklearn-introduction.md index 8174d938..7bb5aa8d 100644 --- a/contrib/machine-learning/sklearn.md +++ b/contrib/machine-learning/sklearn-introduction.md @@ -141,4 +141,4 @@ print(f"Accuracy: {accuracy * 100:.2f}%") ## Conclusion -scikit-learn is a powerful and versatile library that can be used for a wide range of machine learning tasks. It is particularly well-suited for beginners due to its easy-to-use interface and extensive documentation. Whether you are working on a simple classification task or a more complex clustering problem, scikit-learn provides the tools you need to build and evaluate your models effectively. \ No newline at end of file +scikit-learn is a powerful and versatile library that can be used for a wide range of machine learning tasks. It is particularly well-suited for beginners due to its easy-to-use interface and extensive documentation. Whether you are working on a simple classification task or a more complex clustering problem, scikit-learn provides the tools you need to build and evaluate your models effectively.