Skip to content

Commit 97f852c

Browse files
committed
Added sklearn.md file
1 parent 3f999a6 commit 97f852c

File tree

2 files changed

+145
-0
lines changed

2 files changed

+145
-0
lines changed

contrib/machine-learning/index.md

+1
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,4 @@
1010
- [PyTorch.md](pytorch.md)
1111
- [Types of optimizers](Types_of_optimizers.md)
1212
- [Logistic Regression](logistic-regression.md)
13+
- [sklearn.md](sklearn.md)

contrib/machine-learning/sklearn.md

+144
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# scikit-learn (sklearn) Python Library
2+
3+
## Overview
4+
5+
scikit-learn, also known as sklearn, is a popular open-source Python library that provides simple and efficient tools for data mining and data analysis. It is built on NumPy, SciPy, and matplotlib. The library is designed to interoperate with the Python numerical and scientific libraries.
6+
7+
## Key Features
8+
9+
- **Classification**: Identifying which category an object belongs to. Example algorithms include SVM, nearest neighbors, random forest.
10+
- **Regression**: Predicting a continuous-valued attribute associated with an object. Example algorithms include support vector regression (SVR), ridge regression, Lasso.
11+
- **Clustering**: Automatic grouping of similar objects into sets. Example algorithms include k-means, spectral clustering, mean-shift.
12+
- **Dimensionality Reduction**: Reducing the number of random variables to consider. Example algorithms include PCA, feature selection, non-negative matrix factorization.
13+
- **Model Selection**: Comparing, validating, and choosing parameters and models. Example methods include grid search, cross-validation, metrics.
14+
- **Preprocessing**: Feature extraction and normalization.
15+
16+
## When to Use scikit-learn
17+
18+
- **Use scikit-learn if**:
19+
- You are working on machine learning tasks such as classification, regression, clustering, dimensionality reduction, model selection, and preprocessing.
20+
- You need an easy-to-use, well-documented library.
21+
- You require tools that are compatible with NumPy and SciPy.
22+
23+
- **Do not use scikit-learn if**:
24+
- You need to perform deep learning tasks. In such cases, consider using TensorFlow or PyTorch.
25+
- You need out-of-the-box support for large-scale data. scikit-learn is designed to work with in-memory data, so for very large datasets, you might want to consider libraries like Dask-ML.
26+
27+
## Installation
28+
29+
You can install scikit-learn using pip:
30+
31+
```bash
32+
pip install scikit-learn
33+
```
34+
35+
Or via conda:
36+
37+
```bash
38+
conda install scikit-learn
39+
```
40+
41+
## Basic Usage with Code Snippets
42+
43+
### Importing the Library
44+
45+
```python
46+
import numpy as np
47+
from sklearn.model_selection import train_test_split
48+
from sklearn.preprocessing import StandardScaler
49+
from sklearn.linear_model import LogisticRegression
50+
from sklearn.metrics import accuracy_score
51+
```
52+
53+
### Loading Data
54+
55+
For illustration, let's create a simple synthetic dataset:
56+
57+
```python
58+
from sklearn.datasets import make_classification
59+
60+
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
61+
```
62+
63+
### Splitting Data
64+
65+
Split the dataset into training and testing sets:
66+
67+
```python
68+
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
69+
```
70+
71+
### Preprocessing
72+
73+
Standardizing the features:
74+
75+
```python
76+
scaler = StandardScaler()
77+
X_train = scaler.fit_transform(X_train)
78+
X_test = scaler.transform(X_test)
79+
```
80+
81+
### Training a Model
82+
83+
Train a Logistic Regression model:
84+
85+
```python
86+
model = LogisticRegression()
87+
model.fit(X_train, y_train)
88+
```
89+
90+
### Making Predictions
91+
92+
Make predictions on the test set:
93+
94+
```python
95+
y_pred = model.predict(X_test)
96+
```
97+
98+
### Evaluating the Model
99+
100+
Evaluate the accuracy of the model:
101+
102+
```python
103+
accuracy = accuracy_score(y_test, y_pred)
104+
print(f"Accuracy: {accuracy * 100:.2f}%")
105+
```
106+
107+
### Putting it All Together
108+
109+
Here is a complete example from data loading to model evaluation:
110+
111+
```python
112+
import numpy as np
113+
from sklearn.datasets import make_classification
114+
from sklearn.model_selection import train_test_split
115+
from sklearn.preprocessing import StandardScaler
116+
from sklearn.linear_model import LogisticRegression
117+
from sklearn.metrics import accuracy_score
118+
119+
# Load data
120+
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
121+
122+
# Split data
123+
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
124+
125+
# Preprocess data
126+
scaler = StandardScaler()
127+
X_train = scaler.fit_transform(X_train)
128+
X_test = scaler.transform(X_test)
129+
130+
# Train model
131+
model = LogisticRegression()
132+
model.fit(X_train, y_train)
133+
134+
# Make predictions
135+
y_pred = model.predict(X_test)
136+
137+
# Evaluate model
138+
accuracy = accuracy_score(y_test, y_pred)
139+
print(f"Accuracy: {accuracy * 100:.2f}%")
140+
```
141+
142+
## Conclusion
143+
144+
scikit-learn is a powerful and versatile library that can be used for a wide range of machine learning tasks. It is particularly well-suited for beginners due to its easy-to-use interface and extensive documentation. Whether you are working on a simple classification task or a more complex clustering problem, scikit-learn provides the tools you need to build and evaluate your models effectively.

0 commit comments

Comments
 (0)