Machine Learning Notes

Uploaded by

Diksha Manchanda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Machine Learning Notes

Uploaded by

Diksha Manchanda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

MACHINE LEARNING MATERIAL

Basic libraries required: pandas, Data Visualisation: Enables to

matplotlib.pyplot, seaborn, numpy understand features and their
few data samples with head() method. relationship among themselves and with
use info() method to get quick output label.
description of data. Relationship between features: Standard
to understand nature of numeric correlation coefficient between features.
attribites, we use describe() method. Part 1: Feature Extraction KNNImputer: Uses k-nearest neighbours
When we look at the test set, we are DictVectorizer: Converts lists of approach to fill missing values in a
likely to notice patterns in that and mappings of feature name and feature dataset. The missing value of an
based on that we may select certain value, into a matrix. attribute in a specific example is filled
models. This leads to biased estimation with the mean value of the same
on test set, which may not generalize attribute of n_neighbors closest
well in practice. This is called data neighbors. The nearest neighbours are
snooping bias. decided based on
Scikit-Learn provides a few functions for Euclidean distance.
creating test sets based on Random
sampling, which randomly selects k%
points in the test set. Stratified FeatureHasher: High-speed, low-
sampling, which samples test examples memory vectorizer that uses feature
such that they are representative of hashing technique. Instead of building a
overall distribution. hash table of the features, as the
Random Sampling: train_test_split() vectorizers do, it applies a hash function
function performs random sampling to the features to determine their
with random_state parameter to set the column index in sample matrices
random seed, which ensures that the directly. This results in increased speed
same examples are selected for test sets and reduced memory usage, at the
across runs. test_size parameter for expense of inspectability; the hasher
specifying size of the test set. shuffle does not remember what the input
flag to specify if the data needs to be features looked like and has no
shuffled before splitting. Provision for inverse_transform method. Output of
processing multiple datasets with an this transformer is scipy.sparse matrix.
identical number of rows and selecting sklearn.feature_extraction.image.* has
the same indices from these datasets. useful APIs to extract features from
Useful when labels are in different image data.
dataframe. sklearn.feature_extraction.text.* has
useful APIs to extract features from text
data.
Part 2: Data Cleaning
Handling Missing Values
Missing values occur due to errors in
data capture such as sensor
malfunctioning, measurement errors
etc. Many ML algorithms do not work
with missing data and need all features
to be present. Discarding records
containing missing values would result
in loss of valuable training samples.
sklearn.impute API provides
functionality to fill missing
values in a dataset.
Stratified Sampling: Data distribution MissingIndicator provides indicators for
may not be uniform in real world data. missing values.
Random sampling - by its nature - SimpleImputer: Fills missing values with
one of the following strategies: 'mean', Marking imputed values
introduces biases in such data sets. It is useful to indicate the presence of
How do we sample: We divide the 'median', 'most_frequent' and
'constant'. missing values in the dataset.
population into homogenous groups MissingIndicator helps us get those
called strata. Data is sampled from each indications. It returns a binary matrix,
stratum so as to match it with the True values correspond to missing
overall data distribution. Scikit-Learn entries in original dataset.
provides a class StratifiedShuffleSplit Numeric Transformers
that helps us in stratified sampling. Feature Scaling:
Numerical features with different scales a feature which has same value, i.e. zero
leads to slower convergence of iterative variance.
optimization procedures. It is a good
practice to scale numerical features so
that all of them are on the same scale. LabelEncoder: Encodes target labels
1. StandardScaler: with value between 0 and K-1, where K
is number of distinct values.

OrdinalEncoder: Encodes categorical

2. MinMaxScaler: features with value between 0 and
K − 1, where K is number of distinct
values.

OrdinalEncoder can operate multi

dimensional data, while LabelEncoder
can transform only 1D data.
3. MaxAbsScaler: LabelBinarizer: Several regression and
binary classification can be extended
to multi-class setup in one-vs-all fashion.
This involves training a single regressor
or classifier per class. For this, we need
to convert multi-class labels to binary
labels, and LabelBinarizer performs this
task.

FunctionTransformer:

If estimator supports multiclass data,

LabelBinarizer is not needed.
MultiLabelBinarizer: Encodes categorical
features with value between 0 and
K − 1, where K is number of classes.

Polynomial transformation:

Add_dummy_feature: Augments dataset

with a column vector, each value in
the column vector is 1.

KBinsDiscretizer:
PART 3: FEATURE SELECTION
The features that do not contribute
significantly, can be removed. It leads to
decrease in size of the dataset and
hence, the computation cost of fitting a
model.
Filter Based
Categorical Transformers Removing features with low variance
OneHotEncoder: Encodes categorical Variance Threshold: Removes all
feature or label as a one-hot numeric features with variance below a certain
array. Creates one binary column for threshold, as specified by the user, from
each of K unique values. Exactly one input feature matrix. By default removes
column has 1 in it and rest have 0.

Electrical Installation Calculations Volume 1: V. 1: by A. J. Watkins, Chris Kitcher, Christopher James Kitcher
No ratings yet
Electrical Installation Calculations Volume 1: V. 1: by A. J. Watkins, Chris Kitcher, Christopher James Kitcher
1 page
Machine Learning Algorithms PDF
100% (1)
Machine Learning Algorithms PDF
148 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
Scikit Hca
No ratings yet
Scikit Hca
8 pages
MLP Week 2 Slides
No ratings yet
MLP Week 2 Slides
82 pages
MODELS (AutoRecovered)
No ratings yet
MODELS (AutoRecovered)
9 pages
Feature and Feature Extractionlect2
No ratings yet
Feature and Feature Extractionlect2
28 pages
Northbay Summarizes Data Pre-Processing Algorithms
No ratings yet
Northbay Summarizes Data Pre-Processing Algorithms
10 pages
Image Classification
No ratings yet
Image Classification
18 pages
PPA Data Preparation
No ratings yet
PPA Data Preparation
31 pages
Scikit Learn
No ratings yet
Scikit Learn
17 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
Final ML
No ratings yet
Final ML
2 pages
Intro To Scikit Learning
No ratings yet
Intro To Scikit Learning
18 pages
Lecture Material 10
No ratings yet
Lecture Material 10
9 pages
1 An Introduction To Machine Learning With Scikit Learn
No ratings yet
1 An Introduction To Machine Learning With Scikit Learn
2 pages
ML Unit 2
No ratings yet
ML Unit 2
41 pages
Machine Learning Practice
No ratings yet
Machine Learning Practice
17 pages
ML unit 3
No ratings yet
ML unit 3
17 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
04_MLModelingBasics
No ratings yet
04_MLModelingBasics
61 pages
Data Pre-Processing Python For Beginner
No ratings yet
Data Pre-Processing Python For Beginner
12 pages
Data Pre-Processing Python For Beginner
No ratings yet
Data Pre-Processing Python For Beginner
12 pages
Feature Engineering For Machine Learning
No ratings yet
Feature Engineering For Machine Learning
41 pages
Feature Engineering: Getting The Most Out of Data For Predictive Models
No ratings yet
Feature Engineering: Getting The Most Out of Data For Predictive Models
75 pages
Lecture 5 - Feature extraction, model building & evaluation
No ratings yet
Lecture 5 - Feature extraction, model building & evaluation
35 pages
Featureengineering 171206213206
No ratings yet
Featureengineering 171206213206
45 pages
ML Unit 2
No ratings yet
ML Unit 2
33 pages
Maxbox Starter60 Machine Learning
No ratings yet
Maxbox Starter60 Machine Learning
8 pages
ML Notes
No ratings yet
ML Notes
79 pages
som
No ratings yet
som
19 pages
Maxbox - Starter67 Machine Learning
No ratings yet
Maxbox - Starter67 Machine Learning
7 pages
Feature Engineering PDF
100% (1)
Feature Engineering PDF
75 pages
Assignment1_LATEX
No ratings yet
Assignment1_LATEX
11 pages
Human Activities Classifier Using SVM
No ratings yet
Human Activities Classifier Using SVM
19 pages
Kabir Data Preprocessing Python
No ratings yet
Kabir Data Preprocessing Python
14 pages
Mini Project With Output
No ratings yet
Mini Project With Output
8 pages
Unit 6aics
No ratings yet
Unit 6aics
25 pages
The Implication of Statistical Analysis and Feature Engineering For Model Building Using Machine Learning Algorithms
No ratings yet
The Implication of Statistical Analysis and Feature Engineering For Model Building Using Machine Learning Algorithms
11 pages
Python Scikit-Learn Cheat Sheet For Machine Learning
No ratings yet
Python Scikit-Learn Cheat Sheet For Machine Learning
3 pages
Feature Engg Pre Processing Python
No ratings yet
Feature Engg Pre Processing Python
68 pages
Data Mining
No ratings yet
Data Mining
33 pages
Scikit-Learn: Scikit-Learn Is An Open Source Python Library That
100% (1)
Scikit-Learn: Scikit-Learn Is An Open Source Python Library That
1 page
Scikit Learn
No ratings yet
Scikit Learn
25 pages
Class PPT - Unit2
No ratings yet
Class PPT - Unit2
139 pages
Ml Lab Manual Completed
No ratings yet
Ml Lab Manual Completed
56 pages
Exp2 - Data Visualization and Cleaning and Feature Selection
No ratings yet
Exp2 - Data Visualization and Cleaning and Feature Selection
13 pages
MACHINE LEARNING manual
No ratings yet
MACHINE LEARNING manual
36 pages
Data Pre Processing
No ratings yet
Data Pre Processing
2 pages
ML Inter Q&A
No ratings yet
ML Inter Q&A
54 pages
Machine Learning (ML)
No ratings yet
Machine Learning (ML)
35 pages
Unit 2 MLMM
No ratings yet
Unit 2 MLMM
41 pages
machinelearning
No ratings yet
machinelearning
26 pages
06 - Data Preprocessing
No ratings yet
06 - Data Preprocessing
68 pages
MLLabManual
No ratings yet
MLLabManual
24 pages
Pytorch (Tabular) - Regression
No ratings yet
Pytorch (Tabular) - Regression
13 pages
Lab 08 - Data Preprocessing
No ratings yet
Lab 08 - Data Preprocessing
9 pages
Exercise and Experiment 3
No ratings yet
Exercise and Experiment 3
14 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
G9 Activity 3B
No ratings yet
G9 Activity 3B
2 pages
Exp 0009
No ratings yet
Exp 0009
23 pages
Principles of Management: The Friday Cinema
No ratings yet
Principles of Management: The Friday Cinema
22 pages
Plant Leaf Disease Detection Using CNN
No ratings yet
Plant Leaf Disease Detection Using CNN
12 pages
Traffic Engineering
No ratings yet
Traffic Engineering
18 pages
Xvision 525 240223
No ratings yet
Xvision 525 240223
2 pages
Audoe (18) +3+2022 170
No ratings yet
Audoe (18) +3+2022 170
17 pages
Fitting
No ratings yet
Fitting
16 pages
Ad 19 NPSH
No ratings yet
Ad 19 NPSH
14 pages
PATHFit-4-volleyball - BEED
No ratings yet
PATHFit-4-volleyball - BEED
8 pages
RSW7420 - Alasund Shipbrokers LTD
No ratings yet
RSW7420 - Alasund Shipbrokers LTD
5 pages
Ben Tettmar Feedback
No ratings yet
Ben Tettmar Feedback
2 pages
African Youth Survey 2020
No ratings yet
African Youth Survey 2020
112 pages
Most Important Innovations/Inventions in The 21 Century
No ratings yet
Most Important Innovations/Inventions in The 21 Century
48 pages
Historical Grammar of The Visual Arts by Aloïs Riegl Jacqueline E. Jung Benjamin Binstock
No ratings yet
Historical Grammar of The Visual Arts by Aloïs Riegl Jacqueline E. Jung Benjamin Binstock
4 pages
Welcome To Potential Failure Modes AND Effect Analysis Process FMEA-4 Edition
No ratings yet
Welcome To Potential Failure Modes AND Effect Analysis Process FMEA-4 Edition
68 pages
Detection of Fraud Statement Based On Word Vector Evidence From Financial Companies in China - ScienceDirect
No ratings yet
Detection of Fraud Statement Based On Word Vector Evidence From Financial Companies in China - ScienceDirect
9 pages
CorporCorporate Presentation Liugong Machinery Europe 2014ate Presentation Liugong Machinery Europe 2014
100% (2)
CorporCorporate Presentation Liugong Machinery Europe 2014ate Presentation Liugong Machinery Europe 2014
43 pages
Cfa Module 4 GR 3 10 15 15
No ratings yet
Cfa Module 4 GR 3 10 15 15
10 pages
MO T6 Prob
No ratings yet
MO T6 Prob
3 pages
Curriculum Vitae and Resume
No ratings yet
Curriculum Vitae and Resume
9 pages
Operations Management MRP
No ratings yet
Operations Management MRP
32 pages
Swans D300 Active Speaker Manual: Designed by Hivi Acoustics, Inc
No ratings yet
Swans D300 Active Speaker Manual: Designed by Hivi Acoustics, Inc
6 pages
20bm1t01 - Denm
No ratings yet
20bm1t01 - Denm
2 pages
20240923_Silfen Glasberg & Shannon_2010
No ratings yet
20240923_Silfen Glasberg & Shannon_2010
246 pages
Thank You, M'am by Langston Hughes: Pudtol Vocational High School
No ratings yet
Thank You, M'am by Langston Hughes: Pudtol Vocational High School
4 pages
(COR 020) Study Notes - First Achievement Test
No ratings yet
(COR 020) Study Notes - First Achievement Test
2 pages
Assignment 1 Python
No ratings yet
Assignment 1 Python
1 page
Ec-Tds Analyser - CM 183 ELICO.: 1) Works Instructions
No ratings yet
Ec-Tds Analyser - CM 183 ELICO.: 1) Works Instructions
3 pages