2 Getting Started
2 Getting Started
2 Getting Started
Getting Started
CS771: Introduction to Machine Learning
Purushottam Kar
CS771: Intro to ML
2
An overview of ML
Study similarity of ML with a student preparing for an exam
Look at a toy ML problem Warning: lots of
oversimplifications
Learn what is training data, test data? ahead!
Learn what is a model?
CS771: Intro to ML
A typical study cycle (e.g. in a course) 3
TEST TEST TEST
Preparation
Practice Tests/Study
Material
Actual Exam
EXAM
Subject
Pass / Fail Knowledge
CS771: Intro to ML
4
Spam Filtering
Suppose Mary has already tagged several old emails as
spam/non-spam, can we tag her new emails too?
Trick: use the old tagged emails to try and understand what sort
of emails does Mary think of as spam and which as non-spam!
E.g. may find that emails about shopping always tagged as spam
E.g. may find that emails from Jill are never tagged as spam
These insights/patterns are what are stored in the spam filter
Our spam filter helps us make predictions on new emails
CS771: Intro to ML
A typical ML workflow 5
Ah! That is the fun (and artistic) bit
ML
about ML. We will learn tons of
Algorithm
ways on how ML algos store
patterns
Old Emails The spam filter stores
information about your personal
preferences about what looks
New Emails spam to you
How is this
information stored?
Spam
Spam/Non-spam Filter
CS771: Intro to ML
A typical ML workflow 6
ML
Algorithm
Training Data
Test Data
Model
Output
CS771: Intro to ML
ML as an “examination” 7
Our brain stores subject matter The model stores data patterns
Use subject matter to solve exam Use model to predict on test data
Critical to do well on exam-day Critical to do well on test data
Mock test results indicative Training accuracies indicative
No out-of syllabus questions Training/test data are similar
Should not leak exam paper Should not look at test data while
before exam training CS771: Intro to ML
A typical ML workflow 8
CS771: Intro to ML
ML can do lots of cool things with test data9
In this course, we will learn how to do
Test Data
most of these operations with test data
Regression Multi-classification
URGENT,
Clustering Tagging OFFICIAL,
TAX
Ranking
Binary Classification
¿ ¿ ¿ CS771: Intro to ML
A typical ML workflow 10
CS771: Intro to ML
11
ML can take in lots of kinds of training data
We wont be able to cover all these training settings
in this course – there are entire courses devoted to
specific training settings e.g. CS773 (Online
Learning)
Training Data
CS771: Intro to ML
A typical ML workflow 12
CS771: Intro to ML
ML can store info in lots of innovative ways 13
ML Models and Algorithms
Linear/Opt Local
We can mix-n-match Neural/Deep
these methods too e.g.
Bayesian Deep We will learn how to use each of
Learning or Kernel these techniques in the course, but a
Nearest Neighbours
(Local) Probabilistic/Bayesian
bit briefly. As before, there are entire
Kernel courses devoted to each technique
e.g. CS772 (Prob ML), CS774 (Opt)
Correct! But we will not be able to
cover such advanced methods either
CS771: Intro to ML
14
Fantastic Features
… and how to find them
What are vectors
How are vectors used in ML
Useful operations on vectors
CS771: Intro to ML
What Weare features
could have – but it does not carry
much information about spam/non-
Why did we not keep
the word “to” as a
feature?
15
Features are a way for us to give input to ML algorithms
spam since it is such a common
Most ML algorithmsword! use numerical vectors to represent features
Guys, something is wrong with our feature. The
For example, in spam classification,
word “do” meansevery old
different email
things (training
in two emails data)
as well as every new email (test data) must be converted into a
vector Do You Want Go Million Dollars Dinner Today
Categorical Features
A B AB O
BTECH MTECH MS PHD
Relational Features
¿ ¿ ¿
Ranking in Class
Social Media
CS771: Intro to ML
Derived Features
As we saw, the bag-of-words (also called
18
Bagged/binned features BoW) is not always a very good feature (it
confuses polysemous words). However, it
ESC101 (A), ESO207 (B), CS220 (B),
CS340 (C), MSO201 (A), CS771 (A)
is still extremely popular due to its
A B C D E F
simplicity
Pooled/aggregated features
ESC101 (A), ESO207 (B), CS220 (B), 10 (max), 8.67 (avg)
CS340 (C), MSO201 (A), CS771 (A)
The way we represented
emails in the spam
problem is called the
bag-of-words feature
CS771: Intro to ML
Feature Selection?? Useful for predicting
19
expenditure
DIET (VEG/NON-VEG) Not useful for predicting expenditure
EYE COLOR
TOTAL INCOME
EDUCATION LEVEL
FAMILY SIZE
HOUSE NO (ODD/EVEN)
In fact, one reason for the success
NO OF CHILDREN
RENTED/SELF of deep learning is that it learns
OWNED
(EXPENDITURE)
good features themselves!
SURNAME LENGTH