Working With Data and Features: CS771: Introduction To Machine Learning Nisheeth Srivastava

Working with Data and
Features
CS771: Introduction to Machine Learning

Nisheeth Srivastava
2
Plan for today
 Types of ML problems
 Typical workflow of ML problems
 Various perspectives of ML problems
 Data and Features
 Some basic operations of data and features
 People who need a math refresher can use this handy tutorial
CS771: Intro to ML
3
A Loose Taxonomy of ML
Learning using Learning using
labeled data unlabeled data
Some examples of
supervised learning problems
 Classification Supervised Unsupervised Some examples of

 Regression Learning unsupervised learning problems
 Ranking
Learning
 Clustering
 Dimensionality Reduction
Machine  Unsupervised Probability Density Estimation
Learning
Many other specialized flavors of ML also exist,
some of which include
 Semi-supervised Learning
 Active Learning
 Transfer Learning
RL doesn’t use “labeled” or  Multitask Learning
“unlabeled” data in the traditional Reinforcement  Imitation Learning (somewhat related to RL)
sense! In RL, an agent learns via Learning  Zero-Shot Learning
its interactions with an environment  Few-Shot Learning
 Continual learning
CS771: Intro to ML
4
A Typical Supervised Learning Workflow
Note: This example is for the
problem of binary classification,
a supervised learning problem
Labeled “dog” “dog”
Training “dog” “dog”
Data Can you think of a problem you
“dog”
“dog” would try to solve using supervised
“cat” learning?
“Feature”
“cat” Extraction
“cat” ML Algorithm
“cat” “cat” (outputs a “model”)
“cat”
Feature extraction converts raw inputs
to a numeric representation that the ML
algo can understand and work with.
More on feature extraction later. Predicted Label
Test “Feature”
Extraction (cat/dog)
Image
Cat vs Dog
Prediction model
https://www.pinclipart.com/, http://www.pngtree.com CS771: Intro to ML

5
A Typical Unsupervised Learning Workflow
Note: This example is for the
problem of data clustering, an
unsupervised learning problem
Unlabeled
Data
“Feature”
Extraction ML Algorithm
(outputs a
clustering)
https://www.pinclipart.com/, http://www.pngtree.com CS771: Intro to ML

6
A Typical Reinforcement Learning Workflow
Wish to teach an agent optimal policy for some task
Agent State
Agent does the following repeatedly
 Senses/observes the environment

 Takes an action based on its current policy
 Receives a reward for that action
Observation Reward Action  Updates its policy
Agent’s goal is to maximize its overall reward
There IS supervision, not explicit

(as in Supervised Learning) but
rather implicit (feedback based)
Environment State at time t
CS771: Intro to ML
7
ML: Some Perspectives
CS771: Intro to ML
8
Geometric Perspective Recall that feature extraction converts
inputs into a numeric representation
 Basic fact: Inputs in ML problems can often be represented as points or vectors in some vector space
 Doing ML on such data can thus be seen from a geometric view
y: Grumpiness (scale of 0-100)

Regression: A supervised learning
problem. Goal is to model the
relationship between input (x) and
real-valued output (y). This is akin to
a line or curve fitting problem
x: sleep hours
Classification: A supervised learning

problem. Goal is to learn a to predict
which of the two or more classes an
input belongs to. Akin to learning
linear/nonlinear separator for the inputs
Pic from: https://learningstatisticswithr.com/book/regression.html, https://maxstat.de/
CS771: Intro to ML
9
Geometric Perspective
Clustering looks like
Clustering: An unsupervised learning classification to me. Is
there any difference?
problem. Goal is to group inputs in a
few clusters based on their similarities
Yes. In clustering, we don’t know
with each other the labels. Goal is to separate them
without any labeled “supervision”
Dimensionality Reduction: An
unsupervised learning problem. Goal is
to compress the size of each input
without losing much information
present in the data
CS771: Intro to ML
10
Perspective as function approximation
 Supervised Learning (“predict output given input”) can be usually thought of as learning a
function f that maps each input to the corresponding output
 Unsupervised Learning (“model/compress inputs”) can also be usually thought of as

learning a function f that maps each input to a compact representation
Harder since we
don’t know the
labels in this case
 Reinforcement Learning can also be seen as doing function approximation

CS771: Intro to ML
11
Perspective as probability estimation
 Supervised Learning (“predict output given input”) can be thought of as estimating
the conditional probability of each possible output given an input
p(label=“cat” | image)
 Unsupervised Learning (“model/compress inputs”) can be thought of as estimating

the probability density of the inputs
Harder since we
don’t know the
labels in this case
 Reinforcement Learning can also be seen as estimating probability densities

CS771: Intro to ML
12
Data and Features
CS771: Intro to ML
13
Data and Features
 ML algos require a numeric feature representation of the inputs
 Features can be obtained using one of the two approaches

 Approach 1: Extracting/constructing features manually from raw inputs
 Approach 2: Learning the features from raw inputs
 Approach 1 is what we will focus on primarily for now
 Approach 2 is what is followed in Deep Learning algorithms (will see later)
 Approach 1 is not as powerful as Approach 2 but still used widely

CS771: Intro to ML
14
Example: Feature Extraction for Text Data
 Consider some text data consisting of the following sentences:
 John likes to watch movies BoW is just one of the many ways of doing
feature extraction for text data. Not the
 Mary likes movies too most optimal one, and has various flaws
(can you think of some?), but often works
 John also likes football reasonably well
 Want to construct a feature representation for these sentences
 Here is a “bag-of-words” (BoW) feature representation of these sentences
 Each sentence is now represented as a binary vector (each feature is a binary value,
denoting presence or absence of a word). BoW is also called “unigram” rep.
CS771: Intro to ML
15
Example: Feature Extraction for Image Data
 A very simple feature extraction approach for image data is flattening
Flattening and histogram based

methods destroy the spatial
information in the image but often
7x7 image Vector of pixel still work reasonably well
(49 pixels) intensities
 Histogram of visual patterns is another popular feature extr. method for images
 Many other manual feature extraction techniques developed in computer vision and
image processing communities (SIFT, HoG, and others)
Pic credit: cat.uab.cat/Research/object-recognition CS771: Intro to ML
16
Feature Selection
 Not all the extracted features may be relevant for learning the model (some may
even confuse the learner)
 Feature selection (a step after feature extraction) can be used to identify the
features that matter, and discard the others, for more effective learning
Age
Gender
Height Body-mass index (BMI)
Weight
Eye color
 Many techniques exist – some based on intuition, some based on algorithmic

principles (will visit feature selection later)
 More common in supervised learning but can also be done for unsup. learning
CS771: Intro to ML
17
Some More Postprocessing: Feature Scaling
 Even after feature selection, the features may not be on the same scale
 This can be problematic when comparing two inputs – features that have larger scales may
dominate the result of such comparisons
 Therefore helpful to standardize the features (e.g., by bringing all of them on the same
scale such as between 0 to 1)
 Also helpful for stabilizing the optimization techniques used in ML algos

Pic credit: https://becominghuman.ai/demystifying-feature-scaling-baff53e9b3fd, https://stackoverflow.com/ CS771: Intro to ML
18
Deep Learning: An End-to-End Approach to ML
Deep Learning = ML with automated feature learning from the raw inputs
Feature extraction part is automated via the feature learning module
ati o n M o del
Classific
Learning
re Le a rn in g Module
Featu
Raw Input layers_
(one or more Learned Features
(penultimate layer)
Pic an adaptation of the original from: https://deepai.org/
CS771: Intro to ML
19
Some Notation/Nomenclature/Convention
 Sup. learning requires training data as input-output pairs
RL and other flavors
of ML problems also
 Unsupervised learning requires training data as inputs use similar notation
 Each input is (usually) a vector containing the values of the features or attributes or
covariates that encode properties of the it represents, e.g.,
Size or length of the input is commonly
 For a 7 × 7 image: can be a 49 × 1 vector of pixel intensities known as data/input dimensionality or
feature dimensionality
 (In sup. Learning) Each is the output or response or label associated with input
(and its value is known for the training inputs)
 Output can be a scalar, a vector of numbers, or even an structured object (more on this later)
CS771: Intro to ML
20
Types of Features and Types of Outputs
 Features as well as outputs can be real-valued, binary, categorical, ordinal, etc.
 Real-valued: Pixel intensity, house area, house price, rainfall amount, temperature, etc
 Binary: Male/female, adult/non-adult, or any yes/no or present/absent type value
 Categorical/Discrete: Zipcode, blood-group, or any “one from a finite many choices“ value
 Ordinal: Grade (A/B/C etc.) in a course, or any other type where relative values matter
 Often, the features can be of mixed types (some real, some categorical, some ordinal, etc.)
CS771: Intro to ML
21
Some Basic Operations of Inputs
 Assume each input feature vector to of size D
What does such a
“mean” represent?
 Given inputs their average or mean can be computed as
If inputs are all cat images,

mean vector would represents
what an “average” cat looks like
 Can compute the Euclidean distance between any pair of inputs and
 .. or Euclidean distance between an input and the mean of all inputs
 .. and various other operations that we will look at later..

CS771: Intro to ML
22
Next Class
 Introduction to Supervised Learning
 A simple Supervised Learning algorithm based on computing distances
CS771: Intro to ML

Working With Data and Features: CS771: Introduction To Machine Learning Nisheeth Srivastava

Uploaded by

Copyright:

Available Formats

Working With Data and Features: CS771: Introduction To Machine Learning Nisheeth Srivastava

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Working With Data and Features: CS771: Introduction To Machine Learning Nisheeth Srivastava

Uploaded by

Copyright:

Available Formats

Working with Data and

CS771: Introduction to Machine Learning

 Typical workflow of ML problems

 Various perspectives of ML problems

 Data and Features

 Some basic operations of data and features

 Classification Supervised Unsupervised Some examples of

https://www.pinclipart.com/, http://www.pngtree.com CS771: Intro to ML

https://www.pinclipart.com/, http://www.pngtree.com CS771: Intro to ML

 Senses/observes the environment

Agent’s goal is to maximize its overall reward

There IS supervision, not explicit

Environment State at time t

ML: Some Perspectives

 Doing ML on such data can thus be seen from a geometric view

y: Grumpiness (scale of 0-100)

Classification: A supervised learning

 Unsupervised Learning (“model/compress inputs”) can also be usually thought of as

 Reinforcement Learning can also be seen as doing function approximation

 Unsupervised Learning (“model/compress inputs”) can be thought of as estimating

 Reinforcement Learning can also be seen as estimating probability densities

Data and Features

 Features can be obtained using one of the two approaches

 Approach 1 is what we will focus on primarily for now

 Approach 2 is what is followed in Deep Learning algorithms (will see later)

 Approach 1 is not as powerful as Approach 2 but still used widely

Flattening and histogram based

 Many techniques exist – some based on intuition, some based on algorithmic

 Also helpful for stabilizing the optimization techniques used in ML algos

Feature extraction part is automated via the feature learning module

 Binary: Male/female, adult/non-adult, or any yes/no or present/absent type value

If inputs are all cat images,

 .. or Euclidean distance between an input and the mean of all inputs

 .. and various other operations that we will look at later..

 A simple Supervised Learning algorithm based on computing distances

You might also like