Working With Data and Features: CS771: Introduction To Machine Learning Nisheeth Srivastava
Working With Data and Features: CS771: Introduction To Machine Learning Nisheeth Srivastava
Working With Data and Features: CS771: Introduction To Machine Learning Nisheeth Srivastava
Features
People who need a math refresher can use this handy tutorial
CS771: Intro to ML
3
A Loose Taxonomy of ML
Learning using Learning using
labeled data unlabeled data
Some examples of
supervised learning problems
Learning
Many other specialized flavors of ML also exist,
some of which include
Semi-supervised Learning
Active Learning
Transfer Learning
RL doesn’t use “labeled” or Multitask Learning
“unlabeled” data in the traditional Reinforcement Imitation Learning (somewhat related to RL)
sense! In RL, an agent learns via Learning Zero-Shot Learning
its interactions with an environment Few-Shot Learning
Continual learning
CS771: Intro to ML
4
A Typical Supervised Learning Workflow
Note: This example is for the
problem of binary classification,
a supervised learning problem
Labeled “dog” “dog”
Training “dog” “dog”
Data Can you think of a problem you
“dog”
“dog” would try to solve using supervised
“cat” learning?
“Feature”
“cat” Extraction
“cat” ML Algorithm
“cat” “cat” (outputs a “model”)
“cat”
Feature extraction converts raw inputs
to a numeric representation that the ML
algo can understand and work with.
More on feature extraction later. Predicted Label
Test “Feature”
Extraction (cat/dog)
Image
Cat vs Dog
Prediction model
“Feature”
Extraction ML Algorithm
(outputs a
clustering)
CS771: Intro to ML
7
CS771: Intro to ML
8
Geometric Perspective Recall that feature extraction converts
inputs into a numeric representation
Basic fact: Inputs in ML problems can often be represented as points or vectors in some vector space
Dimensionality Reduction: An
unsupervised learning problem. Goal is
to compress the size of each input
without losing much information
present in the data
CS771: Intro to ML
10
Perspective as function approximation
Supervised Learning (“predict output given input”) can be usually thought of as learning a
function f that maps each input to the corresponding output
Harder since we
don’t know the
labels in this case
p(label=“cat” | image)
Harder since we
don’t know the
labels in this case
CS771: Intro to ML
13
Data and Features
ML algos require a numeric feature representation of the inputs
Each sentence is now represented as a binary vector (each feature is a binary value,
denoting presence or absence of a word). BoW is also called “unigram” rep.
CS771: Intro to ML
15
Example: Feature Extraction for Image Data
A very simple feature extraction approach for image data is flattening
Histogram of visual patterns is another popular feature extr. method for images
Many other manual feature extraction techniques developed in computer vision and
image processing communities (SIFT, HoG, and others)
Pic credit: cat.uab.cat/Research/object-recognition CS771: Intro to ML
16
Feature Selection
Not all the extracted features may be relevant for learning the model (some may
even confuse the learner)
Feature selection (a step after feature extraction) can be used to identify the
features that matter, and discard the others, for more effective learning
Age
Gender
Height Body-mass index (BMI)
Weight
Eye color
More common in supervised learning but can also be done for unsup. learning
CS771: Intro to ML
17
Some More Postprocessing: Feature Scaling
Even after feature selection, the features may not be on the same scale
This can be problematic when comparing two inputs – features that have larger scales may
dominate the result of such comparisons
Therefore helpful to standardize the features (e.g., by bringing all of them on the same
scale such as between 0 to 1)
ati o n M o del
Classific
Learning
re Le a rn in g Module
Featu
Raw Input layers_
(one or more Learned Features
(penultimate layer)
Pic an adaptation of the original from: https://deepai.org/
CS771: Intro to ML
19
Some Notation/Nomenclature/Convention
Sup. learning requires training data as input-output pairs
RL and other flavors
of ML problems also
Unsupervised learning requires training data as inputs use similar notation
Each input is (usually) a vector containing the values of the features or attributes or
covariates that encode properties of the it represents, e.g.,
Size or length of the input is commonly
For a 7 × 7 image: can be a 49 × 1 vector of pixel intensities known as data/input dimensionality or
feature dimensionality
(In sup. Learning) Each is the output or response or label associated with input
(and its value is known for the training inputs)
Output can be a scalar, a vector of numbers, or even an structured object (more on this later)
CS771: Intro to ML
20
Types of Features and Types of Outputs
Features as well as outputs can be real-valued, binary, categorical, ordinal, etc.
Real-valued: Pixel intensity, house area, house price, rainfall amount, temperature, etc
Categorical/Discrete: Zipcode, blood-group, or any “one from a finite many choices“ value
Ordinal: Grade (A/B/C etc.) in a course, or any other type where relative values matter
Often, the features can be of mixed types (some real, some categorical, some ordinal, etc.)
CS771: Intro to ML
21
Some Basic Operations of Inputs
Assume each input feature vector to of size D
What does such a
“mean” represent?
Given inputs their average or mean can be computed as
CS771: Intro to ML