CHP 4
CHP 4
CHP 4
Iris Dataset
Feature Engineering
Refers to the process of translating a data set into features
such that these features are able to represent the data set more
effectively and result in a better learning performance.
• Example:
Situations in which it is essential to use Feature
Construction
When features have categorical value and machine learning
needs numeric value inputs.
When features having numeric (continuous) values and need
to be converted to ordinal values.
When text-specific feature construction needs to be done.
Encoding Categorical (Nominal) Variables
Feature construction (encoding nominal variables)
Encoding Categorical (Ordinal) Variables
• In PCA, a new set of features are extracted from the original features
which are quite dissimilar in nature.
(an n- dimensional feature space gets transformed to an m- dimensional
feature space, where the dimensions are orthogonal to each other.)
• Unlike PCA, the focus of LDA is not to capture the data set
variability. Instead, LDA focuses on class separability.
( i.e. separating the features based on class separability so as to avoid over-fitting of
the machine learning model.)
1. Correlation-based measures
2. Distance-based measures
3. Other coefficient-based measure
Correlation-based Similarity Measure
Correlation is a measure of linear dependency between two
random variables.
where, n11 = number of cases where both the features have value
1
n01 = number of cases where the feature 1 has value 0 and feature
2 has value 1
n10 = number of cases where the feature 1 has value 1 and feature
2 has value 0
Jaccard distance, d = 1 - J
• Simple matching coefficient (SMC) is almost same as
Jaccard coeficient except the fact that it includes a number of
cases where both the features have a value of 0.
where, n11 = number of cases where both the features have value 1
n01 = number of cases where the feature 1 has value 0 and feature 2 has
value 1
n10 = number of cases where the feature 1 has value 1 and feature 2 has
value 0
n00 = number of cases where both the features have value 0
• Cosine similarity which is one of the most popular measures
in text classification.
Calculate the cosine similarity of x and y, where x = (2, 4, 0, 0,
2, 1, 3, 0, 0) and y = (2, 1, 0, 0, 3, 2, 1, 0, 1).
• Cosine similarity actually measures the angle between x and
y vectors.
• If cosine similarity has a value 1, the angle between x and y
is 0° which means x and y are same except for the magnitude.
• If cosine similarity is 0, the angle between x and y is 90°.
Hence, they do not share any similarity (in case of text data,
no term/word is common).
Overall Feature Selection Process
• A search may start with a full set and successively remove features.
This strategy is termed as Sequential backward elimination.
• In certain cases, search start with both ends and add and remove
features simultaneously. This strategy is termed as a Bi-directional
selection.
• Each candidate subset is then evaluated and compared with
the previous best performing subset based on certain
evaluation criterion.
• If the new subset performs better, it replaces the previous
one.
• The cycle of subset generation and evaluation continues till a
pre-defined stopping criterion is fulfilled.
Stopping Criteria:
1. The search completes.
2. Some given bound (e.g. a specified number of iterations) is
reached.
3. Subsequent addition (or deletion) of the feature is not
producing a better subset.
4. A sufficiently good subset (e.g. a subset having better
classification accuracy than the existing benchmark) is
selected.
Next..