Read Discuss
Data Normalization in Data Mining
Difficulty Level : Basic ● Last Updated : 02 Feb, 2023
INTRODUCTION:
Data normalization is a technique used in data mining to transform the values of a
dataset into a common scale. This is impor tant because many machine learning
algorithms are sensitive to the scale of the input features and can produce better
results when the data is normalized.
There are several different normalization techniques that can be used in data
mining, including :
1. Min-Max normalization: This technique scales the values of a feature to a range
between 0 and 1. This is done by subtracting the minimum value of the feature
from each value, and then dividing by the range of the feature.
2. Z-score normalization: This technique scales the values of a feature to have a
mean of 0 and a standard deviation of 1. This is done by subtracting the mean of
the feature from each value, and then dividing by the standard deviation.
3. Decimal Scaling : This technique scales the values of a feature by dividing the
values of a feature by a power of 10.
4. Logarithmic transformation: This technique applies a logarithmic transformation
to the values of a feature. This can be useful for data with a wide range of values,
as it can help to reduce the impact of outliers.
5. Root transformation: This technique applies a square root transformation to the
values of a feature. This can be useful for data with a wide range of values, as it
can help to reduce the impact of outliers.
6. It ’s impor tant to note that normalization should be applied only to the input
features, not the target variable, and that different normalization technique may
work better for different types of data and models.
▲
Start Your Coding Journey Now!
In conclusion, normalization is an impor tant step in data mining, as it can help to
Login Register
improve the per formance of machine learning algorithms by scaling the input
Read Discuss
features to a common scale. This can help to reduce the impact of outliers and
improve the accuracy of the model.
Normalization is used to scale the data of an attribute so that it falls in a smaller
range, such as -1.0 to 1.0 or 0.0 to 1.0. It is generally useful for classification
algorithms.
Need of Normalization –
Normalization is generally required when we are dealing with attributes on a
different scale, other wise, it may lead to a dilution in effectiveness of an impor tant
equally impor tant attribute(on lower scale) because of other attribute having values
on larger scale. In simple words, when multiple attributes are there but attributes
have values on different scales, this may lead to poor data models while per forming
data mining operations. So they are normalized to bring all the attributes on the
same scale.
Methods of Data Normalization –
Start Your Coding Journey Now!
Read Discuss
Decimal Scaling
Min-Max Normalization
z-Score Normalization(zero-mean Normalization)
Decimal Scaling Method For Normalization –
It normalizes by moving the decimal point of values of the data. To normalize the data
by this technique, we divide each value of the data by the maximum absolute value of
data. The data value, vi, of data is normalized to vi‘ by using the formula below –
where j is the smallest
integer such that max(|v ‘|)<1. Example –
i
Let the input data is: -10, 201, 301, -401, 501, 601, 701 To normalize the above
data, Step 1: Maximum absolute value in given data(m): 701 Step 2: Divide the
given data by 1000 (i.e j=3) Result : The normalized data is: -0.01, 0.201, 0.301,
Data Structures and Algorithms
-0.401, 0.501, 0.601, 0.701
Interview Preparation Data Science
Min-Max Normalization –
In this technique of data normalization, linear transformation is per formed on the
original data. Minimum and maximum value from data is fetched and each value is
replaced according to the following formula.
Start Your Coding Journey Now!
Read Discuss
Where A is the attribute data, Min(A), Max(A) are the minimum and maximum
absolute value of A respectively. v’ is the new value of each entr y in data. v is the old
value of each entr y in data. new_max(A), new_min(A) is the max and min value of the
range(i.e boundar y value of range required) respectively.
Z-score normalization –
In this technique, values are normalized based on mean and standard deviation of the
data A . The formula used is:
Start Your Coding Journey Now!
Read Discuss
v’, v is the new and old of each entr y in data respectively. σ A
, A is the standard
deviation and mean of A respectively.
ADVANTAGES OR DISADVANTAGES:
Data normalization in data mining can have a number of advantages and
disadvantages.
Advantages :
1. Improved per formance of machine learning algorithms: Normalization can help to
improve the per formance of machine learning algorithms by scaling the input
features to a common scale. This can help to reduce the impact of outliers and
improve the accuracy of the model.
2. Better handling of outliers: Normalization can help to reduce the impact of
outliers by scaling the data to a common scale, which can make the outliers less
influential.
3. Improved interpretability of results: Normalization can make it easier to interpret
the results of a machine learning model, as the inputs will be on a common scale.
4. Better generalization: Normalization can help to improve the generalization of a
model, by reducing the impact of outliers and by making the model less sensitive
to the scale of the inputs.
Disadvantages :
Start Your Coding Journey Now!
1. Loss of information: Normalization can result in a loss of information if the
original scale of the input features is impor tant.
Read Discuss
2. Impact on outliers: Normalization can make it harder to detect outliers as they will
be scaled along with the rest of the data.
3. Impact on interpretability: Normalization can make it harder to interpret the
results of a machine learning model, as the inputs will be on a common scale,
which may not align with the original scale of the data.
4. Additional computational costs: Normalization can add additional computational
costs to the data mining process, as it requires additional processing time to scale
the data.
5. In conclusion, data normalization can have both advantages and disadvantages. It
can improve the per formance of machine learning algorithms and make it easier
to interpret the results. However, it can also result in a loss of information and
make it harder to detect outliers. It ’s impor tant to weigh the pros and cons of data
normalization and carefully assess the risks and benefits before implementing it.
Like 28
Previous Next
Related Articles
1. Difference Between Data Mining and Text Mining
2. Difference Between Data Mining and Web Mining
3. Text Mining in Data Mining
Start
4. Your Coding
Generalized Journey
Sequential PatternNow!
(GSP) Mining in Data Mining
Read Discuss
5. Problems on min-max normalization
6. Difference between Data Warehousing and Data Mining
7. Data Mining: Data Attributes and Quality
8. Difference Between Big Data and Data Mining
9. Difference Between Data Mining and Data Visualization
10. Outlier Detection in High-Dimensional Data in Data Mining
Ar ticle Contributed By :
deepak_jain
@deepak_jain
Vote for difficulty
Current difficulty : Basic
Easy Normal Medium Hard Expert
Article Tags : data mining, Computer Subject
Practice Tags : Data MIning
Improve Article Report Issue
A-143, 9th Floor, Sovereign Corporate Tower,
Sector-136, Noida, Uttar Pradesh - 201305
Start Your Coding Journey Now!
feedback@geeksforgeeks.org
Read Discuss
Company Learn
About Us DSA
Careers Algorithms
In Media Data Structures
Contact Us SDE Cheat Sheet
Privacy Policy Machine learning
Copyright Policy CS Subjects
Advertise with us Video Tutorials
Courses
News Languages
Top News
Python
Technology
Java
Work & Career
CPP
Business
Golang
Finance
C#
Lifestyle
SQL
Knowledge
Kotlin
Web Development Contribute
Web Tutorials Write an Article
Django Tutorial Improve an Article
HTML Pick Topics to Write
JavaScript Write Interview Experience
Bootstrap Internships
ReactJS Video Internship
NodeJS
@geeksforgeeks , Some rights reserved