Lesson 5 Data Wrangling in Data Science.
Lesson 5 Data Wrangling in Data Science.
Lesson 5 Data Wrangling in Data Science.
Data Wrangling - generally refers to transforming raw data into a useable form
for your analyses of interest, including loading, aggregating, merging, grouping,
concatenating and formatting.
Data is not useful until it can be analyzed and presented as insight that drives
better decision making.
data cannot be effectively analyzed until it is well structured, clean, and
converted into a suitable format. Simply put, that is why good data wrangling is
important.
This process ensures that data is prepared for automation and additional
analysis.
1|P ag e
Here are the goals of data wrangling:
data scientists spend 75% - 80% of their time wrangling the data, which is not a surprise
at all.
2|P ag e
1. Data exploration - here we assign the data, and then we visualize the data in a tabular
format.
2. Dealing with missing values, as we can see from the previous output, there
are NaN values present in the MARKS column which are going to be taken care of by
replacing them with the column mean.
3|P ag e
3. Reshaping data, in the GENDER column, we can reshape the data by categorizing them
into different numbers.
4|P ag e
Explanation
Panda. map () function from series is used to substitute each value in series with
another value.
4. Filtering data, suppose there is a requirement for the details regarding name, gender,
marks of the top-scoring students. Here we need to remove some unwanted data.
5|P ag e
Explanation
What does Axis 1 in pandas mean?
A data frame object has two axes. “axis 0” and “axis 1”
“axis 0” represents rows and
“axis 1“represents columns.
Hence, we have finally obtained an efficient dataset which can be further used
for various purposes. Hence, we have finally obtained an efficient dataset which
can be further used for various purposes.
Now that we know the basics of data wrangling. Below we will discuss various
operations using which we can perform data wrangling:
a) Merge operation
b) Grouping Method
Syntax:
pd.merge( data_frame1,data_frame2, on="field ")
For example: Suppose that a Teacher has two types of Data, first type of Data consists
of Details of Students and Second type of Data Consist of Pending Fees Status which is
taken from Account Office. So The Teacher will use merge operation here in order to
6|P ag e
merge the data and provide it meaning. So that teacher will analyze it easily and it
also reduces time and effort of Teacher from Manual Merging.
7|P ag e
8|P ag e
WRANGLING DATA USING MERGE OPERATION:
Example: There is a Car Selling company and this company have different
Brands of various Car Manufacturing Company like Maruti, Toyota, Mahindra,
Ford, etc. and have data where different cars are sold in different years. So the
Company wants to wrangle only that data where cars are sold during the year
9|P ag e
2010. For this problem, we use another Wrangling technique that is groupby()
method.
10 | P a g e
11 | P a g e