Lesson 3 Data Cleaning and Preparation
Lesson 3 Data Cleaning and Preparation
Lesson 3 Data Cleaning and Preparation
You are required to sort and filter data.Also, improper data needs to be eliminated and data
must be cleaned and be meaningful which serves the business purpose of the organization
To achieve these tasks, you will be learning a few concepts, such as sort, filter, group by,
subtotal and removing duplicates.
Sort and Filter
Sort and Filter
The sort and filter functionalities are available in Excel to order or filter the data for further analysis.
Sort and Filter
To sort the data based on the CGPA in descending order, choose the CGPA column name and then click on
Sort under the Data tab
Steps to Sort Value
Under sort columns, choose CGPA and then select ‘Largest to Smallest’
Steps to Sort Value
Sorting can be done based on character values from either A-Z or Z-A.
Under the Sort by tab, select Name and choose order as A to Z and click OK
Steps to Sort Value
Filter option allows us to choose any column we would like to filter the data on.
Steps to Filter Data
Choose the greater than or equal to option from the dropdown, and then mention the number
Group by and ungroup allow data to group data by collapse and expanding rows with similar content to
create more compact and understandable views.
Group by and ungroup by are available under the Data tab within the outline section.
Group By
The group by functionality in Excel allows us to show necessary data for easy viewing and analysis.
Step 1: To group data, select the rows and columns you want to group
Steps for Grouping
Step 4: This groups the three columns chosen, and applies a control to show or hide the grouped content
Grouping for Columns
Clicking on – hides the content, while clicking on + shows the grouped content.
Grouping for Rows
Similarly, for row-wise grouping, select the rows you want to group.
Grouping for Rows
Step 1: Click on Group under the Data tab, and then select rows option from the dialog box
Step 2: Click on OK
Grouping for Rows
Clicking on – hides the content and clicking on + shows the grouped content.
Grouping for Rows
We can create a group within a group by choosing rows or columns within the grouped data.
Step 1:
Choose the data already chosen for grouping
(row/column)
Ungrouping
Subtotal allows us to create groups and have a subtotal for each group.
Subtotal: Example
Step 3: Click on the column to which the sum function has to be applied
Subtotal: Example
The subtotaling provides control to the group and shows subtotals per student.
Text to Column
Text to Column
It converts raw text into columns in excel, which can save a user the time of manually separating the text
in a cell into several columns.
Step 2:Step
Choose column
3: Go to the AData
and tab
clickand
on click
text to
oncolumns under Data tab.
Text to Columns
Text to Column: Example
The text to column function puts each element separated by comma in an individual box
Removing Duplicates
Duplicate
In any data analytics work, there will always be cases where we get duplicates in
different columns.
When duplicates are removed using Excel, we can choose a single column or
multiple columns to check the data.
Removing Duplicates Using Single Column: Example
Step 2:
• Select the entire column
• Click on Data
• Click on Remove Duplicates
Removing Duplicates Using Single Column: Example
Click OK
Removing Duplicates Using Single Column: Example
Let us consider the following data set as an example for removing duplicates with
multiple columns:
Another pop up will appear which notifies that, 1 duplicate value was found and removed,
also 3 unique values remain.
Removing Duplicates Using Multiple Columns: Example
Data in Excel can be validated using some rules set in data validation dialog.
This helps in reducing the amount of unstandardized data, errors, or irrelevant information in the
worksheet.
Data Validation: Example
• Validation applies to new data entered in the cells where rules are placed.
After clicking on the data validation, a pop-up appears regarding the validation criteria
and the following validations are possible.
The sort and filter functionalities are available to order or filter the
data for further analysis.
b. Data Tools
c. Outline
d. Analyze
Knowledge
Check In which of the following sections can we find Group By and Subtotal under the data
tab?
1
b. Data Tools
c. Outline
d. Analyze
a. True
b. False
c.
d.
Knowledge
Check
Group By within a Group By is possible. True or False.
2
a. True
b. False
c.
d.
a. Options
b. Add Level
c. Sort On
d. Order
Knowledge
Check
Which of the following options can be used for sorting on multiple columns?
3
a. Options
b. Add Level
c. Sort On
d. Order
a. True
b. False
c.
d.
Knowledge
Check
Pattern matching is possible in filters. True or False.
4
a. True
b. False
c.
d.
a. Delimiter
b. Fixed Width
c. Comma
d. Space
Knowledge
Check Which of the following options is used to convert text to columns when there is no
delimiters?
5
a. Delimiter
b. Fixed Width
c. Comma
d. Space
Fixed width allows us to convert data into columns based on the length of each column.
Knowledge
Check
How to convert a CSV format data into excel?
6
d.
Knowledge
Check
How to convert a CSV format data into excel?
6
d.
a. True
b. False
c.
d.
Knowledge
Check Is it possible to separate data with multiple delimiters into columns?(Example
7 1,2,3;4,5|6)? True or False.
a. True
b. False
c.
d.
a. Missing validation
d.
Knowledge
Check
Why do duplicates occur in a dataset?
8
a. Missing validation
d.
Duplicates occur if the input feed has not validated the data and allowed duplicates.
Knowledge
Check
How do you specify that data has header while removing duplicates?
9
c. Cannot be specified
d.
Knowledge
Check
How do you specify that data has header while removing duplicates?
9
c. Cannot be specified
d.
The "My data has headers" checkbox specifies that the data has headers
Knowledge
Check Is it possible to remove rows in a dataset where only one row has duplicates? True or
10 False.
a. True
b. False
c.
d.
Knowledge
Check Is it possible to remove rows in a dataset where only one row has duplicates? True or
10 False.
a. True
b. False
c.
d.
True. It is possible to remove all rows in a dataset where one column only has duplicates.
Knowledge
Check
Which of the following options in data validation allows us to validate a list of values?
11
a. Any Value
b. Data
c. List
d. Custom
Knowledge
Check
Which of the following options in data validation allows us to validate a list of values?
11
a. Any Value
b. Data
c. List
d. Custom
a. not between
b. equal to
c. greater than
d. between
Knowledge
Check
Which of the following range of values can be provided in data validation?
12
a. not between
b. equal to
c. greater than
d. between