Course 2: Excel Basics for Data Analysis
Course Introduction
Course Overview
Goal: Build foundational Excel skills for data cleaning,
wrangling, and analysis—key competencies for aspiring
data analysts
Format: Video lessons + hands-on labs; two distinct
business scenarios and datasets used throughout
Module Breakdown
1. Module 1: Spreadsheet Fundamentals
o Terminology: workbook vs. worksheet, cells, ranges
o Interface navigation: ribbons, menus, tabs, shortcuts
2. Module 2: Data Entry & Basic Manipulation
o Selecting, entering, and editing cell data
o Copying, AutoFill, and Flash Fill
o Formatting (numbers, text, dates)
o Basic formulas and functions
3. Module 3: Data Cleaning & Wrangling
o Data quality & privacy fundamentals
o Removing duplicates, empty rows, and extraneous
whitespace
o Standardizing inconsistent entries
o Text-to-Columns for parsing strings
4. Module 4: Data Analysis Techniques
o Filtering and sorting datasets
o Common analysis functions (e.g., SUM, AVERAGE,
COUNTIFS)
o PivotTables for aggregation
o Slicers and timelines for interactive filtering
5. Module 5: Capstone Labs
o Apply end-to-end workflow:
1. Understand business scenario
2. Clean & prepare the provided dataset
3. Analyze data to produce actionable deliverables
Learning Outcomes
Recognize when Excel is an appropriate (and when it’s
limited) as a data-analysis tool
Build and navigate spreadsheets confidently
Perform intermediate data-wrangling tasks in Excel
Conduct exploratory analysis using filters, sorts, and
PivotTables
Deliver a complete analysis report—cleaned data, insights,
and visualizations
Support & Next Steps
Use course forums for questions and peer discussion
Proceed to the next video: “Introduction to Spreadsheets”
Module 1: Spreadsheet Fundamentals
Introduction to Spreadsheets
1. Common Spreadsheet Applications
Microsoft Excel (desktop + paid; Excel Online free with
limited features)
Google Sheets (free, web-based; integrates with Google
apps)
LibreOffice Calc (free, open-source; basic charts,
conditional formatting, PivotTables)
Other options: Zoho Sheet, OpenOffice Calc, Quip,
Smartsheet, Apple Numbers
2. Key Advantages of Spreadsheets
Automation & Accuracy: Formulas recalculate
automatically and reliably
Organization & Accessibility: Easy sorting, filtering,
formatting, and error-checking
Visualization: Built-in charting and reporting tools to
surface insights
Flexibility: From simple tables (1970s) to today’s powerful
data-analysis features
3. Typical Business Uses
Data entry & storage
Large-dataset comparison
Modeling & planning (budgets, forecasts)
Trend identification & statistical analysis
Financial reporting (P&L, payroll, invoicing, auditing)
Process flowcharts and scheduling
4. Everyday & Personal Uses
Expense tracking, household budgets
Fitness/health logs, recipe collections
Hobby management (music catalogs, sports leagues)
Personal lists (contacts, shopping, holiday cards)
5. Role in Data Analysis
Data Collection: Aggregate from multiple sources into a
single sheet
Data Cleaning: Remove duplicates, fix errors, standardize
formats
Data Analysis: Sort, filter, and apply functions to uncover
patterns
Data Visualization & Storytelling: Build charts and reports
to communicate findings
Next Up:
Basics of spreadsheets: key terminology, navigation, and
workbook/worksheet concepts.
Spreadsheet Basics - Part 1
1. Spreadsheet Applications
Excel Desktop (.xlsx files; full features)
Excel Online (free, limited web version)
Google Sheets, LibreOffice Calc, Zoho Sheet, etc.
2. Workbooks & Worksheets
Workbook = the entire file (one or more sheets; saved as
*.xlsx)
Worksheet = individual “tabs” within a workbook
o Default names: Sheet1, Sheet2… → Rename via
double-click or right-click → “Rename”
o Active sheet = highlighted tab
o Reorder tabs by drag-drop or right-click → Move or
Copy
3. Cell Structure
Cells = rectangular boxes holding data (text, numbers,
formulas)
Organized into:
o Columns (vertical; labeled A, B, C…)
o Rows (horizontal; numbered 1, 2, 3…)
Cell Reference: Column+Row (e.g., M20)
o Active cell = the currently selected cell, shown in
Name Box
4. Cell Ranges (Arrays)
Definition: A block of cells selected together
o Single row: D9:H9 → notation D9:H9
o Single column: D9:D19 → notation D9:D19
o Multiple rows & columns: D9:H19 → notation D9:H19
Used in formulas to operate on many cells at once
3D Reference: Includes sheet name, e.g. Sheet2!D9:D19
5. Key Takeaways
Workbooks contain multiple worksheets; rename &
organize tabs for clarity
Cells are addressed by column letter then row number
(e.g., A1, B2)
Ranges combine contiguous cells for bulk calculations
(SUM, AVERAGE, etc.)
These basics apply across Excel desktop, web, and other
spreadsheet tools
Spreadsheet Basics - Part 2
1. Opening & File Management
Backstage View: File → Open (Recent or Browse) to load
workbooks
Other Options: New, Save, Print, Excel Options
2. The Ribbon & Quick Access Toolbar
Ribbon Tabs: Home, Insert, Formulas, Data, View, Power
Pivot, etc.
Hide/Show Ribbon: Double-click any tab or press Ctrl + F1
Groups: Buttons organized under Font, Alignment,
Number, Styles, Cells; extra options via the “launcher”
arrow
Quick Access Toolbar: Above the ribbon—add your most-
used commands (Save, Undo/Redo, Sort Asc/Desc, etc.)
via the toolbar’s dropdown menu
3. Worksheet Navigation
Arrow Keys: Move one cell at a time
Page Up/Page Down: Jump one screen vertically
Scroll Bars: Fast horizontal/vertical movement in large
sheets
Shortcut Keys:
o Ctrl + Home: Go to A1
o Ctrl + End: Go to last used cell
o Ctrl + ↓ / Ctrl + ↑: Jump to the end/start of the
current column
4. Selecting Cells & Ranges
Single Cell: Click or use arrow keys
Contiguous Range:
o Mouse + drag, or
o Shift + arrow keys
Entire Column/Row: Click its header letter/number
Multiple Columns/Rows:
o Mouse + drag across headers, or
o Select one header and Shift + arrow keys
Noncontiguous Selections: Ctrl + click each separate
header or cell range
All Data:
o Top-left corner button; or
o Ctrl + A (selects used cells only)
5. Cursor Icons & Their Functions
White Cross (“Select”): Default for selecting cells
Four-Pointed Arrow (“Move”): When hovering cell edges—
drags cell contents
Thin Black Plus (“Fill Handle”): At bottom-right corner—
drags to copy or auto-fill values
Next Steps: Entering, copying/pasting, and formatting data in
spreadsheets.
Viewpoints: Using Spreadsheets as a Data Analysis Tool
Advantages of Using Spreadsheets
Immediate Data Visibility
o Tabular layout makes every record and field easy to
inspect at a glance
One-Stop Analysis & Reporting
o Built-in formulas (SUMIF, INDEX-MATCH), PivotTables,
and charts let you calculate, aggregate, and visualize
without extra software
Low Barrier & Ubiquity
o Nearly everyone has access to Excel, Google Sheets,
or similar—no special installs required
Flexible Data Exploration
o Sorting, filtering, and conditional formatting turn
“20,000 rows” into manageable, bite-sized insights
Rapid Prototyping
o Ideal for quick, ad-hoc analyses or proof-of-concept
before scaling to more robust tools
Custom Reporting
o Export ERP or transactional data into spreadsheets
and tailor lookup-driven summaries for stakeholders
Limitations of Using Spreadsheets
Poor Reproducibility
o Ad-hoc filters, manual corrections, and hidden steps
make it hard to trace or share exactly how data was
transformed
Complexity & Maintenance
o Nested formulas (VLOOKUP, IF statements) can break
unexpectedly and become difficult to debug or
update
Performance & Scale
o Large datasets (>10–20K rows) often slow down,
crash, or exceed spreadsheet capacities—may force
migration to databases or BI tools
Analysis Paralysis
o Endless feature sets and formula choices can lead to
overthinking and wasted time when simpler
approaches might suffice
Limited Advanced Analytics
o Not well suited for large-scale data modeling,
predictive analytics, or collaborative version control
Summary and Highlights
In this lesson, you have learned:
There are several spreadsheet applications available in the
marketplace; the most commonly used and fully-
featured spreadsheet application is Microsoft Excel.
Spreadsheets provide several advantages over manual
calculation methods and they help you keep data
organized and easily accessible.
As a Data Analyst, you can use spreadsheets as a tool for
your data analysis tasks.
There are several elements that make up a workbook in a
spreadsheet application.
The ribbon provides access to all the features and tools
required to view, enter, edit, manipulate, clean,
and analyze data in Excel.
There are several ways to navigate around a worksheet
and workbook in Excel.
Module 2: Data Entry & Basic Manipulation
Viewing, Entering, and Editing Data
1. Viewing Features
Zoom Slider (bottom-right): drag or +/– buttons
View → Zoom: preset levels, 100%, or Zoom to
Selection
Split Pane (View tab): divide screen into scrollable
sections; double-click a split line to remove it
Freeze Panes (View tab):
o Freeze Top Row or Freeze First Column
o Or select a cell one row below & one column right of
your freeze point, then Freeze Panes to lock
multiple rows/columns
2. Switching Between Workbooks
View → Switch Windows
Shortcut: Ctrl + F6
3. Entering Data & Headers
New Workbook: Quick Access Toolbar → New (or Ctrl +
N)
Header Row: type heading in first row; use Tab (not
Enter) to move right
Insert Column: right-click a column header → Insert to
add a new column at that position
Resizing Columns:
o Manual: drag divider between column headings
o Auto-fit all selected columns: select columns A:E,
then double-click any divider
4. Editing Cell Contents
Overwrite: select cell and type new entry
In-Cell Edit:
o F2 to edit at end
o Double-click cell to place cursor
o Click in Formula Bar to edit
5. Basic Formatting
Select header cells and apply Bold (Ctrl+B)
Adjust column widths for readability
Next Topic: Copying & filling data; cell and number formatting.
Copying, Filling, and Formatting Cells and Data
1. Moving & Copying Data
Drag & Drop Move: Select cells → hover edge until four-
pointed Move cursor → drag
Drag & Drop Copy: Hold Ctrl while dragging → Copy
cursor appears
Cut/Copy & Paste:
o Ctrl+C to copy, Ctrl+X to cut
o Select destination → Ctrl+V or right-click → Paste
Paste Options:
o Default uses destination column widths
o Paste → Keep Source Column Widths to preserve
original widths
2. AutoFill Patterns
Fill Handle (small black +): Drag from cell corner to
auto-complete series
Text Series: “Jan” → fills months; “Mon” → days
Custom Step: Provide two entries (e.g., “Mon”, “Wed”) →
pattern every two days
Numeric Series: Single value → copies same; two values
(5 → 10) → increments by difference
3. Cell Formatting vs. Data Formatting
1. Cell Formatting (Appearance)
o Styles: Home → Styles gallery; choose fill color, bold,
italic, borders
o Borders: Select range → Home → Borders → choose
style
o Font & Size: Home → Font controls
2. Data Formatting (Number/Text Type)
o Decimal Places: Home → Number →
Decrease/Increase Decimal
o Number Formats: Home → Number drop-down →
select Currency, Percentage, Text, etc.
o Custom Formats: Home → More Number Formats →
define specifics (symbol, decimals)
4. Common Formatting Tasks
Highlight Headers & Columns:
o Select A3:P3 → apply a cell style + bold
o Select Manufacturer or Model columns → different
styles + italic if desired
Uniform Column Widths:
o Adjust one manually or Auto-Fit all: select A:P →
double-click any column divider
Fix Import Errors:
o Cells mis-interpreted as dates (e.g., “9-5” → date) →
change format to Text → re-enter correct values
Currency Formatting:
o Select price column → More Number Formats →
Currency → choose symbol and 2 decimals
The Basics of Formulas
1. Formula Components
Equal Sign (=): Indicates start of a formula
Function: Built-in operations (e.g., SUM, AVERAGE)
Arguments / References: Cell or range inside
parentheses (e.g., E2:E13)
Operators:
o Arithmetic: +, –, *, /
o Comparison, text concat (&), reference operators
Constants: Hard-coded values (e.g., 20, 10%, dates)
2. Writing a Simple SUM Formula
1. Type = → SUM(
2. Enter range as E2:E4 (or select with mouse/Shift+arrows)
3. Close ) → Enter
Tip: Use : between first and last cell instead of listing each cell
individually.
3. Editing Formulas
Select & re-type
F2 to edit in-cell
Double-click cell or edit in the Formula Bar
4. Relative References & AutoFill
Fill Handle: Drag bottom-right corner to copy formulas
Relative References: Cell addresses adjust
automatically (e.g., E2:E13 → F2:F13)
Double-click Fill Handle: Auto-copy down to match
adjacent data length
5. AutoSum Shortcut
Home → Editing → AutoSum
Alt + = quickly inserts =SUM(...) for the adjacent range
Press Enter, then use Fill Handle or double-click to fill
6. Applying Formulas Across Rows & Columns
Drag Fill Handle horizontally to apply same pattern
across columns
Combine values: e.g., =B15 + B16 for subtotal + tax,
then AutoFill across
7. Final Formatting
Select results → Home → Number Format → Currency (e.g.,
US $)
Intro to Functions
1. Statistical Functions via AutoSum
AutoSum Dropdown (Home → Editing → ∑): quick access
to
o Average → =AVERAGE(range)
o Min → =MIN(range)
o Max → =MAX(range)
o Count Numbers → =COUNT(range)
Median (not in AutoSum list): Home → AutoSum → More
Functions → Statistical → MEDIAN(range)
Copy Across/Down: use Fill Handle (drag or double-click)
to apply formulas to adjacent columns/rows
2. Anatomy of a Function
Copy code
= FUNCTION_NAME ( argument1, argument2, … )
Function Name: e.g., AVERAGE, SUM, VLOOKUP
Arguments: cell references, ranges, or constants
Parentheses enclose the arguments
3. Function Categories (Formulas → Function Library)
Recently Used: your latest functions
Financial: e.g., PV, RATE, ACCRINT
Logical: AND, OR, IF
Text: CONCAT (replaces CONCATENATE), LEFT, FIND,
SEARCH
Date & Time: TODAY, NETWORKDAYS, WEEKDAY
Lookup & Reference: VLOOKUP, HLOOKUP, INDEX,
MATCH, SORTBY
Math & Trig: SUMIF, SUMPRODUCT, POWER, COS, SIN
More Functions:
o Statistical: AVERAGE, COUNT, MEDIAN, STDEV
o Engineering, Information, Compatibility
4. Finding & Inserting Any Function
Insert Function (fx button in Formula Bar or Formulas →
Insert Function)
o Search by name or description
o Browse all categories or “All” functions
Key Takeaway:
Excel offers 400+ built-in functions—master AutoSum for quick
statistics, learn core categories on the Formulas tab, and use
the Insert Function tool to discover and insert advanced
formulas.
Referencing Data in Formulas
1. Reference Types
1. Relative References (default)
o Adjust when copied: reference moves relative to new
cell location
o Example: In cell D4 =A1 + A3 → copy to D5 becomes
=A2 + A4
2. Absolute References
o Use $ to lock row and/or column so it never shifts
when copied
o $A$1 locks both column A and row 1
o Example: In E4 =$A$1 + $A$3 → copy anywhere
remains =$A$1 + $A$3
3. Mixed References
o One part fixed, one relative:
A$1 fixes row 1, column changes
$A3 fixes column A, row changes
o Example: In G4 =A$1 + $A3 → copying downward
keeps A$1 fixed but increments the row in $A3
2. Choosing the Right Reference
Relative for formulas that should “shift” with each
row/column
Absolute for constants (e.g., a single tax rate cell)
Mixed when one dimension (row or column) must stay
constant
3. Common Formula Errors & Indicators
1. #####
o Not an error: Column too narrow or negative
date/time
o Fix: Widen column
2. #NAME?
o Mistyped function or missing operator (e.g., using X
instead of *)
o Green triangle in corner + error badge → hover or
click for details
o Fix: Correct spelling, use proper operators
3. Using the Error Badge
o Show Calculation Steps: Highlights syntax error
location
o Help on This Error: Opens contextual Excel help
o Ignore Error: Suppress for this cell
o Error Checking Options: Configure which errors
Excel flags
4. Troubleshooting Workflow
1. Identify error code in cell
2. Hover/click the warning icon for hint
3. Use Show Calculation Steps to pinpoint issue
4. Edit formula in Formula Bar (F2)
5. Re-evaluate or consult Excel Help for code-specific
guidance
Key Takeaway:
Master relative, absolute, and mixed references for flexible
formula behavior, and use Excel’s built-in error indicators and
tools to diagnose and correct formula mistakes.
Summary and Highlights
In this lesson, you have learned:
There are several features to modify views in Excel, and it
is very straightforward to enter and edit data in a
spreadsheet.
You can move or copy data within a worksheet or between
worksheets, and you can use AutoFill to automatically
enter data that is in a series or that fits a pattern.
You can format both cells and data in Excel.
A formula is made up of several component parts,
and formulas can perform calculations using numbers
directly or by using references to data in the worksheet.
You can use the Fill Handle in Excel to quickly copy
formulas to other cells.
There are several different categories of function you can
use for different purposes, and you can search for a
function by name, or by category.
You can reference cells in the worksheet in your
formulas by using relative, absolute, or mixed references.
You can make a formula absolute by adding a dollar
symbol ($) to a cell reference.
If you get errors in your formulas, you can use the error-
checking capabilities of Excel to resolve them.
Module 3: Basics of Data Quality and Privacy
Introduction to Data Quality
Why Data Quality Matters
Reliable data underpins confident business decisions
Profiling identifies errors and inconsistencies before
analysis
Five Key Data-Quality Traits
1. Accuracy
o Data correctly represents real-world values
o Clean by removing duplicates, fixing formatting
errors, deleting blank rows
2. Completeness
o All required fields are present
o Incomplete if essential information (e.g., region
codes) is missing—requires sourcing additional data
3. Reliability
o Data consistently trustworthy over time
o Evaluate source processes (e.g., shared database vs.
siloed records) and implement controls if needed
4. Relevance
o Only include data necessary for the analysis
o Exclude extraneous fields (e.g., customer birthdays
when calculating sales revenue) to streamline work
5. Timeliness
o Data is up-to-date and refreshed at appropriate
intervals
o Match refresh frequency to purpose (e.g., weekly
dashboards vs. monthly updates)
Key Takeaway:
Assess each dataset against these five traits—Accuracy,
Completeness, Reliability, Relevance, and Timeliness—to
ensure clean, fit-for-purpose data before proceeding with
analysis.
Importing File Data
1. Text & CSV Files
Formats:
o Plain text (.txt) or comma-separated values (.csv)
o Each record’s fields are delimited (e.g., by commas)
2. Text Import Wizard Steps
1. Open File: File → Open → Browse → select .txt/.csv
2. Step 1:
o Confirm Delimited
o Check My data has headers
3. Step 2:
o Choose delimiter(s) (e.g., Comma)
o Preview data layout
4. Step 3:
o Assign column data formats (General, Text, Date)
o Finish to load into Excel
3. Adjusting Column & Row Sizes
Manual Resize: Drag header or row divider
Auto-Fit All: Select all columns/rows → double-click any
divider
Auto-Fit One: Double-click single divider
4. Adding & Removing Columns/Rows
Delete Column/Row:
o Select header → Home → Delete → Delete Sheet
Columns/Rows
o Or right-click header → Delete
Insert Column/Row:
o Select existing header → Home → Insert → Insert
Sheet Columns/Rows
o Or right-click header → Insert
5. Saving as Excel Workbook
After import, click Save As → choose Excel Workbook
(*.xlsx)
Next Topic: Data privacy essentials—handling sensitive and
personally identifiable information.
Basics of Data Privacy
1. Three Pillars of Data Privacy
1. Confidentiality
o Recognize that customer data “belongs” to
individuals
o Data Types:
Personal Information (PI): Any data traceable to
a person (email, photos)
Personally Identifiable Information (PII): Direct
identifiers (SSN, driver’s license)
Sensitive Personal Information (SPI): Private
attributes (race, biometrics, sexual orientation)
2. Collection & Use
o Know where data was collected and respondent
location—determines which laws apply
o GDPR (EU): Applies within EU jurisdictions
o LGPD (Brazil): Applies to individuals in Brazil
regardless of processor location (effective Aug 2020)
o CCPA (California): State law protecting consumer data
in CA
o Industry Regulations:
HIPAA (Healthcare): Protected health
information
PCI DSS (Retail): Credit-card data security
3. Compliance
o Maintain up-to-date policies and employee awareness
o Breach example: Stolen laptop containing sensitive
data—even offsite loss violates policy, invites fines
and reputational damage
o Best Practice: Enforce data-handling rules at all times
2. When Regulations Don’t Apply
Anonymized Data: All identifiers removed so individuals
can’t be traced—exempts most privacy laws
Trade-Off: May limit usefulness if individual-level analysis
is required
Key Takeaway:
Data analysts must distinguish PI, PII, and SPI; know applicable
laws (GDPR, LGPD, CCPA, HIPAA, PCI) based on data origin and
industry; enforce confidentiality, collection/use rules, and
compliance to protect both customers and the organization.
Viewpoints: Data Quality and Privacy
1. Data Quality: The Foundation of Trustworthy Analysis
Aligns with Expectations: Any discrepancy between results
and stakeholder assumptions prompts immediate scrutiny
of data sources and transformations
“Garbage In, Garbage Out”: Poor-quality input yields
unreliable output—always perform integrity and sense
checks before analysis
Business Impact: Decisions on product launches,
profitability, and inventory hinge on correct data; a single
wrong SKU or misaligned metric can cost time, money,
and credibility
Due Diligence: Establish trusted data pipelines, document
cleansing steps, and validate accuracy to uphold the
integrity of your insights
2. Data Privacy: Safeguarding Sensitive Information
Role-Based Access Control: Ensure users see only what
their role and geography permit—row- and report-level
security can be built into BI models (e.g., Cognos)
Protecting PII: Personally Identifiable Information (names,
SSNs, birthdates) must never travel unencrypted; use
secure channels, masking tools, or one-time passcodes
rather than plain email
Industry-Wide Importance: While critical in healthcare and
finance, privacy is non-negotiable everywhere—missteps
invite legal penalties and erode customer trust
Best Practices: Encrypt sensitive files, limit data exposure,
and maintain rigorous policies to keep personal data safe
at all costs
Key Takeaway:
High data quality and stringent privacy measures are both
essential pillars of responsible, impactful analytics—ensuring
reliable insights while protecting individuals and organizations.
Summary and Highlights
In this lesson, you have learned the following information:
The Five Traits of Data Quality:
Accuracy
Completeness
Reliability
Relevance
Timeliness
Importing Text:
You can use the ‘Text Import Wizard’ to import data
from other formats, such as plain text, or comma-
separated value files.
The Three Fundamentals of Data Privacy:
Confidentiality
Collection and Use
Compliance
Cleaning Data
Removing Duplicated or Inaccurate Data and Empty
Rows
1. Common Data Inconsistencies
Spelling errors, whitespace, incorrect casing
Empty rows interrupt formulas and navigation
Duplicate records skew analyses
Misspelled text entries (e.g., names, categories)
2. Spell Checking
1. Select target column (e.g., Product Line)
2. Review → Spelling
3. Accept suggested corrections or Ignore
3. Removing Empty Rows
Manual vs. Automated
Manual: scroll and delete—impractical at scale
Automated (Filtered approach):
1. Select entire data range (Ctrl+Shift+End)
2. Data → Filter
3. On any column’s filter menu, uncheck Select All,
then check Blanks
4. Filter shows only blank rows → select and Delete
Rows
5. Data → Clear Filter
6. Verify navigation (Ctrl+Down Arrow) now skips
blanks
4. Removing Duplicate Rows
Method A (Preferred – Review First):
1. Pick a column unlikely to repeat (e.g., Total Sales)
2. Home → Conditional Formatting → Highlight Cells
Rules → Duplicate Values
3. Inspect highlighted rows for true duplicates → delete
unwanted entries
Method B (Quick – No Preview):
1. Select full dataset
2. Data → Remove Duplicates
3. Uncheck all columns, then check only the key column
(e.g., Sales)
4. Click OK to delete duplicates immediately
5. Find & Replace for Text Corrections
1. Home → Find & Select → Replace
2. Find what: misspelling (e.g., “Larson”)
3. Replace with: correct form (e.g., “Larsson”)
4. Replace All to fix every instance at once
Key Takeaway:
Routine cleaning—spelling checks, purging blanks,
deduplicating, and targeted text correction—ensures high data
quality and reliable analysis outcomes.
Dealing with Inconsistencies in Data
1. Changing Text Case
UPPER(text): Converts to ALL UPPERCASE
LOWER(text): Converts to all lowercase
PROPER(text): Capitalizes first letter of each word
(“Proper Case”)
Workflow:
1. Insert helper column beside target column
2. Enter formula (=UPPER(A2), =LOWER(A2), or
=PROPER(A2))
3. Copy down via Fill Handle (double-click) or
Ctrl+Enter over selection
4. Copy helper column → target column using Paste
Values
5. Delete helper column
2. Fixing Date Formats
Issue: Imported dates in UK format (dd/mm/yyyy) → need
US format or custom style
Number Format Dialog (Home → Number → More
Number Formats):
1. Change Locale to English (United States)
2. Choose desired built-in format (e.g., “March 14,
2012”)
3. Or under Custom, modify code (e.g., d-mmm-yyyy,
mmmm d, yyyy)
Apply to entire column:
o Select column → choose new format, or
o Use Format Painter
3. Trimming Unwanted Whitespace
Double-Spaces in Text:
1. Select range → Home → Find & Select → Replace
2. Find what: two spaces ␣␣
3. Replace with: one space ␣
4. Click Replace or Replace All (use cautiously)
Leading/Trailing & Extra Inner Spaces:
1. Insert helper column
2. Use formula =TRIM(A2) to remove extra spaces
3. Copy helper → original via Paste Values
4. Delete helper column
Key Takeaway:
Helper-column formulas plus Paste Values let you standardize
text case, normalize date displays, and strip superfluous spaces
—critical steps for clean, consistent data.
More Excel Features for Cleaning Data
1. Flash Fill for Pattern-Based Transformations
Use Cases:
o Combine separate columns (e.g., First Name + Last
Name → Full Name)
o Reformat a single column (e.g., “John Smith” →
“Smith, John”)
Workflow:
1. Insert helper column next to source data
2. Manually enter the desired result for the first row
(define the pattern)
3. Begin typing the second row—Excel previews the
filled pattern
4. Press Enter to accept and fill the entire column
5. Copy → Paste Values over original column (if
desired)
6. Delete helper column
2. Text to Columns for Delimited Splits
Use Case: Break one column of multi-part text into
separate columns (e.g., full name → forename + surname)
Workflow:
1. Select the source column (e.g., A2:A23)
2. Data → Text to Columns → Delimited → Next
3. Choose delimiter(s) (e.g., Space) → Next
4. Set Destination to first output cell (e.g., B2) →
Finish
5. Remove or hide the original column if no longer
needed
3. Flash Fill Alternative in Excel Online (Functions)
When “Text to Columns” Is Unavailable
Extract Forename:
sql
CopyEdit
=LEFT(A2, FIND(" ", A2) - 1)
Extract Surname:
sql
CopyEdit
=RIGHT(A2, LEN(A2) - FIND(" ", A2))
Copy formulas down via Fill Handle
Key Takeaway:
Flash Fill offers quick, pattern-driven joins and reformatting;
Text to Columns handles reliable splitting on delimiters. For
environments without these tools, simple LEFT/RIGHT/FIND
formulas achieve the same results.
Viewpoints: Issues with Data Quality
1. Prevalence of Imperfect Data
Human-Entered Variability: Different users describe the
same thing inconsistently (e.g., “navy blue” vs. “dark
blue”)
Database Design Limits: Source systems often capture
data in formats not suited for analysis (e.g., combined
date&time vs. separate day/month/quarter fields)
2. Cleaning & Verification Effort
Time Investment: A large share of analysis work is spent
on data cleaning, integrity checks, and validation
Common Tasks: Standardizing text values, splitting or
reformatting fields, filling or excluding missing values
3. Impact on Business & Analysis
Misleading Results: Incomplete or misaligned financial
data (e.g., missing revenue, out-of-period costs) can skew
ratios and profitability assessments
Eroded Credibility: Frequent data errors force analysts
to backtrack, invite second-guessing, and distract from
insights
4. Best Practices for Maintaining Quality
1. Source Validation: When errors appear, trace back to
the original data source and correct at the origin (e.g.,
update general ledger entries)
2. Document Transformations: Use data-catalog or
lineage tools (e.g., Watson Knowledge Catalog) to log
every cleansing step for auditability
3. Attention to Detail: Early detection and resolution of
anomalies saves time and preserves trust—proactive
checks prevent repeated rework
Key Takeaway:
Clean, consistent data is the bedrock of reliable analysis—but it
rarely arrives that way. Expect to invest in thorough validation,
root-cause corrections, and transparent documentation to
uphold data integrity and stakeholder confidence.
Summary and Highlights
In this lesson, you have learned the following information:
It’s important to remove any duplicated or inaccurate
data, and it’s important to remove any empty rows in your
dataset.
There are several other types of data inconsistency that
you may need to resolve, in order to properly clean your
data:
1. Change the case of text
2. Fix date formatting errors
3. Trim whitepace from your data
You can use the Flash Fill and Text to Columns features in
Excel to manipulate and standardize your data, and
functions can also be used to help manipulate and
standardize your data.
MODULE 4
PART1: Data Analysis Basics, Filtering and Sorting Data
Intro to Analysing Data Using Spreadsheets
1. Plan Your Analysis
Visualize the desired output before you begin
Ask:
o How large is the dataset?
o What filters will isolate the needed records?
o How should the data be sorted?
o What calculations must be performed?
2. Sorting & Filtering Basics
Sorting:
o Sort alphabetically or numerically to group like
records (e.g., find duplicate Order IDs)
o Remove or flag duplicates once they’re adjacent
Filtering:
o Apply criteria (e.g., MONTH_ID = 11) to display only
relevant rows
o Combined with sorting, yields a focused view for
analysis
3. Using Functions for Calculations
Avoid long manual formulas:
go
CopyEdit
=B1+B2+B3+ … +B160 → error-prone
Built-in functions simplify work:
sql
CopyEdit
=AVERAGE(B1:B160) → quick average over a range
Categories include:
o Mathematical (SUM, PRODUCT)
o Statistical (AVERAGE, MEDIAN, COUNT)
o Logical (IF, AND, OR)
o Financial, Date & Time, Text, etc.
4. Converting Data to a Table
Benefits of Excel Tables:
o Automatic filter dropdowns on every header
o “Totals” row can show SUM, AVERAGE, COUNT, etc.,
dynamically
o Table auto-expands when adding new rows or
columns
o Banding for readability; headers stay visible on scroll
o Formulas autofill into new rows
5. PivotTables & PivotCharts
PivotTable:
o Drag-and-drop fields to Rows, Columns, Values, and
Filters
o Quickly summarize large datasets without writing
formulas
o Apply slicers or page filters (e.g., Month = October)
to instantly recalc
PivotChart:
o Visual companion to a PivotTable
o Updates automatically with PivotTable changes
o Great for presenting trends, comparisons, and
distributions
Key Takeaway:
By strategically sorting, filtering, leveraging functions, and
using Tables and PivotTables, you can transform raw data into
focused insights and engaging visualizations with minimal
manual effort.
Filtering and Sorting Data in Excel
1. Turning Filters On
Enable filters:
o Data → Filter (adds dropdown to every header)
o Or format range as a Table (filters added
automatically)
2. Applying AutoFilters
Single‐column filter:
o Click the filter arrow on a column header
o Check/uncheck items (e.g., Year = 2004, ProductLine
= Classic Cars, CustomerName = Mini Gifts…)
Multiple filters at once:
o Combine criteria across different columns to narrow
the view (e.g., Classic Cars + Mini Gifts + 2004)
Clearing filters:
o Per-column: Clear Filter from… in that header’s
menu
o All at once: Data → Clear
3. Custom (Advanced) Filters
Number Filters:
o Conditions like “Greater Than…”, “Less Than…” (e.g.,
Sales > 2000 shows only orders above $2,000)
Text Filters:
o Options such as “Begins With…”, “Contains…”, etc.
Note: Filtered-out rows remain hidden, not deleted (row
numbers will skip)
4. Basic Sorting
Single‐column sort:
o Select A cell in the target column
o Data → Sort A→Z / Z→A for text
o Data → Sort Smallest→Largest /
Largest→Smallest for numbers
o Data → Sort Oldest→Newest / Newest→Oldest for
dates
5. Multi‐Level Sorting
Data → Sort… to launch dialog
Sort by: first column & order (e.g., OrderDate
Oldest→Newest)
Add Level: second column & order (e.g., Sales
Largest→Smallest)
“My data has headers” must be checked if you’ve
header row
Result: primary sort key applied first, then secondary
within each group
Tip: Always select a cell in your data (or the entire table/range)
before sorting, so Excel knows the target region.
Viewpoints: Filtering and Sorting
Class Notes: Why Filter & Sort Your Data
Create a Focused View
o Filters let you narrow to a specific subset (e.g. one
product line, geography, or month) without wading
through irrelevant rows.
o Sorting (e.g. highest→lowest, A→Z, oldest→newest)
puts your key items front-and-center.
Enable Self-Service Analysis
o By providing filter controls and sensible sort orders,
end users can “slice and dice” the same worksheet
for their own questions.
Support Custom Orderings
o Beyond standard sorts, you can define a custom
sequence (e.g. show your top products or competitor
group first) to tell the story you need.
Drive Efficiency & Clarity
o Quickly drill down to the “heart” of the data—like
revenue for a given timeframe—without manually
scanning thousands of rows.
o Reduces noise and highlights only the rows that
matter for your analysis.
Improve Visualization & Decision-Making
o When charts or reports are tied to a filtered/sorted
table, they update dynamically—so stakeholders
always see exactly the slice of data they need.
Filtering and sorting are foundational skills for making large
datasets both manageable and meaningful.
Useful Functions for Data Analysis
1. The IF Function
Purpose: Return one value if a condition is TRUE, another
if FALSE.
Syntax: =IF(logical_test, value_if_true, value_if_false)
Examples:
o Mark shipped orders:
=IF(G2="shipped","Yes","No")
o Label big vs. small sales:
=IF(E2>3000,"Over 3K","Under 3K")
1.1 Nested IFs vs. IFS
Nested IFs: Chain multiple IF calls inside each other for
multi-way logic—but become hard to read/maintain.
arduino
CopyEdit
=IF(F2>=5000,"Large",
IF(F2>=3000,"Medium",
IF(F2>0,"Small","Unknown")))
IFS Function (Excel 2019+): Replace nested IFs with
cleaner syntax.
shell
CopyEdit
=IFS(
F2>=5000, "Large",
F2>=3000, "Medium",
F2>0, "Small")
2. COUNTIF & COUNTIFS
COUNTIF: Count cells in one range that meet a single
criterion.
o Syntax: =COUNTIF(range, criterion)
o Text criteria must be quoted:
=COUNTIF(T2:T100,"United Kingdom")
COUNTIFS (Excel 2019+): Apply multiple criteria across
one or more ranges.
3. SUMIF & SUMIFS
SUMIF: Sum values in one range where a corresponding
cell meets a single criterion.
o Syntax: =SUMIF(range, criterion, [sum_range])
o E.g., total sales over $3,000:
=SUMIF(E2:E100,">3000")
o E.g., sum “Classic Cars” sales:
=SUMIF(A2:A100,"*Cars",E2:E100)
SUMIFS (Excel 2019+): Sum with multiple criteria.
4. Applying Conditional Formatting with IF
1. Use IF to compute flags (e.g. “Good” vs. “Poor”).
2. Apply Conditional Formatting ▶︎Highlight Cells Rules
▶︎Text that Contains to color cells for each outcome.
Takeaways:
IF/IFS handle logical branching.
COUNTIF(S) & SUMIF(S) aggregate by criteria.
Combine these with Conditional Formatting to spotlight
key results.
Using the VLOOKUP and HLOOKUP Functions
1. VLOOKUP (Vertical Lookup)
Purpose: Find a value in the leftmost column of a table, then return a
corresponding value from another column.
Syntax:
pgsql
CopyEdit
=VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup])
o lookup_value: cell or value to search for (must appear in first
column of table).
o table_array: range containing lookup column + return columns.
o col_index_num: column number (within table_array) to return.
o [range_lookup]: FALSE (exact match) or TRUE (approximate).
Defaults to TRUE.
Key points:
o The lookup column must be the leftmost in table_array.
o Use absolute references ($A$2:$G$156) to lock the lookup range
when copying formulas.
o Change relative vs. absolute references as needed (e.g. $V5 so only
row adjusts).
2. HLOOKUP (Horizontal Lookup)
Purpose: Find a value in the top row of a table, then return a
corresponding value from another row.
Syntax:
pgsql
CopyEdit
=HLOOKUP(lookup_value, table_array, row_index_num, [range_lookup])
o row_index_num: row number (within table_array) from which to
return the value.
Usage: Far less common than VLOOKUP, since most data is organized
vertically.
3. Newer Alternative: XLOOKUP
Supported in: Excel for Microsoft 365, Excel Web, iPad, Android.
Advantages:
o Works vertically or horizontally.
o Uses separate lookup and return ranges (no col/row index
number).
o Defaults to exact match.
Takeaway:
Use VLOOKUP when your key is in the first column; use HLOOKUP when
your key is in the first row.
Lock your table references with absolute addressing to reliably copy
lookup formulas.
Consider XLOOKUP for more flexibility (any direction, separate ranges,
default exact match).
Summary and Highlights
In this lesson, you have learned the following information:
Before shaping your data, you need to visualize the final
output, and ask yourself the following questions:
How big is the dataset?
What type of filtering is required to find the necessary
information?
How should the data be sorted?
What type of calculations are needed?
There are several advantages to formatting your data as a
table:
Automatic calculations even when filtering
Column headings never disappear
Banded rows to make reading easier
Tables will automatically expand when adding new rows
The most basic way of shaping your data is to sort and filter it:
Sorting data helps you to organize it by a specified
criteria, such as numerically, alphabetically, or
chronologically.
Filtering our data makes it easier to control what data is
displayed and what is hidden, based on filtered fields.
Excel Functions:
Functions in Excel are arranged into
multiple categories; including mathematical, statistical,
logical, financial, and date and time-based.
Common functions for a data analyst include IF, IFS,
COUNTIF, SUMIF, VLOOKUP, HLOOKUP
PART 2: USING PIVOT TABLES
Introduction to Creating Pivot Tables in Excel
Why Pivot Tables?
Quickly summarize, analyze, and present data
Reveal patterns, trends, and comparisons
Dynamic: update automatically as source data changes
Preparing Your Data
1. Format as Table
o Organize data; auto-adds filters
o Auto-expands when you add new rows
2. Check Data Quality
o Single header row, no blanks
o Numeric fields truly numbers; dates truly dates
Creating a Pivot Table
1. Select any cell in your table
2. Insert → PivotTable
o Choose table/range (or table name)
o New worksheet recommended
3. Layout the Pivot via the field list:
o Rows: drag fields to group rows
o Columns: drag fields for column headers
o Values: drag numeric fields to aggregate
o Filters (optional): drag fields to filter entire pivot
Example: Car Sales Pivot
Rows: Manufacturer → Model
Values: Price (Sum), Unit Sales (Sum)
Format Price in Pivot: Value Field Settings → Number
Format → Currency
Calculated Fields
PivotTable Analyze → Fields, Items & Sets → Calculated
Field
Example: Total Model Sales = Price * Unit Sales
Formats and appears as new field in Values
Benefits of Pivot Tables
No manual formulas for summarizing large data sets
Easy to reconfigure (drag-and-drop)
Supports grouped, filtered, and calculated analyses
Viewpoints: Pivot Tables
Everyday Analysis & Aggregation
o Pivot tables let you sum, average, count and group
large data sets without writing queries or code.
o They handle anything from a few thousand records
up to Excel’s row limit.
Drill-Down & Granularity
o Easily group by dimensions like country, store type,
or product to view KPIs at multiple levels (e.g. gross
merchandise volume, take rate).
o In auditing, pivot tables quickly isolate high-value
invoices from hundreds of transactions.
Rapid Exploration & Prototyping
o Drag-and-drop interface provides an instant “big
picture” of your data fields and distributions.
o Great for initial data familiarization before building
more complex models or dashboards.
Versatility for Different Use Cases
o Used in e-commerce sales analysis, financial audits,
operational reporting, customer segmentation, and
more.
o Supports time-based analyses (by year, month, join-
date) as well as demographic and geographic
breakdowns.
Key Advantages
o Speed: Build summaries in seconds.
o Flexibility: Rearrange rows, columns, filters on the fly.
o Clarity: Turn raw tables into meaningful insights
without formulas.
o Drill-Up/Down: Expand or collapse groups to focus on
details or high-level trends.
Takeaway:
Pivot tables are an indispensable, code-free tool for quickly
transforming raw data into actionable, multi-dimensional
insights.
Pivot Table Features
1. Recommended Pivot Tables
o Automatically suggests layouts based on selected
data.
o Helps beginners explore useful summaries without
manual setup.
o You can still customize fields after inserting a
recommended layout.
2. Pivot Table Filters
o Each Row and Column label has a built-in filter icon.
o Clear individual filters via the drop-down or clear all
on the Data tab.
o Moving a field into the Filters area lets you apply
“classic” filters above the table.
3. Slicers
o Graphical, on-screen buttons to filter pivot data (e.g.
by Territory or Product Line).
o Show active filters at a glance and support multi-
select.
o Must click inside the pivot first, then Insert ➔ Slicer.
o Clear all slicer selections with the clear button on the
slicer itself.
4. Timelines
o Specialized date-filter slicer for pivot tables.
o Lets you filter by Day, Month, Quarter, or Year with a
draggable time bar.
o Insert via PivotTable Analyze ➔ Insert Timeline
(or Insert tab).
o Combine with slicers to multi-dimensional filter (e.g.
trains in EMEA during Q3 2003–Q2 2004).
5. Formatting Slicers & Timelines
o Each gets its own ribbon tab when selected.
o Change colors, columns (for slicers), and styles to
match your report theme.
6. Cleanup
o Remove any slicer or timeline by selecting it and
pressing Delete (or right-click ➔ Cut).
Key Takeaway:
Beyond basic row/column layouts, Recommended Pivot
Tables, Filters, Slicers, and Timelines turn Pivot Tables into
dynamic, user-friendly dashboards for fast, visual exploration of
large datasets.
Summary and Highlights
In this lesson, you have learned the following information:
Pivot Tables:
To obtain usable and presentable insights into your data
you need to use Pivot Tables.
Pivot tables provide a simple and quick way to summarize
and analyze data, to observe trends and patterns in your
data and to make comparisons of your data.
Pivot tables are dynamic, so as you change and add data
to the original dataset on which the pivot table is based,
the analysis and summary information changes too.
A Data Analyst can use pivot tables to draw useful and
relevant conclusions about, and create insights into, an
organization’s data in order to present those insights to
interested parties within the company.
Use this Pivot Table checklist to ensure your data is in a fit state
to make a Pivot Table:
Format your data as a table for best results.
Ensure column headings are correct, and there is only one
header row, as these column headings become the field
names in a Pivot Table.
Remove any blank rows and columns, and try to eliminate
blank cells also.
Ensure value fields are formatted as numbers, and not
text, and ensure date fields are formatted as dates, and
not text.
Arranging Pivot Tables with Filters and Recommended Tables:
You use the Pivot Table Fields pane to add and arrange
data fields in your pivot table.
Recommended Pivot Tables are a list of suggested
different combinations of data that could be used when
creating a Pivot Table, based on the data selected in the
worksheet.
Filters and Slicers:
Slicers are on-screen graphical filter objects that enable
you to filter your data using buttons, which makes it
easier to perform quick filtering of your pivot table data.
Timelines are another type of filter tool that enable you to
filter specifically on date-related data in your pivot table.
This is a much quicker and more effective way of
dynamically filtering by date, rather than having to create
and adjust filters on your date columns.