Introduction to Data Visualization
Definition:
Data visualization is the process of representing data in a visual format such as graphs, charts, or diagrams to make
it easier to understand.
Data Representation: Computers and smartphones store data like names and numbers in digital formats.
To extract useful insights from this data, we need proper visualization techniques.
Why Visualization Matters: Without clear representation, data may lose its value. Visualizing data helps
communicate key insights effectively.
Turning Data into Information: Data itself is raw and unorganized. By presenting it visually, we transform
it into meaningful information that is easier to interpret and analyze.
The Importance of Data Visualization:
Data visualization helps us understand data better than just looking at numbers in rows and
columns (like in an Excel sheet).
Example:
Imagine a scatter plot showing the relationship between body mass and maximum lifespan of
animals. The plot reveals a positive correlation, meaning animals with higher body mass tend to
live longer.
Advantages of Data Visualization:
Easier Understanding: Complex data becomes easier to interpret.
Spotting Trends and Outliers: Visuals make patterns, unusual data points, and audience
trends clearer.
Storytelling: Dashboards and animations help present data in an engaging way.
Interactive Exploration: Users can explore data by interacting with visual elements for
deeper insights.
Mr. Gopinath C B., Assistant Professor, Dept. of AI&DS, NCE, Hassan 1
Sample Graph (Scatter Plot Example):
Imagine a scatter plot with:
X-axis: Body Mass (kg)
Y-axis: Maximum Longevity (years)
Here, is a simple scatter plot representing body mass versus maximum longevity for four animals:
Elephant (Red) → Largest body mass with the highest longevity.
Horse (Green) → Moderate body mass and longevity.
Dog (Blue) → Smaller body mass with lower longevity.
Rabbit (Purple) → Smallest body mass with the shortest lifespan.
Figure: A simple example of data visualization.
Mr. Gopinath C B., Assistant Professor, Dept. of AI&DS, NCE, Hassan 2
Data Wrangling:
Data wrangling is the process of transforming raw data into a structured format suitable for
analysis. It ensures data is clean, organized, and meaningful for tasks such as visualization and
decision-making.
Example Scenario:
Data Wrangling Process to Measure Employee Engagement.
Step 1: Raw Data Collection
Data is gathered from various sources like:
Feedback surveys
Employee tenure records
Exit interviews
One-on-one meetings
This data is often unorganized and may contain inconsistencies, missing values, or errors.
Step 2: Data Cleaning and Import
The collected data is imported into tools like Pandas (Python) or Excel as a DataFrame.
Cleaning operations include:
Handling missing values
Removing duplicates
Correcting data types
Filtering unnecessary data
Step 3: Data Transformation
The cleaned data is transformed into meaningful visualizations such as:
Bar graphs — To compare employee engagement factors
Pie charts — To show percentage distributions
Line charts — To track engagement trends over time
Mr. Gopinath C B., Assistant Professor, Dept. of AI&DS, NCE, Hassan 3
Step 4: Analysis and Results
Insights are derived from the visualized data.
For instance, employee engagement may be evaluated based on:
Referrals
Faith in Leadership
Scope for Promotions
These insights help identify key areas for improvement.
Figure: Data wrangling process to measure employee engagement.
Mr. Gopinath C B., Assistant Professor, Dept. of AI&DS, NCE, Hassan 4
Tools and Libraries for Visualization:
Creating data visualizations can be done using both coding and non-coding tools. The choice of
tool depends on the complexity of the data and user preferences.
Non-Coding Tools:
Tableau: A powerful tool that allows users to visualize data without coding. Ideal for
business users who want to explore data quickly.
Coding Tools:
Python: The most widely used language for data visualization due to its simplicity,
flexibility, and extensive library support.
MATLAB: Often used in engineering and scientific fields for complex data analysis.
R: A preferred choice in statistical analysis and data science research.
Why Python Stands Out:
Python is the industry favorite because:
It is easy to learn and efficient for data manipulation.
Libraries like Matplotlib, Seaborn, and Plotly offer powerful visualization capabilities.
Python ensures faster development and is widely adopted across industries.
Mr. Gopinath C B., Assistant Professor, Dept. of AI&DS, NCE, Hassan 5