Pandas - Panel data System
May 06, 2025
📘 Introduction to Python Libraries – Pandas and Matplotlib
In Python, a library is a collection of modules that help you perform specific tasks without
writing all the code yourself. Libraries save time and effort by offering pre-built functions for
data handling, visualization, mathematics, and more. Two important libraries that are
frequently used in data science and analysis are Pandas and Matplotlib.
🔹 What is Pandas?
Pandas is a high performance Open Source Python library used for data analysis and data
manipulation. It was developed by Wes McKinney in 2008. It is especially useful when you
need to work with large volumes of structured data, such as rows and columns in a table,
similar to Excel. With Pandas, you can clean, organize, filter, sort, and analyze data efficiently.
It allows us to read data from various sources like CSV files, Excel files, and SQL databases.
Pandas introduces two main data structures:
1. Series – A one-dimensional labeled array (like a single column).
2. DataFrame – A two-dimensional labeled data structure (like a full table).
🔹 What is Matplotlib?
Matplotlib is another essential library used for data visualization. It helps you create a wide
range of graphs such as line graphs, bar charts, pie charts, histograms, and more. Visualization
is important because it makes data easier to understand and interpret. Rather than reading
numbers in a table, graphs provide a visual representation of trends and comparisons.
🗂️ Data Structures in Pandas
Pandas provides two key data structures:
1. Series
A Series is a one-dimensional array that holds data along with labels called index. You can
think of it as a single column of values, each value paired with a label. It is useful for
representing things like a list of marks, names, or prices.
Key Features of Series:
One-dimensional
Each value has a label (index)
Supports mathematical operations
Can contain integers, floats, strings, etc. - homogeneous data
Size immutable
Data mutable
Examples in real life:
List of student marks
Daily temperature readings
2. DataFrame
A DataFrame is a two-dimensional table-like data structure. It has rows and columns, and
each column can be considered a Series. Think of a DataFrame as an entire spreadsheet or
table, with multiple columns such as Name, Age, Marks, etc.
Key Features of DataFrame:
Two-dimensional (rows and columns)
Labeled axes (row index and column names)
Can store different data types in each column - heterogeneous data
Allows filtering, sorting, grouping, and more
Size mutable
Data mutable
Real-life examples:
Class report card (Name, Subject, Marks)
Employee database (Name, Salary, Department)
🔑 Key Differences: Series vs. DataFrame
Feature Series DataFrame
Dimension One-dimensional Two-dimensional
Structure Like a single column Like a complete table
Indexing Only one axis (row index) Two axes (row and column
labels)
Data Storage Stores a single list of values Stores multiple columns
Complexity Simpler, for basic data More complex, used for
structured data
✅ Key Points to Remember
Pandas is for handling and analyzing data. It helps in reading, cleaning, modifying, and
storing data.
Series is ideal for simple lists with labels, like test scores.
DataFrame is ideal when you need to represent data in rows and columns, such as a
student database.
Matplotlib is for drawing graphs and visualizing data.
Data visualization helps in understanding large data quickly and making decisions based
on trends and comparisons.
Pandas and Matplotlib are often used together in data science projects to first
clean/analyze data and then visualize it.
Pandas Series
🔷 1. Creating Series in Pandas
A Pandas Series is like a column of data with an index attached to every element. Unlike a
regular Python list or array, each value in a Series is associated with a label, making it more
powerful and flexible. The structure is similar to a dictionary or a one-dimensional table,
where each entry is stored with a key (index) and a value (data).
🔹 Ways to Create a Series:
1. From ndarray (NumPy array):
This is a quick way to create a Series with numerical data. If you don’t specify an index, it
automatically assigns one starting from 0.
Use this when you're dealing with arrays or numerical datasets.
1. From dictionary:
Each key becomes the index and each value becomes the data. This is especially useful
when you already have labeled data.
Ideal for labeled data like names and marks.
1. From scalar value:
A scalar is a single number or value. You can use this to fill a Series with the same value
across multiple indexes.
Good for initializing a Series with default values.
Why is this important?
In data analysis, we often need to label our data. A Series allows this while maintaining
performance and flexibility. It’s the foundation of Pandas and leads to understanding
DataFrames, which are built using Series.