Pandas vs SQL: Essential
Concepts for Beginners
A Quick, Practice-Oriented Guide
“Read Less, Practice More, Master Quickly
Merging and Joining
SQL:
SELECT * FROM Orders
INNER JOIN Customers
ON Orders.Customer_ID = Customers.Customer_ID;
Pandas:
merged_df = pd.merge(orders_df, customers_df, on='Customer_ID', how='inner')
Note: Practice all join types (inner, outer, left, right).
Outer Joins
SQL:
SELECT * FROM Customers
FULL OUTER JOIN Orders
ON Customers.Customer_ID = Orders.Customer_ID;
Pandas:
outer_join_df = pd.merge(customers_df, orders_df, on='Customer_ID',
how='outer')
Purpose: Use outer joins to include unmatched rows from both tables.
Unmatched Rows with Outer
Joins
SQL:
SELECT * FROM Customers
FULL OUTER JOIN Orders
ON Customers.Customer_ID = Orders.Customer_ID
WHERE Orders.Customer_ID IS NULL OR Customers.Customer_ID IS NULL;
Pandas:
unmatched_rows = outer_join_df[(outer_join_df['Order_ID'].isna()) |
(outer_join_df['Customer_Name'].isna())]
Why? Identify gaps in data to ensure completeness.
Basic Set
Operations(Union)
SQL:
SELECT DISTINCT Category FROM Orders WHERE Region = 'West'
UNION
SELECT DISTINCT Category FROM Orders WHERE Region = 'East';
Pandas:
west_categories = orders_df[orders_df['Region'] == 'West']['Category']
east_categories = orders_df[orders_df['Region'] == 'East']['Category']
union_categories = pd.concat([west_categories, east_categories]).drop_duplicates()
Find common categories (Intersect)
SQL:
SELECT DISTINCT Category FROM Orders WHERE Region = 'West'
INTERSECT
SELECT DISTINCT Category FROM Orders WHERE Region = 'East’;
Pandas:
intersect_categories = pd.merge(west_categories, east_categories, on='Category')
Categories exclusive to the West (DIFFERENCE):
SQL:
SELECT DISTINCT Category FROM Orders WHERE Region = 'West'
EXCEPT
SELECT DISTINCT Category FROM Orders WHERE Region = 'East';
Pandas:
difference_categories = west_categories[~west_categories.isin(east_categories)]
Practice All: UNION, INTERSECT, DIFFERENCE.
Slicing and Indexing
Filtering by a column: Subset specific columns:
SQL: SQL:
SELECT * FROM Orders WHERE Region = SELECT Order_ID, Sales FROM Orders;
'West';
Pandas: Pandas:
filtered_df = orders_df[orders_df['Region'] subset_df = orders_df[['Order_ID', 'Sales']]
== 'West']
Indexing Techniques
Explore Pandas loc/iloc for advanced indexing. Create a MultiIndex:
SQL: multi_index_df = orders_df.set_index(['Region',
'Category'])
SELECT * FROM Orders LIMIT 10 OFFSET 5;
Access data for a specific `Region` and
Pandas: `Category`:
sliced_df = orders_df.iloc[5:15] multi_index_df.loc[('West', 'Furniture’)]
Ranking and Row
Numbering
Partition by Region (SQL): SQL:
SQL: SELECT *, RANK() OVER (ORDER BY Sales DESC) AS
SELECT *, RANK() OVER (PARTITION BY Region rank
ORDER BY Sales DESC) AS rank FROM Orders;
FROM Orders;
Pandas:
Pandas:
orders_df['rank'] =
orders_df['rank'] = orders_df.groupby('Region') orders_df['Sales'].rank(ascending=False)
['Sales'].rank(ascending=False)
Ranking and Row
Numbering Dense Rank Example
Rank products based on their sales (highest first).
SQL: SQL: SELECT Product_ID, Sales, DENSE_RANK()
SELECT Product_ID, Sales, RANK() OVER OVER (ORDER BY Sales DESC) AS
(ORDER BY Sales DESC) AS RankFROM Orders; Dense_RankFROM Orders;
Pandas:
Pandas:
orders_df['Rank'] = orders_df['Dense_Rank'] =
orders_df['Sales'].rank(ascending=False, orders_df['Sales'].rank(ascending=False,
method='min')print(orders_df[['Product_ID', method='dense')print(orders_df[['Product_ID',
'Sales', 'Rank']]) 'Sales', 'Dense_Rank']])
Row Numbering
SQL :
SELECT Product_ID, Sales,ROW_NUMBER() OVER (ORDER BY Sales DESC) AS Row_Number FROM Orders;
Pandas:
orders_df['Row_Number'] = orders_df['Sales'].rank(ascending=False,
method='first')print(orders_df[['Product_ID', 'Sales', 'Row_Number’]])
Merging Deep Dive
1. Merge `Orders` with `Customers` using `INNER
JOIN`.
2. Experiment with mismatches using `LEFT JOIN`.
3. Use `OUTER JOIN` to identify customers with no
orders.
Why? Mastering joins is crucial for data integration.
Real Examples of Set
Operations
- Find common categories sold in West and East
regions (INTERSECT).
- Combine all categories across both regions
(UNION).
- Identify categories sold only in one region
(DIFFERENCE).
Exercise: Implement these with the Global
Superstore dataset in SQL and Pandas.
Advanced Indexing
Techniques
- Multi-level indexing: Index by `Region` and
`Category`.
- Combine WHERE clauses: Filter by `Ship Mode` and
`Profit`.
Why? Efficient data selection = faster analysis.
Ranking in Real Scenarios
- Rank customers by total `Sales`.
- Assign row numbers for the top 5 products by
category.
- Partition rankings by `Region` for localized insights.
Task: Replicate ranking in both tools.
The Power of Practice
1. Explore my previous PowerPoints for **data
filtering** and **aggregation**.
2. Practice concepts side-by-side in SQL and Pandas.
3. Use the Global Superstore dataset for meaningful
insights.
Tagline: "Read less, practice more – your expertise is
just a few queries away!