Pandas vs SQL: Essential
Concepts for Beginners
A Quick, Practice-Oriented Guide
Less read, Practice More, Master Quickly
Merging and Joining
SQL:
SELECT * FROM Orders
INNER JOIN Customers
ON Orders.Customer_ID = Customers.Customer_ID;
Pandas:
merged_df = pd.merge(orders_df, customers_df, on='Customer_ID', how='inner')
Note: Practice all join types (inner, outer, left, right).
Outer Joins
SQL: SQL
SELECT * FROM Customers
FULL OUTER JOIN Orders SELECT Orders.Order_ID, Orders.Customer_ID,
ON Customers.Customer_ID = Customers.Customer_NameFROM Orders LEFT[RIGHT]
Orders.Customer_ID; JOIN CustomersON Orders.Customer_ID =
Customers.Customer_ID;
Pandas:
Pandas:
outer_join_df = pd.merge(customers_df, Join_df = pd.merge(orders_df, customers_df,
orders_df, on='Customer_ID', how='outer') on='Customer_ID', how='right’[‘left’])
print(right_join)
Purpose: Use outer joins to include unmatched
rows from both tables.
Unmatched Rows with Outer
Joins
SQL:
SELECT * FROM Customers
FULL OUTER JOIN Orders
ON Customers.Customer_ID = Orders.Customer_ID
WHERE Orders.Customer_ID IS NULL OR Customers.Customer_ID IS NULL;
Pandas:
unmatched_rows = outer_join_df[(outer_join_df['Order_ID'].isna()) | (outer_join_df['Customer_Name'].isna())]
Why? Identify gaps in data to ensure completeness.
Basic Set
Operations(Union)
SQL:
SELECT DISTINCT Category FROM Orders WHERE Region = 'West'
UNION
SELECT DISTINCT Category FROM Orders WHERE Region = 'East';
Pandas:
west_categories = orders_df[orders_df['Region'] == 'Western US']['Category']
east_categories = orders_df[orders_df['Region'] == 'Eastern US']['Category']
union_categories = pd.concat([west_categories, east_categories]).drop_duplicates()
Find common categories (Intersect)
SQL:
SELECT DISTINCT Category FROM Orders WHERE Region = 'West'
INTERSECT
SELECT DISTINCT Category FROM Orders WHERE Region = 'East’;
Pandas:
intersect_categories = pd.merge(west_categories, east_categories, on='Category')
Categories exclusive (DIFFERENCE):
SQL:
SELECT DISTINCT Category FROM Orders WHERE Region = 'West'
EXCEPT
SELECT DISTINCT Category FROM Orders WHERE Region = 'East';
Pandas:
difference_categories = west_categories[~west_categories.isin(east_categories)]
Practice All: UNION, INTERSECT, DIFFERENCE.
Slicing and Indexing
Filtering by a column: Subset specific columns:
SQL: SQL:
SELECT * FROM Orders WHERE Region = SELECT Order_ID, Sales FROM Orders;
‘Central US';
Pandas: Pandas:
filtered_df = orders_df[orders_df['Region'] subset_df = orders_df[['Order_ID', 'Sales']]
== ‘Central US'']
Indexing Techniques
Explore Pandas loc/iloc for advanced indexing. Pandas:
SQL: Create a MultiIndex:
SELECT * FROM Orders LIMIT 10 OFFSET 5; multi_index_df = orders_df.set_index(['Region',
'Category'])
Pandas:
Access data for a specific `Region` and
sliced_df = orders_df.iloc[5:15] `Category`:
multi_index_df.loc[('West', 'Furniture’)]
Ranking and Row
Numbering
Partition by Region (SQL): SQL:
SQL: SELECT *, RANK() OVER (ORDER BY Sales DESC) AS
SELECT *, RANK() OVER (PARTITION BY Region rank
ORDER BY Sales DESC) AS rank FROM Orders;
FROM Orders;
Pandas:
Pandas:
orders_df['rank'] =
orders_df['rank'] = orders_df.groupby('Region') orders_df['Sales'].rank(ascending=False)
['Sales'].rank(ascending=False)
Ranking and Row
Numbering Dense Rank Example
Rank products based on their sales (highest first).
SQL: SQL: SELECT Product_ID, Sales, DENSE_RANK()
SELECT Product_ID, Sales, RANK() OVER OVER (ORDER BY Sales DESC) AS
(ORDER BY Sales DESC) AS RankFROM Orders; Dense_RankFROM Orders;
Pandas:
Pandas:
orders_df['Rank'] = orders_df['Dense_Rank'] =
orders_df['Sales'].rank(ascending=False, orders_df['Sales'].rank(ascending=False,
method='min’) method='dense’)
print(orders_df[['Product_ID', 'Sales', 'Rank']])
print(orders_df[['Product_ID', 'Sales',
'Dense_Rank']])
Row Numbering
SQL :
SELECT Product_ID, Sales,ROW_NUMBER() OVER (ORDER BY Sales DESC) AS Row_Number FROM Orders;
Pandas:
orders_df['Row_Number'] = orders_df['Sales'].rank(ascending=False, method='first’)
print(orders_df[['Product_ID', 'Sales', 'Row_Number’]])
Key Differences to Highlight:
•Rank: Skips ranks if there are ties.
•Dense Rank: No gaps in rank sequence, even with ties.
•Row Number: Provides a unique number for every row, even with ties.
The Power of Practice
1. Explore the previous PowerPoints for **data filtering** and
**aggregation**.
2. Practice concepts side-by-side in SQL and Pandas.
3. Use the Global Superstore dataset for meaningful insights.
4. Dataset link https://github.com/codewithVnsastry/learnNdo
"Less read, practice more – your expertise is just a few queries away!