Pandas vs SQL Concepts Updated
Pandas vs SQL Concepts Updated
Pandas:
merged_df = pd.merge(orders_df, customers_df, on='Customer_ID', how='inner')
Pandas:
outer_join_df = pd.merge(customers_df, orders_df, on='Customer_ID',
how='outer')
Purpose: Use outer joins to include unmatched rows from both tables.
Unmatched Rows with Outer
Joins
SQL:
SELECT * FROM Customers
ON Customers.Customer_ID = Orders.Customer_ID
Pandas:
unmatched_rows = outer_join_df[(outer_join_df['Order_ID'].isna()) |
(outer_join_df['Customer_Name'].isna())]
UNION
Pandas:
Pandas:
intersect_categories = pd.merge(west_categories, east_categories, on='Category')
Categories exclusive to the West (DIFFERENCE):
SQL:
SELECT DISTINCT Category FROM Orders WHERE Region = 'West'
EXCEPT
SELECT DISTINCT Category FROM Orders WHERE Region = 'East';
Pandas:
difference_categories = west_categories[~west_categories.isin(east_categories)]
Pandas: Pandas:
filtered_df = orders_df[orders_df['Region'] subset_df = orders_df[['Order_ID', 'Sales']]
== 'West']
Indexing Techniques
Explore Pandas loc/iloc for advanced indexing. Create a MultiIndex:
Pandas:
Pandas:
orders_df['rank'] =
orders_df['rank'] = orders_df.groupby('Region') orders_df['Sales'].rank(ascending=False)
['Sales'].rank(ascending=False)
Ranking and Row
Numbering Dense Rank Example
Rank products based on their sales (highest first).
SQL: SQL: SELECT Product_ID, Sales, DENSE_RANK()
SELECT Product_ID, Sales, RANK() OVER OVER (ORDER BY Sales DESC) AS
(ORDER BY Sales DESC) AS RankFROM Orders; Dense_RankFROM Orders;
Pandas:
Pandas:
orders_df['Rank'] = orders_df['Dense_Rank'] =
orders_df['Sales'].rank(ascending=False, orders_df['Sales'].rank(ascending=False,
method='min')print(orders_df[['Product_ID', method='dense')print(orders_df[['Product_ID',
'Sales', 'Rank']]) 'Sales', 'Dense_Rank']])
Row Numbering
SQL :
SELECT Product_ID, Sales,ROW_NUMBER() OVER (ORDER BY Sales DESC) AS Row_Number FROM Orders;
Pandas:
orders_df['Row_Number'] = orders_df['Sales'].rank(ascending=False,
method='first')print(orders_df[['Product_ID', 'Sales', 'Row_Number’]])
Merging Deep Dive
1. Merge `Orders` with `Customers` using `INNER
JOIN`.
2. Experiment with mismatches using `LEFT JOIN`.
3. Use `OUTER JOIN` to identify customers with no
orders.