Pandas vs SQL Concepts Final
Pandas vs SQL Concepts Final
Pandas:
merged_df = pd.merge(orders_df, customers_df, on='Customer_ID', how='inner')
ON Customers.Customer_ID = Orders.Customer_ID
Pandas:
UNION
Pandas:
Pandas:
intersect_categories = pd.merge(west_categories, east_categories, on='Category')
Categories exclusive (DIFFERENCE):
SQL:
SELECT DISTINCT Category FROM Orders WHERE Region = 'West'
EXCEPT
SELECT DISTINCT Category FROM Orders WHERE Region = 'East';
Pandas:
difference_categories = west_categories[~west_categories.isin(east_categories)]
Pandas: Pandas:
filtered_df = orders_df[orders_df['Region'] subset_df = orders_df[['Order_ID', 'Sales']]
== ‘Central US'']
Indexing Techniques
Explore Pandas loc/iloc for advanced indexing. Pandas:
Pandas:
Access data for a specific `Region` and
sliced_df = orders_df.iloc[5:15] `Category`:
multi_index_df.loc[('West', 'Furniture’)]
Ranking and Row
Numbering
Partition by Region (SQL): SQL:
SQL: SELECT *, RANK() OVER (ORDER BY Sales DESC) AS
SELECT *, RANK() OVER (PARTITION BY Region rank
Pandas:
Pandas:
orders_df['rank'] =
orders_df['rank'] = orders_df.groupby('Region') orders_df['Sales'].rank(ascending=False)
['Sales'].rank(ascending=False)
Ranking and Row
Numbering Dense Rank Example
Rank products based on their sales (highest first).
SQL: SQL: SELECT Product_ID, Sales, DENSE_RANK()
SELECT Product_ID, Sales, RANK() OVER OVER (ORDER BY Sales DESC) AS
(ORDER BY Sales DESC) AS RankFROM Orders; Dense_RankFROM Orders;
Pandas:
Pandas:
orders_df['Rank'] = orders_df['Dense_Rank'] =
orders_df['Sales'].rank(ascending=False, orders_df['Sales'].rank(ascending=False,
method='min’) method='dense’)
print(orders_df[['Product_ID', 'Sales', 'Rank']])
print(orders_df[['Product_ID', 'Sales',
'Dense_Rank']])
Row Numbering
SQL :
SELECT Product_ID, Sales,ROW_NUMBER() OVER (ORDER BY Sales DESC) AS Row_Number FROM Orders;
Pandas:
orders_df['Row_Number'] = orders_df['Sales'].rank(ascending=False, method='first’)
print(orders_df[['Product_ID', 'Sales', 'Row_Number’]])