0% found this document useful (0 votes)
3 views

Pandas vs SQL Concepts Updated

Uploaded by

nvinaysastry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Pandas vs SQL Concepts Updated

Uploaded by

nvinaysastry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Pandas vs SQL: Essential

Concepts for Beginners


A Quick, Practice-Oriented Guide
“Read Less, Practice More, Master Quickly
Merging and Joining
SQL:
SELECT * FROM Orders
INNER JOIN Customers
ON Orders.Customer_ID = Customers.Customer_ID;

Pandas:
merged_df = pd.merge(orders_df, customers_df, on='Customer_ID', how='inner')

Note: Practice all join types (inner, outer, left, right).


Outer Joins
SQL:
SELECT * FROM Customers
FULL OUTER JOIN Orders
ON Customers.Customer_ID = Orders.Customer_ID;

Pandas:
outer_join_df = pd.merge(customers_df, orders_df, on='Customer_ID',
how='outer')

Purpose: Use outer joins to include unmatched rows from both tables.
Unmatched Rows with Outer
Joins
SQL:
SELECT * FROM Customers

FULL OUTER JOIN Orders

ON Customers.Customer_ID = Orders.Customer_ID

WHERE Orders.Customer_ID IS NULL OR Customers.Customer_ID IS NULL;

Pandas:
unmatched_rows = outer_join_df[(outer_join_df['Order_ID'].isna()) |
(outer_join_df['Customer_Name'].isna())]

Why? Identify gaps in data to ensure completeness.


Basic Set
Operations(Union)
SQL:

SELECT DISTINCT Category FROM Orders WHERE Region = 'West'

UNION

SELECT DISTINCT Category FROM Orders WHERE Region = 'East';

Pandas:

west_categories = orders_df[orders_df['Region'] == 'West']['Category']

east_categories = orders_df[orders_df['Region'] == 'East']['Category']

union_categories = pd.concat([west_categories, east_categories]).drop_duplicates()


Find common categories (Intersect)
SQL:

SELECT DISTINCT Category FROM Orders WHERE Region = 'West'


INTERSECT
SELECT DISTINCT Category FROM Orders WHERE Region = 'East’;

Pandas:
intersect_categories = pd.merge(west_categories, east_categories, on='Category')
Categories exclusive to the West (DIFFERENCE):
SQL:
SELECT DISTINCT Category FROM Orders WHERE Region = 'West'
EXCEPT
SELECT DISTINCT Category FROM Orders WHERE Region = 'East';

Pandas:
difference_categories = west_categories[~west_categories.isin(east_categories)]

Practice All: UNION, INTERSECT, DIFFERENCE.


Slicing and Indexing
Filtering by a column: Subset specific columns:
SQL: SQL:
SELECT * FROM Orders WHERE Region = SELECT Order_ID, Sales FROM Orders;
'West';

Pandas: Pandas:
filtered_df = orders_df[orders_df['Region'] subset_df = orders_df[['Order_ID', 'Sales']]
== 'West']
Indexing Techniques
Explore Pandas loc/iloc for advanced indexing. Create a MultiIndex:

SQL: multi_index_df = orders_df.set_index(['Region',


'Category'])
SELECT * FROM Orders LIMIT 10 OFFSET 5;

Access data for a specific `Region` and


Pandas: `Category`:
sliced_df = orders_df.iloc[5:15] multi_index_df.loc[('West', 'Furniture’)]
Ranking and Row
Numbering
Partition by Region (SQL): SQL:
SQL: SELECT *, RANK() OVER (ORDER BY Sales DESC) AS
SELECT *, RANK() OVER (PARTITION BY Region rank

ORDER BY Sales DESC) AS rank FROM Orders;


FROM Orders;

Pandas:
Pandas:
orders_df['rank'] =
orders_df['rank'] = orders_df.groupby('Region') orders_df['Sales'].rank(ascending=False)
['Sales'].rank(ascending=False)
Ranking and Row
Numbering Dense Rank Example
Rank products based on their sales (highest first).
SQL: SQL: SELECT Product_ID, Sales, DENSE_RANK()
SELECT Product_ID, Sales, RANK() OVER OVER (ORDER BY Sales DESC) AS
(ORDER BY Sales DESC) AS RankFROM Orders; Dense_RankFROM Orders;

Pandas:
Pandas:
orders_df['Rank'] = orders_df['Dense_Rank'] =
orders_df['Sales'].rank(ascending=False, orders_df['Sales'].rank(ascending=False,
method='min')print(orders_df[['Product_ID', method='dense')print(orders_df[['Product_ID',
'Sales', 'Rank']]) 'Sales', 'Dense_Rank']])
Row Numbering
SQL :
SELECT Product_ID, Sales,ROW_NUMBER() OVER (ORDER BY Sales DESC) AS Row_Number FROM Orders;

Pandas:
orders_df['Row_Number'] = orders_df['Sales'].rank(ascending=False,
method='first')print(orders_df[['Product_ID', 'Sales', 'Row_Number’]])
Merging Deep Dive
1. Merge `Orders` with `Customers` using `INNER
JOIN`.
2. Experiment with mismatches using `LEFT JOIN`.
3. Use `OUTER JOIN` to identify customers with no
orders.

Why? Mastering joins is crucial for data integration.


Real Examples of Set
Operations
- Find common categories sold in West and East
regions (INTERSECT).
- Combine all categories across both regions
(UNION).
- Identify categories sold only in one region
(DIFFERENCE).

Exercise: Implement these with the Global


Superstore dataset in SQL and Pandas.
Advanced Indexing
Techniques
- Multi-level indexing: Index by `Region` and
`Category`.
- Combine WHERE clauses: Filter by `Ship Mode` and
`Profit`.

Why? Efficient data selection = faster analysis.


Ranking in Real Scenarios
- Rank customers by total `Sales`.
- Assign row numbers for the top 5 products by
category.
- Partition rankings by `Region` for localized insights.

Task: Replicate ranking in both tools.


The Power of Practice
1. Explore my previous PowerPoints for **data
filtering** and **aggregation**.
2. Practice concepts side-by-side in SQL and Pandas.
3. Use the Global Superstore dataset for meaningful
insights.

Tagline: "Read less, practice more – your expertise is


just a few queries away!

You might also like