0% found this document useful (0 votes)
10 views

Pandas vs SQL Concepts Final

Uploaded by

nvinaysastry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Pandas vs SQL Concepts Final

Uploaded by

nvinaysastry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Pandas vs SQL: Essential

Concepts for Beginners


A Quick, Practice-Oriented Guide
Less read, Practice More, Master Quickly
Merging and Joining
SQL:
SELECT * FROM Orders
INNER JOIN Customers
ON Orders.Customer_ID = Customers.Customer_ID;

Pandas:
merged_df = pd.merge(orders_df, customers_df, on='Customer_ID', how='inner')

Note: Practice all join types (inner, outer, left, right).


Outer Joins
SQL: SQL
SELECT * FROM Customers

FULL OUTER JOIN Orders SELECT Orders.Order_ID, Orders.Customer_ID,

ON Customers.Customer_ID = Customers.Customer_NameFROM Orders LEFT[RIGHT]


Orders.Customer_ID; JOIN CustomersON Orders.Customer_ID =
Customers.Customer_ID;
Pandas:
Pandas:

outer_join_df = pd.merge(customers_df, Join_df = pd.merge(orders_df, customers_df,


orders_df, on='Customer_ID', how='outer') on='Customer_ID', how='right’[‘left’])
print(right_join)
Purpose: Use outer joins to include unmatched
rows from both tables.
Unmatched Rows with Outer
Joins
SQL:

SELECT * FROM Customers

FULL OUTER JOIN Orders

ON Customers.Customer_ID = Orders.Customer_ID

WHERE Orders.Customer_ID IS NULL OR Customers.Customer_ID IS NULL;

Pandas:

unmatched_rows = outer_join_df[(outer_join_df['Order_ID'].isna()) | (outer_join_df['Customer_Name'].isna())]

Why? Identify gaps in data to ensure completeness.


Basic Set
Operations(Union)
SQL:

SELECT DISTINCT Category FROM Orders WHERE Region = 'West'

UNION

SELECT DISTINCT Category FROM Orders WHERE Region = 'East';

Pandas:

west_categories = orders_df[orders_df['Region'] == 'Western US']['Category']

east_categories = orders_df[orders_df['Region'] == 'Eastern US']['Category']

union_categories = pd.concat([west_categories, east_categories]).drop_duplicates()


Find common categories (Intersect)
SQL:

SELECT DISTINCT Category FROM Orders WHERE Region = 'West'


INTERSECT
SELECT DISTINCT Category FROM Orders WHERE Region = 'East’;

Pandas:
intersect_categories = pd.merge(west_categories, east_categories, on='Category')
Categories exclusive (DIFFERENCE):
SQL:
SELECT DISTINCT Category FROM Orders WHERE Region = 'West'
EXCEPT
SELECT DISTINCT Category FROM Orders WHERE Region = 'East';

Pandas:
difference_categories = west_categories[~west_categories.isin(east_categories)]

Practice All: UNION, INTERSECT, DIFFERENCE.


Slicing and Indexing
Filtering by a column: Subset specific columns:
SQL: SQL:
SELECT * FROM Orders WHERE Region = SELECT Order_ID, Sales FROM Orders;
‘Central US';

Pandas: Pandas:
filtered_df = orders_df[orders_df['Region'] subset_df = orders_df[['Order_ID', 'Sales']]
== ‘Central US'']
Indexing Techniques
Explore Pandas loc/iloc for advanced indexing. Pandas:

SQL: Create a MultiIndex:

SELECT * FROM Orders LIMIT 10 OFFSET 5; multi_index_df = orders_df.set_index(['Region',


'Category'])

Pandas:
Access data for a specific `Region` and
sliced_df = orders_df.iloc[5:15] `Category`:

multi_index_df.loc[('West', 'Furniture’)]
Ranking and Row
Numbering
Partition by Region (SQL): SQL:
SQL: SELECT *, RANK() OVER (ORDER BY Sales DESC) AS
SELECT *, RANK() OVER (PARTITION BY Region rank

ORDER BY Sales DESC) AS rank FROM Orders;


FROM Orders;

Pandas:
Pandas:
orders_df['rank'] =
orders_df['rank'] = orders_df.groupby('Region') orders_df['Sales'].rank(ascending=False)
['Sales'].rank(ascending=False)
Ranking and Row
Numbering Dense Rank Example
Rank products based on their sales (highest first).
SQL: SQL: SELECT Product_ID, Sales, DENSE_RANK()
SELECT Product_ID, Sales, RANK() OVER OVER (ORDER BY Sales DESC) AS
(ORDER BY Sales DESC) AS RankFROM Orders; Dense_RankFROM Orders;

Pandas:
Pandas:
orders_df['Rank'] = orders_df['Dense_Rank'] =
orders_df['Sales'].rank(ascending=False, orders_df['Sales'].rank(ascending=False,
method='min’) method='dense’)
print(orders_df[['Product_ID', 'Sales', 'Rank']])
print(orders_df[['Product_ID', 'Sales',
'Dense_Rank']])
Row Numbering
SQL :
SELECT Product_ID, Sales,ROW_NUMBER() OVER (ORDER BY Sales DESC) AS Row_Number FROM Orders;

Pandas:
orders_df['Row_Number'] = orders_df['Sales'].rank(ascending=False, method='first’)
print(orders_df[['Product_ID', 'Sales', 'Row_Number’]])

Key Differences to Highlight:


•Rank: Skips ranks if there are ties.
•Dense Rank: No gaps in rank sequence, even with ties.
•Row Number: Provides a unique number for every row, even with ties.
The Power of Practice
1. Explore the previous PowerPoints for **data filtering** and
**aggregation**.
2. Practice concepts side-by-side in SQL and Pandas.
3. Use the Global Superstore dataset for meaningful insights.
4. Dataset link https://github.com/codewithVnsastry/learnNdo
"Less read, practice more – your expertise is just a few queries away!

You might also like