Pandas Vs SQL
Pandas Vs SQL
Pandas vs
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pandas
df
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Select specific
columns
Pandas
df[['col1', 'col2']]
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pandas
df[df['col1'] > 5]
SQL
SELECT *
FROM table
WHERE col1 > 5;
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pandas
SQL
SELECT *
FROM table
WHERE col1 > 5 AND
col2 < 10;
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Group by a column
and count
Pandas
df.groupby('col1').size()
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Group by a column
and sum another
column
Pandas
df.groupby('col1')['col2'].sum()
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Order by a column
Pandas
df.sort_values(by='col1')
SQL
SELECT *
FROM table
ORDER BY col1;
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Order by a column
descending
Pandas
df.sort_values(by='col1',
ascending=False)
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pandas
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pandas
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pandas
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pandas
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pandas
df['col1'].mean()
SQL
SELECT AVG(col1)
FROM table;
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pandas
df['col1'].sum()
SQL
SELECT SUM(col1)
FROM table;
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Calculate the
maximum of a column
Pandas
df['col1'].max()
SQL
SELECT MAX(col1)
FROM table;
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Calculate the
minimum of a column
Pandas
df['col1'].min()
SQL
SELECT MIN(col1)
FROM table;
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pandas
df['col1'].nunique()
SQL
SELECT
COUNT(DISTINCT col1)
FROM table;
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Rename a column
Pandas
df.rename(columns=
{'old_name': 'new_name'})
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pandas
df['new_col'] = value
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Drop a column
Pandas
df.drop(columns=['col1'])
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pandas
df['col1'].fillna(0)
SQL
SELECT
COALESCE(col1, 0)
FROM table;
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Calculate the
difference between
two columns
Pandas
df['col1'] - df['col2']
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Concatenate two
columns
Pandas
df['col1'] + df['col2']
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pandas
df['date_col'].dt.year
SQL
SELECT YEAR(date_col)
FROM table;
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pandas
df['date_col'].dt.month
SQL
SELECT
MONTH(date_col) FROM
table;
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pandas
df['date_col'].dt.day
SQL
SELECT DAY(date_col)
FROM table;
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pandas
df[df['col1'].str.contains('pattern')]
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Aggregate functions
with group by
Pandas
df.groupby('col1')['col2'].mean()
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pivot table
Pandas
df.pivot_table(values='col1',
index='col2', columns='val1')
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Unpivot table
Pandas
df.melt(id_vars=['id'],
value_vars=['col1', 'col2'])
SQL
Calculate cumulative
sum
Pandas
df['col2'].cumsum()
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Calculate moving
average
Pandas
df['col2'].rolling(window=3).mean()
SQL
df.loc[df['col1'].idxmax()]
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
df.loc[df['col1'].idxmin()]
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pandas
df.drop_duplicates()
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pandas
df['col1'].str.len()
SQL
SELECT LENGTH(col1)
FROM table;
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Convert string to
uppercase
Pandas
df['col1'].str.upper()
SQL
SELECT UPPER(col1)
FROM table;
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Convert string to
lowercase
Pandas
df['col1'].str.lower()
SQL
SELECT LOWER(col1)
FROM table;
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pandas
df['col1'].str.strip()
SQL
SELECT TRIM(col1)
FROM table;
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Replace a substring
Pandas
df['col1'].str.replace('old',
'new')
SQL
SELECT REPLACE(col1,
'old', 'new')
FROM table;
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pandas
df[df['col1'].isnull()]
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pandas
df[df['col1'].notnull()]
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Conditional column
based on other
columns
Pandas
df['col3'] =
df['col2'].apply(lambda x:
'High' if x > 10 else 'Low')
SQL
Pandas
df.dropna()
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Pandas
df['col3'] = df['col1'] +
df['col2']
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Calculate percentage
of a column
Pandas
df['percentage'] =
df['col1'] /
df['col1'].sum() * 100
SQL
Pandas
SQL
Shwetank Singh
GritSetGrow - GSGLearn.com
Data Engineering 101: SQL vs Pandas
Update a row
Pandas
SQL
UPDATE table
SET col1 = val1 WHERE
col2 = val2;
Shwetank Singh
GritSetGrow - GSGLearn.com