Python Cheatsheet for Data Scientists
Core Python for Data Science
x = 10 # int
y = 3.14 # float
name = "AI" # str
flag = True # bool
lst = [1, 2, 3]
tpl = (1, 2, 3)
dct = {"a": 1, "b": 2}
st = {1, 2, 3}
squares = [x**2 for x in range(10)]
def square(x): return x**2
f = lambda x: x**2
NumPy
import numpy as np
a = np.array([1, 2, 3])
b = np.zeros((2, 3))
c = np.ones(5)
d = np.eye(3)
e = np.linspace(0, 1, 5)
a.mean(), a.std(), a.sum()
a.reshape(3, 1)
np.dot(a, a)
Pandas
import pandas as pd
df = pd.read_csv("data.csv")
df.head(), df.info(), df.describe()
df["col"], df[["col1", "col2"]]
df[df["col"] > 5]
df.groupby("group_col").mean()
df.isnull().sum()
df.fillna(0), df.dropna()
Matplotlib & Seaborn
Python Cheatsheet for Data Scientists
import matplotlib.pyplot as plt
import seaborn as sns
plt.plot([1,2,3], [4,5,6])
plt.hist([1,2,2,3])
plt.show()
sns.boxplot(x="col", data=df)
sns.heatmap(df.corr(), annot=True)
Scikit-learn (ML Basics)
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
X = df[["feature1", "feature2"]]
y = df["target"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LinearRegression()
model.fit(X_train, y_train)
preds = model.predict(X_test)
mse = mean_squared_error(y_test, preds)
Common Data Science Tasks
pd.get_dummies(df["category"])
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier().fit(X, y)
importances = rf.feature_importances_
Bonus: Libraries to Know
- numpy, pandas: Data handling
- matplotlib, seaborn, plotly: Visualization
- scikit-learn: Machine learning
- xgboost, lightgbm: Gradient boosting
- statsmodels: Statistical modeling
- tensorflow, pytorch: Deep learning