Python Data Science Cheat Sheet
1. Importing Libraries
- import numpy as np
- import pandas as pd
- import matplotlib.pyplot as plt
- import seaborn as sns
- from sklearn.model_selection import train_test_split
- from sklearn.linear_model import LinearRegression
2. Numpy Basics
- np.array([1,2,3])
- np.zeros((2,2))
- np.ones((3,3))
- np.arange(0, 10, 2)
- np.linspace(0, 1, 5)
- np.mean(arr), np.median(arr), np.std(arr)
3. Pandas Basics
- pd.Series([1,2,3])
- pd.DataFrame(data)
- df.head(), df.tail()
- df.info(), df.describe()
- df['col'], df[['col1','col2']]
- df.loc[0], df.iloc[0]
4. Data Cleaning
- df.dropna(), df.fillna(value)
- df.replace(to_replace, value)
- df.rename(columns={'old':'new'})
- df.duplicated(), df.drop_duplicates()
5. Filtering & Sorting
- df[df['col'] > 10]
- df.sort_values(by='col', ascending=False)
6. GroupBy & Aggregation
- df.groupby('col').sum()
- df.groupby('col').agg({'col2':'mean'})
7. Merging & Joining
- pd.merge(df1, df2, on='col')
- pd.merge(df1, df2, on='col', how='left')
- df1.join(df2, on='key')
8. Visualization
- plt.plot(x, y)
- sns.barplot(x='col1', y='col2', data=df)
- df.hist(), df.plot(kind='box')
- plt.show()
9. Sklearn Basics
- train_test_split(X, y, test_size=0.2)
- model = LinearRegression()
- model.fit(X_train, y_train)
- pred = model.predict(X_test)
- model.score(X_test, y_test)