0% found this document useful (0 votes)
2 views2 pages

Python Data Science Cheat Sheet

This document is a Python Data Science Cheat Sheet that covers essential libraries and basic operations in data science using Python. It includes sections on importing libraries, Numpy and Pandas basics, data cleaning, filtering, sorting, group by and aggregation, merging, visualization, and Scikit-learn basics. Each section provides code snippets for common tasks and functions used in data analysis and machine learning.

Uploaded by

junkiepunkiegogo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views2 pages

Python Data Science Cheat Sheet

This document is a Python Data Science Cheat Sheet that covers essential libraries and basic operations in data science using Python. It includes sections on importing libraries, Numpy and Pandas basics, data cleaning, filtering, sorting, group by and aggregation, merging, visualization, and Scikit-learn basics. Each section provides code snippets for common tasks and functions used in data analysis and machine learning.

Uploaded by

junkiepunkiegogo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Python Data Science Cheat Sheet

1. Importing Libraries
- import numpy as np

- import pandas as pd

- import matplotlib.pyplot as plt

- import seaborn as sns

- from sklearn.model_selection import train_test_split

- from sklearn.linear_model import LinearRegression

2. Numpy Basics
- np.array([1,2,3])

- np.zeros((2,2))

- np.ones((3,3))

- np.arange(0, 10, 2)

- np.linspace(0, 1, 5)

- np.mean(arr), np.median(arr), np.std(arr)

3. Pandas Basics
- pd.Series([1,2,3])

- pd.DataFrame(data)

- df.head(), df.tail()

- df.info(), df.describe()

- df['col'], df[['col1','col2']]

- df.loc[0], df.iloc[0]

4. Data Cleaning
- df.dropna(), df.fillna(value)

- df.replace(to_replace, value)

- df.rename(columns={'old':'new'})

- df.duplicated(), df.drop_duplicates()

5. Filtering & Sorting


- df[df['col'] > 10]
- df.sort_values(by='col', ascending=False)

6. GroupBy & Aggregation


- df.groupby('col').sum()

- df.groupby('col').agg({'col2':'mean'})

7. Merging & Joining


- pd.merge(df1, df2, on='col')

- pd.merge(df1, df2, on='col', how='left')

- df1.join(df2, on='key')

8. Visualization
- plt.plot(x, y)

- sns.barplot(x='col1', y='col2', data=df)

- df.hist(), df.plot(kind='box')

- plt.show()

9. Sklearn Basics
- train_test_split(X, y, test_size=0.2)

- model = LinearRegression()

- model.fit(X_train, y_train)

- pred = model.predict(X_test)

- model.score(X_test, y_test)

You might also like