0% found this document useful (0 votes)
4 views

Pandas Notes

The document outlines essential Pandas functions for data manipulation, emphasizing their importance for data scientists. Key functions include df.head(), df.sample(), df.info(), and df.describe(), which help in data inspection, sampling, and summarization. The document also highlights functions for data cleaning, such as df.duplicated() and df.drop_duplicates(), along with methods for handling categorical data.

Uploaded by

Tayyab Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Pandas Notes

The document outlines essential Pandas functions for data manipulation, emphasizing their importance for data scientists. Key functions include df.head(), df.sample(), df.info(), and df.describe(), which help in data inspection, sampling, and summarization. The document also highlights functions for data cleaning, such as df.duplicated() and df.drop_duplicates(), along with methods for handling categorical data.

Uploaded by

Tayyab Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Master

These Essential

Pandas
Functions Every
Data Scientist
Must-Know
Part: 1

Afshan Nadeem
Why Pandas?
Whether you're cleaning,
transforming, or analyzing data,
Pandas is your go-to Python
library. It makes data
manipulation easier and faster!

Afshan Nadeem
df.head()
Displays the first few rows of the
DataFrame, helping you quickly inspect
the data structure.

df.sample()
Returns a random sample of rows from the
DataFrame, useful for getting a quick sense
of the data variety.

Afshan Nadeem
df.info()
Provides a summary of the DataFrame,
including column names, non-null
counts, and data types.

df.shape()
Returns the number of rows and columns
in the DataFrame.

Afshan Nadeem
df.describe()
Generates summary statistics for
numerical columns, giving a quick
overview of the distribution.

df.columns
Displays the column names in the
DataFrame.

Afshan Nadeem
df.rename()
Renames columns. Useful when you
need to correct column names.

df.dtypes
Shows the data types of each column,
helping you ensure data consistency for
further analysis.

Afshan Nadeem
df.value_counts()
Returns counts of unique values in
a specific column.

df.insert()
Inserts a new column at a specific position.

Afshan Nadeem
df.corr()
Compute pairwise correlation between
numerical columns, which helps identify
relationships between variables.

df.sort_values()
Sorts the DataFrame by the values of a
specific column. Use this to reorder data
based on a column.

Afshan Nadeem
unique()
Returns the unique values of a single
column. Helpful when you want to see
what distinct values are present in a
column.

nunique()
Returns the number of unique values in a
column. Useful when you want to count
how many distinct values are present in
one or more columns.

For all
For single column columns

Afshan Nadeem
df.duplicated()
Helps identify duplicate rows in your data,
which is essential for cleaning datasets.

df.drop_duplicates()
Removes duplicate rows from a
DataFrame, ensuring data integrity.

Afshan Nadeem
pd.get_dummies()
Converts categorical variables into
dummy/indicator variables. Use this to
transform categories into a numeric
format.

Afshan Nadeem
Found this
useful?
Interested in learning more about AI and
Data Science? Stay tuned for more insights
and concepts in future posts!

Like it
Save it
Repost it

Afshan Nadeem

You might also like