0% found this document useful (0 votes)
14 views2 pages

Pandas 1

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 2

Pandas is a data manipulation package in Python for tabular data.

Data in the form of rows and columns, also known as DataFrames.


Pandas works well with other popular Python data science packages, often called
the PyData ecosystem, including

 NumPy for numerical computing


 Matplotlib, Seaborn, Plotly, and other data visualization packages
 scikit-learn for machine learning

Pandas is used throughout the data analysis workflow.


With pandas, you can:

 Import datasets from databases, spreadsheets, comma-separated values


(CSV) files, and more.
 Clean datasets, for example, by dealing with missing values.
 reshaping their structure into a suitable format for analysis.
 Aggregate data by calculating summary statistics such as the mean of
columns, correlation between them, and more.
 Visualize datasets and uncover insights.

Pandas also contains functionality for time series analysis and analyzing text data.

Importing CSV files

Use read_csv() with the path to the CSV file to read a comma-separated values file

Importing text files

Reading text files is similar to CSV files. The only nuance is that you need to specify
a separator with the sep argument, as shown below. The separator argument refers
to the symbol used to separate rows in a DataFrame. Comma (sep = ","),
whitespace(sep = "\s"), tab (sep = "\t"), and colon(sep = ":") are the commonly used
separators.

Importing Excel files (single sheet)

Reading excel files (both XLS and XLSX) is as easy as the read_excel() function,
using the file path as an input.

Importing Excel files (multiple sheets)

Reading Excel files with multiple sheets is not that different. You just need to specify
one additional argument, sheet_name, where you can either pass a string for the
sheet name or an integer for the sheet position (note that Python uses 0-indexing,
where the first sheet can be accessed with sheet_name = 0)

Importing JSON file

Similar to the read_csv() function, you can use read_json() for JSON file types with
the JSON file name as the argument.

How to view data using .head() and .tail()


You can view the first few or last few rows of a DataFrame using
the .head() or .tail() methods, respectively. You can specify the number of rows
through the n argument (the default value is 5).

Understanding data using .describe()

The .describe() method prints the summary statistics of all numeric columns, such as
count, mean, standard deviation, range, and quartiles of numeric columns.

Understanding data using .info()

The .info() method is a quick way to look at the data types, missing values, and data
size of a DataFrame. Here, we’re setting the show_counts argument to True, which
gives a few over the total non-missing values in each column. We’re also
setting memory_usage to True, which shows the total memory usage of the
DataFrame elements. When verbose is set to True, it prints the full summary
from .info().

df.info(show_counts=True, memory_usage=True, verbose=True)

Understanding your data using .shape

The number of rows and columns of a DataFrame can be identified using


the .shape attribute of the DataFrame. It returns a tuple (row, column) and can be
indexed to get only rows, and only columns count as output.

Get all columns and column names

Calling the .columns attribute of a DataFrame object returns the column names in
the form of an Index object.

You might also like