0% found this document useful (0 votes)

15 views

Pandas basics

Uploaded by

221201011

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Pandas basics

Uploaded by

221201011

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

NumPy and pandas are both popular libraries in Python for data manipulation and analysis,

but they serve different purposes and have distinct features. Here’s a comparison of the two:

NumPy

Purpose:

• NumPy (Numerical Python) is primarily used for numerical computations. It provides

support for arrays, matrices, and a wide range of mathematical functions to operate on
these data structures.

Typical Use Cases:

• Numerical computations and mathematical operations.

• Scientific computing and engineering problems.
• Operations on arrays and matrices.

pandas

Purpose:

• pandas is designed for data manipulation and analysis. It provides data structures and
functions needed to work with structured data, such as tables and time series.
• NOTE: A time series is a sequence of data points typically measured at successive points in
time, usually at equally spaced intervals. Time series data is often used in various fields such
as finance, economics, weather forecasting, signal processing, and many others to analyze
patterns, trends, and cycles.

Typical Use Cases:

• Data manipulation and analysis.

• Working with structured data in tables or time series.
• Data cleaning and preparation for analysis.

Pandas is a powerful and widely-used data manipulation and analysis library in Python. It
provides data structures and functions needed to work with structured data seamlessly. Here's
an introduction to the core components of pandas:

1. Key Data Structures

• Series: A one-dimensional labeled array capable of holding any data type. It is similar
to a column in a spreadsheet.
• DataFrame: A two-dimensional labeled data structure with columns that can be of
different types, much like a table in a database or a data frame in R.
import pandas as pd

# Creating a Series

data = [1, 2, 3, 4, 5]

series = pd.Series(data)

# Creating a DataFrame

data = { 'Name': ['John', 'Anna', 'Peter', 'Linda'], 'Age': [28, 24, 35, 32] }

df = pd.DataFrame(data)

Series:

dtype: int64

DataFrame:

Name Age

0 John 28

1 Anna 24

2 Peter 35

3 Linda 32

The Series is a one-dimensional array with integer values and default index labels. The DataFrame
is a two-dimensional table with columns for 'Name' and 'Age', and it also has default integer index
labels.

What is a pandas Series?

A pandas Series is a one-dimensional labeled array capable of holding any data type
(integers, strings, floats, etc.). It is similar to a column in a table or a single column in an
Excel sheet.

Breaking down series = pd.Series(data)

1. pd: This is the common alias for the pandas library. Before you can use it, you need to
import pandas with import pandas as pd.
2. Series: This is a constructor for creating a Series object in pandas.
3. data: This is the input data that you want to convert into a Series. In this case, data is
a list [1, 2, 3, 4, 5].
Creating the Series

When you call pd.Series(data), pandas does the following:

1. Takes the input data: Here, it is a list [1, 2, 3, 4, 5].

2. Creates an index: By default, pandas will create an integer index starting from 0. So,
the indices for this Series will be [0, 1, 2, 3, 4].
3. Pairs the data with the index: Each element in the list is paired with an index value.

Customizing the Series

Creating a Series data = [1, 2, 3, 4, 5]

custom_index = ['a', 'b', 'c', 'd', 'e']

series = pd.Series(data, index=custom_index)

print(series)

Output with Custom Index:

dtype: int64

Creating a pandas Series from a dictionary

To create a Series from a dictionary, the keys of the dictionary become the indices of the Series, and
the values of the dictionary become the values of the Series

import pandas as pd

# Creating a dictionary

data = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}

# Creating a Series from the dictionary

series = pd.Series(data)

print(series)

Explanation

1. Creating the dictionary: The dictionary data has keys 'a', 'b', 'c', 'd', 'e'
and corresponding values 1, 2, 3, 4, 5.
2. Creating the Series: When you pass this dictionary to pd.Series, pandas creates a
Series where the dictionary keys become the index, and the dictionary values become
the data of the Series.
Output

dtype: int64

Explanation of the Output

• Indices (a, b, c, d, e): These are the keys from the dictionary.
• Values (1, 2, 3, 4, 5): These are the values from the dictionary.
• dtype: int64: This indicates the data type of the values in the Series. In this case, it is
int64, which means 64-bit integers.

This approach is useful when you have labeled data that you want to convert into a pandas
Series for further manipulation or analysis.

create series from scalar values

Creating a pandas Series from scalar values involves specifying the scalar value and an index. This
will generate a Series where each index label is associated with the same scalar value.

import pandas as pd

# Scalar value

scalar_value = 10

# Specifying the index

index = ['a', 'b', 'c', 'd', 'e']

# Creating the Series from the scalar value

series = pd.Series(scalar_value, index=index)

print(series)

output

a 10

b 10

c 10

d 10

e 10

dtype: int64
Explanation of the Output

• Indices (a, b, c, d, e): These are the index labels specified in the index list.
• Values (10): Each index label is associated with the scalar value 10.
• dtype: int64: This indicates the data type of the values in the Series. In this case, it is
int64, which means 64-bit integers.

This approach is useful when you want to create a Series with a constant value for each
index, such as initializing a Series or filling a Series with a specific value.

create series using index

Creating a pandas Series using a specified index involves associating values with specific indices.

import pandas as pd

# List of values

values = [10, 20, 30, 40, 50]

# Specifying the index

index = ['a', 'b', 'c', 'd', 'e']

# Creating the Series with specified index

series = pd.Series(values, index=index)

print(series)

output:
a 10
b 20
c 30
d 40
e 50
dtype: int64

4. Creating an Empty Series with a Specified Index

To create an empty Series with a specified index. This is useful for initializing a Series to be
filled later.

import pandas as pd

# Specifying the index

index = ['a', 'b', 'c', 'd', 'e']

# Creating an empty Series with the specified index

series = pd.Series(index=index)

print(series)
output:
a NaN
b NaN
c NaN
d NaN
e NaN
dtype: float64

Explanation

• NaN: Stands for "Not a Number" and is used to denote missing or undefined values in
pandas.
• dtype: float64: When creating an empty Series, the default data type is float64.

By specifying the index, you can create Series that are tailored to your data structure needs,
whether you are initializing with specific values, using a scalar value, or creating an empty
Series for later use.

To know the size, dimension, shape and index of series

import pandas as pd

# Creating a Series with specified index

data = [10, 20, 30, 40, 50]

index = ['a', 'b', 'c', 'd', 'e']

series = pd.Series(data, index=index)

# Printing the Series

print(series)

# Getting the size of the Series

print("\nSize of Series:", series.size)

# Getting the dimensions of the Series

print("Dimensions of Series:", series.ndim)

# Getting the shape of the Series

print("Shape of Series:", series.shape)

# Getting the index of the Series

print("Index of Series:", series.index)

output:
Series:
a 10
b 20
c 30
d 40
e 50
dtype: int64

Size of Series: 5
Dimensions of Series: 1
Shape of Series: (5,)
Index of Series: Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

Explanation

1. Size of the Series:

o series.size: Returns the number of elements in the Series.
o Output: 5
2. Dimensions of the Series:
o series.ndim: Returns the number of dimensions of the Series. For a Series, it
is always 1.
o Output: 1
3. Shape of the Series:
o series.shape: Returns a tuple representing the dimensionality of the Series.
For a one-dimensional Series, it returns a tuple with one element, which is the
number of elements in the Series.
o Output: (5,)
4. Index of the Series:
o series.index: Returns the index (labels) of the Series.
o Output: Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

Importance of dtype

• Efficiency: Knowing the dtype helps Pandas optimize storage and computation.

dtype='object' indicates that the index labels of the Series are stored as generic objects, which is
appropriate for string labels. If the index labels were numeric, then dtype, is int64 or float64.

These attributes provide essential information about the Series, helps to understand its
structure and how to manipulate it.

Note: A tuple is an immutable, ordered collection of elements in Python. Tuples are similar to
lists but have some key differences:

Key Characteristics of Tuples:

1. Ordered: The elements in a tuple have a defined order, and this order will not
change.
2. Immutable: Once a tuple is created, you cannot modify, add, or remove elements
from it. This immutability makes tuples a good choice for read-only collections of
data.
3. Heterogeneous: Tuples can contain elements of different types, including other
tuples, lists, dictionaries, and more.
4. Indexable: You can access elements in a tuple using their index, starting from 0 for
the first element.

the comma in the shape (5,) signifies that the shape is a tuple with one element. This is a
requirement of Python's syntax to differentiate single-element tuples from regular
parenthesized expressions.

Creating an empty DataFrame

Creating an empty DataFrame can be useful in various scenarios when working with data in
pandas. Here are some common reasons and scenarios where you might need to create an
empty DataFrame:

#Empty DataFrame with no columns and no rows:

import pandas as pd
# Creating an empty DataFrame
df = pd.DataFrame()
print(df)

output:

Empty DataFrame
Columns: []
Index: []

1. Incremental Data Loading:

o You may start with an empty DataFrame and then incrementally add rows of
data to it as they are processed or received from another source.
2. Data Aggregation:
o When aggregating data from multiple sources or files, you can start with an
empty DataFrame and append the data as it is read from each source.
3. Dynamic DataFrame Creation:
o In cases where the structure of the DataFrame (i.e., columns) is known in
advance but the data is generated or fetched dynamically, you can initialize an
empty DataFrame and populate it later.
4. Initializing Data Structures:
o To create placeholders for future data manipulation, ensuring you have the
correct structure before data insertion.
5. Placeholders in Functions or Classes:
o When writing functions or classes that return DataFrames, you can start with
an empty DataFrame and populate it based on certain conditions or input
parameters.
6. Ensuring Data Consistency:
o When you need to ensure a certain structure (specific columns with specific
data types) from the start, even before data is available.
7. Error Handling:
o To avoid errors in code execution, initializing an empty DataFrame can help
manage cases where no data is available, ensuring that the code can handle
such scenarios gracefully.
# Creating an empty DataFrame with specific columns

Creating an empty DataFrame with specific columns is a way to initialize a DataFrame with
predefined column names but without any data. This can be useful when you know the
structure of your DataFrame in advance but will populate it with data later.

# Creating an empty DataFrame with specific columns

df = pd.DataFrame(columns=['Column1', 'Column2', 'Column3'])
print(df)
output:
Empty DataFrame
Columns: [Column1, Column2, Column3]
Index: []

# Function to extract and print one column

# Sample DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
# Function to extract and print one column
def extract_column(dataframe, column_name):
if column_name in dataframe.columns:
column_data = dataframe[column_name]
print(column_data)
else:
print(f"Column '{column_name}' not found in DataFrame.")
# Example usage
extract_column(df, 'Age')

Explanation
Creating a DataFrame:
df = pd.DataFrame(data)
data is assumed to be a predefined variable containing the data , to convert into a DataFrame. A
DataFrame is a two-dimensional labeled data structure with columns of potentially different types.

Defining the function:

def extract_column(dataframe, column_name):

This function, extract_column, is defined to take two arguments: a DataFrame (dataframe) and
a column name (column_name). The purpose of the function is to check if the specified column
exists in the DataFrame and print its data. If the column does not exist, it prints a message indicating
that the column was not found.

if column_name in dataframe.columns:

This line checks if the provided column_name exists in the DataFrame's columns. If it does, the
function proceeds to the next step; otherwise, it goes to the else block.
Extracting and printing the column:

column_data = dataframe[column_name]
print(column_data)

If the column exists, the function extracts the column data from the DataFrame and prints it.

Handling a non-existent column:

else:
print(f"Column '{column_name}' not found in DataFrame.")

If the column does not exist in the DataFrame, this block prints a message indicating that the column
was not found.

# Sample DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

# Function to extract and return one row

def extract_row(dataframe, row):
if row in dataframe.index:
return dataframe.iloc[row]
else:
print(f"Row index '{row}' not found in DataFrame.")
return None

# Example usage
row_2 = extract_row(df, 2)
print("Extracted row:")
print(row_2)
return dataframe.iloc[row_index]

If the row index exists, the function extracts the row data from the DataFrame using iloc,
which is an integer-location based indexing for selection by position, and returns the row
data.

The f in the string print(f"Row index '{row}' not found in DataFrame.") represents
an f-string, which is a feature introduced in Python 3.6. An f-string (formatted string literal)
allows you to embed expressions inside string literals, using curly braces {}.

Working

• f-string: The f before the opening quote marks indicates that the string is an f-string.
• Curly braces {}: Any expression inside the curly braces {} will be evaluated and its
result will be inserted into the string at that position.
row = 5
print(f"Row index '{row}' not found in DataFrame.")

• f"Row index '{row}' not found in DataFrame.": This is an f-string.

• Inside the curly braces {row}, the value of the variable row will be evaluated and
converted to a string.
• If row is 5, the output of the print statement will be:

Row index '5' not found in DataFrame.

Benefits

• Readability: f-strings make it easier to format strings and read the code.
• Performance: f-strings are generally faster than other methods of string formatting,
like using % or str.format().

Adding a new row using loc:

# Sample DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

# Adding a new row using loc

df.loc[len(df)] = ['David', 40, 'San Francisco']
print("DataFrame after adding a new row using loc:")
print(df)

df.loc[len(df)] = ['David', 40, 'San Francisco']

• len(df): This expression returns the number of rows in the DataFrame. If the
DataFrame initially has 3 rows, len(df) will return 3.
• df.loc[len(df)]: The loc accessor is used to access a group of rows and columns
by labels or a boolean array. In this case, it is used to add a new row at the index
position returned by len(df).
• ['David', 40, 'San Francisco']: This is the data for the new row being added to
the DataFrame. The new row will have 'David' as the Name, 40 as the Age, and 'San
Francisco' as the City.

Output

DataFrame after adding a new row using loc:

Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
3 David 40 San Francisco
In pandas, data storage refers to how data is organized, manipulated, and stored within the
DataFrame and Series objects. Understanding data storage in pandas is essential for efficient
data manipulation and analysis. Here's an overview of key aspects of data storage in pandas:

Data Types

Pandas uses NumPy for its underlying data storage, which means it leverages NumPy's
efficient array-based storage. Common data types in pandas include:

• int64
• float64
• bool
• datetime64[ns]
• object (for string or mixed types)

Example:

python
print(df.dtypes)

Output:

Name object
Age int64
City object
dtype: object

3. Indexing

Indexes provide fast lookups and are essential for aligning data:

• Default Index: Integer-based starting from 0.

• Custom Index: User-defined labels for rows.

Example:

python
df = pd.DataFrame(data, index=['a', 'b', 'c'])
print(df)

Output:

Name Age City

a Alice 25 New York
b Bob 30 Los Angeles
c Charlie 35 Chicago

4. Storage Formats
In-Memory Storage

Data is typically stored in memory in the form of DataFrame and Series objects, allowing for
fast data manipulation and analysis.

File-Based Storage

Pandas supports various file formats for data storage, including:

• CSV
• Excel
• HDF5
• Parquet
• SQL databases
• JSON
• and more

Reading and Writing CSV:

# Writing to a CSV file

df.to_csv('data.csv', index=False)

EXPLANATION:
to write the contents of a pandas DataFrame to a CSV (Comma Separated
Values) file.

• df: This is the pandas DataFrame that you want to write to a CSV file. In your context, df
contains data with columns such as 'Name', 'Age', and 'City'.

• .to_csv(): This is a pandas DataFrame method that exports the DataFrame to a CSV file.
The to_csv method has several optional parameters that allow you to control the output
format, but here we're using two of them: the file path and index.

• 'data.csv': This is the path to the file where the DataFrame will be written. If the file
does not exist, it will be created. If it does exist, it will be overwritten.

• index=False: This parameter specifies whether to write row indices to the CSV file. By
default, index=True, meaning the row indices are included in the CSV file. Setting
index=False excludes the row indices from the output file.

# Reading from a CSV file

df = pd.read_csv('data.csv')
print(df)

DELIMITER

A delimiter in a CSV (Comma Separated Values) file is a character that separates individual
data values within each row of the file. The delimiter tells the software reading the CSV file
how to split the data into individual fields or columns.

Common Delimiters
1. Comma (,): This is the most common delimiter and is the default in CSV files. Each
value is separated by a comma. Example:

Name,Age,City
Alice,25,New York
Bob,30,Los Angeles
Charlie,35,Chicago

2. Semicolon (;): Sometimes used instead of a comma, especially in regions where the
comma is used as a decimal separator. Example:

Name;Age;City
Alice;25;New York
Bob;30;Los Angeles
Charlie;35;Chicago

3. Tab (\t): Used in TSV (Tab Separated Values) files. Example:

Name Age City

Alice 25 New York
Bob 30 Los Angeles
Charlie 35 Chicago

4. Pipe (|): Occasionally used to avoid conflicts with data that may contain commas or
semicolons. Example:

Name|Age|City
Alice|25|New York
Bob|30|Los Angeles
Charlie|35|Chicago

Specifying Delimiters in pandas

When reading or writing CSV files with pandas, you can specify the delimiter using the sep
parameter in the read_csv and to_csv methods.

Reading a CSV File with a Custom Delimiter

Example of reading a CSV file with a semicolon delimiter:

import pandas as pd

df = pd.read_csv('data.csv', sep=';')
Writing a CSV File with a Custom Delimiter

Example of writing a DataFrame to a CSV file with a semicolon delimiter:

import pandas as pd

data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
df.to_csv('data.csv', sep=';', index=False)

Why Use Different Delimiters?

• Regional Preferences: In some countries, the comma is used as a decimal separator. To

avoid confusion, semicolons or other delimiters might be used in CSV files.
• Data Content: If the data itself contains commas, using a different delimiter like a tab or
pipe can help avoid conflicts.
• Compatibility: Some software might require a specific delimiter to correctly parse the CSV
file.

Summary

• Delimiter: A character that separates individual values in a CSV file.

• Common Delimiters: Comma, semicolon, tab, and pipe.
• Custom Delimiters in pandas: Use the sep parameter in read_csv and to_csv methods to
specify a delimiter other than the default comma.

SLICING ROWS FROM A DATAFRAME

In Pandas, slicing rows from a DataFrame can be done in various ways depending on your
requirements.

Using iloc

iloc is used for integer-location based indexing for selection by position.

import pandas as pd

# Sample DataFrame

data = {

'A': [1, 2, 3, 4, 5],

'B': [10, 20, 30, 40, 50],

'C': [100, 200, 300, 400, 500]

df = pd.DataFrame(data)
print(df)

# Slicing rows from index 1 to 3 (excluding 3)

sliced_df = df.iloc[1:3]

print(sliced_df)

import pandas as pd

# Sample DataFrame
data = {
'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50],
'C': [100, 200, 300, 400, 500]
}
df = pd.DataFrame(data)
print(df)

# Slicing every other row

sliced_df = df.iloc[::2]
print(sliced_df)

output
A B C
0 1 10 100
1 2 20 200
2 3 30 300
3 4 40 400
4 5 50 500

A B C
0 1 10 100
2 3 30 300
4 5 50 500

Explanation of df.iloc[::2]

The expression df.iloc[::2] is using the iloc indexer to slice the DataFrame. Here's a
breakdown of the syntax and how it works:

• iloc: This is a Pandas method used for integer-location based indexing. It allows you
to select rows and columns by their integer positions.
• ::2: This is Python's slicing syntax. In general, slicing has the form
start:stop:step, where:
o start is the index to start the slice (inclusive).
o stop is the index to end the slice (exclusive).
o step is the step size or stride between each index in the slice.

When you use ::2, it means:

• start is omitted, so it starts from the beginning of the DataFrame.
• stop is omitted, so it goes until the end of the DataFrame.
• step is 2, so it takes every second row.

This effectively selects every other row from the DataFrame.

only every other row from the original DataFrame is included in the sliced DataFrame.

rows with indices 0, 2, and 4 are selected, while rows with indices 1 and 3 are skipped.

This kind of slicing is useful when you want to downsample your data by taking every nth
row, in this case, every 2nd row.

import pandas as pd

# Simulating a DataFrame for sensor data in a mechatronics system

data = {
'Timestamp': pd.date_range(start='2023-01-01', periods=10, freq='T'),
'Temperature': [20, 21, 19, 22, 20, 21, 19, 23, 20, 22],
'Vibration': [0.01, 0.02, 0.03, 0.02, 0.01, 0.04, 0.02, 0.03, 0.01, 0.02],
'Position': [5, 6, 5, 7, 5, 6, 5, 8, 5, 7]
}
df = pd.DataFrame(data)
print(df)
Creating the Data Dictionary:

This dictionary data contains the following keys and corresponding lists:

• Timestamp: A range of timestamps starting from '2023-01-01' with 10 periods at a

frequency of one minute ('T').
• Temperature: A list of temperature readings.
• Vibration: A list of vibration readings.
• Position: A list of position readings.

Creating the DataFrame:

df = pd.DataFrame(data)
This line creates a Pandas DataFrame df using the dictionary data. The DataFrame will have
columns 'Timestamp', 'Temperature', 'Vibration', and 'Position', and each list in the dictionary
becomes a column in the DataFrame.

Output Explanation
Timestamp Temperature Vibration Position
0 2023-01-01 00:00:00 20 0.01 5
1 2023-01-01 00:01:00 21 0.02 6
2 2023-01-01 00:02:00 19 0.03 5
3 2023-01-01 00:03:00 22 0.02 7
4 2023-01-01 00:04:00 20 0.01 5
5 2023-01-01 00:05:00 21 0.04 6
6 2023-01-01 00:06:00 19 0.02 5
7 2023-01-01 00:07:00 23 0.03 8
8 2023-01-01 00:08:00 20 0.01 5
9 2023-01-01 00:09:00 22 0.02 7

The DataFrame contains 10 rows, corresponding to the 10 periods specified in the

pd.date_range function, each with associated readings for temperature, vibration, and
position.

This DataFrame could be used for further analysis, such as plotting the sensor data, detecting
anomalies, or performing statistical analysis on the readings.
Loading data into a pandas DataFrame:

Loading data into a pandas DataFrame is a fundamental step in data analysis and
manipulation. Pandas provides a variety of functions to read data from different file formats
and data sources. Here’s a guide on how to load data into pandas DataFrames from common
sources:

1. Reading CSV Files

Basic CSV Reading

import pandas as pd

# Reading a CSV file

df = pd.read_csv('data.csv')
print(df)
Custom Delimiter
# Reading a CSV file with a custom delimiter (e.g., semicolon)
df = pd.read_csv('data.csv', sep=';')
print(df)
Specifying Column Data Types
# Reading a CSV file and specifying data types for columns
df = pd.read_csv('data.csv', dtype={'Age': int, 'Name': str})
print(df)

# Reading a CSV file and specifying data types for columns

df = pd.read_csv('data.csv', dtype={'Age': int, 'Name': str})

print(df)

Benefits of Specifying Data Types

• Accuracy: Ensures columns are interpreted correctly (e.g., 'Age' as integers, not
strings).
• Performance: Helps pandas optimize memory usage and improve performance by
using appropriate data types.
• Error Prevention: Prevents potential issues with data type mismatches during data
processing and analysis.

2. Reading Excel Files

Reading from Excel

# Reading an Excel file
df = pd.read_excel('data.xlsx')
print(df)
Specifying Sheet Name
# Reading a specific sheet from an Excel file
df = pd. read_excel('data.xlsx', sheet_name='Sheet1')
print(df)
3. Reading from SQL Databases

Using SQLAlchemy
from sqlalchemy import create_engine
import pandas as pd

# Creating an engine to connect to the database

engine = create_engine('sqlite:///my_database.db')

# Reading data from a SQL table

df = pd.read_sql('SELECT * FROM my_table', engine)
print(df)

4. Reading JSON Files

Reading JSON
import pandas as pd

# Reading a JSON file

df = pd.read_json('data.json')
print(df)

5. Reading from Other Formats

Reading HTML Tables

# Reading tables from an HTML file
dfs = pd.read_html('https://example.com/table.html')
print(dfs[0]) # Print the first table found on the page
Reading from HDF5
# Reading from an HDF5 file
df = pd.read_hdf('data.h5', 'dataset_name')
print(df)

6. Loading Data from Python Dictionaries and Lists

From Dictionary
# Creating a DataFrame from a dictionary
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)
From List of Dictionaries
# Creating a DataFrame from a list of dictionaries
data = [{'Name': 'Alice', 'Age': 25}, {'Name': 'Bob', 'Age': 30}]
df = pd.DataFrame(data)
print(df)

7. Handling Missing Values

Specifying Missing Values

# Reading a CSV file and specifying missing values
df = pd.read_csv('data.csv', na_values=['NA', 'N/A', ''])
print(df)

8. Specifying Index Column

Setting an Index Column
# Reading a CSV file and setting a column as the index
df = pd.read_csv('data.csv', index_col='ID')
print(df)

Python 1 To 16 Practical
83% (6)
Python 1 To 16 Practical
41 pages
Python Pandas Tutorial PDF
100% (1)
Python Pandas Tutorial PDF
13 pages
Class 12 IP Ch-1, 2 3
No ratings yet
Class 12 IP Ch-1, 2 3
28 pages
XII IP Ch 1 Python Pandas - I Series
No ratings yet
XII IP Ch 1 Python Pandas - I Series
45 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
135 pages
XII_ip_Panda_I_Part_I_2023 (1) 1 1
No ratings yet
XII_ip_Panda_I_Part_I_2023 (1) 1 1
25 pages
Pandas
No ratings yet
Pandas
20 pages
Python Pandas
100% (1)
Python Pandas
35 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
75 pages
ML Lab8
No ratings yet
ML Lab8
28 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
38 pages
Data Handling Using Pandas I - Series
No ratings yet
Data Handling Using Pandas I - Series
11 pages
Data Analytics Pandas
No ratings yet
Data Analytics Pandas
33 pages
Python Pandas (II)
No ratings yet
Python Pandas (II)
18 pages
1 IP 12 NOTES PythonPandas 2022 PDF
100% (3)
1 IP 12 NOTES PythonPandas 2022 PDF
66 pages
Data Handling Using Pandas - 1-2-1
No ratings yet
Data Handling Using Pandas - 1-2-1
10 pages
Class12 Pandas Notes
No ratings yet
Class12 Pandas Notes
23 pages
Unit-1 Python Pandas (1)
No ratings yet
Unit-1 Python Pandas (1)
56 pages
Working With Pandas Notes
No ratings yet
Working With Pandas Notes
27 pages
Ln. 1 - Data handling using Pandas - Series & Dataframe
No ratings yet
Ln. 1 - Data handling using Pandas - Series & Dataframe
14 pages
Unit II Notes Revision
No ratings yet
Unit II Notes Revision
20 pages
4b Understanding Series in Pandas - PPTX - Lyst2672
No ratings yet
4b Understanding Series in Pandas - PPTX - Lyst2672
10 pages
Unit - 1 - Python Pandas
No ratings yet
Unit - 1 - Python Pandas
176 pages
PYTHON UNIT-5 Part-C
No ratings yet
PYTHON UNIT-5 Part-C
4 pages
Python Pandas
No ratings yet
Python Pandas
22 pages
Python Pandas
No ratings yet
Python Pandas
21 pages
Series In Pandas
No ratings yet
Series In Pandas
3 pages
Chapter 2 Data Handling using pandas - I(Series)
No ratings yet
Chapter 2 Data Handling using pandas - I(Series)
13 pages
Pandas Notoes For XII PDF
No ratings yet
Pandas Notoes For XII PDF
12 pages
DS
No ratings yet
DS
38 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
CSE488_Lab5_Pandas
No ratings yet
CSE488_Lab5_Pandas
27 pages
Pandas Ip PDF
100% (1)
Pandas Ip PDF
48 pages
14_Pandas
No ratings yet
14_Pandas
25 pages
Python Pandas-Series-neww
100% (1)
Python Pandas-Series-neww
80 pages
Python Code
No ratings yet
Python Code
44 pages
Python Pandas Series
No ratings yet
Python Pandas Series
37 pages
Exp8 SBLC
No ratings yet
Exp8 SBLC
9 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
12 IP Questions
No ratings yet
12 IP Questions
181 pages
Pandas Notes 1
No ratings yet
Pandas Notes 1
6 pages
LAST MINUTES REVISION Pandas Series
No ratings yet
LAST MINUTES REVISION Pandas Series
6 pages
UNIT 3(Chapter 2) Pandas
No ratings yet
UNIT 3(Chapter 2) Pandas
43 pages
Exp 25_26
No ratings yet
Exp 25_26
17 pages
Series-2 ip
No ratings yet
Series-2 ip
10 pages
DV
No ratings yet
DV
53 pages
Unit_III_part_2_1725700061785
No ratings yet
Unit_III_part_2_1725700061785
85 pages
Python Pandas Series
No ratings yet
Python Pandas Series
45 pages
2.1 Pandas Objects
No ratings yet
2.1 Pandas Objects
10 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
25 pages
Chapter 1 and 2 Series and Data Frame
No ratings yet
Chapter 1 and 2 Series and Data Frame
45 pages
Pandas
No ratings yet
Pandas
82 pages
Pandas - Series - Introduction
No ratings yet
Pandas - Series - Introduction
19 pages
Python Pandas
No ratings yet
Python Pandas
96 pages
Data Handling Python NCERT
No ratings yet
Data Handling Python NCERT
36 pages
CH 2
No ratings yet
CH 2
36 pages
CSL-410-L14
No ratings yet
CSL-410-L14
19 pages
Python Data Processing
No ratings yet
Python Data Processing
36 pages
Ip Chapter 1
No ratings yet
Ip Chapter 1
36 pages
Ip 102
No ratings yet
Ip 102
36 pages
Pandas & Numpy
No ratings yet
Pandas & Numpy
32 pages
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
NUS Python Analytics Brochure
No ratings yet
NUS Python Analytics Brochure
14 pages
Unit2 PDS
No ratings yet
Unit2 PDS
17 pages
cs3362 Foundations of Data Science Lab Manual
No ratings yet
cs3362 Foundations of Data Science Lab Manual
53 pages
Python DataStructure Question
No ratings yet
Python DataStructure Question
12 pages
IP Project Report
No ratings yet
IP Project Report
15 pages
De&v Lab Manual
No ratings yet
De&v Lab Manual
91 pages
Krishna Data Scientist +1 (713) - 478-5282
No ratings yet
Krishna Data Scientist +1 (713) - 478-5282
5 pages
Python
No ratings yet
Python
3 pages
How To Index, Slice and Reshape NumPy Arrays For Machine Learning
No ratings yet
How To Index, Slice and Reshape NumPy Arrays For Machine Learning
31 pages
XII-IP-QuickRevision 2 in 1
No ratings yet
XII-IP-QuickRevision 2 in 1
13 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
14 pages
Pranjal CS Final Project
No ratings yet
Pranjal CS Final Project
32 pages
AI Document
No ratings yet
AI Document
7 pages
ML Notesv1
100% (1)
ML Notesv1
300 pages
Scipy Cookbook
No ratings yet
Scipy Cookbook
527 pages
Appendix Tensorflow PDF
50% (8)
Appendix Tensorflow PDF
14 pages
About Pytorch Brief Details 1716579380
No ratings yet
About Pytorch Brief Details 1716579380
20 pages
FDA Lab Manual Final
No ratings yet
FDA Lab Manual Final
42 pages
Packages
No ratings yet
Packages
37 pages
Pi Camera
No ratings yet
Pi Camera
214 pages
Ai Resos
No ratings yet
Ai Resos
16 pages
The Best Python Libraries b0d3576dpz
100% (1)
The Best Python Libraries b0d3576dpz
50 pages
Ip Study Material
No ratings yet
Ip Study Material
185 pages
Instructions: Answer Any Four Que"stions From Part - 8 in Detalf. Different Sub-Parts of A Question Are To Be Attempted Adjacent
No ratings yet
Instructions: Answer Any Four Que"stions From Part - 8 in Detalf. Different Sub-Parts of A Question Are To Be Attempted Adjacent
2 pages
XII IP Support Material 2024-25
No ratings yet
XII IP Support Material 2024-25
148 pages
Audio and Digital Signal Processing
No ratings yet
Audio and Digital Signal Processing
18 pages
Minor Project RRR
No ratings yet
Minor Project RRR
24 pages
Python and ML MCQ
No ratings yet
Python and ML MCQ
25 pages