Session-7: Data file operations using pandas
Aim: develop a python program that reads data from a CSV file and applies various
operations using the Pandas library.
Software requirement: Python
Program:
import pandas as pd
# Read data from a CSV file (replace 'data.csv' with your file path)
df = pd.read_csv('data.csv')
# Display the first few rows of the DataFrame
print("First 5 rows:")
print(df.head())
# Basic statistics
print("\nSummary Statistics:")
print(df.describe())
# Filtering data
filtered_df = df[df['Age'] > 25]
# Sorting data
sorted_df = df.sort_values(by='Age', ascending=False)
# Grouping and aggregation
grouped_df = df.groupby('Department')['Salary'].mean()
# Adding a new column
df['Salary Increased'] = df['Salary'] * 1.1
# Save the modified DataFrame to a new CSV file
df.to_csv('modified_data.csv', index=False)
# Pivot table
pivot_table = df.pivot_table(index='Department', columns='Gender', values='Salary',
aggfunc='mean')
# Display the results
print("\nFiltered DataFrame:")
print(filtered_df)
print("\nSorted DataFrame:")
print(sorted_df)
print("\nGrouped DataFrame:")
print(grouped_df)
print("\nDataFrame with Added Column:")
print(df)
print("\nPivot Table:")
print(pivot_table)
# This program reads data from a CSV file, applies various operations like filtering,
sorting,
# grouping, adding a new column, and creating a pivot table using Pandas. Make sure
to
# replace 'data.csv' with the path to your CSV file, and adjust the operations as
needed # for your specific data and requirements.
Theory: Reading data from a .doc file directly using Pandas can be a bit challenging
since Pandas is primarily designed to work with structured data like CSV, Excel, and
databases. However, you can convert the data from a .doc file to a format that Pandas
can handle, such as text or CSV, and then perform operations on it. Here's an example
of how to do that using the python-docx library to read data from a Word document:
First, you'll need to install the python-docx library:
pip install python-docx
Now, let's create a Python program that reads data from a Word document, converts it
to a DataFrame, and applies some operations using Pandas:
import pandas as pd
from docx import Document
# Read data from a Word document (replace 'document.docx' with your file path)
document = Document('document.docx')
# Extract text from the Word document
text = []
for paragraph in document.paragraphs:
text.append(paragraph.text)
# Create a DataFrame from the extracted text
df = pd.DataFrame({'Text': text})
# Display the first few rows of the DataFrame
print("First 5 rows:")
print(df.head())
# Basic statistics
print("\nSummary Statistics:")
print(df.describe())
# Filter data
filtered_df = df[df['Text'].str.contains('keyword')]
# Save the filtered DataFrame to a new CSV file
filtered_df.to_csv('filtered_data.csv', index=False)
# Display the filtered DataFrame
print("\nFiltered DataFrame:")
print(filtered_df)
Result: A python program developed to read data from a CSV file and applies various
operations using the Pandas library.