John Glauben J.
Caduan
01/25/2025
Fundamentals of Analytics Assignment
1. What is Pandas?
Pandas is a Python library used for working with data sets. It has
functions for analyzing, cleaning, exploring, and manipulating data.
2. Types of Data structures in Pandas and give sample declaration
a. Series: A one-dimensional labeled array, capable of holding data of
any type.
import pandas as pd
series = pd.Series([1, 2, 3, 4])
b. DataFrame: A two-dimensional labeled data structure with columns
of potentially different types.
dataframe = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
3. Types of Indexes in Pandas and how to use these
a. Default Index: Sequential integers starting from 0.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3]})
# Default index: 0, 1, 2
b. Custom Index: User-defined index values.
df = pd.DataFrame({'A': [1, 2, 3]}, index=['a', 'b', 'c'])
c. MultiIndex: Hierarchical indexing for multiple levels.
multi_index = pd.MultiIndex.from_tuples([('A', 1), ('A', 2), ('B', 1)])
df = pd.DataFrame({'Value': [10, 20, 30]}, index=multi_index)
d. DatetimeIndex: Index based on datetime objects.
dates = pd.date_range('2023-01-01', periods=3)
df = pd.DataFrame({'A': [1, 2, 3]}, index=dates)
e. CategoricalIndex: Index with categorical data.
categories = pd.Categorical(['low', 'medium', 'high'])
df = pd.DataFrame({'Value': [10, 20, 30]}, index=categories)
4. Enumerate the series and DataFrame Methods
Data Manipulation
add(), sub(), mul(), div()
append()
combine()
update()
replace()
map()
apply()
drop()
fillna(), bfill(), ffill()
Index & Access
at[], iat[]
loc[], iloc[]
get()
Aggregation & Statistics
mean(), sum(), prod()
min(), max(), idxmin(), idxmax()
median(), mode()
std(), var()
cumsum(), cumprod(), cummin(), cummax()
Sorting & Filtering
sort_values()
sort_index()
where(), mask()
Conversion
to_frame()
astype()
to_list()
Other
describe()
value_counts()
unique()
isna(), notna()
5. Given the Data Below answer the following:
a. how to display the highest sales
b. how to display the total sales in East region
import pandas as pd
# Create the DataFrame
data = { 'Name': ['William', 'Emma', 'Sofia', 'Markus', 'Edward'],
'Region': ['East', 'North', 'East', 'South', 'West'],
'Sales': [50000, 52000, 90000, 34000, 42000],
'Expense': [42000, 43000, 50000, 44000, 38000] }
df = pd.DataFrame(data)
# a. Display the highest sales highest_sales =
df.loc[df['Sales'].idxmax()]
print("Highest Sales:")
print(highest_sales)
# b. Display the total sales in the East region
total_sales_east = df[df['Region'] == 'East']['Sales'].sum()
print("\nTotal Sales in East Region:", total_sales_east)