Advanced Python Data Science Exercise
Set
1. Matplotlib: Advanced OHLC Candlestick Plot with Volume Annotations
Using the provided OHLC dataset, plot candlestick-style OHLC bars. Highlight the top 3
highest volume days with annotations. Ensure text size dynamically adjusts to prevent
overlaps.
Dataset Preparation Code:
import pandas as pd
import numpy as np
np.random.seed(42)
dates = pd.date_range("2023-01-01", periods=100)
ohlc_df = pd.DataFrame({
'Date': dates,
'Open': np.random.uniform(100, 200, 100).round(2),
'High': np.random.uniform(200, 300, 100).round(2),
'Low': np.random.uniform(50, 100, 100).round(2),
'Close': np.random.uniform(100, 200, 100).round(2),
'Volume': np.random.randint(1_000, 10_000, 100)
})
2. Matplotlib: Multi-Axis Climate Plot with Interactive Hover
Plot Temperature, Humidity, and WindSpeed on a shared x-axis with separate y-axes. Add
hover interactivity using mpl_connect or mplcursors.
Dataset Preparation Code:
climate_df = pd.DataFrame({
'Date': pd.date_range("2023-01-01", periods=100),
'Temperature': np.random.uniform(20, 40, 100),
'Humidity': np.random.uniform(40, 90, 100),
'WindSpeed': np.random.uniform(5, 30, 100)
})
3. Plotly: Drilldown Sunburst with Time-Series Update
Create a Plotly Dash app with a sunburst chart for Region > Country > Product > Quarter. On
clicking a segment, show a corresponding time series.
Dataset Preparation Code:
regions = ['Asia', 'Europe', 'America']
countries = {'Asia': ['India', 'China'], 'Europe': ['France', 'Germany'], 'America': ['USA',
'Brazil']}
products = ['A', 'B', 'C']
quarters = ['Q1', 'Q2', 'Q3', 'Q4']
data = []
for region in regions:
for country in countries[region]:
for product in products:
for quarter in quarters:
data.append({
'Region': region,
'Country': country,
'Product': product,
'Quarter': quarter,
'Sales': np.random.randint(1000, 10000)
})
sales_df = pd.DataFrame(data)
4. Plotly: Linked Hover and Animated Subplots
Use Plotly to plot: (a) a choropleth map, (b) a scatter plot, and (c) a time-series bar chart.
Animate monthly data and enable linked hover.
Dataset Preparation Code:
world_df = pd.DataFrame({
'Country': ['USA', 'India', 'Germany', 'Brazil', 'China'],
'Sales': np.random.randint(10_000, 100_000, 5),
'ISO': ['USA', 'IND', 'DEU', 'BRA', 'CHN']
})
product_df = pd.DataFrame({
'Product': ['A', 'B', 'C', 'D'],
'Price': np.random.uniform(10, 100, 4),
'Volume': np.random.randint(100, 500, 4)
})
time_series_df = pd.DataFrame({
'Month': pd.date_range("2023-01-01", periods=12, freq='M'),
'Sales': np.random.randint(5000, 15000, 12)
})
5. Pandas: Rolling Average Anomaly Detection
From a MultiIndex dataset with transaction logs, compute rolling averages and flag spend
increase anomalies.
Dataset Preparation Code:
user_ids = [f'U{i}' for i in range(1, 21)]
dates = pd.date_range('2023-01-01', '2023-04-30')
transaction_data = []
for uid in user_ids:
for date in np.random.choice(dates, 40):
transaction_data.append({
'UserID': uid,
'Date': date,
'Spend': round(np.random.uniform(10, 500), 2)
})
transaction_df = pd.DataFrame(transaction_data)
transaction_df = transaction_df.sort_values(['UserID', 'Date']).set_index(['UserID', 'Date'])
6. Pandas: Funnel Analysis from Multi-source Data
Using 3 CSVs (users, logins, purchases), compute user funnel conversion metrics based on
time windows.
Dataset Preparation Code:
user_ids = [f'U{i}' for i in range(1, 21)]
users_df = pd.DataFrame({
'user_id': user_ids,
'join_date': pd.date_range('2023-01-01', periods=20)
})
logins_df = pd.DataFrame({
'user_id': np.random.choice(user_ids, 50),
'login_date': pd.date_range('2023-01-01', periods=50)
})
purchases_df = pd.DataFrame({
'user_id': np.random.choice(user_ids, 30),
'purchase_date': pd.date_range('2023-01-10', periods=30),
'amount': np.random.randint(100, 1000, 30)
})
7. NumPy: Memory-Efficient Weighted Window Function
Apply a custom window function to a large 1D NumPy array (>10 million elements) using
broadcasting (no loops).
Dataset Preparation Code:
large_array = np.random.rand(10_000_000)
# Goal: Apply custom weighted moving average of size 5
8. NumPy: Vectorized Random Walks with Reset Constraint
Simulate 100,000 random walks of 1000 steps. Reset to zero if walk drops below -10. Track
resets and final position.
Dataset Preparation Code:
num_walks = 100_000
steps = 1000
random_walks = np.random.choice([-1, 1], size=(num_walks, steps))