Data Science and Python Session
Date Time Topic
Mon, 30 11:00 AM – 1:00 PM Python for Data Science – The Groundwork
June
Tue, 1 July 11:00 AM – 1:00 PM Data Cleaning, Wrangling & Visualization
Wed, 2 July 11:00 AM – 1:00 PM Core Machine Learning Algorithms & Metrics
Thu, 3 July 11:00 AM – 1:00 PM Real-World ML Pipelines + GitHub + Project
Workflow
Fri, 4 July 11:00 AM – 1:00 PM Industry Skills, Career Tips & MDPI Paper Showcase
● Tools used: Google Colab, Pandas, Scikit-learn, Seaborn, GitHub
● Outcome: Students build and push a complete ML project to GitHub by Day 5
● Extras: GitHub starter repo, and open Q&A on internships & freelance tips
Detailed Schedule
📍 Date: 30 June – 4 July 2025
🕚 Time: 11:00 AM to 1:00 PM (2 hours daily)
🔵 Day 1: Python for Data Science – The Groundwork
🧭 Objective:
Equip students with Python fundamentals and introduce them to data structures and
basic libraries (NumPy, Pandas).
⏰ Agenda (11:00 AM – 1:00 PM)
Time Activity
11:00–11:10 Welcome, goals of the workshop, what to expect
11:10–11:40 Python basics (variables, loops, functions)
11:40–12:00 Intro to Jupyter/Colab & Python data structures
12:00–12:30 Numpy & Pandas overview: arrays, series,
dataframes
12:30–12:50 Hands-on: Load the Titanic dataset
12:50–1:00 Q&A + Assignment: Explore the dataset
independently
🎯 Deliverables:
● Colab notebook for Day 1
● Assignment to calculate survival stats using .groupby() and visual exploration
🟢 Day 2: Data Cleaning, Wrangling & Visualization
🧭 Objective:
Teach students to clean messy data and explore insights visually using Matplotlib and
Seaborn.
⏰ Agenda (11:00 AM – 1:00 PM)
Time Activity
11:00–11:20 Data cleaning: missing values, outliers, duplicates
11:20–11:50 Hands-on with Pandas (dropna(), fillna(),
filters)
11:50–12:30 Visualizations: Histograms, Boxplots, Pairplots
12:30–12:50 EDA mini-project: Visualize Titanic or new dataset
12:50–1:00 Q&A + GitHub intro + notebook submission
guidance
🎯 Deliverables:
● Colab notebook with 4 types of visualizations + short summary
● GitHub push of notebook (can be assisted live)
🔴 Day 3: Core ML Algorithms & Model Evaluation
🧭 Objective:
Introduce essential ML algorithms, training/testing logic, and model evaluation metrics.
⏰ Agenda (11:00 AM – 1:00 PM)
Time Activity
11:00–11:30 Overview of Machine Learning & real-life examples
11:30–12:10 Hands-on: Linear Regression (exam scores
prediction)
12:10–12:40 Hands-on: Logistic Regression (Titanic
classification)
12:40–12:50 Confusion Matrix, Accuracy, Precision, Recall
12:50–1:00 Assignment: Try KNN or Decision Tree on same
dataset
🎯 Deliverables:
● Notebook with two working models (regression + classification)
● Evaluation metrics output
● Homework: experiment with KNN or SVM
🟠 Day 4: Real-World ML Pipeline + GitHub Integration
🧭 Objective:
Demonstrate industry-style pipeline, preprocessing, hyperparameter tuning, and using
GitHub effectively.
⏰ Agenda (11:00 AM – 1:00 PM)
Time Activity
11:00–11:30 Preprocessing: Label Encoding, Scaling, Train-Test
Split
11:30–12:00 Using Pipelines in scikit-learn
12:00–12:30 GridSearchCV: tuning model hyperparameters
12:30–12:50 Live: Git basics + Pushing project notebook to
GitHub
12:50–1:00 Bonus Tips: How to present a project professionally
🎯 Deliverables:
● Full pipeline notebook on GitHub
● Template README.md for showcasing project
● PDF of evaluation metrics (optional)
🟣 Day 5: Industry Use-Cases + Research Showcase
🧭 Objective:
Highlight career paths, portfolio development, and showcase your research paper as
inspiration.
⏰ Agenda (11:00 AM – 1:00 PM)
Time Activity
11:00–11:30 Industry Use Cases: ML in Finance, Health, Retail,
Startups
11:30–11:50 How to build a DS career: Resume, GitHub, LinkedIn
11:50–12:20 Showcase your MDPI paper: simplified
12:20–12:40 Walkthrough of a model from your paper
(non-technical)
12:40–1:00 Open Q&A + Feedback
🎯 Deliverables:
● A GitHub repo containing all 5-day notebooks
● Career guidance doc