Data Science Lecture Notes
Course: Introduction to Data Science
Instructor: Dr. Sarah Thompson
Semester: Fall 2025
Week 1: Overview of Data Science
Data Science is an interdisciplinary field that combines statistics, computer science,
and domain expertise to extract insights from structured and unstructured data.
Applications include predictive modeling, natural language processing, and data
visualization.
Week 2: Data Collection & Cleaning
Data quality is critical. Techniques include handling missing values, detecting outliers,
and normalizing numerical features. Tools: Python (pandas), R, SQL.
Week 3: Exploratory Data Analysis (EDA)
EDA involves summarizing datasets and visualizing distributions to identify patterns
and relationships. Common tools: matplotlib, seaborn, Tableau.
Week 4: Statistical Foundations
Key concepts: probability distributions, hypothesis testing, confidence intervals, and
regression analysis.
Week 5: Machine Learning Basics
Supervised vs. Unsupervised Learning. Algorithms: Linear Regression, Decision Trees,
k-Means Clustering, Random Forests.
Week 6: Model Evaluation
Metrics include accuracy, precision, recall, F1 score, and ROC-AUC. Cross-validation
ensures model robustness.
Week 7: Big Data & Cloud Computing
Introduction to distributed systems (Hadoop, Spark) and cloud platforms (AWS, GCP,
Azure) for large-scale data processing.
Week 8: Ethics in Data Science
Discussion on bias, fairness, and privacy in AI models. Importance of transparent
algorithms and data governance.
Week 9: Capstone Project
Students apply the full data science workflow to a real-world dataset, delivering a
report and presentation.
End of Notes