Class Notes: Introduction to Data Science
What is Data Science?
Data Science is an interdisciplinary field that uses scientific methods, algorithms, and
systems to extract insights and knowledge from structured and unstructured data. It
combines aspects of statistics, computer science, and domain expertise.
Data Science vs. Data Analytics
- Data Science: Focuses on building models to predict future trends using machine learning
and complex algorithms.
- Data Analytics: Focuses on interpreting historical data to gain actionable insights.
Key Components of Data Science
1. Data Collection: Gathering data from various sources.
2. Data Cleaning: Handling missing values, outliers, and inconsistencies.
3. Data Exploration: Using statistical techniques and visualizations.
4. Modeling: Applying algorithms to find patterns.
5. Deployment: Making models accessible in production systems.
Data Science Process (CRISP-DM)
The Cross-Industry Standard Process for Data Mining includes:
1. Business Understanding
2. Data Understanding
3. Data Preparation
4. Modeling
5. Evaluation
6. Deployment
Tools and Technologies
Common tools include:
- Programming: Python, R
- Data Handling: Pandas, NumPy
- Visualization: Matplotlib, Seaborn
- Machine Learning: Scikit-learn, TensorFlow
- Platforms: Jupyter, Google Colab
Real-World Applications
Data science is used in healthcare (disease prediction), finance (fraud detection), retail
(recommendation engines), and marketing (customer segmentation).
Careers in Data Science
Popular roles include Data Scientist, Data Analyst, Machine Learning Engineer, and Business
Intelligence Analyst. Skills in programming, statistics, and data visualization are essential.