Assignment Unit I and II
Assignment Unit I and II
1. What is data science? Explain its key components and how it differs from traditional
data analysis.
2. Describe the data science workflow. What are the major steps involved in solving a
data science problem?
3. How is data science applied in various industries? Provide examples of its
applications in fields like healthcare, finance, and marketing.
4. Differentiate between data science, machine learning, and artificial intelligence. How
do they interrelate in practice?
1. What are the key traits of big data? Explain the "5 Vs" (Volume, Velocity, Variety,
Veracity, and Value) of big data.
2. How do the characteristics of big data impact the methods used for storing,
processing, and analyzing data? Provide examples.
3. Discuss the challenges associated with big data. How do these challenges influence
the choice of tools and technologies for big data analysis?
4. Explain how scalability and distributed computing are important when dealing with
big data. What are some common tools used to handle big data?
Web Scraping
1. What is web scraping? Describe its importance in data science and list some common
tools used for web scraping in Python.
2. Explain the ethical considerations and legal implications of web scraping. What are
some guidelines to follow when scraping data from websites?
3. Describe the process of web scraping using BeautifulSoup and requests in Python.
Provide an example of scraping data from a website.
4. What are some common challenges in web scraping, and how can they be mitigated?
Discuss issues such as CAPTCHA, rate limiting, and dynamic content.
Analysis vs Reporting
1. Differentiate between data analysis and data reporting. How does each contribute to
the decision-making process?
2. Explain the key differences between exploratory data analysis (EDA) and generating
business reports. When should you use each approach?
3. How does the focus of data analysis differ from data reporting in terms of the
audience and the purpose? Provide examples of each.
4. Discuss the tools and techniques commonly used for data analysis versus those used
for data reporting. How do their outputs differ?
Unit II
1. Explain how to create a bar chart using Matplotlib in Python. What are some common
use cases for bar charts in data science?
2. How can you customize line charts in Matplotlib to show multiple data series on the
same plot? Give an example.
3. Describe the process of creating a scatterplot in Matplotlib. How can you modify the
size and color of points based on additional data?
4. What are the different types of visualizations available in Matplotlib for comparing
categorical and numerical data? Provide examples.
NumPy
1. What is NumPy and how is it used in data science? Explain the concept of arrays and
how NumPy arrays differ from Python lists.
2. Demonstrate how to perform basic mathematical operations (addition, subtraction,
multiplication, etc.) on NumPy arrays.
3. Explain the concept of broadcasting in NumPy. Provide an example where
broadcasting is used for efficient computation.
4. How can you use NumPy to generate random numbers and create datasets for data
analysis? Provide examples.
Scikit-learn
1. Explain the purpose of Scikit-learn in Python. How is it used for machine learning?
2. What is the difference between supervised and unsupervised learning in Scikit-learn?
Provide examples of algorithms for each type.
3. Describe the process of training a linear regression model in Scikit-learn. What
functions are used to evaluate the model’s performance?
4. How does Scikit-learn handle feature scaling? Explain the importance of scaling in
machine learning models.
1. What is the purpose of the NLTK library in Python? How is it used for text
processing?
2. Explain how tokenization is performed using NLTK. Why is it an important step in
natural language processing (NLP)?
3. Describe the process of sentiment analysis using NLTK. How can this be applied in
analyzing social media data?
4. How can NLTK be used for named entity recognition (NER)? Provide an example of
extracting entities from text.
1. Describe the process of reading and writing CSV files in Python using Pandas.
Provide an example.
2. Explain how web scraping works in Python using BeautifulSoup. What precautions
should be taken when scraping websites?
3. How can the Twitter API be used to collect data for sentiment analysis? Provide an
example of connecting to the API and retrieving tweets.
4. What are some common methods for handling missing data in Python? Explain the
pros and cons of different approaches.
1. What is data munging, and why is it important in the data analysis process?
2. How can Pandas be used to clean and manipulate data? Provide an example of
filtering and modifying data in a DataFrame.
3. Explain how to handle outliers in a dataset. What impact can outliers have on the
results of a data analysis?
4. Describe the process of rescaling data using MinMaxScaler and StandardScaler in
Scikit-learn. When should you use each?
Dimensionality Reduction