Unit 1-Part3-Compressed
Unit 1-Part3-Compressed
5.Sensor Data:
•Data generated by sensors, IoT devices, and other
monitoring tools.
•This can include data from smart devices, industrial sensors,
environmental sensors, etc.
• Data collection
• Data cleaning
• Exploratory data analysis
• Modeling
• Deployment
Steps Used in Data Science
Data collection
After formulating any problem statement the main task
is to calculate data that can help us in our analysis and
manipulation.
• Sometimes data is collected by performing some
kind of survey and there are times when it is done
by performing scrapping.
• Gather relevant data from various sources, which
may include databases, APIs, files, or external
datasets.
• Ensure the data collected is sufficient and
appropriate for addressing the defined problem.
Data collection
Step 1: Remove Duplicates
When you are working with large datasets, working across multiple data sources, or have not implemented any quality checks before
adding an entry, your data will likely show duplicated values.
These duplicated values add redundancy to your data and can make your calculations go wrong. Duplicate serial numbers of products
in a dataset will give you a higher count of products than the actual numbers.
Duplicate email IDs or mobile numbers might cause your communication to look more like spam. We take care of these duplicate
records by keeping just one occurrence of any unique observation in our data.