Practice Questions Answers IA
Practice Questions Answers IA
Q3. With the help of example explain any one of the technique of EDA
Ans.
Exploratory Data Analysis (EDA) is a crucial initial step in the data analysis process that
involves examining and understanding a dataset to uncover insights, patterns, and anomalies.
EDA helps analysts gain a better understanding of the data's structure, distribution, and
characteristics before applying more advanced statistical or machine learning techniques.
EDA typically consists of three main stages: summarization, visualization, and normalization.
1. Summarization:
In this stage, the aim is to summarize the main characteristics of the dataset using
descriptive statistics and metrics. This provides a high-level overview of the data.
Common summarization techniques include:
Mean: Calculated by summing all values and dividing by the total count.
Median: The median is the middle value in a sorted dataset
Mode: The mode is the most frequently occurring value in a dataset.
Variance:
Standard Deviation:
Quartiles: Quartiles divide a dataset into four equal parts, with three quartiles
(Q1, Q2, Q3)
Example: Suppose we have a dataset of customer age, height, weight. We can
calculate the mean, median, mode to find out the missing/null values
2. Visualization:
Visualization is a powerful tool in EDA that helps to explore data patterns,
relationships, and outliers through charts, graphs, and plots.
Common visualization techniques include histograms, box plots, scatter plots, and
bar charts. These visualizations provide insights into data distributions, correlations,
and potential anomalies.
Example, we can create a histogram to visualize the Weight distribution in the
customer data set. A histogram will help to understand the range and the spread of
the data.
3. Normalization:
Normalization is a method used to adjust data so that it follows a standard scale or
distribution. This adjustment makes it simpler to compare and analyze various
features or datasets.
Common normalization techniques include min-max scaling (scaling data to a
specific range, e.g., [0, 1]) and z-score standardization (scaling data to have a mean
of 0 and a standard deviation of 1).
Example: