0% found this document useful (0 votes)
15 views21 pages

Ad404 Data Science Notes Unit-2

The document outlines the syllabus for a Data Science course at Jai Narain College of Technology, focusing on unstructured data analytics, its importance, and various types of data analytics including predictive, descriptive, prescriptive, and diagnostic analytics. It emphasizes the significance of unstructured data in providing deeper insights, driving innovation, and improving customer experiences, while also detailing the steps in data analysis and the role of data visualization. Additionally, it discusses the future scope of data analytics across various industries such as retail, healthcare, finance, and transportation.

Uploaded by

shivantsahu72
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views21 pages

Ad404 Data Science Notes Unit-2

The document outlines the syllabus for a Data Science course at Jai Narain College of Technology, focusing on unstructured data analytics, its importance, and various types of data analytics including predictive, descriptive, prescriptive, and diagnostic analytics. It emphasizes the significance of unstructured data in providing deeper insights, driving innovation, and improving customer experiences, while also detailing the steps in data analysis and the role of data visualization. Additionally, it discusses the future scope of data analytics across various industries such as retail, healthcare, finance, and transportation.

Uploaded by

shivantsahu72
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

JAI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL(M.P.

)
Approved by AICTE New Delhi & Govt. of M.P.
Affiliated with Rajiv Gandhi Technical University (RGPV), Bhopal
__________________________________________________________________________________
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
Syllabus: AD404_DATA SCIENCE
Unit II: Unstructured Data Analytics- Importance of Unstructured Data, Unstructured Data
Analytics: Descriptive, diagnostic, predictive and prescriptive data Analytics based on Case
study. Data Visualization: box plots, histograms, scatterplots, features map visualization, t-SNE .
Overview of Advance Excel- Introduction, Data validation, Introduction to charts, pivot table,
Scenario manager, Protecting data, Excel minor, Introduction to macros.

Unstructured Data Analytics- Importance of Unstructured Data:

Unstructured data, which lacks a predefined format, is crucial for businesses because it holds
valuable, often untapped, insights that can inform decisions, improve customer experiences, and
drive innovation, going beyond what structured data alone can provide.

1. Richer Insights and Context:


●​ Unstructured data, such as emails, social media posts, documents, and videos, provides a
more complete picture of customer behavior, market trends, and business operations
than structured data alone.
●​ It allows for nuanced analysis, such as sentiment analysis, trend detection, and
understanding customer preferences.
●​ This deeper understanding helps businesses make more informed decisions and tailor
their products and services to meet customer demands effectively.

2. Innovation and Business Intelligence:


●​ Unstructured data can fuel innovation by providing new sources of inspiration,
creativity, and discovery.
●​ By exploring unstructured datasets, organizations can uncover novel insights, ideas, and
solutions that lead to breakthrough innovations, product enhancements, and business
opportunities.
●​ Unstructured data can also help businesses identify emerging trends and opportunities,
allowing them to stay ahead of the competition.

3. Improved Customer Experience:


●​ Analyzing customer interactions captured in unstructured data, such as reviews, call
transcripts, or product feedback, can provide valuable insights into customer preferences
and pain points.
●​ This knowledge can be used to improve customer service, personalize marketing
campaigns, and enhance overall customer satisfaction.
●​ Businesses can use unstructured data to understand customer intent and market shifts,
empowering them to provide better, more secure, and resilient customer experiences.

4. Scalability and Flexibility:


JAI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL(M.P.)
Approved by AICTE New Delhi & Govt. of M.P.
Affiliated with Rajiv Gandhi Technical University (RGPV), Bhopal
__________________________________________________________________________________
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
●​ Unstructured data can be stored in its native format, making it highly flexible and
adaptable to different use cases.
●​ This flexibility allows businesses to collect and analyze data from a variety of sources,
without being constrained by rigid schemas.
●​ Unstructured data is also highly scalable, allowing businesses to handle large volumes of
data without sacrificing performance.

5. Examples of Unstructured Data:


●​ Emails
●​ Social media posts
●​ Documents (reports, presentations, etc.)
●​ Videos and images
●​ Audio files
●​ Customer reviews
●​ Call transcripts
●​ Surveillance footage

Unstructured data analytics is the process of extracting valuable insights and knowledge from
data that doesn't conform to a structured format, such as text, images, audio, and video, using
specialized techniques and tools.

​ What is Unstructured Data


●​ It's data that doesn't have a predefined format or structure, unlike structured data
stored in relational databases.
●​ Examples include text documents, emails, social media posts, images, audio files,
and video files.
​ Why is it Important?
●​ Unstructured data can hold valuable insights into customer behavior, market trends,
and operational processes.
●​ Analyzing this data can help businesses make better decisions, improve customer
experiences, and gain a competitive edge.
​ How is it Analyzed?
●​ Text Mining: Extracts valuable information from text-based sources, such as
customer reviews or social media posts.
●​ Sentiment Analysis: Identifies emotions or opinions expressed in text data.
●​ Image and Video Analysis: Uses techniques like object detection and facial
recognition to extract information from visual data.
●​ Audio Analysis: Analyzes audio data for speech recognition, voice sentiment
analysis, and other applications.
●​ Machine Learning: Uses algorithms to find patterns and insights in unstructured
data.
​ Tools and Techniques
●​ Specialized tools are needed to handle the diverse formats and large volumes of
unstructured data.
JAI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL(M.P.)
Approved by AICTE New Delhi & Govt. of M.P.
Affiliated with Rajiv Gandhi Technical University (RGPV), Bhopal
__________________________________________________________________________________
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
●​ These tools often include natural language processing (NLP), image processing, and
machine learning capabilities.
​ Examples of Use Cases
●​ Healthcare: Analyzing medical images, patient records, and social media posts to
improve diagnosis and treatment.
●​ Finance: Detecting fraud, analyzing customer behavior, and identifying market
trends.
●​ Retail: Understanding customer preferences, predicting demand, and improving
marketing campaigns.
●​ Government: Analyzing social media sentiment, monitoring public opinion, and
detecting potential threats.
​ Challenges
●​ Unstructured data can be difficult to store, manage, and analyze due to its varied
formats and lack of structure.
●​ It requires specialized skills and tools to extract meaningful insights.

Types of Data Analytics


There are four major types of data analytics:
1.​ Predictive (forecasting)
2.​ Descriptive (business intelligence and data mining)
3.​ Prescriptive (optimization and simulation)
4.​ Diagnostic analytics

Predictive Analytics

Predictive analytics turn the data into valuable, actionable information. predictive analytics uses
data to determine the probable outcome of an event or a likelihood of a situation occurring.
JAI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL(M.P.)
Approved by AICTE New Delhi & Govt. of M.P.
Affiliated with Rajiv Gandhi Technical University (RGPV), Bhopal
__________________________________________________________________________________
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
Predictive analytics holds a variety of statistical techniques from modeling, machine learning,
data mining , and game theory that analyze current and historical facts to make predictions about
a future event. Techniques that are used for predictive analytics are:
●​ Linear Regression
●​ Time Series Analysis and Forecasting
●​ Data Mining

Basic Cornerstones of Predictive Analytics


●​ Predictive modeling
●​ Decision Analysis and optimization
●​ Transaction profiling

Descriptive Analytics

Descriptive analytics looks at data and analyze past event for insight as to how to approach
future events. It looks at past performance and understands the performance by mining historical
data to understand the cause of success or failure in the past. Almost all management reporting
such as sales, marketing, operations, and finance uses this type of analysis.
The descriptive model quantifies relationships in data in a way that is often used to classify
customers or prospects into groups. Unlike a predictive model that focuses on predicting the
behavior of a single customer, Descriptive analytics identifies many different relationships
between customer and product.
Common examples of Descriptive analytics are company reports that provide historic
reviews like:
●​ Data Queries
●​ Reports
●​ Descriptive Statistics
●​ Data dashboard

Prescriptive Analytics

Prescriptive Analytics automatically synthesize big data, mathematical science, business rule,
and machine learning to make a prediction and then suggests a decision option to take advantage
of the prediction.
Prescriptive analytics goes beyond predicting future outcomes by also suggesting action benefits
from the predictions and showing the decision maker the implication of each decision option.
Prescriptive Analytics not only anticipates what will happen and when to happen but also why it
will happen. Further, Prescriptive Analytics can suggest decision options on how to take
advantage of a future opportunity or mitigate a future risk and illustrate the implication of each
decision option.
JAI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL(M.P.)
Approved by AICTE New Delhi & Govt. of M.P.
Affiliated with Rajiv Gandhi Technical University (RGPV), Bhopal
__________________________________________________________________________________
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
For example, Prescriptive Analytics can benefit healthcare strategic planning by using analytics
to leverage operational and usage data combined with data of external factors such as economic
data, population demography, etc.

Diagnostic Analytics

In this analysis, we generally use historical data over other data to answer any question or for the
solution of any problem. We try to find any dependency and pattern in the historical data of the
particular problem.
For example, companies go for this analysis because it gives a great insight into a problem, and
they also keep detailed information about their disposal otherwise data collection may turn out
individual for every problem and it will be very time-consuming. Common techniques used for
Diagnostic Analytics are:
●​ Data discovery
●​ Data mining
●​ Correlations

The Role of Data Analytics


Data analytics plays a pivotal role in enhancing operations, efficiency, and performance across
various industries by uncovering valuable patterns and insights. Implementing data analytics
techniques can provide companies with a competitive advantage. The process typically involves
four fundamental steps:
●​ Data Mining : This step involves gathering data and information from diverse
sources and transforming them into a standardized format for subsequent analysis.
Data mining can be a time-intensive process compared to other steps but is crucial for
obtaining a comprehensive dataset.
●​ Data Management : Once collected, data needs to be stored, managed, and made
accessible. Creating a database is essential for managing the vast amounts of
information collected during the mining process. SQL (Structured Query Language)
remains a widely used tool for database management, facilitating efficient querying
and analysis of relational databases.
●​ Statistical Analysis : In this step, the gathered data is subjected to statistical analysis
to identify trends and patterns. Statistical modeling is used to interpret the data and
make predictions about future trends. Open-source programming languages like
Python, as well as specialized tools like R, are commonly used for statistical analysis
and graphical modeling.
●​ Data Presentation : The insights derived from data analytics need to be effectively
communicated to stakeholders. This final step involves formatting the results in a
manner that is accessible and understandable to various stakeholders, including
JAI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL(M.P.)
Approved by AICTE New Delhi & Govt. of M.P.
Affiliated with Rajiv Gandhi Technical University (RGPV), Bhopal
__________________________________________________________________________________
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
decision-makers, analysts, and shareholders. Clear and concise data presentation is
essential for driving informed decision-making and driving business growth.

Steps in Data Analysis


●​ Define Data Requirements : This involves determining how the data will be grouped
or categorized. Data can be segmented based on various factors such as age,
demographic, income, or gender, and can consist of numerical values or categorical
data.
●​ Data Collection : Data is gathered from different sources, including computers,
online platforms, cameras, environmental sensors, or through human personnel.
●​ Data Organization : Once collected, the data needs to be organized in a structured
format to facilitate analysis. This could involve using spreadsheets or specialized
software designed for managing and analyzing statistical data.
●​ Data Cleaning : Before analysis, the data undergoes a cleaning process to ensure
accuracy and reliability. This involves identifying and removing any duplicate or
erroneous entries, as well as addressing any missing or incomplete data. Cleaning the
data helps to mitigate potential biases and errors that could affect the analysis results.

Usage of Data Analytics


There are some key domains and strategic planning techniques in which Data Analytics has
played a vital role:
●​ Improved Decision-Making – If we have supporting data in favour of a decision,
then we can implement them with even more success probability. For example, if a
certain decision or plan has to lead to better outcomes then there will be no doubt in
implementing them again.
●​ Better Customer Service – Churn modeling is the best example of this in which we
try to predict or identify what leads to customer churn and change those things
accordingly, so that the attrition of the customers is as low as possible which is a most
important factor in any organization.
●​ Efficient Operations – Data Analytics can help us understand what is the demand of
the situation and what should be done to get better results then we will be able to
streamline our processes which in turn will lead to efficient operations.
●​ Effective Marketing – Market segmentation techniques have been implemented to
target this important factor only in which we are supposed to find the marketing
techniques which will help us increase our sales and leads to effective marketing
strategies.

Future Scope of Data Analytics


●​ Retail : To study sales patterns, consumer behavior, and inventory management, data
analytics can be applied in the retail sector. Data analytics can be used by retailers to
make data-driven decisions regarding what products to stock, how to price them, and
how to best organize their stores.
JAI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL(M.P.)
Approved by AICTE New Delhi & Govt. of M.P.
Affiliated with Rajiv Gandhi Technical University (RGPV), Bhopal
__________________________________________________________________________________
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
●​ Healthcare : Data analytics can be used to evaluate patient data, spot trends in patient
health, and create individualized treatment regimens. Data analytics can be used by
healthcare companies to enhance patient outcomes and lower healthcare expenditures.
●​ Finance : In the field of finance, data analytics can be used to evaluate investment
data, spot trends in the financial markets, and make wise investment decisions. Data
analytics can be used by financial institutions to lower risk and boost the performance
of investment portfolios.
●​ Marketing : By analyzing customer data, spotting trends in consumer behavior, and
creating customized marketing strategies, data analytics can be used in marketing.
Data analytics can be used by marketers to boost the efficiency of their campaigns
and their overall impact.
●​ Manufacturing : Data analytics can be used to examine production data, spot trends
in production methods, and boost production efficiency in the manufacturing sector.
Data analytics can be used by manufacturers to cut costs and enhance product quality.
●​ Transportation : To evaluate logistics data, spot trends in transportation routes, and
improve transportation routes, the transportation sector can employ data analytics.
Data analytics can help transportation businesses cut expenses and speed up delivery
times.

What is Data Visualization and Why is It Important:


Data visualization is the graphical representation of information. In this guide we will study what
is Data visualization and its importance with use cases.
Understanding Data Visualization
Data visualization translates complex data sets into visual formats that are easier for the human
brain to understand. This can include a variety of visual tools such as:
●​ Charts: Bar charts, line charts, pie charts, etc.
●​ Graphs: Scatter plots, histograms, etc.
●​ Maps: Geographic maps, heat maps, etc.
●​ Dashboards: Interactive platforms that combine multiple visualizations.

The primary goal of data visualization is to make data more accessible and easier to interpret
allow users to identify patterns, trends, and outliers quickly. This is particularly important in big
data where the large volume of information can be confusing without effective visualization
techniques.
Why is Data Visualization Important:
Let’s take an example. Suppose you compile data of the company’s profits from 2013 to 2023
and create a line chart. It would be very easy to see the line going constantly up with a drop in
just 2018. So you can observe in a second that the company has had continuous profits in all the
years except a loss in 2018.
JAI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL(M.P.)
Approved by AICTE New Delhi & Govt. of M.P.
Affiliated with Rajiv Gandhi Technical University (RGPV), Bhopal
__________________________________________________________________________________
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
It would not be that easy to get this information so fast from a data table. This is just one
demonstration of the usefulness of data visualization. Let’s see some more reasons why
visualization of data is so important.

1. Data Visualization Simplifies the Complex Data: Large and complex data sets can be
challenging to understand. Data visualization helps break down complex information into
simpler, visual formats making it easier for the audience to grasp.
2. Enhances Data Interpretation: Visualization highlights patterns, trends, and correlations in
data that might be missed in raw data form. This enhanced interpretation helps in making
informed decisions. Consider another Tableau visualization that demonstrates the relationship
between sales and profit. It might show that higher sales do not necessarily equate to higher
profits this trend that could be difficult to find from raw data alone. This perspective helps
businesses adjust strategies to focus on profitability rather than just sales volume.
3. Data Visualization Saves Time: It is definitely faster to gather some insights from the data
using data visualization rather than just studying a chart. In the screenshot below on Tableau it is
very easy to identify the states that have suffered a net loss rather than a profit. This is because
all the cells with a loss are coloured red using a heat map, so it is obvious states have suffered a
loss. Compare this to a normal table where you would need to check each cell to see if it has a
negative value to determine a loss. Visualizing Data can save a lot of time in this situation.
4. Improves Communication: Visual representations of data make it easier to share findings
with others especially those who may not have a technical background. This is important in
business where stakeholders need to understand data-driven insights quickly. Let see the below
TreeMap visualization on Tableau showing the number of sales in each region of the United
States with the largest rectangle representing California due to its high sales volume. This visual
context is much easier to grasp rather than a detailed table of numbers.
5. Data Visualization Tells a Data Story: Data visualization is also a medium to tell a data story
to the viewers. The visualization can be used to present the data facts in an easy-to-understand
form while telling a story and leading the viewers to an inevitable conclusion. This data story
should have a good beginning, a basic plot, and an ending that it is leading towards. For
example, if a data analyst has to craft a data visualization for company executives detailing the
JAI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL(M.P.)
Approved by AICTE New Delhi & Govt. of M.P.
Affiliated with Rajiv Gandhi Technical University (RGPV), Bhopal
__________________________________________________________________________________
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
profits of various products then the data story can start with the profits and losses of multiple
products and move on to recommendations on how to tackle the losses.
Types of Data Visualization Analysis
Data visualization is used to analyze visually the behavior of the different variables in a dataset,
such as a relationship between data points in a variable or the distribution. Depending on the
number of variables you want to study at once, you can distinguish three types of data
visualization analysis.
●​ Univariate analysis: Used to summarize the behavior of only one variable at a time.
●​ Bivariate analysis: Helps to study the relationship between two variables
●​ Multivariate analysis: Allows data practitioners to analyze more than two variables at
once

Data Visualization Techniques


Box Plot:
Box Plot is a graphical method to visualize data distribution for gaining insights and making
informed decisions. Box plot is a type of chart that depicts a group of numerical data through
their quartiles.
In this article, we are going to discuss components of a box plot, how to create a box plot, uses
of a Box Plot, and how to compare box plots.

What is a Box Plot:


The idea of box plot was presented by John Tukey in 1970. He wrote about it in his book
“Exploratory Data Analysis” in 1977. Box plot is also known as a whisker plot, box-and-whisker
plot, or simply a box-and whisker diagram. Box plot is a graphical representation of the
distribution of a dataset. It displays key summary statistics such as the median, quartiles, and
potential outliers in a concise and visual manner. By using Box plot you can provide a summary
of the distribution, identify potential and compare different datasets in a compact and visual
manner.
Elements of Box Plot
A box plot gives a five-number summary of a set of data which is-
●​ Minimum – It is the minimum value in the dataset excluding the outliers.
●​ First Quartile (Q1) – 25% of the data lies below the First (lower) Quartile.
●​ Median (Q2) – It is the mid-point of the dataset. Half of the values lie below it and
half above.
●​ Third Quartile (Q3) – 75% of the data lies below the Third (Upper) Quartile.
●​ Maximum – It is the maximum value in the dataset excluding the outliers.
JAI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL(M.P.)
Approved by AICTE New Delhi & Govt. of M.P.
Affiliated with Rajiv Gandhi Technical University (RGPV), Bhopal
__________________________________________________________________________________
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE

The area inside the box (50% of the data) is known as the Inter Quartile Range. The IQR is
calculated as –
IQR = Q3-Q1
Outlies are the data points below and above the lower and upper limit. The lower and upper
limit is calculated as –
Lower Limit = Q1 - 1.5*IQR
Upper Limit = Q3 + 1.5*IQR
The values below and above these limits are considered outliers and the minimum and maximum
values are calculated from the points which lie under the lower and upper limit.
How to create a box plots:
Let us take a sample data to understand how to create a box plot.
Here are the runs scored by a cricket team in a league of 12 matches – 100, 120, 110, 150, 110,
140, 130, 170, 120, 220, 140, 110.
To draw a box plot for the given data first we need to arrange the data in ascending order and
then find the minimum, first quartile, median, third quartile and the maximum.
Ascending Order
100, 110, 110, 110, 120, 120, 130, 140, 140, 150, 170, 220
Median (Q2) = (120+130)/2 = 125; Since there were even values
To find the First Quartile we take the first six values and find their median.
Q1 = (110+110)/2 = 110
For the Third Quartile, we take the next six and find their median.
Q3 = (140+150)/2 = 145
Note: If the total number of values is odd then we exclude the Median while calculating Q1 and
Q3. Here since there were two central values we included them. Now, we need to calculate the
Inter Quartile Range.
IQR = Q3-Q1 = 145-110 = 35
We can now calculate the Upper and Lower Limits to find the minimum and maximum values
and also the outliers if any.
Lower Limit = Q1-1.5*IQR = 110-1.5*35 = 57.5
Upper Limit = Q3+1.5*IQR = 145+1.5*35 = 197.5
So, the minimum and maximum between the range [57.5,197.5] for our given data are –
Minimum = 100
Maximum = 170
The outliers which are outside this range are –
JAI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL(M.P.)
Approved by AICTE New Delhi & Govt. of M.P.
Affiliated with Rajiv Gandhi Technical University (RGPV), Bhopal
__________________________________________________________________________________
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
Outliers = 220
Now we have all the information, so we can draw the box plot which is as below-

We can see from the diagram that the Median is not exactly at the center of the box and one
whisker is longer than the other. We also have one Outlier.
Use-Cases of Box Plot
●​ Box plots provide a visual summary of the data with which we can quickly identify
the average value of the data, how dispersed the data is, whether the data is skewed or
not (skewness).
●​ The Median gives you the average value of the data.
●​ Box Plots shows Skewness of the data-
a) If the Median is at the center of the Box and the whiskers are almost the same on both the
ends then the data is Normally Distributed.
b) If the Median lies closer to the First Quartile and if the whisker at the lower end is shorter
(as in the above example) then it has a Positive Skew (Right Skew).
c) If the Median lies closer to the Third Quartile and if the whisker at the upper end is
shorter than it has a Negative Skew (Left Skew).

●​ The dispersion or spread of data can be visualized by the minimum and maximum
values which are found at the end of the whiskers.
●​ The Box plot gives us the idea of about the Outliers which are the points which are
numerically distant from the rest of the data.
Histograms:
Histogram is a type of graphical representation used in statistics to show the distribution of
numerical data. It looks somewhat like a bar chart, but unlike bar graphs, which are used for
JAI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL(M.P.)
Approved by AICTE New Delhi & Govt. of M.P.
Affiliated with Rajiv Gandhi Technical University (RGPV), Bhopal
__________________________________________________________________________________
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
categorical data, histograms are designed for continuous data, grouping it into logical ranges
which are also known as "bins."
A histogram helps in visualizing the distribution of data across a continuous interval or period
which makes the data more understandable and also highlights the trends and patterns.

Parts of a Histogram
A histogram is a graph that represents the distribution of data. Here are the essential
components, presented in simple terms:
●​ Title: This is similar to the name of the histogram. It explains what the histogram is
about and what data it displays.
●​ X-axis: X-axis is a horizontal line at the bottom of the histogram. It displays the
categories or groups that the data is sorted into. For example, if you're measuring
people's heights, the X-axis may indicate several height ranges such as "5-6 feet" or
"6-7 feet".
●​ Y-axis: The Y-axis is a vertical line on the side of the histogram. It displays the
number of times something occurs in each category or group shown on the X-axis.
So, if you're measuring heights, the Y-axis may display how many individuals are in
each height range.
●​ Bars: Bars are the vertical rectangles you see on the chart. Each bar on the X-axis
represents a category or group, and its height indicates how many times something
occurs inside that category and the width indicates the range covered by each
category on the X-axis. So, higher bars indicate more occurrences, whereas shorter
bars indicate fewer occurrences.
Scatter plots:
Scatter plot is a mathematical technique that is used to represent data. Scatter plot also called a
Scatter Graph, or Scatter Chart uses dots to describe two different numeric variables. The
position of each dot on the horizontal and vertical axis indicates values for an individual data
point.
JAI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL(M.P.)
Approved by AICTE New Delhi & Govt. of M.P.
Affiliated with Rajiv Gandhi Technical University (RGPV), Bhopal
__________________________________________________________________________________
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
Scatter plot is one of the most important data visualization techniques and it is considered one of
the Seven Basic Tools of Quality. A scatter plot is used to plot the relationship between two
variables, on a two-dimensional graph that is known as Cartesian Plane on mathematical
grounds.
It is generally used to plot the relationship between one independent variable and one dependent
variable, where an independent variable is plotted on the x-axis and a dependent variable is
plotted on the y-axis so that you can visualize the effect of the independent variable on the
dependent variable. These plots are known as Scatter Plot Graph or Scatter Diagram.
Applications of Scatter Plot
As already mentioned, a scatter plot is a very useful data visualization technique. A few
applications of Scatter Plots are listed below.
●​ Correlation Analysis: Scatter plot is useful in the investigation of the correlation
between two different variables. It can be used to find out whether two variables have
a positive correlation, negative correlation or no correlation.
●​ Outlier Detection: Outliers are data points, which are different from the rest of the
data set. A Scatter Plot is used to bring out these outliers on the surface.
●​ Cluster Identification: In some cases, scatter plots can help identify clusters or
groups within the data.

Data visualization using feature maps, especially in the context of Convolutional Neural
Networks (CNNs), helps understand what features a network learns and how it processes data
by visualizing the output of each filter at different layers. This allows for debugging,
optimization, and a deeper understanding of the network's inner workings.

What are Feature Maps


​ Output of Filters: In a CNN, feature maps represent the output of each filter (or kernel)
in a given layer after processing the input data (like an image).
​ Layer-Specific: Each layer of a CNN has its own set of filters, and each filter produces a
feature map.
​ Understanding Network Behavior: By visualizing these feature maps, you can see what
types of features (edges, textures, shapes, etc.) the network is detecting at different layers.
​ Hierarchical Representations: Comparing feature maps across different layers reveals how
the network learns and extracts hierarchical representations of the input data.
JAI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL(M.P.)
Approved by AICTE New Delhi & Govt. of M.P.
Affiliated with Rajiv Gandhi Technical University (RGPV), Bhopal
__________________________________________________________________________________
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
Why Visualize Feature Maps:
​ Debugging and Optimization: Feature maps can help identify issues in the network's
architecture or training process.
​ Understanding Network Decisions: By seeing what features the network is focusing on,
you can gain insights into its decision-making process.
​ Improving Model Interpretability: Visualizing feature maps makes it easier to
understand how the network works and what it's learning.
T-distributed Stochastic Neighbor Embedding (t-SNE) is a nonlinear dimensionality
reduction technique and it is suited for visualizing high-dimensional data in a lower-dimensional
space typically in 2D or 3D. It is a widely used dimensionality reduction technique and in this
article we will learn about it.
Institution behind t-SNE Algorithm
t-SNE is a dimensionality reduction technique that uses a randomized, non-linear approach to
reduce the dimensionality of data. Unlike linear methods such as Principal Component
Analysis (PCA), t-SNE focuses on preserving the local structure and pattern of the data. It
is especially effective for visualizing high-dimensional datasets as it keeps similar data points
close to each other in the lower-dimensional space making it easier to see patterns and clusters.
This ability to retain the local structure of the dataset helps in exploring and understanding
complex, high-dimensional data. Visualizing the data in 2D or 3D can provide us valuable
insights into the relationships between different data points.
Now we will learn how t-SNE works.
t-SNE Working
t-SNE works by looking at the similarity between data points in the high-dimensional space. The
similarity is computed as a conditional probability. It calculates how likely it is that one data
point would be near another.
Once the similarities are calculated, t-SNE tries to keep similar points close when it reduces the
data to lower dimensions (like 2D or 3D). The goal is to make sure that points that are close in
the original space stay close in the lower-dimensional space, preserving the structure of the data.

Advanced Excel- Introduction:


Advanced Excel takes your spreadsheet skills beyond the basics, equipping you with tools for
complex data analysis, automation, and dynamic reporting, including features like pivot tables,
macros, and advanced formulas.

What is Advanced Excel:


​ Beyond the Basics: Advanced Excel builds upon fundamental Excel knowledge, delving
into more powerful and specialized features.
​ Data Analysis: Advanced Excel empowers users to perform in-depth data analysis,
including techniques like time series analysis, regression, and statistical analysis.
​ Automation: Advanced Excel offers tools for automating repetitive tasks through macros
and VBA (Visual Basic for Applications), saving time and improving efficiency.
​ Dynamic Reporting: Advanced Excel enables the creation of dynamic and interactive
reports using features like pivot tables, slicers, and data visualization techniques.
JAI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL(M.P.)
Approved by AICTE New Delhi & Govt. of M.P.
Affiliated with Rajiv Gandhi Technical University (RGPV), Bhopal
__________________________________________________________________________________
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
​ Data Manipulation: Advanced Excel provides tools for cleaning, transforming, and
manipulating large datasets efficiently.
​ Key Features:
●​ Pivot Tables: Summarize and analyze data by grouping and aggregating information.
●​ Macros and VBA: Automate repetitive tasks and create custom functions.
●​ Conditional Formatting: Highlight data based on specific criteria.
●​ Advanced Formulas: Utilize complex formulas like INDEX & MATCH,
VLOOKUP, and HLOOKUP.
●​ Data Validation: Ensure data accuracy by setting constraints on cell input.
●​ Power Query: Import, transform, and clean data from various sources.
​ Applications:​
Advanced Excel skills are valuable in various sectors, including finance, data analysis, and
market research, where data-driven decision-making is crucial.
​ Benefits:
●​ Increased Productivity: Automate tasks to free up time for more strategic work.
●​ Improved Data Accuracy: Use data validation and advanced formulas to ensure
data integrity.
●​ Better Decision-Making: Analyze data effectively to gain insights and make
informed decisions.
●​ Enhanced Reporting: Create dynamic and informative reports to communicate
findings effectively.

Data validation is the process of checking the accuracy, consistency, and completeness of data.
It is a type of data cleansing that is performed before using, importing, or processing data.

Why is data validation important:


●​ It helps to prevent issues like data corruption or inconsistencies.
●​ It ensures that data is correct for specific contexts.
●​ It makes data useful for an organization or for a specific application operation.
●​ It facilitates useful analytics for a wide variety of applications.

Types of data validation


●​ Type check: Confirms the data type (integer, string, or some other format)
●​ Code check: Ensures that the data comes from a valid list of values or follows certain
other formatting rules
●​ Range check: Checks the value of data to see if it is within a certain range
●​ Uniqueness check: Checks for the uniqueness of some data entries
●​ Consistency check: Monitors and ensures that data remains consistent throughout the
validation stage and beyond

Data validation in Excel:


JAI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL(M.P.)
Approved by AICTE New Delhi & Govt. of M.P.
Affiliated with Rajiv Gandhi Technical University (RGPV), Bhopal
__________________________________________________________________________________
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
Excel has built-in date validation that provides predefined criteria to restrict users to entering
only dates between specified dates.

Charts are visual representations of data that transform information into easily understandable
formats, such as graphs, diagrams, or maps, to help uncover patterns, trends, and relationships.
Here's a more detailed explanation:
What they are: Charts are a powerful tool for communicating data because they make complex
information more accessible and easier to grasp at a glance.
Why they are important:​
Charts are used to:
●​ Analyze data: They help identify patterns, trends, and relationships that might not be
immediately obvious from raw numbers.
●​ Emphasize a point: Charts can be used to highlight specific data points or
comparisons, making an argument more compelling.
●​ Compare multiple sets of data: Charts allow for easy comparison of different
datasets, making it easier to see similarities and differences.
​ Common chart types:
●​ Bar charts: Use rectangular bars to compare different categories.
●​ Line charts: Show trends over time by connecting data points with lines.
●​ Pie charts: Represent parts of a whole as slices of a circle.
●​ Scatter plots: Display the relationship between two variables as points on a graph.
●​ Area charts: Use filled areas to show trends and magnitudes.
​ How to effectively use charts
●​ Choose the right chart type for your data and purpose.
●​ Ensure your charts are clear, concise, and easy to understand.
●​ Use appropriate labels and titles.
●​ Consider your audience and tailor your charts accordingly.

In Excel, a PivotTable is an interactive tool that allows you to quickly summarize and analyze
data by grouping and aggregating values, offering flexible ways to explore and present data.

1. Creating a PivotTable:
●​ Select Data: Choose a cell within the data range or table you want to analyze.
●​ Insert PivotTable: Go to the "Insert" tab and click "PivotTable".
●​ Choose Location: Decide whether to place the PivotTable on a new worksheet or an
existing one, and specify the location if creating it on an existing worksheet.
●​ Create the Table: Click "OK" to create the blank PivotTable and display the PivotTable
Fields list.

2. Using the PivotTable Fields List:

​ Fields List: This list on the right side of the screen shows all the fields (column headers)
from your data.
JAI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL(M.P.)
Approved by AICTE New Delhi & Govt. of M.P.
Affiliated with Rajiv Gandhi Technical University (RGPV), Bhopal
__________________________________________________________________________________
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
​ Drag and Drop: Drag fields from the list into the "Rows", "Columns", "Values", and
"Filters" areas to structure your PivotTable.
●​ Rows: Fields that will be displayed as rows.
●​ Columns: Fields that will be displayed as columns.
●​ Values: Fields that will be summarized (e.g., sum, count, average).
●​ Filters: Fields that can be used to filter the data.
​ Summarize Values:​
By default, Excel sums numerical values in the "Values" area, but you can change this to
other calculations (e.g., count, average, min, max) by right-clicking on the value field and
selecting "Value Field Settings".

3. Additional Features:
●​ PivotCharts: You can create charts directly from PivotTables by clicking on the "Insert
Chart" button in the PivotTable ribbon.
●​ Slicers: Add slicers to filter your PivotTable by specific values.
●​ Recommended PivotTables: Excel can suggest PivotTable layouts based on your data.
●​ Refresh: If your source data changes, you can refresh the PivotTable to see the updated
results.
●​ External Data Sources: You can also create PivotTables based on data from external
sources like SQL Server.

Example:
Let's say you have sales data with columns for "Region", "Product", and "Sales Amount". You
can create a PivotTable to see total sales by region and product:
●​ Select the data: (including headers).
●​ Insert a PivotTable .
●​ Drag "Region" to "Rows" .
●​ Drag "Product" to "Columns" .
●​ Drag "Sales Amount" to "Values" .
●​ Choose "Sum" as the calculation .

In the context of Microsoft Excel, the "Scenario Manager" is a built-in tool that allows users
to create, save, and compare different sets of input values (cells) within a worksheet, enabling
"what-if" analysis.

Here's a more detailed explanation:


​ What it does: The Scenario Manager lets you define and switch between various scenarios,
each with its own set of input values, while keeping the underlying formulas and calculations
the same.
​ How it helps: It's a powerful tool for exploring different possibilities and making informed
decisions by seeing how changes in input values affect the results of your calculations.
JAI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL(M.P.)
Approved by AICTE New Delhi & Govt. of M.P.
Affiliated with Rajiv Gandhi Technical University (RGPV), Bhopal
__________________________________________________________________________________
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
​ Where to find it: You can access the Scenario Manager in Excel by going to the "Data" tab,
then the "What-If Analysis" group, and clicking on "Scenario Manager".
​ Key features:
●​ Create Scenarios: You can create multiple scenarios by specifying the cells that will
change and the values they will take in each scenario.
●​ Save Scenarios: Once created, scenarios are saved and can be easily switched
between.
●​ Compare Scenarios: You can compare different scenarios side-by-side to see the
impact of different input values.
●​ Scenario Summary: You can generate a summary report that shows the results of
each scenario, making it easy to compare and analyze the outcomes.
​ Use Cases:
●​ Budget Analysis: Explore different budget scenarios (e.g., optimistic, pessimistic,
baseline).
●​ Financial Forecasting: Analyze how different economic conditions might affect
your financial projections.
●​ Risk Assessment: Evaluate the potential impact of various risks on your business.
●​ Strategic Planning: Consider different strategic options and their potential outcomes.
Protecting data involves implementing various measures to safeguard sensitive information
from unauthorized access, loss, or damage, including encryption, backups, firewalls, and data
loss prevention (DLP) tools.
Why is Data Protection Important:
Preventing Data Breaches: Data protection helps prevent unauthorized access, use, or
disclosure of sensitive information, which can lead to data breaches.
Identity Theft: Protecting data is crucial in preventing identity theft and other forms of
cyberattacks that can result in financial loss and reputational damage.
Compliance with Regulations: Data protection ensures compliance with relevant regulations
like GDPR, which protects personal data and imposes strict controls on how organizations
process it.
​ Maintaining Data Availability: Data protection strategies focus on ensuring data
availability, even in the event of loss or damage, through measures like backups and disaster
recovery plans.
Key Data Protection Measures:
​ Encryption: Converting data into a coded form, making it unreadable to anyone without the
proper decryption key.
​ Data Backups: Regularly backing up data to ensure data can be restored in case of loss or
damage, either through physical disks or cloud storage.
JAI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL(M.P.)
Approved by AICTE New Delhi & Govt. of M.P.
Affiliated with Rajiv Gandhi Technical University (RGPV), Bhopal
__________________________________________________________________________________
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
​ Firewalls: Monitoring and filtering network traffic to prevent unauthorized access to data.
​ Data Loss Prevention (DLP): Implementing tools and processes to prevent sensitive data
from being accessed, shared, or transferred inappropriately.
​ Access Control: Implementing measures to restrict access to sensitive data to authorized
personnel only, using methods like usernames, passwords, and biometric authentication.
​ Data Erasure: Deleting data that is no longer needed or relevant, which is also a
requirement of GDPR.
​ Data Resiliency: Building resilient systems within software and hardware to ensure security
in case of natural disasters or power outages.
​ Secure Wi-Fi Connections: Ensuring that Wi-Fi connections are secure to protect data
transmitted over the network.
​ Strong Passwords and Multi-Factor Authentication: Using strong, unique passwords and
enabling multi-factor authentication for added security.
​ Be Aware of Your Surroundings: Being aware of your surroundings and protecting your
devices when they are unattended.
​ Be Wary of Suspicious Emails: Be cautious of suspicious emails and phishing attempts that
could compromise your data

Excel Minor: XlMiner is a robust and user-friendly solution in the vast landscape of data
analysis tools. Developed to empower users with the ability to glean valuable insights from data,
XlMiner simplifies the complex world of analytics. This blog will look at XlMiner’s different
parts, including its features and functions, and how it can help make decisions based on data.
Understanding the Basics
A Microsoft Excel add-in called XlMiner is made to make data analysis accessible for all kinds
of users, from newbies to seasoned pros.
Its seamless integration with Excel provides users with a familiar environment, reducing the
learning curve and allowing for a smooth transition into data analytics.
Features and Capabilities
One of the critical strengths of XlMiner lies in its diverse set of features. From fundamental
statistical analysis to advanced machine learning, XlMiner covers various analytical techniques.
The passive voice emphasizes the simplicity and ease of accomplishing tasks.
Data Preparation Made Effortless
With XlMiner, data preparation becomes a breeze. Often scattered and unorganized, raw data can
be effortlessly transformed into a structured format suitable for analysis. Whether cleaning
missing values, handling outliers, or merging datasets, XlMiner simplifies these tasks, allowing
users to focus on deriving meaningful insights.
Models Built with Ease
JAI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL(M.P.)
Approved by AICTE New Delhi & Govt. of M.P.
Affiliated with Rajiv Gandhi Technical University (RGPV), Bhopal
__________________________________________________________________________________
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
In the realm of predictive analytics, building models can be a daunting task. XlMiner, however,
streamlines this process, providing a user-friendly interface for model development. Models can
be trained with a passive approach, letting the software handle the intricate details behind the
scenes. The simplicity of the process empowers users to harness the power of predictive
analytics without delving into the complexities of model building.
Exploring Advanced Analytics
For those venturing into advanced analytics, XlMiner proves to be a reliable companion. Its
support for machine learning algorithms opens up predictive modeling, classification, and
clustering possibilities. Complex analyses that once required specialized knowledge can now be
executed with a passive involvement from the user, thanks to XlMiner’s intuitive design.
Visualizing Insights
Data representation is essential to data analysis because it helps people understand and discuss
the results. XlMiner excels in this area, offering a range of visualization options. Users can
present their insights visually appealing and comprehensively, from basic charts to interactive
dashboards. The passive voice is utilized to emphasize the automated nature of these
visualization processes, minimizing the effort required from the user.
Time Series Analysis and Forecasting
In business and finance, predicting future trends is invaluable. XlMiner facilitates time series
analysis and forecasting, allowing users to uncover patterns and make informed predictions. The
passive approach ensures that users can perform these analyses without delving into the
complexities of time series modeling.
Optimization for Decision-Making
Making optimal decisions is a cornerstone of effective management. XlMiner supports
optimization techniques, enabling users to find the best solutions to complex problems. Whether
it’s resource allocation, production planning, or budget optimization, XlMiner guides users
through the process with a passive involvement, making decision-making more data-driven and
informed.
Introduction to macros.
In essence, a macro is a sequence of actions or instructions that can be recorded and then
executed repeatedly, automating tasks and saving time.
What it is: A macro is essentially a small program or script that records your actions (like
keystrokes and mouse clicks) within a software application, allowing you to perform a series of
operations with a single command.
Why use them: Macros are particularly useful for automating repetitive or complex tasks,
streamlining workflows, and increasing efficiency.
JAI NARAIN COLLEGE OF TECHNOLOGY, BHOPAL(M.P.)
Approved by AICTE New Delhi & Govt. of M.P.
Affiliated with Rajiv Gandhi Technical University (RGPV), Bhopal
__________________________________________________________________________________
DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE
​ How they work: You record a macro by performing the desired actions, and then the
software stores these actions as a macro that can be run later.
​ Examples:
●​ In Microsoft Excel: You can create a macro to automatically format cells, perform
calculations, or generate reports.
●​ In Microsoft Word: You can use macros to insert boilerplate text, change page
layouts, or perform complex formatting tasks.
​ Underlying Technology:​
In many applications, macros are often implemented using a scripting language, such as
VBA (Visual Basic for Applications) in Microsoft Office applications.
​ Benefits:
●​ Time-saving: Automates repetitive tasks, saving significant time and effort.
●​ Error Reduction: Minimizes the risk of human error by automating tasks.
●​ Increased Efficiency: Streamlines workflows and improves overall productivity.
●​ Flexibility: Allows for customization and automation of a wide range of tasks.

You might also like