0% found this document useful (0 votes)
22 views33 pages

Data Science Through R Lesson-1 Introduction To Data Science

Uploaded by

Suman Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views33 pages

Data Science Through R Lesson-1 Introduction To Data Science

Uploaded by

Suman Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Data Science Through R

Lesson-1
Introduction to Data Science

Prof.Dr. A. B. Chowdhury,HOD,CA

Techno India University, West Bengal,India


Reach Me Here::hod.ca@technoindiaeducation.com

August 12, 2023

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 1 / 33
Objectives of the lesson

This lesson aims to introduce the basic concepts of Data Science and
the closely related topics to this new paradigm of science based on
empirical facts,theoretical knowledge,computational efficiency and in-
formation explosion.Utmost care has been taken to discuss every point
from its original implication to the current notion of this rapidly emerging
technology.The focus of discussion has been kept pertinent with clarity and
brevity.

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 2 / 33
Overview of the Concepts of Data
The term ‘Data’ has been derived from the Latin word ‘datum’ that implies fact.
We use the term ‘data’ in computer science and applications in both singular
and plural sense to imply readily available facts about entities that may be either
in meaningful and useful form requiring further manipulation called processing to
become interpretative meaningfully to serve some useful purpose. The data needing
further processing is also termed as raw facts by some authors. Although the term
‘raw’ is relative in nature from person to person, we use the term to signify that
the data has not yet been analysed and organized to convey the desired meaning.
The meaningful and useful data is known as information. Now-a-days, data has
become the new soil of business and the core of all subjects of study from material
science to healthcare.As a result,new terms and techniques have been defined to
cover various sets of activities with data.
In this context, we may refer to two terms used frequently in the domain of Data
Science, namely,Data Engineering and Data Wrangling.
We define Data Engineering as the act of designing and building systems for
collecting, storing, and analyzing data to empower data-driven decisions.
On the other hand, Data Wrangling may be defined as a process for procuring
and transforming relevant data into better understandable formats for fast access
and analysis leading to better decision-making.
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 3 / 33
Overview of the basic notion of Science and Data
Science
The term ‘Science’ has been derived from the Latin word ‘scientia’, mean-
ing ”knowledge”.In the pursuit of this knowledge, we study the natural as
well as social phenomena and seek evidence for their interpretation leading
to the application of the derived knowledge.It may be defined as a sys-
tematic effort that builds and organizes knowledge in the form of testable
explanations and predictions about anything in the universe.
The applied knowledge of exploring interesting and decisive information de-
rived from data using some model of analysis may be termed as Data
Science.The applied knowledge is usually a blend of mathematical and sta-
tistical concepts leading to algorithms and numerical techniques to achieve
useful insights of the data with the help of the skills achieved through the
study of Computer science,data engineering and wrangling,coding as well
as the knowledge of the domains to which the data belong. The business
acumen and experience also contribute a lot to utilize the knowledge in
practical purposes.
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 4 / 33
Definition of Data Science and the prospect of its
study
We define Data science as the systematic methodology for absorbing data
from various sources like a pipeline to subject it to purposive manipulation
with a view to make it more reliable,better organized, better understandable
and better usable to extract practical knowledge from it through appropriate
modeling, to visualize it with a theoretical understanding of all possibilities
and to disseminate the knowledge effectively for the betterment of any en-
terprise or a society.
The person who practices data science is called a data scientist and the job
of a data scientist has been described as the sexiest job of the 21st cen-
tury by none other than Harvard Business Review,2012. Forbes, 2012
declared that the terms data scientist and Sexy are simply synonymous.
The Fortune called data science as “Hot New Gig in Tech”.
The Venn diagram in figure 1.1 below shows the place of data science in
modern application areas of computer science.

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 5 / 33
The Venn Diagram showing the position of Data
Science among its related fields of study

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 6 / 33
Explanation of the related fields of study
In the Venn diagram the term Substantive Expertise refers to the Domain
Knowledge. It is the skill and understanding related to specific facts, to
relationships about certain subject matter, that may be acquired through a
specific field of study and experience. It may be a set of heuristics acquired
in the practical field of work that aid analysis in finding reasonable solutions
(if exist, at all).
Hacking skills could be the ability to cleverly draw up code from scratch
to solve problems; The knowledge of math and statistics would allow one to
just solve a mathematical problems and perform statistical analysis of the
data; but substantive expertise would let one use one’s background, for
instance, in zoology to apply its knowledge to finding diseases in DNA codes.
In absence of substantive expertise, one usually doesn’t even know what to
do with his/her technical skills that matters, if any. It is the substantive
expertise that can make a metallurgical engineer, a successful product
manager or an entrepreneur. Or an economist, a successful healthcare
specialist.

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 7 / 33
Business and Business Decisions
Since Data Science influences business decisions to a great extent, it has
become a buzzword in the business world and hence a brief idea of business
and business decisions have been included below.
Business : It may be defined as an enterprise or economic system where
goods and services are exchanged in lieu of one another or for money. It is a
conglomeration of man, money, machines, material, market and information.
Every business requires some form of investment for acquiring money and
enough customers making a market or its products and services to whom
its output can be sold on a consistent basis in order to make a profit.
Businesses can be privately owned, non-profit or state-owned. An example
of a corporate business is Reliance InfoTech, while a Palatable catering
business is a private enterprise.
Decision: It is a conclusion or resolution reached after consideration of
all foreseeable possibilities. All business ventures require to take decisions
before any sort of business planning by considering various possible actions.
Such decisions are often called business decisions.

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 8 / 33
Examples of business decisions
Usually a business enterprise takes shareholders’ investment and bank loans;
it then uses that money to make or buy something or generate services; It
then goes for selling the products and services to earn revenue and utilizes
the revenue to repay debts, if any, and to gain profit and repeats the task
in a cyclic order with plans of further expansion and diversification.
Some simple business decisions may be like answers to the following ques-
tions:
How likely is the client X to buy product Z while buying product Y?
How can the base of the customers be increased
Which product should we introduce next?
Which clients are “at risk” of going to our competitors?
What kind of sales force do we need to deliver on our targets?
Which units are not performing at targeted?
Which sellers might miss their quota?
Who are the influencers for this product in the marketplace?
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 9 / 33
The Concept of Business Analytics
The word ‘Analytics’ implies a science of analysis – analysis of data by
using scientific methods and software tools.It is an AI-infused integrated
planning solution that enables one to transcend the limits of manual planning
so that one can quickly create more accurate plans for financial operations,
sales, supply chain and beyond.
Business Analytics is the process by which businesses make use of statis-
tical methods and technologies for analyzing historical data in order to gain
new insight into the data that improves the quality and speed of strate-
gic decision-making in a smarter way resulting into a better operational
efficiency.
Business Analytics may be categorized as stated below:
Descriptive Analytics
Predictive Analytics
Prescriptive Analytics
Diagnostic Analytics
Planning Analytics
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 10 / 33
Descriptive Analytics
We define Descriptive Analytics as a diagnostic process that attempts to
understand what happened in a particular situation and tracks key perfor-
mance indicators in order to set the goals by which the enterprise will be
measured and managed.
The Descriptive Analytics is meant for generating graphs, charts, reports,
and dashboards using Business Intelligence tools available which are pre-
sented to the stakeholders of the business to set the directions for the future.
The data visualizations generated by Descriptive Analytics may represent
answers to questions like the following ones:
What were the sales of the enterprise in the last quarter or last month
or last week?
Which customers required the most customer service help?
Which product had the most defects?
Which product had the highest demand in the last quarter?
What additional features were provided by the close competitors?

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 11 / 33
Predictive analytics
Predictive analytics is meant for making predictions based on trend data to assess
the likelihood of future outcomes. It makes use of historical data combined with
statistical modeling, data mining techniques and machine learning to find patterns
in the data to identify risks and opportunities.
So,Predictive analytics can be used as a decision-making tool in a variety of
industries like insurance companies, consumer product industries etc.For example,
an insurance company may want to determine the likelihood of having to pay out
for a future claim based on the current risk factors of policyholders, as well as past
records resulting in payouts.A consumer product marketer may like to assess past
reaction of the consumers on a product and challenges to be faced when planning
on a new campaign for it.
Predictive analytics plays a significant role in making accurate as well as action-
able predictions in helping decision-makers navigate a consumer world where rapid
change and market volatility are constantly present.
Predictive analytics was also being used in the fight against COVID-19. Hospitals
and health systems were using predictive models to assess risk, predict disease out-
comes, and manage supply chains for medical equipment and Personal Protective
Equipments(PPE). In turn, researchers were using models to map the spread of
the virus, predict case numbers, and manage contact tracing, all with the goal of
reducing number of infected persons and deaths.
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 12 / 33
Prescriptive Analytics
Prescriptive Analytics is a form of advanced analytics which examines
data or content to decide what should be done or what can be done to
make something happen with a view to generate optimal course of action
for the organization from the strategies determined previously to handle
similar situations in the future.
Prescriptive Analytics is characterized by techniques such as graph analy-
sis, simulation, complex event processing, neural networks, recommendation
engines, heuristics, and machine learning.
The method factors data/information about possible situations or scenar-
ios, available resources, past performance, and current performance, and
suggests the best feasible course of action or strategy.
Prescriptive Analytics is found to be useful where the cost of human error
is high including those in the financial services and health care sectors and
hence it benefits numerous types of data-intensive businesses and govern-
ment agencies.

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 13 / 33
Diagnostic Analytics
Diagnostic analytics is a form of advance analytics which examines data or
content to determine the reason behind some happening and is characterized by
techniques such as drill-down, data discovery, data mining and correlations. Diag-
nostic analytics takes a deeper look at data to attempt to understand the causes
of events and behaviors.
Diagnostic analytics is usually performed using such techniques as data discovery,
drilling-down, data mining, and correlations. In the discovery process, analysts
identify the data sources that will help them interpret the results. Drilling down
involves focusing on a certain facet of the data.
Diagnostic analytics lets one understand his/her data faster to answer critical
workforce questions. Cornerstone View provides the fastest and simplest way
for organizations to gain more meaningful insight into their employees and solve
complex workforce issues. Interactive data visualization tools allow managers to
easily search, filter and compare people by centralizing information from across
the Cornerstone unified talent management suite. For example, users can find the
right candidate to fill a position, select high potential employees for succession,
and quickly compare succession metrics and performance reviews across select em-
ployees to reveal meaningful insights about talent pools. Filters also allow for a
snapshot of employees across multiple categories such as location, division, perfor-
mance and tenure.
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 14 / 33
Planning Analytics
We know that a plan is a detailed proposal for doing or achieving some-
thing. So,Planning Analytics starts with a plan which may be the overall
corporate plan or one of many lower-level plans, associated with enterprise
performance management, which includes financial planning, budgeting, and
forecasting. Planning Analytics is in a unique position compared with the
other analytics categories because of its reliance on outputs of all the other
steps. It requires an understanding of past performance, identification of
deviations from the norm (planned outcome vs. actual outcome), eval-
uation of possible scenarios, prediction of likely outcomes, and assessment
of risks and constraints.
Planning Analytics puts the power of algorithmic forecasting in the hands
of users — even those without data science skills — for more accurate,
consistent, and timely forecasts.IBM Planning Analytics powered by
TM1(Table Management 1) is available as a ready to use tool for the task.
It is a business performance management software suite designed to imple-
ment collaborative planning, budgeting and forecasting solutions, interactive
”what-if” analyses, as well as analytical and reporting applications.
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 15 / 33
Planning Analytics–Contd.
IBM Planning Analytics powered by TM1® actually puts the power
of algorithmic forecasting in the hands of users — even those without data
science skills — for more accurate, consistent, and timely forecasts.For in-
stance, its rich workspace can be used by an analyst within a retail organiza-
tion to monitor the performance of the stores. The following figure depicts
the BA processes before the start of of the Planning Analytics:

Figure: The BA Processes before the Planning Analytics

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 16 / 33
Big Data Analytics
The term Big Data in IT implies a massive amount of complex data sets grow-
ing exponentially with time that cannot be stored, processed, or analyzed using
traditional tools. The data generated by the New York Stock Exchange amount-
ing to one terabyte of new trade data per day can be cited as an example of Big
Data.Another example may be the 500+terabytes of new data being ingested into
the databases of social media site Facebook, every day comprising photo and video
uploads, message exchanges, putting comments etc. Still another example may be
the data generated by a single Jet engine amounting to 10+terabytes of data in
30 minutes of flight time where with many thousand flights per day, generation
of data reaches up to many Petabytes. As all these data are very important, Big
Data has become the hottest buzzword today around the world.

Big Data Analytics is a complex process of extracting meaningful insights, such


as hidden patterns, unknown correlations, market trends, and customer preferences
for making useful and effective business decisions in an optimal manner.It is fueling
everything we do online in every industry.It involves complex applications with ele-
ments such as predictive models, statistical algorithms and what-if analysis powered
by analytics systems.

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 17 / 33
Big Data Analytics–Contd.
If we take the music streaming platform Spotify , for example, it includes
nearly 96 million users that generate a tremendous amount of data every
day using which the cloud-based platform automatically generates suggested
songs—through a smart recommendation engine—based on likes, shares,
search history, and more where the techniques, tools, and frameworks are a
result of Big Data analytics.
Organizations can use big data analytics systems and software to make
data-driven decisions that can improve business-related outcomes.

Importance of Big Data Analytics


More effective marketing
New revenue opportunities
Customer personalization
Improved operational efficiency
An effective strategy

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 18 / 33
Benefits and Advantages of Big Data Analytics
These benefits can provide competitive advantages over rivals.On a broad
scale, the technologies and techniques of data analytics give organizations
a way to analyze data sets and gather new information. Business intelli-
gence (BI) queries answer basic questions about business operations and
performance.
Organizations can use big data analytics systems and software to make data-
driven decisions that can improve business-related outcomes. The benefits
may include more effective marketing, new revenue opportunities, customer
personalization and improved operational efficiency. With an effective strat-
egy, these benefits can provide competitive advantages over rivals.

Advantages of Big Data Analytics


Big Data Analytics provide the following advantages:
1. Risk Management
Use Case: Banco de Oro, a Phillippine banking company, uses Big Data
analytics to identify fraudulent activities and discrepancies. The organiza-
tion leverages it to narrow down a list of suspects or root causes of problems.
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 19 / 33
Advantages of Big Data Analytics–Contd.

2. Product Development and Innovations


Use Case: Rolls-Royce, one of the largest manufacturers of jet engines for airlines
and armed forces across the globe, uses Big Data analytics to analyze how efficient
the engine designs are and if there is any need for improvements.
3. Quicker and Better Decision Making Within Organizations
Use Case: Starbucks uses Big Data analytics to make strategic decisions. For
example, the company leverages it to decide if a particular location would be suit-
able for a new outlet or not. They will analyze several different factors, such as
population, demographics, accessibility of the location, and more.
4. Improve Customer Experience
Use Case: Delta Air Lines uses Big Data analysis to improve customer experi-
ences. They monitor tweets to find out their customers’ experience regarding their
journeys, delays, and so on. The airline identifies negative tweets and does what’s
necessary to remedy the situation. By publicly addressing these issues and offering
solutions, it helps the airline build good customer relations.

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 20 / 33
Lifecycle of Big Data Analytics
The Lifecycle of Big Data Analytics can be described as under.
Stage 1 - Business case evaluation - The Big Data analytics lifecycle
begins with a business case, which defines the reason and goal behind the
analysis.
Stage 2 - Identification of data - Here, a broad variety of data sources are
identified.
Stage 3 - Data filtering - All of the identified data from the previous stage
is filtered here to remove the corrupt data.

Stage 4 - Data extraction - Data that is not compatible with the tool is
extracted and then transformed into a compatible form.
Stage 5 - Data aggregation - In this stage, data with the same fields
across different datasets are integrated.
Stage 6 - Data analysis - Data is evaluated using analytical and statistical
tools to discover useful information.
Stage 7 - Visualization of data - With tools like Tableau, Power BI, and
QlikView, Big Data analysts can produce graphic visualizations of the
analysis for the business stakeholders who will take action.
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 21 / 33
Tools for Big Data Analytics

Here are some of the tools for Big Data Analytics:


Hadoop - helps in storing and analyzing data.
MongoDB - used on datasets that change frequently.
Talend - used for data integration and management.
Cassandra - a distributed database used to handle chunks of data.
Spark - used for real-time processing and analyzing large amounts of data.
STORM - an open-source real-time computational system.
Kafka - a distributed streaming platform that is used for fault-tolerant
storage.

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 22 / 33
Avenues of Applications of Big Data Analytics
Some of the sectors where Big Data is actively used are stated below:
Ecommerce - Predicting customer trends and optimizing prices are a few of the
ways e-commerce uses Big Data analytics.
Marketing - Big Data analytics helps to drive high ROI marketing campaigns,
which result in improved sales.
Education - Used to develop new and improve existing courses based on market
requirements.
Big Data - With the help of a patient’s medical history, Big Data analytics is used
to predict how likely they are to have health issues.
Media and entertainment - Used to understand the demand of shows, movies,
songs, and more to deliver a personalized recommendation list to its users.
Banking - Customer income and spending patterns help to predict the likelihood
of choosing various banking offers, like loans and credit cards etc.
Telecommunications - Used to forecast network capacity and improve customer
experience.
Government - Big Data analytics helps governments in law enforcement, among
other things.

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 23 / 33
Applications of Business Analytics
Some specific applications of Business Analytics, which allow for many novel
avenues of opportunity for businesses to optimize and adapt their business
model, are:
Critical product analysis allows minor alterations to be made to a
location-specific product, including aiding the study in trends
associated with those locations.
Improved customer service keeps track of frequent customer
queries, which prevents businesses from repeating mistakes, and
improves customer satisfaction.
Up-selling opportunities identify the most prominent needs of a
business’ customer base.
Simplified inventory management becomes feasible as gathered
data can help predict which products are on the verge of becoming
outdated, minimizing losses.
And competitive price insights can help businesses make their
prices competitive by tracking the customer trends and price ranges
which suit the customers.
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 24 / 33
Business Intelligence
The term Business Intelligence (BI) refers to technologies, applications
and practices for the collection, integration, analysis, and presentation of
business information to equip the business executives with a competitive
edge for profitable business actions. BI leverages software and services to
transform data into actionable insights that inform an organization’s strate-
gic and tactical business decisions. BI tools access and analyze data sets
and present analytical findings in reports, summaries, dashboards, graphs,
charts and maps to provide users with detailed intelligence about the state
of the business.
BI usually comprises a suite of software and services to transform data into
actionable intelligence and knowledge to support better business decision
making. Essentially, BI systems are data-driven Decision Support Systems
(DSS). BI is sometimes used interchangeably with briefing books, report
and query tools and Executive Information Systems(EIS) intelligently with
a view to encash synergy of the business executives for the all-round growth
of an enterprise.

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 25 / 33
Importance of Business Intelligence tools or software
solutions
Business Intelligence systems provide historical, current, and predictive views
of business operations, most often using the highly summarized historical data
gathered into a data warehouse or a data mart and occasionally working from
operational data.
Software elements support reporting, interactive “slice-and-dice” pivot-table
analyses, visualization, and statistical data mining. Applications tackle sales, pro-
duction, financial, and many other sources of business data for purposes that include
business performance management.
Information is often gathered about other companies in the same industry for
comparative analysis. Currently organizations are moving towards Operational
Business Intelligence which is currently under served and uncontested by vendors.
Traditionally, Business Intelligence vendors set their focus to reach the top of
the pyramid. However, there is a paradigm shift now for taking Business Intelligence
to the bottom of the pyramid with a focus of self-service business intelligence.
Self-service Business intelligence provides the end-users the ability to do more
with their data without necessarily having technical skills. These solutions are
usually developed to yield flexibility and ease-to-use so that end-users themselves
can analyze data, make decisions, plan and forecast on their own lucratively.
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 26 / 33
Distinction between BA and Business Intelligence
(BI)
BA mostly focuses on creating novel insights and the understanding of business
performance based on statistical methods, data, quantitative analysis, explanatory
and predictive modeling, and fact-based management to drive decision making.
Business Intelligence also uses data and statistical methods, yet it rather focuses
on using a set of metrics to measure both past performance and guide business
planning. Also it looks at querying, reporting, OLAP, and alerts.
Examples of questions asked by Business Analytics are:
Why is this happening?
What if these trends continue?
What will happen next?
What is the optimal outcome?
Examples of questions asked by BI are:
What happened?
How many times did it happen?
Where is the problem?
What are the solutions to the problem?
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 27 / 33
Tools of Business Analytics and OLAP
Data visualization tools
Business intelligence reporting software
Self-service analytics platforms
Statistical analysis tools
Big data platforms
OLAP is a computing method that enables users to easily and selectively extract
and query data in order to analyze it from different points of view. In OLAP
the variety of activities are usually performed by end users in online systems for
generating and answering to various queries, requesting ad hoc reports and ex-
ecuting them, conducting traditional or modern statistical analyses and building
visual presentations. OLAP includes multidimensional analysis and presentations,
EIS/ESS(Employee Self-Service) and data mining. It can provide modeling, analy-
sis and visualization capabilities to large data sets, either to database management
systems (DBMS), data warehouse systems, and a multidimensional conceptual
view of the data. OLAP is the technology behind many Business Intelligence
(BI) applications. It is a powerful technology for data discovery, including capa-
bilities for limitless report viewing, complex analytical calculations, and predictive
“what if” scenario (budget, forecast) planning.
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 28 / 33
Multidimensional Data Structure in a Data
Warehouse
The following figure depicts a multidimensional data structure:

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 29 / 33
OLAP Tools-Characteristics and Types
OLAP tools are used for following types of analysis:
Categorical analysis – static analysis based on historical data;
Exegetical analysis – querying further based on historical data and
drill-down analysis (to determine the detail data used for a derived value);
Contemplative analysis – allows a user to determine a derived value;
Formulaic analysis – permits changes to multiple variables
OLAP can be classified as stated below:
Multidimensional OLAP (MOLAP) – cube structure which the users can
rotate – queries are fast;
Relational OLAP (ROLAP) – create multidimensional views on-the-fly; a
large number of attributes – it can be easily placed in a cube structure;
Web OLAP(WOLAP)-refers to OLAP data that is accessible from a Web
browser.

Desktop OLAP- involves low-priced, simple OLAP tools that perform local
multidimensional analysis and presentation of data downloaded to client
machines from relational or multidimensional database.
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 30 / 33
Working principles of OLAP systems
Data collected from multiple data sources are cleansed and organized into data
cubes and then stored in data warehouses. Each OLAP cube contains data cat-
egorized by dimensions (such as customers, geographic sales region and time pe-
riod) derived by dimensional tables in the data warehouses.The dimensions are then
populated by members (such as customer names, countries and months) that are
organized hierarchically. OLAP cubes are often pre-summarized across dimensions
to drastically improve query time over relational databases.
Analysts perform five types of OLAP analytical operations against the multidimen-
sional databases as stated below:
Roll-up. Also known as consolidation, or drill-up, this is for summarizing
the data along the dimension.
Drill-down. This allows analysts to navigate deeper among the dimensions
of data, for example drilling down from ”time period” to ”years” and
”months” to chart sales growth for a product
Slicing. This enables an analyst to take one level of information for display,
such as ”sales in 2017.”
Dicing. This allows an analyst to select data from multiple dimensions to
analyze, such as ”sales of blue beach balls in Iowa in 2017.”
Pivoting. It is the task of rotating the data axes of the cube to gain a new
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023
31 / 33
An extended view of Data Science
Data Science is all about exploring treasures of truth in data to help business
leaders with the strength of right decision making for development, expansion and
diversification of business activities speedily.It comprises a sequential but iterative
set of activities consisting of procurement,preparation, analysis,visualization, man-
agement and preservation of huge collections of facts.It aims at developing prod-
ucts out of data to empower others to use the data wisely through analysis and to
communicate the results.The data products may be for interactive visualization like
Google Flu Application,Global Burden Of Disease,or it may be a data-driven
App like Spellchecker, Google Map or Machine Translator, or it may be an
online database.The ingredients for developing data products may be identified as
i) data,ii)Technical expertise(Knowledge of machine learning) and iii) Peo-
ple and process forming the requisite talent.The use of data Science is now
all pervading, transforming a dream business into a reality.The most relevant areas
of Data Science are statistics, machine learning, databases, distributed systems,
networking,cloud computing, natural language processing etc.The ultimate goal of
data science is to generate wisdom for welfare.

Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)


Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 32 / 33
An extended view of Data Science–Contd.
We use data to generate information for drawing conclusion or taking immediate
decision with certainty; information to generate knowledge for skill and understand-
ing and finally,knowledge to generate wisdom for modeling the utility of data.The
readily available facts about anything most often require analytics or further ma-
nipulation called processing to become information; the information require quality
of repeated usability with definiteness through preservation to become knowledge
and knowledge must possess permeability to tackle real-life situations successfully
to become wisdom.A person dealing with the activities of data science is called
a data scientist.A data scientist should possess interest in all spheres of knowl-
edge so that he/she can deal with data from any real life process.Data scientists
often generate new data out of some process such as text mining and then con-
vert the data into useful facts called information.This process is often termed as
datafication.A data scientist always attempts to excel previously found results
with new ideas and approaches,often merging new concepts with fresh algorithms
for better predictions and forecasts.By ‘prediction’, we mean the identification of
one outcome, whereas,by forecasts,we imply a range of outcomes.Data Science is,
therefore, an evolving process.According to Josh Wills of Cloudera “A data
scientist is a person who is better at statistics than any software engineer
and better at software engineering than any statistician”.A data scientist
may, therefore,be regarded as the master of all trades.
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 33 / 33

You might also like