Data Science Through R Lesson-1 Introduction To Data Science
Data Science Through R Lesson-1 Introduction To Data Science
Lesson-1
Introduction to Data Science
Prof.Dr. A. B. Chowdhury,HOD,CA
This lesson aims to introduce the basic concepts of Data Science and
the closely related topics to this new paradigm of science based on
empirical facts,theoretical knowledge,computational efficiency and in-
formation explosion.Utmost care has been taken to discuss every point
from its original implication to the current notion of this rapidly emerging
technology.The focus of discussion has been kept pertinent with clarity and
brevity.
Stage 4 - Data extraction - Data that is not compatible with the tool is
extracted and then transformed into a compatible form.
Stage 5 - Data aggregation - In this stage, data with the same fields
across different datasets are integrated.
Stage 6 - Data analysis - Data is evaluated using analytical and statistical
tools to discover useful information.
Stage 7 - Visualization of data - With tools like Tableau, Power BI, and
QlikView, Big Data analysts can produce graphic visualizations of the
analysis for the business stakeholders who will take action.
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 21 / 33
Tools for Big Data Analytics
Desktop OLAP- involves low-priced, simple OLAP tools that perform local
multidimensional analysis and presentation of data downloaded to client
machines from relational or multidimensional database.
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023 30 / 33
Working principles of OLAP systems
Data collected from multiple data sources are cleansed and organized into data
cubes and then stored in data warehouses. Each OLAP cube contains data cat-
egorized by dimensions (such as customers, geographic sales region and time pe-
riod) derived by dimensional tables in the data warehouses.The dimensions are then
populated by members (such as customer names, countries and months) that are
organized hierarchically. OLAP cubes are often pre-summarized across dimensions
to drastically improve query time over relational databases.
Analysts perform five types of OLAP analytical operations against the multidimen-
sional databases as stated below:
Roll-up. Also known as consolidation, or drill-up, this is for summarizing
the data along the dimension.
Drill-down. This allows analysts to navigate deeper among the dimensions
of data, for example drilling down from ”time period” to ”years” and
”months” to chart sales growth for a product
Slicing. This enables an analyst to take one level of information for display,
such as ”sales in 2017.”
Dicing. This allows an analyst to select data from multiple dimensions to
analyze, such as ”sales of blue beach balls in Iowa in 2017.”
Pivoting. It is the task of rotating the data axes of the cube to gain a new
Prof.Dr. A. B. Chowdhury,HOD,CA (TIU,W.B.)
Data Science Through R Lesson-1Introduction to Data Science
August 12, 2023
31 / 33
An extended view of Data Science
Data Science is all about exploring treasures of truth in data to help business
leaders with the strength of right decision making for development, expansion and
diversification of business activities speedily.It comprises a sequential but iterative
set of activities consisting of procurement,preparation, analysis,visualization, man-
agement and preservation of huge collections of facts.It aims at developing prod-
ucts out of data to empower others to use the data wisely through analysis and to
communicate the results.The data products may be for interactive visualization like
Google Flu Application,Global Burden Of Disease,or it may be a data-driven
App like Spellchecker, Google Map or Machine Translator, or it may be an
online database.The ingredients for developing data products may be identified as
i) data,ii)Technical expertise(Knowledge of machine learning) and iii) Peo-
ple and process forming the requisite talent.The use of data Science is now
all pervading, transforming a dream business into a reality.The most relevant areas
of Data Science are statistics, machine learning, databases, distributed systems,
networking,cloud computing, natural language processing etc.The ultimate goal of
data science is to generate wisdom for welfare.