Intro Lectures To DSA

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Module in Data Science Analytics

Module 1
Overview of Data, Data Science Analytics, and Tools

What is Data Science?

Data science is a field of study that focuses


on techniques and algorithms to extract
knowledge from data. The area combines
data mining and machine learning with
data-specific domains. This section focuses
on defining "data" before going to any
complicated topic.

Data analysis refers to the process of


inspecting, cleaning, transforming, and
interpreting data to discover valuable
insights, draw conclusions, and support
decision-making. It involves using various
techniques and tools to analyze large sets
of data and extract meaningful patterns,
trends, correlations, and relationships
within the data. Data analysis is essential
across various industries and disciplines, as
it helps uncover valuable information that can be used to optimize processes, solve problems,
and make informed decisions.

What is the purpose of data analysis?

The purpose of data analysis is to gain meaningful insights from raw data to support
decision-making, identify patterns, and extract valuable information. Some of the key objectives
of data analysis include: Identifying trends and patterns, Making data-driven decisions, Finding
correlations and relationships, Detecting anomalies, Improving performance, and Predictive
modeling.

AN INTRODUCTION TO DATA

Data science is a field of study that focuses on techniques


and algorithms to extract knowledge from data. The area
combines data mining and machine learning with
data-specific domains. This section focuses on defining
"data" before going to any complicated topic.

What is Data?

The word “data” has the following meaning, based on the Oxford dictionary.

Data refers to facts and statistics collected for reference analysis.

Based on the definition, data has three aspects: (1) Data comes from facts and statistics, (2) Data
is collected, and (3) Data is used for reference or analysis.

The simplest form of data

A table is probably the simplest form of data. Surprisingly, most implementations of data science
algorithms still today use tabular data as inputs. Data scientists prefer to convert any type of
complex data — such as text, image, or time series — to tables to make sure that existing tools
can be leveraged for analysis.

As an example, let us say that a company keeps information about its employees in an excel
table. Here is the table.
One reason for the popularity of tabular representation is the ease in storing the tabular data
directly in the main memory of the computer. Regardless of the number of rows or columns, a
tabular dataset can always be stored in a 2-dimensional array.

Given some data, a data scientist tries to retrieve interesting information that might help the
data owner make decisions.

Can data speak?

Given the following data table regarding employees, we will try to retrieve interesting
information from it.

Closely look at the table for several minutes. Then, write down anything interesting you can find.
Here is what I could find from the table above. You can see how many of your findings match
with the findings listed below and how many of your findings are not listed below. Please feel
free to write your additional findings in the Comments section.

Jane and Dave earn the highest salary.

Delilah earns the least.

Jane and Dave are the oldest people in the group.

Delilah is the youngest among all the employees in the table.

These are all interesting findings. What else?

Will the following statement be a correct one based on the information provided in the table?

Older people earn more in the company from where the data was collected.

It is indeed a correct piece of information based on the data provided to us in the table.

Now, let us go back to the definition: Data refers to “facts and statistics collected for reference
or analysis.”

This table has facts. This table is collected from a company.


We used the table for analysis. We revealed that the
company appreciates experienced employees. Basically, the
data reflects a general trend –

Experience, wisdom, (and money, which is the salary in this


case) come with age.

There can be debates regarding the conclusion but the main point is — Data Speaks. Data gives
us insights. Data gives us those light-bulb moments.
The Difference between Data Science vs Data Analytics

What is Data Science?


Data Science is the process of using scientific methods, tools and systems to shape raw data into
meaningful information. Data scientists use machine learning algorithms to build complex,
predictive models that find patterns and trends in the data. Data Science software is used to
manipulate, organise and build predictive data models.

What is Data Analytics?


Data Analytics is the science of analysing either raw or processed data to derive useful insights,
that can then be turned into actionable plans or strategies. Data Analytics often builds on the
work first done by Data Science, using the predictive information to make decisions.

What is Business Analytics?


Business Analytics is the application of Data Analytics tools and techniques in a business context.
The historical data of a business is statistically analysed in order to understand past
performance, predict future market trends, create more accurate budgets and much more.

The Function of Data Science vs Data Analytics


The function of Data Science is to build the foundation from which Data Analytics then works on.
The key functions associated with this field are;

 Programming: Coding algorithms and computer models that can analyse large data sets.
The most common programming languages used in Data Science is R, SQL and Python.

 Data wrangling: Cleaning the data and then organising the data coherently so that it’s
both easier and more readily available to use.

 Statistical modelling: Using statistical assumptions and mathematical models such as


regression analysis, k-mean clustering and more, to identify relationships between two or
more variables. This function is tied to Quantitative research methodologies.
The function of Data Analytics is to apply a set of analysis specific frameworks and tools to data
sets in order to generate information that can be used to make decisions. These frameworks are;

 Predictive Analytics: The use of past trends, patterns and historical data to make
predictions about future events, and act accordingly. An example of this would be to
increase the inventory count of an item that sees spikes in sales during a specific month or
season.

 Prescriptive Analytics: This uses all available data to determine the best strategy, action or
plan that should be taken in a specific scenario, in order to reach the objective. It is
considered a more advanced form of Predictive Analytics. An example of this would be
e-commerce websites that show consumers a specific product they know would entice a
purchase, based on that consumer’s lifestyle data, browsing patterns and previous purchase
history.

 Descriptive Analytics: The means of summarising data to analyse, understand and


describe ‘what happened’ either in real-time or at a particular point of time in a business. An
example of this would be KPI reports or dashboards that depict the current figures against an
established benchmark.

 Diagnostic Analytics: This uses data to understand and analyse ‘why something happened’.
An example of this would be identifying why a social media campaign faired either very
poorly or did very well, in order to either avoid or duplicate the parameters.
What are the steps in the data science process ?

The data science process typically involves several steps, including:

1. Defining the problem: Identifying the problem or question that the data science project is

intended to solve.

2. Collecting and cleaning data: Gathering the data needed for the project and preparing it

for analysis by cleaning and preprocessing it.

3. Exploring and visualizing data: Examining the data to get a better understanding of its
characteristics and patterns. Visualization techniques, such as plots and charts, can be used

to help identify trends and relationships in the data.

4. Modeling and evaluation: Building and testing machine learning models to make

predictions or inferences from the data. This step may involve selecting and tuning the

model, as well as evaluating its performance using metrics such as accuracy or precision.
5. Communicating findings: Presenting the results of the data science project to stakeholders,

including key findings and recommendations.

What are some common applications of data science ?

Data science is used in a wide range of industries and sectors, including finance, healthcare, retail,

and technology. Some common applications of data science include:

 Predictive modeling: Using machine learning algorithms to predict future outcomes based

on past data.

 Customer segmentation: Grouping customers into different categories based on their

characteristics and behaviors.


 Fraud detection: Identifying fraudulent activity using patterns and anomalies in data.

Fraud and Risk Detection

The earliest applications of data science were in Finance. Companies were fed up of bad debts
and losses every year. However, they had a lot of data which use to get collected during the
initial paperwork while sanctioning loans. They decided to bring in data scientists in order to
rescue them from losses.
Over the years, banking companies learned to divide and conquer data via customer profiling,
past expenditures, and other essential variables to analyze the probabilities of risk and default.
Moreover, it also helped them to push their banking products based on customer’s purchasing
power.

Supply chain optimization: Analyzing data to improve efficiency and reduce costs in the supply

chain.

Healthcare

The healthcare sector, especially, receives great benefits from data science applications.

1. Medical Image Analysis

Procedures such as detecting tumors, artery stenosis, organ delineation employ various different
methods and frameworks like MapReduce to find optimal parameters for tasks like lung texture
classification. It applies machine learning methods, support vector machines (SVM),
content-based medical image indexing, and wavelet analysis for solid texture
classification.healthcare 1 - Data Science Applications - Edureka
2. Genetics & Genomics

Data Science applications also enable an advanced level of treatment personalization through
research in genetics and genomics. The goal is to understand the impact of the DNA on our
health and find individual biological connections between genetics, diseases, and drug response.
Data science techniques allow integration of different kinds of data with genomic data in the
disease research, which provides a deeper understanding of genetic issues in reactions to
particular drugs and diseases. As soon as we acquire reliable personal genome data, we will
achieve a deeper understanding of the human DNA. The advanced genetic risk prediction will be
a major step towards more individual care.

3. Drug Development

The drug discovery process is highly complicated and involves many disciplines. The greatest
ideas are often bounded by billions of testing, huge financial and time expenditure. On average,
it takes twelve years to make an official submission.

Data science applications and machine learning algorithms simplify and shorten this process,
adding a perspective to each step from the initial screening of drug compounds to the prediction
of the success rate based on the biological factors. Such algorithms can forecast how the
compound will act in the body using advanced mathematical modeling and simulations instead
of the “lab experiments”. The idea behind the computational drug discovery is to create
computer model simulations as a biologically relevant network simplifying the prediction of
future outcomes with high accuracy.

4. Virtual assistance for patients and customer support

Optimization of the clinical process builds upon the concept that for many cases it is not actually
necessary for patients to visit doctors in person. A mobile application can give a more effective
solution by bringing the doctor to the patient instead.

The AI-powered mobile apps can provide basic healthcare support, usually chatbots. You simply
describe your symptoms, or ask questions, and then receive key information about your medical
condition derived from a wide network linking symptoms to causes. Apps can remind you to take
your medicine on time, and if necessary, assign an appointment with a doctor.

This approach promotes a healthy lifestyle by encouraging patients to make healthy decisions,
saves their time waiting in line for an appointment, and allows doctors to focus on more critical
cases.

The most popular applications nowadays are Your.MD and Ada.

Internet Search

Now, this is probably the first thing that strikes your mind when you think Data Science
Applications.

When we speak of search, we think ‘Google’. Right? But there are many other search engines
like Yahoo, Bing, Ask, AOL, and so on. All these search engines (including Google) make use of
data science algorithms to deliver the best result for our searched query in a fraction of seconds.
Considering the fact that, Google processes more than 20 petabytes of data every day.

Had there been no data science, Google wouldn’t have been the ‘Google’ we know today.
Targeted Advertising

If you thought Search would have been the biggest of all data science applications, here is a
challenger – the entire digital marketing spectrum. Starting from the display banners on various
websites to the digital billboards at the airports – almost all of them are decided by using data
science algorithms.

This is the reason why digital ads have been able to get a lot higher CTR (Call-Through Rate) than
traditional advertisements. They can be targeted based on a user’s past behavior.

This is the reason why you might see ads of Data Science Training Programs while I see an ad of
apparels in the same place at the same time.
Website Recommendations

Aren’t we all used to the suggestions about similar products on Amazon? They not only help you
find relevant products from billions of products available with them but also add a lot to the user
experience.

A lot of companies have fervidly used this engine to promote their products in accordance with
user’s interest and relevance of information. Internet giants like Amazon, Twitter, Google Play,
Netflix, Linkedin, IMDb, and much more use this system to improve the user experience. The
recommendations are made based on previous search results for a user.

Speech Recognition

Some of the best examples of speech recognition products are Google Voice, Siri, Cortana etc.
Using the speech-recognition feature, even if you aren’t in a position to type a message, your life
wouldn’t stop. Simply speak out the message and it will be converted to text. However, at times,
you would realize, speech recognition doesn’t perform accurately.

Airline Route Planning

Airline Industry across the world is known to bear heavy losses. Except for a few airline service
providers, companies are struggling to maintain their occupancy ratio and operating profits.
With high rise in air-fuel prices and need to offer heavy discounts to customers has further made
the situation worse. It wasn’t for long when airlines companies started using data science to
identify the strategic areas of improvements. Now using data science, the airline companies can:

Predict flight delay

Decide which class of airplanes to buy

Whether to directly land at the destination or take a halt in between (For example, A flight can
have a direct route from New Delhi to New York. Alternatively, it can also choose to halt in any
country.)

Effectively drive customer loyalty programs

Southwest Airlines, Alaska Airlines are among the top companies who’ve embraced data science
to bring changes in their way of working.

You can get a better insight into it by referring to this video by our team, which vividly speaks of
all the various fields conquered by Data Science Applications.

Gaming

Games are now designed using machine learning algorithms that improve/upgrade themselves
as the player moves up to a higher level. In motion gaming also, your opponent (computer)
analyzes your previous moves and accordingly shapes up its game. EA Sports, Zynga, Sony,
Nintendo, Activision-Blizzard have led the gaming experience to the next level using data
science.

Augmented Reality

This is the final of the data science applications which seem most exciting in the future.
Augmented reality.

Data Science and Virtual Reality do have a relationship, considering a VR headset contains
computing knowledge, algorithms and data to provide you with the best viewing experience. A
very small step towards this is the high-trending game of Pokemon GO. The ability to walk
around things and look at Pokemon on walls, streets, things that aren’t really there. The creators
of this game used the data from Ingress, the last app from the same company, to choose the
locations of the Pokemon and gyms.
References:

https://www.staffordglobal.org/articles-and-blogs/data-science-articles-and-blogs/fundamental-
data-science-concepts/

https://www.staffordglobal.org/articles-and-blogs/data-science-articles-and-blogs/difference-da
ta-science-data-analytics/

https://studymafia.org/data-analysis-ppt/?expand_article=1

https://medium.com/@hamzakhalid2111/the-fundamentals-of-data-science-a-guide-for-beginn
ers-b563db9522ba

https://mitu.co.in/wp-content/uploads/2021/11/7.-Data-Analytics.pdf

https://www.investopedia.com/terms/p/prescriptive-analytics.asp

https://www.upgrad.com/blog/types-of-data/

You might also like