Intro Lectures To DSA
Intro Lectures To DSA
Intro Lectures To DSA
Module 1
Overview of Data, Data Science Analytics, and Tools
The purpose of data analysis is to gain meaningful insights from raw data to support
decision-making, identify patterns, and extract valuable information. Some of the key objectives
of data analysis include: Identifying trends and patterns, Making data-driven decisions, Finding
correlations and relationships, Detecting anomalies, Improving performance, and Predictive
modeling.
AN INTRODUCTION TO DATA
What is Data?
The word “data” has the following meaning, based on the Oxford dictionary.
Based on the definition, data has three aspects: (1) Data comes from facts and statistics, (2) Data
is collected, and (3) Data is used for reference or analysis.
A table is probably the simplest form of data. Surprisingly, most implementations of data science
algorithms still today use tabular data as inputs. Data scientists prefer to convert any type of
complex data — such as text, image, or time series — to tables to make sure that existing tools
can be leveraged for analysis.
As an example, let us say that a company keeps information about its employees in an excel
table. Here is the table.
One reason for the popularity of tabular representation is the ease in storing the tabular data
directly in the main memory of the computer. Regardless of the number of rows or columns, a
tabular dataset can always be stored in a 2-dimensional array.
Given some data, a data scientist tries to retrieve interesting information that might help the
data owner make decisions.
Given the following data table regarding employees, we will try to retrieve interesting
information from it.
Closely look at the table for several minutes. Then, write down anything interesting you can find.
Here is what I could find from the table above. You can see how many of your findings match
with the findings listed below and how many of your findings are not listed below. Please feel
free to write your additional findings in the Comments section.
Will the following statement be a correct one based on the information provided in the table?
Older people earn more in the company from where the data was collected.
It is indeed a correct piece of information based on the data provided to us in the table.
Now, let us go back to the definition: Data refers to “facts and statistics collected for reference
or analysis.”
There can be debates regarding the conclusion but the main point is — Data Speaks. Data gives
us insights. Data gives us those light-bulb moments.
The Difference between Data Science vs Data Analytics
Programming: Coding algorithms and computer models that can analyse large data sets.
The most common programming languages used in Data Science is R, SQL and Python.
Data wrangling: Cleaning the data and then organising the data coherently so that it’s
both easier and more readily available to use.
Predictive Analytics: The use of past trends, patterns and historical data to make
predictions about future events, and act accordingly. An example of this would be to
increase the inventory count of an item that sees spikes in sales during a specific month or
season.
Prescriptive Analytics: This uses all available data to determine the best strategy, action or
plan that should be taken in a specific scenario, in order to reach the objective. It is
considered a more advanced form of Predictive Analytics. An example of this would be
e-commerce websites that show consumers a specific product they know would entice a
purchase, based on that consumer’s lifestyle data, browsing patterns and previous purchase
history.
Diagnostic Analytics: This uses data to understand and analyse ‘why something happened’.
An example of this would be identifying why a social media campaign faired either very
poorly or did very well, in order to either avoid or duplicate the parameters.
What are the steps in the data science process ?
1. Defining the problem: Identifying the problem or question that the data science project is
intended to solve.
2. Collecting and cleaning data: Gathering the data needed for the project and preparing it
3. Exploring and visualizing data: Examining the data to get a better understanding of its
characteristics and patterns. Visualization techniques, such as plots and charts, can be used
4. Modeling and evaluation: Building and testing machine learning models to make
predictions or inferences from the data. This step may involve selecting and tuning the
model, as well as evaluating its performance using metrics such as accuracy or precision.
5. Communicating findings: Presenting the results of the data science project to stakeholders,
Data science is used in a wide range of industries and sectors, including finance, healthcare, retail,
Predictive modeling: Using machine learning algorithms to predict future outcomes based
on past data.
The earliest applications of data science were in Finance. Companies were fed up of bad debts
and losses every year. However, they had a lot of data which use to get collected during the
initial paperwork while sanctioning loans. They decided to bring in data scientists in order to
rescue them from losses.
Over the years, banking companies learned to divide and conquer data via customer profiling,
past expenditures, and other essential variables to analyze the probabilities of risk and default.
Moreover, it also helped them to push their banking products based on customer’s purchasing
power.
Supply chain optimization: Analyzing data to improve efficiency and reduce costs in the supply
chain.
Healthcare
The healthcare sector, especially, receives great benefits from data science applications.
Procedures such as detecting tumors, artery stenosis, organ delineation employ various different
methods and frameworks like MapReduce to find optimal parameters for tasks like lung texture
classification. It applies machine learning methods, support vector machines (SVM),
content-based medical image indexing, and wavelet analysis for solid texture
classification.healthcare 1 - Data Science Applications - Edureka
2. Genetics & Genomics
Data Science applications also enable an advanced level of treatment personalization through
research in genetics and genomics. The goal is to understand the impact of the DNA on our
health and find individual biological connections between genetics, diseases, and drug response.
Data science techniques allow integration of different kinds of data with genomic data in the
disease research, which provides a deeper understanding of genetic issues in reactions to
particular drugs and diseases. As soon as we acquire reliable personal genome data, we will
achieve a deeper understanding of the human DNA. The advanced genetic risk prediction will be
a major step towards more individual care.
3. Drug Development
The drug discovery process is highly complicated and involves many disciplines. The greatest
ideas are often bounded by billions of testing, huge financial and time expenditure. On average,
it takes twelve years to make an official submission.
Data science applications and machine learning algorithms simplify and shorten this process,
adding a perspective to each step from the initial screening of drug compounds to the prediction
of the success rate based on the biological factors. Such algorithms can forecast how the
compound will act in the body using advanced mathematical modeling and simulations instead
of the “lab experiments”. The idea behind the computational drug discovery is to create
computer model simulations as a biologically relevant network simplifying the prediction of
future outcomes with high accuracy.
Optimization of the clinical process builds upon the concept that for many cases it is not actually
necessary for patients to visit doctors in person. A mobile application can give a more effective
solution by bringing the doctor to the patient instead.
The AI-powered mobile apps can provide basic healthcare support, usually chatbots. You simply
describe your symptoms, or ask questions, and then receive key information about your medical
condition derived from a wide network linking symptoms to causes. Apps can remind you to take
your medicine on time, and if necessary, assign an appointment with a doctor.
This approach promotes a healthy lifestyle by encouraging patients to make healthy decisions,
saves their time waiting in line for an appointment, and allows doctors to focus on more critical
cases.
Internet Search
Now, this is probably the first thing that strikes your mind when you think Data Science
Applications.
When we speak of search, we think ‘Google’. Right? But there are many other search engines
like Yahoo, Bing, Ask, AOL, and so on. All these search engines (including Google) make use of
data science algorithms to deliver the best result for our searched query in a fraction of seconds.
Considering the fact that, Google processes more than 20 petabytes of data every day.
Had there been no data science, Google wouldn’t have been the ‘Google’ we know today.
Targeted Advertising
If you thought Search would have been the biggest of all data science applications, here is a
challenger – the entire digital marketing spectrum. Starting from the display banners on various
websites to the digital billboards at the airports – almost all of them are decided by using data
science algorithms.
This is the reason why digital ads have been able to get a lot higher CTR (Call-Through Rate) than
traditional advertisements. They can be targeted based on a user’s past behavior.
This is the reason why you might see ads of Data Science Training Programs while I see an ad of
apparels in the same place at the same time.
Website Recommendations
Aren’t we all used to the suggestions about similar products on Amazon? They not only help you
find relevant products from billions of products available with them but also add a lot to the user
experience.
A lot of companies have fervidly used this engine to promote their products in accordance with
user’s interest and relevance of information. Internet giants like Amazon, Twitter, Google Play,
Netflix, Linkedin, IMDb, and much more use this system to improve the user experience. The
recommendations are made based on previous search results for a user.
Speech Recognition
Some of the best examples of speech recognition products are Google Voice, Siri, Cortana etc.
Using the speech-recognition feature, even if you aren’t in a position to type a message, your life
wouldn’t stop. Simply speak out the message and it will be converted to text. However, at times,
you would realize, speech recognition doesn’t perform accurately.
Airline Industry across the world is known to bear heavy losses. Except for a few airline service
providers, companies are struggling to maintain their occupancy ratio and operating profits.
With high rise in air-fuel prices and need to offer heavy discounts to customers has further made
the situation worse. It wasn’t for long when airlines companies started using data science to
identify the strategic areas of improvements. Now using data science, the airline companies can:
Whether to directly land at the destination or take a halt in between (For example, A flight can
have a direct route from New Delhi to New York. Alternatively, it can also choose to halt in any
country.)
Southwest Airlines, Alaska Airlines are among the top companies who’ve embraced data science
to bring changes in their way of working.
You can get a better insight into it by referring to this video by our team, which vividly speaks of
all the various fields conquered by Data Science Applications.
Gaming
Games are now designed using machine learning algorithms that improve/upgrade themselves
as the player moves up to a higher level. In motion gaming also, your opponent (computer)
analyzes your previous moves and accordingly shapes up its game. EA Sports, Zynga, Sony,
Nintendo, Activision-Blizzard have led the gaming experience to the next level using data
science.
Augmented Reality
This is the final of the data science applications which seem most exciting in the future.
Augmented reality.
Data Science and Virtual Reality do have a relationship, considering a VR headset contains
computing knowledge, algorithms and data to provide you with the best viewing experience. A
very small step towards this is the high-trending game of Pokemon GO. The ability to walk
around things and look at Pokemon on walls, streets, things that aren’t really there. The creators
of this game used the data from Ingress, the last app from the same company, to choose the
locations of the Pokemon and gyms.
References:
https://www.staffordglobal.org/articles-and-blogs/data-science-articles-and-blogs/fundamental-
data-science-concepts/
https://www.staffordglobal.org/articles-and-blogs/data-science-articles-and-blogs/difference-da
ta-science-data-analytics/
https://studymafia.org/data-analysis-ppt/?expand_article=1
https://medium.com/@hamzakhalid2111/the-fundamentals-of-data-science-a-guide-for-beginn
ers-b563db9522ba
https://mitu.co.in/wp-content/uploads/2021/11/7.-Data-Analytics.pdf
https://www.investopedia.com/terms/p/prescriptive-analytics.asp
https://www.upgrad.com/blog/types-of-data/