Unit 1 Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Unit 1

INTRODUCTION TO DATA SCIENCE


Defining data science

What is Data Science?

Data Science is about data gathering, analysis and decision-making.

Data Science is about finding patterns in data, through analysis, and make future
predictions.

By using Data Science, companies are able to make:

 Better decisions (should we choose A or B)


 Predictive analysis (what will happen next?)
 Pattern discoveries (find pattern, or maybe hidden information in the data)

Where is Data Science Needed?

Data Science is used in many industries in the world today, e.g. banking,
consultancy, healthcare, and manufacturing.

Examples of where Data Science is needed:

 For route planning: To discover the best routes to ship


 To foresee delays for flight/ship/train etc. (through predictive analysis)
 To create promotional offers
 To find the best suited time to deliver goods
 To forecast the next years revenue for a company
 To analyze health benefit of training
 To predict who will win elections

What is Data?

The quantities, characters, or symbols on which operations are performed by a


computer, which may be stored and transmitted in the form of electrical signals
and recorded on magnetic, optical, or mechanical recording media.

Now, let’s learn Big Data definition

What is Big Data?

Big Data is a collection of data that is huge in volume, yet growing exponentially
with time. It is a data with so large size and complexity that none of traditional data
management tools can store it or process it efficiently. Big data is also a data but
with huge size.

What is an Example of Big Data?

Following are some of the Big Data examples-

Social Media

The statistic shows that 500+terabytes of new data get ingested into the databases
of social media site Facebook, every day. This data is mainly generated in terms of
photo and video uploads, message exchanges, putting comments etc.

Recognizing the different types of data


Introduction – Importance of Data

“Data is the new oil.” Today data is everywhere in every field. Whether you are a
data scientist, marketer, businessman, data analyst, researcher, or you are in any
other profession, you need to play or experiment with raw or structured data. This
data is so important for us that it becomes important to handle and store it properly,
without any error. While working on these data, it is important to know the types
of data to process them and get the right results. There are two types of data:
Qualitative and Quantitative data, which are further classified into:

The data is classified into four categories:

 Nominal data.
 Ordinal data.
 Discrete data.
 Continuous data.

Now business runs on data, and most companies use data for their insights to create
and launch campaigns, design strategies, launch products and services or try out
different things. According to a report, today, at least 2.5 quintillion bytes of data
are produced per day.
Types of Data

Qualitative or Categorical Data

Qualitative or Categorical Data is data that can’t be measured or counted in the


form of numbers. These types of data are sorted by category, not by number.
That’s why it is also known as Categorical Data. These data consist of audio,
images, symbols, or text. The gender of a person, i.e., male, female, or others, is
qualitative data.

Qualitative data tells about the perception of people. This data helps market
researchers understand the customers’ tastes and then design their ideas and
strategies accordingly.

The other examples of qualitative data are :


 What language do you speak
 Favorite holiday destination
 Opinion on something (agree, disagree, or neutral)
 Colors

The Qualitative data are further classified into two parts :

Nominal Data

Nominal Data is used to label variables without any order or quantitative value.
The color of hair can be considered nominal data, as one color can’t be compared
with another color.

The name “nominal” comes from the Latin name “nomen,” which means “name.”
With the help of nominal data, we can’t do any numerical tasks or can’t give any
order to sort the data. These data don’t have any meaningful order; their values are
distributed into distinct categories.

Examples of Nominal Data :

 Colour of hair (Blonde, red, Brown, Black, etc.)


 Marital status (Single, Widowed, Married)
 Nationality (Indian, German, American)
 Gender (Male, Female, Others)
 Eye Color (Black, Brown, etc.)

Ordinal Data
Ordinal data have natural ordering where a number is present in some kind of order
by their position on the scale. These data are used for observation like customer
satisfaction, happiness, etc., but we can’t do any arithmetical tasks on them.

Ordinal data is qualitative data for which their values have some kind of relative
position. These kinds of data can be considered “in-between” qualitative and
quantitative data. The ordinal data only shows the sequences and cannot use for
statistical analysis. Compared to nominal data, ordinal data have some kind of
order that is not present in nominal data.

Examples of Ordinal Data :

 When companies ask for feedback, experience, or satisfaction on a scale of 1


to 10
 Letter grades in the exam (A, B, C, D, etc.)
 Ranking of people in a competition (First, Second, Third, etc.)
 Economic Status (High, Medium, and Low)
 Education Level (Higher, Secondary, Primary)

Difference between Nominal and Ordinal Data


Nominal Data Ordinal Data

Nominal data can’t be quantified,


Ordinal data gives some kind of sequential
neither they have any intrinsic
order by their position on the scale
ordering

Nominal data is qualitative data or Ordinal data is said to be “in-between”


categorical data qualitative data and quantitative data

They don’t provide any quantitative They provide sequence and can assign
value, neither can we perform any numbers to ordinal data but cannot perform
arithmetical operation the arithmetical operation

Nominal data cannot be used to Ordinal data can help to compare one item
compare with one another with another by ranking or ordering

Examples: Eye color, housing style, Examples: Economic status, customer


gender, hair color, religion, marital satisfaction, education level, letter grades,
status, ethnicity, etc etc

Quantitative Data

Quantitative data can be expressed in numerical values, making it countable and


including statistical data analysis. These kinds of data are also known as Numerical
data. It answers the questions like “how much,” “how many,” and “how often.” For
example, the price of a phone, the computer’s ram, the height or weight of a
person, etc., falls under quantitative data.

Quantitative data can be used for statistical manipulation. These data can be
represented on a wide variety of graphs and charts, such as bar graphs, histograms,
scatter plots, boxplots, pie charts, line graphs, etc.

Examples of Quantitative Data :

 Height or weight of a person or object


 Room Temperature
 Scores and Marks (Ex: 59, 80, 60, etc.)
 Time

The Quantitative data are further classified into two parts :


Discrete Data

The term discrete means distinct or separate. The discrete data contain the values
that fall under integers or whole numbers. The total number of students in a class is
an example of discrete data. These data can’t be broken into decimal or fraction
values.

The discrete data are countable and have finite values; their subdivision is not
possible. These data are represented mainly by a bar graph, number line, or
frequency table.

Examples of Discrete Data :

 Total numbers of students present in a class


 Cost of a cell phone
 Numbers of employees in a company
 The total number of players who participated in a competition
 Days in a week

Continuous Data

Continuous data are in the form of fractional numbers. It can be the version of an
android phone, the height of a person, the length of an object, etc. Continuous data
represents information that can be divided into smaller levels. The continuous
variable can take any value within a range.

The key difference between discrete and continuous data is that discrete data
contains the integer or whole number. Still, continuous data stores the fractional
numbers to record different types of data such as temperature, height, width, time,
speed, etc.
Examples of Continuous Data :

 Height of a person
 Speed of a vehicle
 “Time-taken” to finish the work
 Wi-Fi Frequency
 Market share price

Difference between Discrete and Continuous Data


Discrete Data Continuous Data

Discrete data are countable and finite; Continuous data are measurable; they
they are whole numbers or integers are in the form of fractions or decimal

Discrete data are represented mainly by Continuous data are represented in the
bar graphs form of a histogram

The values cannot be divided into The values can be divided into
subdivisions into smaller pieces subdivisions into smaller pieces

Discrete data have spaces between the Continuous data are in the form of a
values continuous sequence

Examples: Total students in a class, Example: Temperature of room, the


number of days in a week, size of a shoe, weight of a person, length of an object,
etc etc

Data Science Process


Overview

Introduction :

Data Science could be a space that incorporates working with colossal sums of
information, creating calculations, working with machine learning and more to
come up with trade insights. It incorporates working with the gigantic sum of
information. Different processes are included to infer the information from the
source like extraction of data, information preparation, model planning, model
building and many more. The below image depicts the various processes of Data
Science.

Different steps
 Discovery To begin with, it is exceptionally imperative to get the different
determinations, prerequisites, needs and required budget-related with the
venture. You must have the capacity to inquire the correct questions like do
you have got the desired assets. These assets can be in terms of individuals,
innovation, time and information. In this stage, you too got to outline the
trade issue and define starting hypotheses (IH) to test.
 Information Preparation In this stage, you would like to investigate,
preprocess and condition data for modeling. You’ll be able to perform
information cleaning, changing, and visualization. This will assist you to
spot the exceptions and build up a relationship between the factors. Once
you have got cleaned and arranged the information, it’s time to do
exploratory analytics on it.
 Model Planning Here, you may decide the strategies and methods to draw
the connections between factors. These connections will set the base for the
calculations which you may execute within the following stage. You may
apply Exploratory Data Analytics (EDA) utilizing different factual equations
and visualization apparatuses.
 Model Building In this stage, you’ll create datasets for training and testing
purposes. You may analyze different learning procedures like classification,
association, and clustering and at last, actualize the most excellent fit
technique to construct the show.
 Operationalize In this stage, you convey the last briefings, code, and
specialized reports. In expansion, now a pilot venture is additionally
actualized in a real-time generation environment. This will give you a clear
picture of the execution and other related limitations.
 Communicate Results Presently, it is critical to assess the outcome of the
objective. So, within the final stage, you recognize all the key discoveries,
communicate to the partners and decide in the event that the outcomes about
the venture are a victory or a disappointment based on the criteria created in
Stage 1.

Machine Learning
Machine learning is a method of data analysis that automates analytical model
building. It is a branch of artificial intelligence based on the idea that systems can
learn from data, identify patterns and make decisions with minimal human
intervention.
Relationship between Machine Learning and Data Science

If you are wondering how machine learning and data science are related to each
other and not interchangeable, it must be noted that though the distinction between
these concepts is not too clear, understanding these terms will give you a clear and
deeper understanding of the relationship between machine learning and data
science.

It will also help you understand how they are closely connected.

While machine learning uses a variety of algorithms to parse and learn from data in
order to make accurate decisions, data science is a broad, interdisciplinary field
that interprets huge amounts of data and is used for a number of applications.

It involves the collection of data, management of data, analysis, and interpretation


to present it visually.

Data science correlates all other terms such as deep learning, machine learning,
and artificial intelligence to present meaningful insights for given data, including
exploratory data analytics and predictive analytics, to make accurate predictions
when large datasets are given.

This makes it clear that data science uses fields such as machine learning,
visualization, and statistics, to name a few.

Data science and machine learning are very crucial for businesses to make accurate
decisions on the strategic level with the help of these key skills.

In simpler terms, one must understand that artificial intelligence works on the basis
of machine learning. It then collects data that is used as a part of data science.
This will give you a fairly better idea of the relationship between these two fields.
Machine learning is a critical part of data science. It effectively uses statistics and
algorithms in order to analyze and work on data extracted from multiple sources.

When asked about the difference between Artificial Intelligence, Machine


Learning, Deep learning and Data Science, Mr. Omkar Raikar, CTO of Business
Toys Pvt Ltd says –

“AI is enabling machines to think by using Machine Learning, which provides us


statistical & mathematical tools to explore and analyze data and deep learning
techniques, a subset of Machine learning, which is an attempt to mimic human
brain using Multi-Neural network architecture. Now this entire study of processes
involving one to solve data problems using Machine learning and deep learning is
Data Science”.

With this, one can say that data science is a field that merges algorithms derived
from machine learning to develop practical solutions while making use of domain
expertise, mathematics, and statistics.

Therefore, it stands as a one-stop term for merging machine learning and artificial
intelligence into one function.

It is crucial to keep in mind that machine learning and artificial intelligence are a
part of data science where the purpose is achieved on a different, advanced level.

Data science is a very broad term that cannot solely focus on algorithms that are
considered complex, and this is why machine learning is a necessity for this field.

Machine learning can be used for various projects, such as supervised clustering or
regression, minimizing human error.
However, it cannot be denied that when it comes to searching for available patterns
and bringing a solid structure to big data, data science is revolutionizing the needs
of businesses across the globe.

It is necessary to understand that regardless of whether it is data science or


machine learning, they cannot work in isolation.

Both these fields require to be integrated in order to achieve maximum results. A


set plan might give you the results you need, but without the integration of these
concepts, there is no quick way to learn.

Data science helps you focus on what problems you need to solve, and machine
learning helps you in building real-world applications that facilitate you in solving
the problems you just recognized.

You might also like