0% found this document useful (0 votes)

62 views

Introduction To Data Analtsis

The document discusses different types of data analysis including descriptive, predictive, and prescriptive analytics. It also outlines the typical life cycle of analysis, including problem identification, hypothesis formulation, data collection, data exploration, model building, and model validation. Finally, it covers important concepts in data analysis like variables, samples, populations, qualitative vs. quantitative variables, discrete vs. continuous variables, and scales of measurement.

Uploaded by

farhan selamat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views

Introduction To Data Analtsis

Uploaded by

farhan selamat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

Introduction to data analtsis

Definition analysis data

The process of evaluating data using analytical and logical
reasoning to examine each component of the data
provided. This form of analysis is just one of the many
steps that must be completed when conducting a research
experiment. Data from various sources is gathered,
reviewed, and then analyzed to form some sort of finding
or .
Types of data analysis

 DESCRIPTIVE ANALYSIS
 Descriptive analysis is an important first step for conducting statistical analyses. It gives you an idea of the
distribution of your data, helps you detect outliers and typos, and enable you identify associations among variables,
thus preparing you for conducting further statistical analysis.
 Predictive Analysis
Predictive analytics uses historical data to predict future events. Typically, historical data is used to build a mathematical
model that captures important trends. That predictive model is then used on current data to predict what will happen next,
or to suggest actions to take for optimal outcomes.
 Prescriptive Analytics
Prescriptive Analytics is the area of data analytics that focuses on finding the best course of action in a scenario given the
available data.
Analysis life Cycle

1. Problem identification
2. Hypothesis formulation
3. Data collection
4. Data exploration/preparation
5. Model building
6. Model validation and evalution
Analysis life Cycle

1. Problem identification
-The problem is situation which is judged to be corrected or solved
-Problem can be identified through
i) comparative/benchmarking stidies
ii) performance reporting
iii) asking some basic questions
a) who are affected by the problem
b) what will happen if problem is not solved
c) when and where does the problem occur
d) Why the problem occurring
e) how are the people currently handling the problem
Analysis life Cycle

2.Hypothesis formulation
i) Frame the questions which need to be answered
ii) Develop a comprehensive list of all possible issues related to the problem.
iii) Reduce the list by eliminating duplicates and combining overlapping issues
iv) Using consensus building get down to a major issue list
Analysis life Cycle

3. Data collection
i) Using data that is already collected by ather
ii) Systematically selecting and watching charateristics of people,objects and events
iii) Oral questioning respondents either individually or as a group
iv) Collecting data based on answers provided by the respondents in written format
Analysis life Cycle

4. Data Exploration
i) Importing data
ii) Variable Idewnfication
iii) Data Cleaning
iv) Summarizing data
v) Selecting subset of data
5. Model Building
 Building a Model is a very iterative process because there is no such thing as final and
perfect solution
 Many of the machine learning and statistical techniques are available in traditional technology
platform
8
Population

 The entire group of individuals is called the

population.
 For example, a researcher may be interested in the
relation between class size (variable 1) and
academic performance (variable 2) for the
population of third-grade children.
9 Sample

 Usually populations are so large that a researcher

cannot examine the entire group. Therefore, a
sample is selected to represent the population in a
research study. The goal is to use the results
obtained from the sample to help answer questions
about the population.
A census is a list of all individuals in a
population along with certain characteristics of
each individual.

A Pilot Study ia a study done before the actual field work is

carried out. This study is also used to test out questionnaires
and to improvethem in term of flow,question design,
language and clarity

A sample survey, on the other hand, involves a subgroup (or

sample) of the population being chosen and questioned on set
of topics. The results of this sample survey are usually used to
make inference about the larger population.
A sample of size n from a population of size N
is obtained through simple random sampling
if every possible sample of size n has an
equally likely chance of occurring. The sample
is then called a simple random sample.
14
Notation

 The individual measurements or scores obtained for a

research participant will be identified by the letter X (or X
and Y if there are multiple scores for each individual).
 The number of scores in a data set will be identified by N
for a population or n for a sample.
 Summing a set of values is a common operation in
statistics and has its own notation. The Greek letter sigma,
Σ, will be used to stand for "the sum of." For example, ΣX
identifies the sum of the scores.
EXAMPLE Parameter versus Statistic

Suppose the percentage of all students on your campus who have a

job is 84.9%. This value represents a parameter because it is a
numerical summary of a population.
Suppose a sample of 250 students is obtained, and from this
sample we find that 86.3% have a job. This value represents a
statistic because it is a numerical summary based on a sample.
16 Data

 The measurements obtained in a research study are

called the data.
 The goal of statistics is to help researchers
organize and interpret the data.
Some Characteristics of Data
 Not all data is the same. There are some limitations as to
what can and cannot be done with a data set, depending
on the characteristics of the data
 Some key characteristics that must be considered are:
 Continuous vs. Discrete
 Grouped vs. Individual
 Scale of Measurement
18
Variables

 A variable is a characteristic or condition that can

change or take on different values.
 Most research begins with a general question about
the relationship between two variables for a
specific group of individuals.
Variables are the characteristics of the individuals
within the population

Key Point: Variables vary. Consider the variable

heights. If all individuals had the same height, then
obtaining the height of one individual would be
sufficient in knowing the heights of all individuals. Of
course, this is not the case. As researchers, we wish to
identify the factors that influence variability.
20 Types of Variables

 Variables can be classified as discrete or continuous.

 Discrete variables (such as class size) consist of indivisible categories, and continuous
variables (such as time or weight) are infinitely divisible into whatever units a researcher
may choose. For example, time can be measured to the nearest minute, second, half-
second, etc.
Qualitative or Categorical variables allow for
classification of individuals based on some attribute or
characteristic.

Quantitative variables provide numerical measures of

individuals. Arithmetic operations such as addition and
subtraction can be performed on the values of the
quantitative variable and provide meaningful results.
EXAMPLE Distinguishing between Qualitative and Quantitative Variables

Researcher Elisabeth Kvaavik and others studied factors that affect the eating habits of
adults in their mid-thirties. (Source: Kvaavik E, et. Al. Psychological explanatorys
of eating habits among adults in their mid-30’s (2005) International Journal of
Behavioral Nutrition and Physical Activity (2)9.) Classify each of the following
variables considered in the study as qualitative or quantitative.
a. Nationality
Qualitative
b. Number of children
c. Household income in theQuantitative
previous year
d. Level of education Quantitative
Qualitative
e. Daily intake of whole grains (measured in grams per day)
Quantitative
A discrete variable is a quantitative variable that either has a finite number
of possible values or a countable number of possible values. The term
“countable” means the values result from counting such as 0, 1, 2, 3, and so
on.

A continuous variable is a quantitative variable that has an infinite

number of possible values it can take on and can be measured to any
desired level of accuracy. e.g., 1, 1.43, and 3.1415926 are all acceptable
values.
Geographic examples: distance, tree height, amount of precipitation, etc
EXAMPLE Distinguishing between Qualitative and Quantitative Variables

a. Number of children
b. Household income in theDiscrete
previous year
Continuous
c. Daily intake of whole grains (measured in grams per day)
Continuous
25
Measuring Variables
 To establish relationships between variables, researchers must observe the variables
and record their observations. This requires that the variables be measured.
 The process of measuring a variable requires a set of categories called a scale of
measurement and a process that classifies each individual into one category.
Scales of Measurement

 The data used in statistical analyses can divided into

four types:
1. The Nominal Scale
2. The Ordinal Scale
As we progress through
3. The interval Scale
these scales, the types
4. The Ratio Scale
of data they describe
have increasing
information content
The Nominal Scale
 Nominal scale data are data that can simply be
broken down into categories, i.e., having to do with
names or types
 Dichotomous or binary nominal data has just two
types, e.g., yes/no, female/male, is/is not, hot/cold,
etc
 Multichotomous data has more than two types, e.g.,
vegetation types, soil types, counties, eye color, etc
 Not a scale in the sense that categories cannot be
ranked or ordered (no greater/less than)
The Ordinal Scale
 Ordinal scale data can be categorized AND can be
placed in an order, i.e., categories that can be assigned a
relative importance and can be ranked such that
numerical category values have
 star-system restaurant rankings
5 stars > 4 stars, 4 stars > 3 stars, 5 stars > 2 stars
 BUT ordinal data still are not scalar in the sense that
differences between categories do not have a
quantitative meaning
 i.e., a 5 star restaurant is not superior to a 4 star restaurant by
the same amount as a 4 star restaurant is than a 3 star
restaurant
The Interval Scale
 Interval scale data take the notion of ranking items in
order one step further, since the distance between
adjacent points on the scale are equal
 For instance, the Fahrenheit scale is an interval scale,
since each degree is equal but there is no absolute zero
point.
 This means that although we can add and subtract
degrees (100° is 10° warmer than 90°), we cannot
multiply values or create ratios (100° is not twice as
warm as 50°)
The Ratio Scale

 Similar to the interval scale, but with the addition of

having a meaningful zero value, which allows us to
compare values using multiplication and division
operations, e.g., precipitation, weights, heights, etc
 e.g., rain – We can say that 2 inches of rain is twice as
much rain as 1 inch of rain because this is a ratio scale
measurement
 e.g., age – a 100-year old person is indeed twice as old
as a 50-year old one
The Ordinal Scale
 Ordinal scale data can be categorized AND can be
placed in an order, i.e., categories that can be assigned a
relative importance and can be ranked such that
numerical category values have
 star-system restaurant rankings
5 stars > 4 stars, 4 stars > 3 stars, 5 stars > 2 stars
 BUT ordinal data still are not scalar in the sense that
differences between categories do not have a
quantitative meaning
 i.e., a 5 star restaurant is not superior to a 4 star restaurant by
the same amount as a 4 star restaurant is than a 3 star
restaurant
The Interval Scale
 Interval scale data take the notion of ranking items in
order one step further, since the distance between
adjacent points on the scale are equal
 For instance, the Fahrenheit scale is an interval scale,
since each degree is equal but there is no absolute zero
point.
 This means that although we can add and subtract
degrees (100° is 10° warmer than 90°), we cannot
multiply values or create ratios (100° is not twice as
warm as 50°)
The Ratio Scale

 Similar to the interval scale, but with the addition of

Week 1-LS6 DLL-Digital Natives
No ratings yet
Week 1-LS6 DLL-Digital Natives
5 pages
Understanding The Development of Young Person
100% (1)
Understanding The Development of Young Person
10 pages
Statistics and Probability - Midterm Reviewer
No ratings yet
Statistics and Probability - Midterm Reviewer
12 pages
Lecture 1 - Introduction To Statistics
No ratings yet
Lecture 1 - Introduction To Statistics
41 pages
Note for Int to Statistics
No ratings yet
Note for Int to Statistics
24 pages
Statistics and Probability - Midterm Reviewer
No ratings yet
Statistics and Probability - Midterm Reviewer
13 pages
(Buiness Statistics) Chapter 1 2
No ratings yet
(Buiness Statistics) Chapter 1 2
33 pages
Basics of Biostatistics ALL
No ratings yet
Basics of Biostatistics ALL
456 pages
Unit 2
No ratings yet
Unit 2
72 pages
Statistics Analysis With Software Application
No ratings yet
Statistics Analysis With Software Application
22 pages
Math as a Tool Data Management Introduction and Central Tendency
No ratings yet
Math as a Tool Data Management Introduction and Central Tendency
12 pages
MMW Module 4 Lesson 1
No ratings yet
MMW Module 4 Lesson 1
13 pages
MMW Module 4
No ratings yet
MMW Module 4
54 pages
ITC 112 Lesson 1
No ratings yet
ITC 112 Lesson 1
54 pages
chapter 1_250119_072242
No ratings yet
chapter 1_250119_072242
11 pages
Introduction
No ratings yet
Introduction
43 pages
Statistic
No ratings yet
Statistic
171 pages
1data Management Mamw 100
100% (1)
1data Management Mamw 100
84 pages
Basic Ideas of Data Management
No ratings yet
Basic Ideas of Data Management
32 pages
AE9 - Statistical Analysis With Software Application
100% (1)
AE9 - Statistical Analysis With Software Application
16 pages
STA132 Lecture Notes - 1
No ratings yet
STA132 Lecture Notes - 1
6 pages
Week 1 Introduction To Statistics 8 11am
No ratings yet
Week 1 Introduction To Statistics 8 11am
29 pages
Lesson 1:: Basic Terminologies in Statistics
No ratings yet
Lesson 1:: Basic Terminologies in Statistics
3 pages
STATAPP1
No ratings yet
STATAPP1
11 pages
Statistical Analysis
No ratings yet
Statistical Analysis
26 pages
Stats Bio Supp. 1
No ratings yet
Stats Bio Supp. 1
11 pages
Chapter-1 Data analysis
No ratings yet
Chapter-1 Data analysis
14 pages
Statistics and Probability M - PLV TextBook
No ratings yet
Statistics and Probability M - PLV TextBook
83 pages
Introduction To STATISTICS-new
No ratings yet
Introduction To STATISTICS-new
44 pages
Lecture 1
No ratings yet
Lecture 1
19 pages
Section 6 Data - Statistics For Quantitative Study
No ratings yet
Section 6 Data - Statistics For Quantitative Study
142 pages
Introduction Book 1
No ratings yet
Introduction Book 1
41 pages
(Wills, N.D.) : Null Hypothesis (Ho)
No ratings yet
(Wills, N.D.) : Null Hypothesis (Ho)
4 pages
Bahir Dar University College of Agriculture and Environmental Sciences
No ratings yet
Bahir Dar University College of Agriculture and Environmental Sciences
44 pages
MTPDF1 - Introduction To Statistics
No ratings yet
MTPDF1 - Introduction To Statistics
106 pages
Chapter 1 Introduction To Statistics
No ratings yet
Chapter 1 Introduction To Statistics
28 pages
Introduction To Statistics: There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
No ratings yet
Introduction To Statistics: There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
26 pages
The Nature of Probability and Statistics
No ratings yet
The Nature of Probability and Statistics
30 pages
Introduction To Statistics-Part I
No ratings yet
Introduction To Statistics-Part I
28 pages
TOPIC ONE_INTRODUCTION
No ratings yet
TOPIC ONE_INTRODUCTION
72 pages
Lecture 1 - Introduction To Statistics
No ratings yet
Lecture 1 - Introduction To Statistics
3 pages
Biostat
No ratings yet
Biostat
20 pages
INTRODUCTION-TO-STATISTICAL-CONCEPTS
No ratings yet
INTRODUCTION-TO-STATISTICAL-CONCEPTS
10 pages
STA132 Complete Note
No ratings yet
STA132 Complete Note
110 pages
Introduction To Statistics
100% (3)
Introduction To Statistics
43 pages
FIN10002 - Notes Master
No ratings yet
FIN10002 - Notes Master
44 pages
STAT110 Biostatistics
No ratings yet
STAT110 Biostatistics
21 pages
Basic Statistics Concept Activity No 1.
No ratings yet
Basic Statistics Concept Activity No 1.
5 pages
Math-101-Statistics
No ratings yet
Math-101-Statistics
100 pages
Introduction To Statistics
100% (1)
Introduction To Statistics
21 pages
Statistics Introduction
No ratings yet
Statistics Introduction
26 pages
Stansa23z - 2023 - Basic Statistics
No ratings yet
Stansa23z - 2023 - Basic Statistics
10 pages
Lecture 1
No ratings yet
Lecture 1
39 pages
Week 1 - Data & Statistics
No ratings yet
Week 1 - Data & Statistics
75 pages
Statistical Analysis With Software Application-ppt_5ff616054a20ee1e28e5a36722f6fc61
No ratings yet
Statistical Analysis With Software Application-ppt_5ff616054a20ee1e28e5a36722f6fc61
57 pages
Lecture Notes Quanti 1
No ratings yet
Lecture Notes Quanti 1
105 pages
Chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2
No ratings yet
Chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2
47 pages
UP Statistics Lecture
100% (1)
UP Statistics Lecture
102 pages
Lecture 01 Introduction to Statistics Ppt 06022025 095924am
No ratings yet
Lecture 01 Introduction to Statistics Ppt 06022025 095924am
40 pages
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
Biostatistics Explored Through R Software: An Overview
From Everand
Biostatistics Explored Through R Software: An Overview
Vinaitheerthan Renganathan
3.5/5 (2)
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
02 Declaration of Non Collusion
No ratings yet
02 Declaration of Non Collusion
1 page
Case Study 2 OSH
No ratings yet
Case Study 2 OSH
5 pages
DUW10012 - Jun19 - Week7 - Chapter 5 - Fire Safety
No ratings yet
DUW10012 - Jun19 - Week7 - Chapter 5 - Fire Safety
26 pages
Dcq10033 Topic 1 Metal and Non Metal
No ratings yet
Dcq10033 Topic 1 Metal and Non Metal
46 pages
Dcq10033 Topic 1 Wood
No ratings yet
Dcq10033 Topic 1 Wood
32 pages
Dcq10033 Topic 1 Bricks and Blocks - 1
No ratings yet
Dcq10033 Topic 1 Bricks and Blocks - 1
36 pages
Topic 3
No ratings yet
Topic 3
18 pages
Week 8 - Ortega
No ratings yet
Week 8 - Ortega
24 pages
Role of Organization in Memory Practical Layout
No ratings yet
Role of Organization in Memory Practical Layout
5 pages
Plamango Integrated School
No ratings yet
Plamango Integrated School
3 pages
Daily Lesson Plan: Class
No ratings yet
Daily Lesson Plan: Class
2 pages
Pangasinan State University
No ratings yet
Pangasinan State University
7 pages
Assessment -National and Specialized Skills Training on the Fundamentals of Digital Literacy and Artificial Intelligence for Educators
No ratings yet
Assessment -National and Specialized Skills Training on the Fundamentals of Digital Literacy and Artificial Intelligence for Educators
3 pages
Oral Communication in Context: Quarter 2
No ratings yet
Oral Communication in Context: Quarter 2
37 pages
FS2 Worksheet 1
No ratings yet
FS2 Worksheet 1
4 pages
Guidelines in Designing A Learning Activity Sheet
No ratings yet
Guidelines in Designing A Learning Activity Sheet
39 pages
Grade 11 Career Guidance Module
No ratings yet
Grade 11 Career Guidance Module
61 pages
ch01 OB
No ratings yet
ch01 OB
42 pages
Module6. The Good Life
No ratings yet
Module6. The Good Life
8 pages
Public Finance Course Outline
100% (1)
Public Finance Course Outline
2 pages
Lesson 5: Data Presentation: Textual or Narrative Presentation
No ratings yet
Lesson 5: Data Presentation: Textual or Narrative Presentation
6 pages
TTL
100% (1)
TTL
3 pages
Phineas Gage 2
No ratings yet
Phineas Gage 2
2 pages
0 1 Sociolinguistics A Language Study in Sociocultural Perspectives
No ratings yet
0 1 Sociolinguistics A Language Study in Sociocultural Perspectives
314 pages
Selection of Academic Staff Using The Fuzzy Analytic Hierarchy Process (Fahp) : A Pilot Study
No ratings yet
Selection of Academic Staff Using The Fuzzy Analytic Hierarchy Process (Fahp) : A Pilot Study
7 pages
Grade Ranges Based On Deped Order No. 8, S. 2015, P. 16 Grade Ranges Based On Deped Order No. 36, S. 2016, P. 4
No ratings yet
Grade Ranges Based On Deped Order No. 8, S. 2015, P. 16 Grade Ranges Based On Deped Order No. 36, S. 2016, P. 4
2 pages
Literature Review
50% (2)
Literature Review
8 pages
A Seminar On Rashtriya Madhyamik Siksha Abhiyan (RMSA) : Scholar Diptimayi Behera M.Phil (Education) Roll No-06
No ratings yet
A Seminar On Rashtriya Madhyamik Siksha Abhiyan (RMSA) : Scholar Diptimayi Behera M.Phil (Education) Roll No-06
12 pages
Lecture 9 - Supervised Learning in ANN - (Part 2) New
No ratings yet
Lecture 9 - Supervised Learning in ANN - (Part 2) New
7 pages
Follow Up Vs Follow-Up
No ratings yet
Follow Up Vs Follow-Up
4 pages
The Monkeys Paw
No ratings yet
The Monkeys Paw
2 pages
Interdisciplinary Unit Plan
No ratings yet
Interdisciplinary Unit Plan
6 pages
Is That Your Cousin
No ratings yet
Is That Your Cousin
5 pages
Framing 6 English Language Arts V
No ratings yet
Framing 6 English Language Arts V
21 pages