كتاب الاحصاء الحيوي

Download as pdf or txt
Download as pdf or txt
You are on page 1of 68

Asst.Prof.dr.sufian M.

salih 2018

Engineering statistics
Asst.Prof.dr.sufian M.salih
2018

1
Asst.Prof.dr.sufian M.salih 2018

PART 1

INTRODUCTION AND BASIC CONCEPTS

1.1. Introduction
Statistics (Common) : Production, consumption, population, health,
education, traffic, monitoring the results of a specific event, such as the
economy; its size, assets, distribution, and so on, obtained about the properties,
that can be interpreted figures are called statistics. These definitions are
frequently encountered. The visual and written media often mentioned this
definition.

Statistics (Scientific) : Statistics is the art of the defineing the datas. Allows to
predict the decisions about the future using existing information. Of research;
planning, implementation, obtaining the data, summary of the data that
obtained, evaluated and some analysis and forecasts of the scientific method to
describe the manner in which called for the submission of statistics. This
definition is of an interest rather than researchers. So, university researchers
and research institutions are much more in a research, to evaluate the purposes
of this definition.

1.1.1. Classification of Statistics


Descriptive/ Explanatory statistics: Summarizing the raw data stack, includes
an easy understanding methods, that is being used to shape the results.
Deductive method uses the science of logic. Benefited from tables and graphs
to summarize the data.

Analytical / Computational statistics: Made from a data, that is obtained


from the samples to include some principles related to estimation and analysis.
Uses the inductive method of reasoning science.

According to statistics Uses; A collection of methods used to evaluate the


results of research conducted in the Health Sciences is known as biometrics
Biostatistics or Statistics and Biology in Health and Agricultural sciences.

1.1.2. Some important terms used in Statistics / concepts


The Data: Observation, counting or measurement result informations that is
obtained, symbols and figures.

Excessive, abnormal or biased observations: One or more data which is


averaging significantly, upgrading or minimizing excessive, is called abnormal
or aberrant observations.

2
Asst.Prof.dr.sufian M.salih 2018

The Rate: It is the unit affinity between the same two values.For Example:,
income-expenditure ratio, birth-death rate, export-import ratio, … ect.

Percent (%): It is the rate value, that is expressed as a percentage by


multiplying by 100.

Thousand: If the value is too small, it will be multiply with 1000 and to obtain
the thousandth in value.

Velocity: The units used to determine the interest rate with each of two
different variables. Price = Money/Ware; Velocity = Road/Time… etc.

Population: Community that encompasses all elements of the population are


called on to examine the character. The main mass of the universe, such as the
term is also used.

Parametre: Population equation calculated over the elements (µ=mü), the


variance (2=sigma ), regression coefficient (ß=beta) is called parameters such
as size.

Example: Depending chance to sample drawn from the population and the
quality and quantity of community members is called example. The basis of the
sample is a random selection. Research is often a lack of manpower, financial
and instrument-hardware failure etc. are carried out on samples as reasons.

Statistics (Calculation/Forecast) : The equation, that is calculated from


sample data ( X ),variance (S2), standard deviation (S), the correlation
coefficient (r), the regression coefficient (b) etc. are so called statistical
estimates. From this definition it is to be understood as an estimate of any
statistics or the estimator.

Hypothesis: The claims, which raised in any matter is called Hypothesis.

Parametric: It is a test’s and a forecast, which equation, variance and ratio are
used.

Non parametric: Made using the sort and mark tests and estimates.

Unit values and measurement accuracy: If the numericals consist numbers


such as 3, 5, 10 etc. the unit value will be 1; if they are 0,3; 0,5; 10,2 etc.
decimally numbers it will 0,1. For Sayısal veriler 3, 5, 10 vs gibi tam
sayılardan oluşuyorsa birim değeri 1 olur. 0,3; 0,5; 10,2 vs gibi ondalıklı
verilerden oluşuyorsa birim değeri 0,1 olur. For 100 percentage it will be 0,01
veriler için 0,01 dir. These values are defined as the measurement accuracy.

Variable: They are the values from which the data obtained as a result of
observation, counting, measuring and evaluation... Variables are generally
expressed from the last letters of alphabet like x,y,z or some word shortcuts are
3
Asst.Prof.dr.sufian M.salih 2018

used gibi genellikle alfabenin son harfleriyle yada kısaltılmış kelimelerle ifade
edilir. Variables are divided into two.

1. Discrete variables: If the datas are calitative/ qualitative, which are


examining or researching, and can be appointed to number line values only in
one point they are called discrete variables. Discrete variables are usually
obtained from census or classification.

Example
Health Condition : Sick – Healthy
Gender : Female – Male
Quality : First Class, Second Class, Third Class
Pen numbers in pocket : 5, 7, 12
Number of harmful : 50, 75, 67
Petrie
Number of children in :2, 3, 4
the family

2. Continuous variables: If the datas are calitative/ qualitative, which are


examining or researching, and can be appointed to number line values
everwhere they are called continuous variables.. Continuous variables are the
data obtained by measuring and weighing. Example, Length (177,5 cm; 182,3
cm; 190 cm), body weight (60 kg,55 kg), volume, space and time these
variables are changeable.

Increase/Decrease Ration: It is the express, in a certian time of variables


that’s ratio is changeable and it is represented in (%). In according to the
formula below, if the result is pozitif it is a decrease ratio, when the result is
negative it is a decrease ratio.

1.1.4. Measurement Scales (Scales)

Measurement: It is the representation of variable wich is in study such as


observing, counting, weighing etc. with symbols and especialy numerical
symbols. Measurement, whether of individuals or objects with certain
characteristics, determining the degree of the symbols of the results occur with
this and is expressed by the number of symbols in particular. Measurement, is
a description (identification) process.

4
Asst.Prof.dr.sufian M.salih 2018

1.1.5. Different scale than the variable features are used.


1. Naming (Classification-Grouping) scale: To benefit from the same
features and the same characteristics are gathered in a group. Qualitative
variables. Measurement and are subject grouping they fall according to the
number shown in terms of a unit. Counting the elements found in the group, it
is possible to find the frequency distribution and mode. More advanced
statistical procedures are not applicable. Example, Live the development of
power: weak, medium, strong ... like.

2. Rating (Rank) scale: Rating is usually a process occurring after the group.
Objects are put in order according to their having any particular property.
Terms of similar characteristics, the most outstanding is the right one 1 st, 2 th,
3 rd, 4 th ranking is shaped the most backward. After placing the order loses
its importance in common. It is important from whom more is less or little-big
occur. The classification of data is done in the form of rankings. Example,
Product quality: I. Quality, II. ... As quality.

3. Interval scale: Interval scale indicate the amount of the difference between
the objects. For this, collection - extraction calculation process can take out.
Each type of statistical procedures applied. Data based on a fictitious relative
starting point or two points separated by an interval equal to the specified
portion (such as Celsius and Fahrenheit thermometer for temperature
measurement) is created. Thermometers are examples of scale scores range.

4. Rating Scale: These are the top-of-scale scale. The only difference is the
presence of the interval scale of such a starting point scale indicating the
absolute absence. Which is an actual starting point (zero point) are each scale is
expressed as solid data. The measure used is the exact measure of the rate.
Variables measured in this kind of scale in terms of quantity.

Ratio scale is the most common type of scale, all arithmetic data obtained in
this scale and statistical techniques can be applied. Example: length, area, time,
weight, volume, density measurements, etc.

1.1.6. Conducting research and data collection

The research carried out in the framework of the planned issue should be aware
of the following.
 In the study, the sample size (number of repetition) should be enough.
 Impartiality in all stages of the research should be considered to be
objective.
 Tools and equipment should consist of instruments that appropriate and
accurate measurement research.
 The members and workers of the research should be trained
educational, impartial and know what to do.
5
Asst.Prof.dr.sufian M.salih 2018

 Data must be saved by paying attention to precision weighing or


measuring.

1.2. Sample Questions


Q1. By checking the correct or incorrect statements below into brackets, find
the answer in (T) true, false (F) conditions.
( ) Variance population S2 is shown in example with 2 .
( ) The calculations mean and variance over population elements are called
parameter.
( ) The number of people injured in traffic accidents is continuously variable.
( ) Patients with AIDS in a region is a discrete variable.
Q2. In Isparta State Hospital 4000 children are borning and 1000 are dieing.
What is the born/death ratio of this hospital.

Q3. Enter the variable type and measurement scale on the following variables.
Variable Variable Type Ratio Type
Health Condition
Succes Condition
Hematocrit value
Height
Blood pressure
values;
Body temperatures;
Pulse value

Q4. Explain the meaning of data and variable ?


Q5. Explain the meaning of statistics, population, parameter, example ?
Q6. What is variable ? What are the types ?
Q7. In a state hospital there was use to be come 200 polio patients. According
to the vaccine treatmens in 7 years, this number decreased arround 60 patients.
What is the decrease ratio of polio patients ?
Q8. In a state hospital there was use to be come 450 rubeola patients. bölge
hastanesine yılda 450 kızamık hastası gelmekte idi. Within the 3 years of study
and research the patients of this disease decreased around 30. What is the
decrease ratio of rubeola disease?

Q9. Fill in the blanks below with the correct answer ?

 Observation, counting and measuring results obtained figures, symbols,


signs and similar values are called …………..
 Depending on the luck of the population and the population selected to
represent the quality and quantity of the smaller elements in the
community are called................
6
Asst.Prof.dr.sufian M.salih 2018

PART 2

Identification / Descriptive statistics

Appropriate methods with easy to make clear the raw data obtained from
research, summarizing and interpreting the subject of descriptive statistics.
These methods are tables and figures (graphs) can be divided into two main
groups.

2.1.Tables
a) Private tables
b) Frequency tables

Researchers can use the appropriate special tables to present their research
results. These tables are generally based on specific characteristics mean,
standard deviation, etc. that is included in statistics. However, some features
are not expressed in the classification table provides information about the
frequency distribution with the characteristics of the data is defined as the use
of graphics is more appropriate. Frequency is a periodik repeat of number in
values.

2.1.1. Frequency tables


Preparing of Frequency tables.
The first step is to determine the number of classes to create a frequency table.
1. Number of Classes (NC) : Researcher’s request will path the way,
depending on the nature and number of data; Can be set between 6-15.
Researcher can determine the appropriate number of classes that can be
reviewed in the research data. There no any limitations in this term. Due to
summarize much of the data is less than the number of classes, the data is more
than the loss of information due to make very messy and will have difficulty in
interpretation. Therefore, the number of classes is determined usually around
10.
The number of classes in the classification of discrete data is no need to take
special action, that will be set as a natural number. Number of patients who
were referred to services, are like the cores of apples falling to the floor.

The number of classes can also be determined according to the rules of Sturges.
SS = 1+3,32*log (n);
n = The number of data. The number of classes should be rounding decimal if
it is a integer.

7
Asst.Prof.dr.sufian M.salih 2018

2. Change Width / Range (DG): Maksimum Value – Minimum Value


3. Class Range (CR): DG/NC. This value represents the difference between the
classes. When the class range of floats founded decimal it should to be rounded
to integer numbers to make calculations easier.
4. Class Lower Limit (LL): Is the minimum value of the relevant class.
Minimum value of tax in terms of convenience can be taken as the lower limit
of the class 1.
By adding the class intervals of other classes, lower limit value would be
founded. The lower limit of the first class can not be greater than the minimum
value.
5. Class Upper Limit (UL): It is the maxiumum value of the related class. The
upper limit of the 1. class value is obtained by subtracting the lower limit of
2nd class by one unit. The other classes uper limits are found by adding the
class range. The last class’es upper limit value can’t be lower than the maximal
value. Class limits placed on the data used in the frequency table.
6. Class Limits (LL/UL): Half of the measurement accuracy by adding the
lower limit and upper limit of each class is calculated by subtracting the lower
limit upper limit. Class boundaries will be used for drawing graphics. Also, the
media will be described in the section dimensions and location and distribution
mode is used in the calculation.
7. Frequency (F): It is the number of data between the lower and upper limits
of each class. In classrooms, it is important to give the intensity data. It is the
express of the researches density according to the path between. Frequency,
and class values provides a close approximation to reality to make calculations
with the help of mean calculations. Provides information about the distribution
of the data. In addition, the actual mean will also be used to estimate the
variance.
8. Class Value (CV): It is the mean of class limititation. (BL+UL)/2. Class
values that represents the values of represented classes. The wide range of
classes may be inadequate to represent this value class. This is known as
frequency tables disadvantage. These values will be used to estimate the true
mean and variance using the formula.
9. Relative Frequency: Frequency of each class refers to the percentage of the
total frequency. Sometimes interpretive is more than the actual frequency is.
Class frequency is finded by divideing te Class Frequency into total frequency
and multiplying it with 100.
10. Incremental Frequency (IF) ve Incremental Relative Frequency (IRF):
Sometimes you may be asked any class or less or greater than the number or
percentage to be used in the interpretation. And the number or percentage of
any class that is to say less than less-only will be considered here. Incremental
frequency are found by addition of class frequency. The expression of the ARF
precentage is found by divideing these frequencies into total frequency and
multiplying it with 100.

8
Asst.Prof.dr.sufian M.salih 2018

Sample 1: 70 children height has been mesuaret and found like below.
Summarize it in frequency table.

Values not in order Values in order


98 107 96 102 105 103 98 90 96 100 102 104 106 110
103 108 93 104 98 101 106 91 97 100 102 104 107 110
96 100 97 91 103 93 102 92 97 100 103 104 107 110
101 103 92 100 106 99 114 93 98 100 103 105 107 111
104 104 97 105 109 109 102 93 98 101 103 105 107 111
113 111 104 99 95 112 105 94 98 101 103 105 108 112
106 96 104 103 111 100 107 94 99 101 103 105 108 113
99 101 105 110 108 110 103 95 99 101 103 106 109 113
102 94 90 101 94 110 107 96 99 102 104 106 109 114
107 109 100 106 99 114 113 96 99 102 104 106 109 114

Sorted data will be seen that it is difficult to interpret these data is analyzed on.
It would be almost impossible to interpret these data in this way, if a greater
number of data that should be considered known. In the sorted data sets
interpretations opportunity to make some small operation has occurred. The
shortest and longest children emerged, repetitive values are immediately
visible. When this data into a frequency table can be made in more nice
comments.

Let's watch the mentioned steps above to create a frequency table.


SS = 1+3,32*log (n) = 1+3,32* log(75) = 7,22 ≈ 7 class.
DG = 114-90 =24; SA = 24/7 = 3,43 ≈ 4
The lower limit of the first class’s smallets value should be 90. The lower limit
to the value of other class would be found by adding 4 (class interval). By
subtracting 1 from lower limit of 2. class, the upper limit of class 1 is found 93.
Again adding to this value 4 will give us the upper limit of the other classes.

Height is given as an integer. Thus the value of the data unit is (measurement
accuracy) 1. Class limitations, are found by the half of the unit number of 1’s
subtraction from below limit, which is added than to the upper limit.

Data is scanned using the class limits and where the data is written in the
falling number of frequency column in each class. The number of scan lines are
added as data in the class. This form of distribution of the data is determined by
screening.

Class limitation mean, (90+93)/2= 91,5 gives the class values.


To find the relative frequenct each class is diveded into the total frequency
(5/70)*100=7,1. Those frequency are added to find the incremental frequencies

9
Asst.Prof.dr.sufian M.salih 2018

and then by divideing them into 70 and multiplying it with 100, incremental
relative frequencies will be founded (5/70)*100=7,1 .

Class limit Class border Frequency SD NF EF ENF


% %
90 – 93 89,5 – 93,5 5 ///// 91,5 7,1 5 7,1
94 – 97 93,5 – 97,5 8 //////// 95,5 11,4 13 18,6
97,5 –
98 – 101 101,5 15 /////////////// 99,5 21,4 28 40,0
101,5 – ////////////////
102 – 105 105,5 19 /// 103,5 27,1 47 67,1
105,5 –
106 – 109 109,5 13 ///////////// 107,5 18,6 60 85,7
109,5 –
110 – 113 113,5 8 //////// 111,5 11,4 68 97,1
113,5 –
114 – 117 117,5 2 // 115,5 2,9 70 100,0

This table is examined that the data is viewed almost symmetrical distribution
or concentration of data, which shows that a mean of 102 cm, that can be seen
immediately. The number of data in certain intervals, could be interpreted.

2.2. Figures and Charts


c) Histogram: Class limit of the X-axis; Plotted on the Y axis column chart
placing frequency is called the histogram. There is no space between the
columns. Histograms provide important information in a visually determine
where the shape of the distribution and centralization.
d) Poligon: It is a vertical coordinate system which the X-axis and Y-axis is
shown to the class value of the frequency distribution, with the line graph is
obtained by placing on frequency. In other words, the histogram column is
constructed by connecting the mid-point.
Children's histogram and frequency polygon prepared for the examples
given for heights is shown Graph 1.

10
Asst.Prof.dr.sufian M.salih 2018

Graph 1. A histogram of children height is prepared for frequency polygon


and other descriptive statistics.

e) Other Graphs: With data that is obtained from research; column, line,
circle graphs etc... drawing is converted into a more concise and
understandable. Results will enable faster detection and interpretation of
the reader to be presented visually. Suitable graphics should be selected
according to the data.

Column graph: Column charts at present, are more than one property in the
same period is appropriate.
Line graph: Line graphics are used to investigate the change over time of any
feature. Growth curves are expressed with line graphs, and generally it
increases up to a certain time and then fixed.
Circle graph: In expressing the parts of a whole, is more suitable apartment or
pie chart. An example is presented for these three graphs below the most
common.

50

40

30
User
20 Non User

10

0
Illıretate Primary School Secondary School High School College

Graph 2. Use cases of using the family planning and education level.
11
Asst.Prof.dr.sufian M.salih 2018

Graph 3. Growth curve for weight in children.

Eye
9% 13%
Internal Medecine
28% Orthopedics
31%
Child
19%
Psychiatry

Graph 4. Inspection rates made a day in the hospital's five service.

2.3. Sample Questions


Q1. Pulse rate of 50 adult individuals are listed below. Create a frequency table
summarizing the number of pulse in 7 class?
60 80 50 45 49 48 35 55 91 50
65 74 54 40 42 80 90 65 67 45
80 71 61 38 55 84 88 79 81 48
74 40 68 52 62 76 81 82 64 38
62 45 59 42 60 66 70 60 68 29
Q2. Class values and frequencies associated with hemoglobin values are given
below. In the frequency table using this data, complete the EF, SS, NF, ENF
columns.

Draw the histogram and frequency polygon on the results


table.SD={10,11,12,13,14,15} F={2,8,12,16,8,4}
Q3. ALES entering grades of 80 people are given below on the frequency
table. Fill in the blanks on the table.

Class Value Frequency Sınıf Sın. NF EF


12
Asst.Prof.dr.sufian M.salih 2018

40 5
50 25
60 15
70 18
80 17

Q4. Systolic blood pressure values of 30 people are given below. Summarize
this data at an appropriate frequency table? Xi: { 56 60 68 90 97 88 80 78 76 69
80 77 59 60 65 86 90 96 64 77 70 75 68 80 95 41 79 91 63 74}
Q5. In a frequenct table the total frequency is 200. The 4. Class frequency is
12, incremental frequencies is 40 and the 5. Class’s incremental frequencies is
64. Fin the incremental relative frequency (IRF) and relative frequency (RF) of
4. And 5. Class.
Q6. The number of patients in a hospital clinic are given. Draw by selecting the
most appropriate graph showing the distribution of the total number of patients
in hospital clinics.

Obstetrician =75, Internal medicine =100, Otorhinolaryngolog=50, Eye=125,


Dermatolog =150

PART 3

CENTRAL TENDENCY / LOCATION AND


CHANGE /DISTRIBUTION MEASUREMENTS

Measureing the center point of giving information about the centralization of


the data or trend intensified measures (place measures) and data exchange is
called the measure to measure showing in the variability around these centers.
Data obtained from the research methods of descriptive statistics (tables, with
illustrations or graphics) is often not enough to summarize. Also identified as
central tendency and variability of the analytical methods are required to
estimate the statistics. The most commonly used location and gradient will be
discussed in this section. Just out of place or gradients to define a population is
not enough. It should be considered together.

3.1. Central Tendency (Location) dimensions

3.1.1. Arithmetic mean ( x )


First comes to mind when the mean comes to mind first it is usally called the
arithmetic mean. First comes to mind when the mean is called the arithmetic
mean. Data is the most commonly used measure for the point where the
centralized location. Continuous analysis of data obtained by measuring and
13
Asst.Prof.dr.sufian M.salih 2018

weighing, and is used especially in the evaluation. Other places in the


environment in which the arithmetic mean of the measurements is based on the
strong assumption that used to be used.
Arithmetic Mean Where Unused
1. Usually unused in counting and classification results of the data obtained.
2. Arithmetic mean is extremely biased or influenced by many of the
abnormal observations. These observations in the presence of such data
disables the research, external test does not hinder the interpretation and can be
dropped off in the interpretation of research using other values. However, other
places such as the mean size would hinder rather than mode or median is one
of the statistics should be used.

Arithmetic mean is estimated by the following formula.


n

x  x2  ...  xn 
xi
xi
x 1  i 1 
n n n
Example: Five babies birth interval are gibin below. Find the arithmetic mean
?

x: {3, 2, 4, 3.5, 2.5}

3  2  4  3.5  2.5
x  3 kg
5

There are significant features of the mean

 Sum of squared deviations from the mean is zero and the sum of
squares of deviations are minimum.
n

 ( x  x )  0  (3-3)+(2-3)+(4-3)+(3.5-3)+(2.5-3)=0
i 1
i

ve
n

 ( x  x ) =minimum;  (3-3) +(2-3) +(4-3) +(3.5-3) +(2.5-3) =2.5


i 1
i
2 2 2 2 2 2

n n

 ( x  x )   ( x  A)
i 1
i
2

i 1
i
2
here is A diffrent from the mean.
Here the value that is typied, is not importat at the mean (3) becouse the
value will always be bigger than 2.5 wich value you ever going to be give.

 If the datas are be in a addition or subtraction of a fixed number; the


mean will increase or decrease according to A.
yi  xi  A ; yxA
x: {3, 2, 4, 3.5, 2.5} and A=10 for yi+10 values, y:{13,12,14,13.5,12.5}
y  3  10  13

14
Asst.Prof.dr.sufian M.salih 2018

 If the datas are be in a multiply with A, the mean will increase in the
multiplied value of A.

yi  xi * A ; y  x*A
x: {3, 2, 4, 3.5, 2.5} and A=10 for yi*10 values, y:{30,20,40,35,25}
y  3*10  30
 If the datas are diveded with A, the mean will decrease in the diveded
value of A.

yi  xi / A ; y x/A
x: {3, 2, 4, 3.5, 2.5} ve A=10 için yi/10 değerleri,
y:{0.3,0.2,0.4,0.35,0.25}
y  3/10  0.3
It consists of great value by utilizing the features of the results can facilitate the
calculation of the mean.

3.1.2. The weight (weighted) mean


If the values that are going to be calculated have diffrent values “ the weight
mean” is be used. For example: Semester or Diploma grade values means. As
the frequency tables have a diffrent weight, the mean is being founded with the
weigh mean from the frequency table.
n

t x i i
ti xi
The weighted mean is estimated as follows:; X T  i 1

n
ti
t
i 1
i

n n

 fi xi fx i i
fi xi
For the mean of frequency table ; X FT  i 1
= i 1

n
fi
f
n
i
i 1

Here; f is for frequency and x for the class values.


Sample 1: Credits of a courses taken by a student for a semester and grades are
given. Calculate the semester grade point mean?

Lesson Credit(t) Point(x) t*x


160  600  280 1040
Statistics 2 80 160 XT  = =65
16 16
Birth 10 60 600
Bio chemistry 4 70 280
Total 16 1040

15
Asst.Prof.dr.sufian M.salih 2018

Sample 2: The mean of the frequency table for example in the Part 1;
Frequency(f) SD(x) f*x
5 91.5 457.5
8 95.5 764.0
15 99.5 1492.5
19 103.5 1966.5 7201
13 107.5 1397.5 X FT  =102,8
70
8 111.5 892.0
2 115.5 231.0
Toplam: 70 7201

3.1.3. Geometric Mean (GM)


Used to examine the characteristics that increased the geometric series in a
certain time period. Geometric properties (bacterial growth, population growth
and interest) generally increases exponentially with data in a specific time
period. Growth rate in unit time (rate) is the geometric mean property.

GM = n x1 * x2 *...* xn  n  xi

Sample 1: In a survey taken in a certain period of a time. The following data is


given below. The geometric mean of these data;

Xi :{2,3,6,10} GM = 4 2*3*6*10  4.4

Sample 2: In a pot that is placed of 100 bacteria is known that it is going to


multiply to 3000 in 5 hours, what would be the increasing velocity per hour.

The compound interest formula known equations used in this type of


assessment.

According to this;
A=B(1+r)t is givin formula; B: is th starting amount, A: is the amount in a
specific period of time r: increasing ratio in term of radians and t: is per unit of
time.

3000 = 100(1+r)5  r = 0.97 increasing value per hours (ratio) %97 dir.

16
Asst.Prof.dr.sufian M.salih 2018

3.1.4. Median and Mode


They are often used in the evaluation of the data obtained with classified and
counted. Median and mode is used much more in discrete data. In extreme
observations usually mode and media are used. Because these statistics are not
affected by the extreme observations. If it is repeated too much the value in the
data series should be used as a place measure mode. If the number of the
repeated are less than the median value, it should be used.

Median (Med) / median: It is the hydrangea value. In a value of range


between, the middle value is called median. According to this, the values are in
a order from less to more. Median is uneffected from extreme or biased
observations. Midmost of data that the data will vary depending on whether the
data count of odd or even. If the value number is (n) and odd its called the
(n+1)/2’th median. If the value number is (n) and even it is called the
(n/2)+1’th median. And the mean of those two values is called median.

Sample 1: Example size is (n) odd number; What is the median of the x
variable data?
xi: {60, 62, 58, 50, 100, 58, 60, 58, 58};
Ordered Values; xi: {50, 58, 58, 58, 58, 60, 60, 62, 100}
When the data were analyzed for the presence of abnormal data is usually seen
as the data of about 100 next 50. Using the mean can be misleading in this
case. However, the median is not affected by this anomalous observations. The
median in the center has the value (9 + 1) / 2 = 5. 58
Sample 2: Let's write more amount of data by adding (68) more data to the
data and let's determine whether the median again. In this case, the data series
xi: {50, 58, 58, 58, 58, 60, 60, 62, 65, 100}
would be in order of 10 values. The values are 10/2=5’th value 58 and
10/2+1=6’th is 60. The mean of the two values is (58+60)/2= 59 median.

17
Asst.Prof.dr.sufian M.salih 2018

Classified data taken from frequency tables which median accounts are done in
a similar sense. However, it is estimated by a formula. The following formula
is used for calculation of the median frequency table.

N / 2 - Fb
Med = L  c here; L: Median class’s real lower limit; N = ∑fi: Total
Fmed
observation number, Fb: Frequency total of class’s before median class’s, Fmed:
Median class’s frequency and c: The interval of the class.

The median class is the first class that holds the cumulative frequency of half
of the total frequency. Let us examine the example of the frequency table in the
apllication department of part 1. Columns are necessary to calculate the median
is given below. Half of the total frequency of 35 which has included it first is
called 4th grade class cumulative frequency designated value.

Class classes Frequency EF


Number
1 89,5 – 93,5 5 5
2 93,5 – 97,5 8 13
3 97,5 –
101,5 15 28
4 101,5 –
105,5 19 47
5 105,5 –
109,5 13 60
6 109,5 –
113,5 8 68
7 113,5 –
117,5 2 70
According to those values above when the values is written in the formula;
N/2 - Fb 70 / 2  28
Med = L  c  101,5  *4  102,97
Fmed 19
Med  103

Mode / Top value: the most repeated value in the data series. The data in the
most repeated value called mode.
According to the median example Xi: {50, 58, 58, 58, 58, 60, 60, 62, 65, 100}
mod of the series is 58. Becaouse this is the most repeated value.To calculate
mod from a freqeuncy table a formula is used.
d1
Mod = L  * c Here, L: Median class’s real lower limit; d1: The
d1  d 2
difference of Mod class’ses frequency between the previous class’es, d2: The

18
Asst.Prof.dr.sufian M.salih 2018

difference of Mod class’ses frequency between the next class’es, ve c: is the


interval of classes.
Mod is the class that has the highest frequency class. Let’s analyze the
application of mod with the freqeuncy table that is given before in part 1.

19  15
Mod  101,5  *4  103,1
(19  15)  (19  13)

According to the data distribution pattern mode, the is a relationship between


the median and mean.

Right skewness Symmetric Left skewness


Mod < Med.< ME. Mod = Med.= ME. Mod > Med.> ME.

3.2. Change (Distribution) Dimensions

3.2.1. Change Width (Range)


Maximum-minimum value is the value calculated from the difference. The
simplest measure of the distribution does not provide more information. Only
gives information about how much of the data showed a variability.

3.2.2. Variance
It is the data that are indicative of deviation from the mean. It is a measure of
the variability in the data. It is not a matter how small the data variance is so
close to each other. That is less than mean deviations. The sum of the squared
deviations from the mean variance divided by the degrees of free. The
following formulas are used to calculate the variance;

For population variance;   2  ( xi   )2


and
N
(xi ) 2
xi2 
For sample variance; S 2 
 (x  x )
i
2

or S 2  n
n -1 n -1
According to the formulas, N: Is the number of individuals in the population, n:
Is the number of individuals in the sample, : Is population mean and x : Is the
sample mean.

19
Asst.Prof.dr.sufian M.salih 2018

Studies are usually carried out on samples and becouse of that in all examples
onyl sample variance is going to be used. The unit of the variance as shown
from the formula is 2 unit. When the square of the values are taken, the squares
of the values are also been taken. As the square values (g2, kg2) are illogical,
they wont be used with the variance. The samples variance’s denominator
value is called the free degree spot. For a sample the free degree spot is n-1.

Sample 1: Five babies weight when they born is givin below. Calculate the
variance ?

X: {3, 2, 4, 3.5, 2.5}


Let’s find it with the two formulas. For the first formula in term of use, the
mean is needed to known.
3  2  4  3.5  2.5
X 3
5
(3  3)2  (2  3)2  (4  3) 2  (3.5  3) 2  (2.5  3) 2
S2   0,63
5 -1
(3  2  4  3.5  2.5) 2
32  22  42  3.52  2.52 
S2  5  0.63
5 -1
Variance formula given above to calculate the variance of the frequency table
will be transformed into the following form.
(fi xi )2
fi xi2 
For Sample Variance; S 2  
fi ( xi  x ) 2
n ; n  f
or S 2  i
n -1 n -1
Sample 2: Let’s calculate the variance according the givin table from part 1;
Frequency(f) SD(x) f*x f*x2 fi ( xi  x )2
5 91.5 457.5 41861.3 638.45
8 95.5 764.0 72962.0 426.32
15 99.5 1492.5 148503.8 163.35
19 103.5 1966.5 203532.8 9.31
13 107.5 1397.5 150231.3 287.17
8 111.5 892.0 99458.0 605.52
2 115.5 231.0 26680.5 322.58
Total: 70 724.5 7201 743229.5 2452.7

(5*91.5  8*95.5  ...  2 *115.5) 2


5*91.52  8*95.52  ...  2 *115.52 
S2  70  35.54
70 -1
(7201) 2
743229.5 
S2  70  35.54
70 -1
Or
20
Asst.Prof.dr.sufian M.salih 2018

7201
First the mean has to be predicted; X FT  =102.8
70
Then with the formula givin below the variance is predicted;

S2 
 f (x  x )
i i
2

n -1
5(91.5  102.8) 2  8(95.5  102.8) 2  ...  2(115.5  102.8) 2 2452.7
S2    35.54
70  1 69
Properties of variance
It has nearly the same properties like the mean.

 If the values are added or subtracted with a fixed number like A,


variance would not be change it stays the same.

yi  xi  A ; S x2  S y2

 If the values are multiplied with a fixed number like A, the variance
will increase the square multiply of A.

yi  xi * A ; S y2  A2 * S x2
 If the values are diveded with a fixed number like A, the variance will
decrease the square divede of A.
xi S x2
yi  ; S y2 
A A2

3.2.3. Standard Deviation


For the variance of the unit which is 2 units are used in conjunction with unit
variance. Usually results in spite of this research is given as a unit. Therefore it
is obtained by taking the square root of the variance, the standard deviation
may be used. Units of standard deviation is the same as the unit of data. The
data that defines the standard deviation around the mean gradient is varied and
widely used. What with the mean of the product which commercial firms
should be noted that a standard deviation. The products are usually introduced
together with the mean and standard deviation.

Population standard deviation,    2 ; Sample’s standard deviation,


S  S2
Sample 1’s standart deviation: 0, 63  0, 79 kg.
Sample 2’s standart deviation: 35,54  5,96 cm.

Properties of standar deviation: 


For a symmetric full normal distribution are;
µ ± 1 standart deviations is for the far values %68’s
21
Asst.Prof.dr.sufian M.salih 2018

µ ± 2 standart deviations is for the far values %95’s


µ ± 3 standart deviations is for the far values %99’s.

22
Asst.Prof.dr.sufian M.salih 2018

3.2.4. Standard Error


The standard deviation of the mean of samples taken over again briefly in a
population is defined as the standard error. The standard deviation divided by
the square root of the sample size. Standard deviation is a more accurated
statistics. This is why more commonly used in scientific research. Standard
deviation is used in more standard error of commercial products used in more
scientific studies.

S2 S
Sx  
n n
0, 79
Sample 1’s standart error: S x   0,35 kg.
5
5,96
Sample 2’s standart error: S x   0, 71 cm.
70

3.2.5. Coefficient of Variation


Means coefficient of variation. The percent coefficient of variation is a value
that has no units. % Represents the amount of deviation from the mean.
Standard deviation of the mean is the sum divided by 100. Coefficient of
variation the degree accuracy of the research, are frequently used to determine
the reliability status. If the coefficient of variation is greater than 30% of the
variability, it must be known that it is be too much and the cause should be
investigated. Because the past is the level of credibility. Issues to health
research in this ratio like %5 - %10 my could be vital in a mistake.

S
VK  *100
x
0, 79
Sample 1’s variation coefficient: VK  *100  %26 .
3
5,96
Sample 2’s variation coefficient: VK  *100  %6 .
102,8
If the mean coefficient of variation is used in another area compared in two
different population variability variance or standard deviation can be
misleading. In such cases, the coefficient of variation should be used.
For example; For mothers and babies get the following statistics are given.
Mother of the standard deviation of the variation between maternal weight for
babies is greater than the standard deviation is considered to be larger. When
analyzed according to Whereas it is seen that the real variation is higher in fetal
weight. 29% deviation from the mean birth weight was only showing maternal
weight deviate by 15%.

23
Asst.Prof.dr.sufian M.salih 2018

Mean Weight Standard VK


deviation
Mother’s 65 kg 10 kg (10/65)*100=%15
Babies’s 3,5 kg 1 kg (1/3,5)*100=%29

3.2.6. Skewness and Kurtosis

Skewness Coefficient: The normal distribution is symmetrical distortion


degree. Symmetric coefficient of skewness of the data provides information
about the right or left distortion. It is indicated and estimated by the following
formula. If the coefficient is symmetric in distribution 0, the + (positive) value
that the distribution is skewed to the right and the - (negative) value means that
when the distribution is skewed to the left.

 (x -  ) /n
3
3 =
3
Kurtosis Coefficient: Kurtosis is the distribution of data that provides
information about the sharpness. It is indicated and estimated by the following
formula. This coefficient is neither sharp nor flat, the full normal distribution is
0, the + (positive) value is sharp when the distribution is - (negative) and that
value means that the distribution is flattened.

 (x -  ) /n - 3
4
4 =
4
3.3. Sample Questions
Q1. What is the relation between Mean, Mod and Median ?
Q2. In winter 15 pregnant women come to clinic per day becouse of
hyperemesis gravidarum circumtences, after 4 day later the number increased
to 150 per day. What is the spread speed of hyperemesis gravidarum ?

Q3. Find the mod and median number of breath rate that is givin below ?
Xi: (12, 12, 12, 14, 14, 13, 13, 16, 16, 20, 22, 24, 24, 24, 18, 18, 18, 18, 18, 18,
18)

Q4. The hearth rate voice of a fetus is givin below. Find spread speed of the
series along (DG, S², S and VK).
Xi: (120, 126, 134, 136, 140, 144, 148, 150, 154, 158, 162)
24
Asst.Prof.dr.sufian M.salih 2018

Q5. 8 pregnant women body tempreturse is givin below. Find the mean of
these series ?
Xi: (36.2, 36.5, 36.7,36.8, 36.9, 36.3, 37, 37.2)

Q6. Head circumference of the newborn from a hospital in the world


measurements are given in the table below. Accordingly to that, find the
newborn head circumference, the variance, the mean and coefficient of
variation.

Children Number 6 8 14 7 5
Head circumference (cm) 31 32 33 34 35
Q7. 328 patients with hepatitis are admitting to the state hospital. According to
the studies and researches for this disease along for 4 years the patients with
this disease decreased to 30. Find the decreasing ratio of the circumtences and
interpret it. Herhangi bir bölge hastanesine yılda 328 hepatit hastası
başvurmaktadır.
Q8. Health courses taken by students in schools of midwifery a students first
class, course credits and grades are given. Calculate the student's GPA?
Lesson Anatomy Genetics Microbiology Biochemistry Psychology
Name
Credit 4 3 2 4 2
Mark 94 88 72 96 68
Q9. What are the use of the variation coefficient ?
Q10. Pregnant women from an mean of 162 cm height and variance of 100
health centers, an mean of 70 kg of weight variance is calculated as 49. By
calculating the coefficient of variation of height and weight determine them for
which is greater than the variability ?
Q11. In a population growth period of a time 40 patient was coming to delivery
room, after 2 days later the number of the patient increased to 600. Calculate
the daily growth rate ?
Q12. A bacterial culture of the colony count is done. First day the count was
1000 and in the fourth day is determined to 8000. If the increase is İlk günkü
sayım 1000 ve 4. Gündeki sayım ise 8000 olarak belirlenmiştir. If we think the
increase is geometric. Calculate the daily growth rate ?
Q13. In April 10 children with diarrhea admitted to a clinic. After 2 months
later on June the number increased to 50. Calculate the monthly growth rate of
diarrheal disease ?

Q14. Find the mod and madian values of 15 people’s hearth rate that is givin
below ?

25
Asst.Prof.dr.sufian M.salih 2018

Xi: {70, 74, 82, 80, 74, 80, 88, 92, 80, 96, 74, 80, 76, 88, 78}

Q15. In period of a time where bird flu was spread, 40 patients was admiting to
a clinic, after 2 days later the number increased to 400 person. Calculate the
rate of increase in the spread of bird flu disease daily.

Q16. A food spending in a sample of 15 families in the framework of the


research family size, income and weekly obtained of the following information
about the weekly food expenditures are givin below.

Family Family Income Food Expenditures (Y)


Size
1 2 62 14.3
2 3 62 20.8
3 3 87 22.7
4 5 65 30.5
5 4 58 41.2
6 7 92 28.2
7 2 88 24.2
8 4 79 30.0
9 2 83 24.2
10 5 62 44.4
11 3 63 13.4
12 6 62 19.8
13 4 60 29.4
14 4 75 27.1
15 2 90 22.2

a) Find the mean weekly food expenditure per person.


b) Find the mean food expenditure per person.
c) Find the percentage of revenue allocated to food.

26
Asst.Prof.dr.sufian M.salih 2018

PART 4

Probability and Population Distribution

4.1. Possibilities
Probability is the most basic issues in statistics . Because all the estimates,
given every decision is expressed by a certain probability level. That is, a
certain error or confidence level is concerned. Therefore, the possibility should
be noted that some ofthe basic concepts and rules. Here is a simple possibilities
will be discussed. Because the subject matter is probably the single head.
The probability of an event is related to the ratio of the number occurring in the
total number of votes. It is calculated by the following formula:
x
P( x) 
n
Here; n: total event number, x: represents the desired number of events.
A possibilities of an event is in the range of between 0≤P(x)≤1. P(x)=0 means
that the event is impossible and P(x)=1 means that the event is going to be.
Sample 1: In a region 75 newborns of 30 was calculated girl. The girl ratio in
30
this populations is found according to this; P( x)   0, 40 .
75
Sample 2: Think that in a health center patients are coming they are 20 of flu
disease, 15 of them are internal medicine patient and 10 of them are Infectious
Diseases patients.

The possibility of internal medicine patient from all of those patients is.
15
P( x)   0,33 .
45
But sometimes the event may went complicated. Like some of the patient that
is waiting treatmen may can be internal medicine patient and Infectious
Diseases patients in same time. According to this the possibility is going to me
calculated:
15 10
P( x)    0,33  0, 22  0,55 dir.
45 45
As shown, each block is the event in question. In such cases, the probability of
occurrence of one or the other is the sum of each separate occurrence
probability is defined as the addition rule.

27
Asst.Prof.dr.sufian M.salih 2018

If the events are not affected by each other, if it is independent of or


conjunction occurs, the probability of both events occurring together is found
by multiplying the probabilities of these events that occurs separately and is
defined as the multiplication rule.

For example, a good pass rate of operation of the two patients operated on at
the same time get 0.70 and 0.80. To the possibility of a successful operation
Both of these patients;

P(x)=0,70*0,80=0,56

4.2. Permutations and Combinations

If in a n person is taken from x to create a series of an ordered system


permutations is necessary, if the ordered series system is not important tha
combinations are going to be used. According to that the n person which is
taken with r value of each to for research would be find with these formulas:

n!
Permutations; order system is important: nP x  And
(n-x)!

n!
Combinations; order system is not important: nCx 
(n  x)! x!

Here, n: represents the total event numbar, and x: the number of the events.
Sample: A, B, C is 3 student from the class president and vice, including the
EU, the number of permutations to be created by taking two at a time, AC, BC,
BA, CA, as the CB was 6 because there is a significant number of ranking
combination AB, BA from; AC, BC, and CA is not different from that of CB.
Thus the number of combinations; AB, AC, BC in the form of three types.

3!
Permutation Number; 3P2   6 and
(3 - 2)!

3!
Combination Number; 3C2  3
(3 - 2)!2!

4.3. Distribution of Population


Distributions according to different data structures has improved. The data that
is obtained from research generally conforms to one of these distributions.
Although it is rarely found in the unsuitable distribution for data. Such tests are
used to analyze non-parametric data. The discontinuous in distribution are as
well as in data structures, and they fall into two groups, namely constant.
Naming and grading scale shows the distribution of discrete qualitative data
28
Asst.Prof.dr.sufian M.salih 2018

obtaining. Datas that are obtained by the quantitative feature interval and ratio
scales shows a solid dispersion.
Each Distribution has a function. This function is defined as the probability
density function which is usually continuous distributions f (x) and for discrete
distributions P (x) as it is defined.

There are a number of continuous and discrete distribution function, that is


defined distribution. In this section it will be evaluated to some discrete and
continuous distributions and the importance will be explained.

Discrete distributions from the binomial, Poisson distributions and continuous


distributions will be examined to normal distribution applications. All
problems will related to normal distributions which is converted to a standard
normal distribution analysis and interpretation.

4.3.1. Discrete Distributions


4.3.1.1. Binomial Distribution
It is the data type as shown by discrete distribution. This is usually the result of
two distribution of data. Yes-No, Female-Male, on-off etc. Each distribution
has a density function. The function of the binomial distribution;
n n!
P (n, x, p)    p x q n- x  p x q n x
 
x x !( n  x )!
Here, n: represents the total event number toplam olay sayısını, x: represents
the event that is wanted; p: represents the possibility of the event that may
happens (succes possibility); q: represents the impossibility of the event
that may not happens (unsecces of the possibility).
Descriptive parameters of the binomial distribution,
Mean of binomial distribution; µ=np
Variance of the binomial distribution; 2=npq
Properties of the Binomial Distribution:
While the event is repeating the succes rate should not be change. For example:
A coins heads or tails possibility is 1/2 it never changes. Like a incoming of
child gender will may be girl or boy the possibility is again 1/2 it never
changes.
In binomial distribution the repeat of the events are less and becouse of that the
succes rate of the events are usually high.
Example: In a 4 children family, what would be the minimal boy possibility of
those children ?
Here; n=4; p=0.5 and q=0.5 and if x is {1, 2, 3 ve 4}. In short;
P(x≤1)= P(x=1) + P(x=2) + P(x=3) + P(x=4) or p + q = 1 it would be.
2. Option;
29
Asst.Prof.dr.sufian M.salih 2018

P(x≥1)=1 - P(x<1)=1 – P(x=0) has to be calculated like this. As seen the 2.


Option has less calculation process.
For the answer, the 2. Option would be a wisely choose.
4!
P(x≤1)=1 - P(x  1)=1 - (0.5)0 (0.5)40  1  (0.5)4  0.94
(4  0)!
In a 4 children family the possibility of a boy child is %94.

4.3.1.2. Poisson Distribution


It is a discrete and distribution which rare events within a certain time
intervalling. In other words, rarely distribution shown by the incident. Rarely
which used in the investigation of the events that occurred.
For example; A patient that dies from narcosis, a good secretary which makes a
letter mistake when she/he was writing, the possibility the get a cold in
summer. Poisson probability of occurrence of the desired event is too small and
the number of recurring events. Probability of occurrence of the event is low.
Does not exceed above 5%. Probability density function of the Poisson
distribution,
e-   x  x
P( x)  =  is like this.
x! e x!
Here x: represents the desired number of events, µ: represents the population
mean and e: represents the logarithm base value of approximately which is
2,718. The distribution parameters are like µ=2=n*p this and the variance is
equal to its mean.
Example: The death rate of narcosis is 0.001 in a hospital. What is the death
ratio from 100 patients that takes a surgery in a year. ?
Those given µ = np =100*0,001= 0.1;
The disered is P(x≤1) = P(x=0) + P(x=1). As to the equvalent whent functions
values are replaced to the formula ,
-0.1 0 -0.1 1
e (0.1) e (0.1) 2*0.1
P( x  1)    0.1  0.18
0! 1! e
the minimum death rate of narcois is %18.

4.3.2. Continuous Probability Distributions


The most commonly encountered in the distribution of the continuous
distribution is normally distributed. Distribution is a form of continuous data
that is showed.
In general, the shape of the distribution of the data is obtained by measuring
and weighting. The probability density function of normal distribution;

30
Asst.Prof.dr.sufian M.salih 2018

1 ( x -  )2
1 - .
It is f ( x)  e 2 2
. However, it must necessarily be integral in
2 2

each case to make forecasting using this formula. This is not easy. Especially
in terms of time would not be possible at all during the test. Therefore, the
value of the standard normal distribution is symmetric (z) is utilized. Z standart
normal values mean is 0 (µz= 0), the vriance is (  Z2 = 1) and it is summarized
x-
in z  N(0,1). In function of f ( x) , the z is equalized to  z and when it

x-
is writin in the in equation to z , normal distribution function

1  12 z 2
f ( z)  e is changed into a standart normal distribution function. The
2
hole integrals are given in the additional paper that includes the Z tables.
Example: 7-year-old children in one of an mean length of 130 cm and standard
deviation was determined to be 8 cm. Find different alternatives to these
options and answer the questions below according to this.
a) What is the ratio of these kids that are above from 130 cm ?
b) What is the ratio of these kids that are above from 135 cm ?
c) What is the ratio of these kids that are below from 125 cm ?
d) What is the ratio of these kids that are between 120 cm and 135 cm ?
Solution:
a) As it is a symetric distribution the ones that are
smaller and taller from the mean is find
%50 %50
P(z<0) = P(z>0) =0.50 and it is %50.
0

135  130
b) P(x>135) =P( z  )
8
 P( z  0.63)  0.2643; % 26.43 .
0.2643

0 0.63

120  130
c) P(x>120) =P( z  )
8
 P( z  1.25)  0.1056; % 10.56 . 0.1056

-1.25 0
It has to be awared that the negative values are not
included in the z table. According the symetry has to be used. Like:
P( z  1.25) = P( z  1.25) =0.1056 it is predicted.

31
Asst.Prof.dr.sufian M.salih 2018

120  130 135  130


d) P(120  x  130)  P( z )  P(1.25  z  0.63)  ?
8 8
These values can’t be find in the z table. Thus the
areas wihch are found will be subtracted from the
sum. 0.63

= P(1.25  z  0.63)
 1  {P( z  1.25)  P(0.63  z)} -1.25 0 0.63

 1  {0.1056  0.2643)  0.63 .


According to this the kids hight ratio that are between in a hight of 120 cm –
135 cm are %63.

4.3.3. Central limit theorem


Some research will be based on the mean mean of these assessments are
repeated examples examined in the central limit theorem. For example, you
have done a research on the hemoglobin values of a community.Let us take
10 samples from this population and the mean of these should be 12.5 and the
standard deviation should be 2.5 . According to this in a any sample’s , sample
mean value possibility, that is bigger than 13.5 is used the central limit
x-
theorem. And becouse of the in the formula of z  will be a change like

x -
this z  .
x
Values in the denominator is the standard error. When it comes x instead x to
the denominator  x2 will instead  . According to the disered possibility is;
13.5  12.5
P(13.5  x )  P(  z)
2.5 / 10
 P(1.26  z)  0.1038  0.10 . 0.1038

It will %10. 0 1.26

4.4. Sample Questions


Q1. It is known that the 0.003 injectors are corrupted in a pharmacy store.
When 1500 of injector are chosen randomly, what would be the value of
corrupted injectors that is above of 1 numbers.
Q2. According to a hospital in internal medicine known as the 65% of women
admitted to the clinic selected randomly from 5 patients,
a) Find the possibility rate of 3 males,
b) Find the mean and variance of the distribution ?
32
Asst.Prof.dr.sufian M.salih 2018

Q3. The birth note 2’s mean is calculated 70 and the variance 169. As the
class’s members succes rate is %40, what is the minimum grade of those
succesfully students (Distribution is normal).
Q4. The mistake possibility of pregnacy testi that is be accured in health center
is 0.001. What is the possibility of minimal 2 misdiagnosed women from 1000
testers ?
Q5. A midwife has misdiagnosed %20 of pregnant women that came for
medical examination. What is the maximum true possibility for the 4 pregnant
women of 5 that is medical examinated ?
Q6. In Turkey the death rate of pregnant women is 0.0001 becouse of the some
complications. What is the death possibility of 3 women from 1000 women
that have the complications ?
Q7. The newborn death possibility while in a birth is %1 with an experienced
midwife. According to this experienced midwife, what is the minimum 2
newborns death possibility in a birth from 200 newborn ?
Q8. A medecine that has a side effect of 0.02 is known whic is given to an
epilepsy patient. is given to an epilepsy patien Bir epilepsi hastasına verilen bir
tür ilacın kötü etki yapma ihtimalinin 0.02 olduğu bilinmektedir. Bu ilacın
şansa bağlı olarak seçilen 300 epilepsi hastasından en fazla 3 ünde kötü etki
yapma ihtimalini bulunuz.
Q9. In a pediatrics service icterus disease has been found 10 from 90 newborn.
In the clinic 6 of the babies had been checked for icterus disease. What is the
icterus disease possbility of 2 from 6 newborns ?
Q10. In a hospital there is 400 boy and 300 girl in a total number of 700
children. When 3 children would be choosen, what would be the minimum 1
boy possibility of 3 children ?
Q11. The infection of the AIDS is 0,002 in a hospital. What is the infection
possibility yearly of 600 employees ?
Q12. In a health centre from 900 of 3 midwife has been seen jointless disease
while in a health check. 600 midwifes had been also checked for edema. What
is the 5 jointlees midwife possibility that are been checked ?
Q13. In a hospital an antiboitcs side effects possbility has been recorded to
0.02. What is the side effect possbility of 2 from 100 patients by chance ?
Q14. In a hospital admitting to the family planning clinics of women ages
mean has been calculatet 25 and standard deviation has been calculated 5. If
the age are been distributed normal;
a) What is the age possbility of each between 25-27 ?
b) What is the age possibility of below 30 ?
c) Calculate the age possbility of below 27 ?
Q15. In a location, b grup blood typed people ratio had been shown %40. 5
person has lend her/his blood to a hospital. What is the total B grup blood type
possbility of this 5 people ?

33
Asst.Prof.dr.sufian M.salih 2018

Q16. In a population the sistolic blood pressure values are in a normal


distribution, the blood pressure mean is 130mmHg and standart deviation is
25mmHg. What is the % percent of those people that’s blood pressure is
between 110-140 mmHg ?
Q17. Some of the pregnants women babies hearth rate has been listened, the
mean has been founded µ=140 and the variance σ²=144. According to this by
chance, when a fetus has been
taken;

a) What is the hearth ratio bigger than 150 ?


b) What is the hearth ratio between 130 and 150 ?
Q18. The infectious disease risk possibility in a village is 0,0001, are in a
treatment over a year. What is the repeat disease treatment possbility of 2 from
2000 people ?
Q19. %20 of Midwifes who are addmited to a hospital that has hyperemesis
gravidarum. In a 10 group of midwifes what is the 2 hyperemesis gravidarum
disease possibility ?
Q20. In a research , in the period of a pregnance the puted weight’s mean are
12 kg and variance is 16. According to this;
a) What is the weight ratio that took oever 14 kg ?
b) What is the weight taken ratio between 10-14 kg ?
Q21. A medicine’s side effect of shedding hair ratio is 0.01 in a cancer
treatment. The medicine which is used on cancer patients, what is the side
effect possbility for shedding hair for 3 from 100 cancer ?
Q22. In a hospital the forgeting of foreign material in the body of a patient
while in a surgery is 0.001 yearly, What is the forgeting possibility of 2 from
3000 people that has to take a surgery in a year ?
Q23. Kids which their nutrition is according to fast food, has been researched.
According to this research the mean of these kids weight calculatet 50 kg and
the standart deviation is 6 kg. One of those which has a fast food eating habit
has been chosen;
a) What is the weight possbility that is below 42 kg ?
b) What is the weight possbility that is abow 62 kg ?
c) What is the weight between 42 kg and 62 kg possbility ?
Q24. In statistics the exam’s result’s mean is 74 and the variance is 49 which
has normal distribution. The ones that has high mark that will take AA is in a
ratio of %10. According to these information, what mark should has to take the
ones that has taken AA grade ?

34
Asst.Prof.dr.sufian M.salih 2018

PART 5

SAMPLING AND SAMPLE SIZE of FORECAST

5.1. Sampling
Studies are carried out as mentioned in the first part samples for various
reasons. Creation of sample is large enough to be a stand-alone course topics
will be only one entry here.
Basically the sample into each individual in the population should have equal
chances. Selection must be made to ensure that aleatory. However, the lack of
uniform or homogeneous materials, or with different properties can be limited
due to the commitment that chance. For this, various sampling methods have
been developed. Let's examine a few of them.

1) Simple Sampling Chance: Homogeneous population and a limited number


of these sampling is applied. For example, if we want to search a class's
level of knowledge is considered homogeneous in terms of the level of
knowledge the students in the class instance is created with a simple chance
sampling. This selection can be placed in the bag with the names selected at
random.

2) Systematic Sampling: It is used for registerations. For example in


hospitals, health centers which uses the information of patients for
registrations. A system has been developed for the samples. Example in a
registration of 1000 people, that is needed only 100 for information the step
for the sampling should be 1000/100 =10. After this by finding who the
first person should be a random number is being choosen between 1-10.
Example the choosen number had been 7. The first sample to the
registeration would be the one which 7. Registered. The other would be
found by adding 10 to the number of the first choosens’s like 7 +10 = 17
and it goes continuos with addind 10 more and more like 17+10 = 27,
27+10 =37. These sample is called according to this method the systematic
sampling.

3) Multi-Stage Sampling Chance: In this sample system the population is


created with a homogeneous subgroubs. For Example: Universities has its
own sections departmens and etc. every of those has to be grouped in
subgroup and the initial numbers has to taken seriously.

35
Asst.Prof.dr.sufian M.salih 2018

4) Stratified Sampling Chance: The subset is used when forming the


heterogeneous population. For example, the situation of hospital services in
terms of blood pressure in patients with heart disease while dermatologs
research in blood pressure is different. Thus it is a heterogeneous
community. Here the stratified sampling chance has to be used.

5.2. Estimattion of the sample size

Knowledge of the sample size required for research in any population. Sample
size varies depending on the parameters to be estimated and the variation in
future studies.

5.2.1. Determination of the population estimated to mean the sample size


The researches that we are going to make has diffrent formulas in use, if the
research is about an mean or if it is about a ratio the formulas would not be the
same. The formulas that is used for an mean is diveded into two:
1) The method in the case of population variance known.
2) The method in the case of population variance unknown.

1) If the population variance is known: If you have previously done a


research on the work to be done, it is a result of the screening of sources
subject variance can be determined. The sample size in this case;
z2 / 2 2
Is predicted with this formula n . Here; n: represents sample
d2
size, z / 2 represents the standard normal value from certain probability level in
the z table. d: represents the researchers maksiumum deviation value.
Sample: Midwifes blood protein value variance whic is 27 known. For the
%95 true save possbility, what should be the sample size for this according to
0.3 mg/dl deviation. ?
Solution : Firstfully the variance and deviation value has givin in critic
condition, the unknown Z critical value has to be found with the error level
from the Z table. The save coefficient value is: 1-=0.95; =0.05. The
predicted mean ( x ) could maybe less or more from the real population mean
(µ). According to this the both tails error value is 0.05. In the single tails error
would found in /2=0.025 this. The Z0.025 1.96 value would be find on Z table.

z2 / 2 2 1.962 * 2.7


n   115
d2 0.32
According to this result in a 115 people sample the save ratio with 0.3 mg /dl
deviation would be %95.

36
Asst.Prof.dr.sufian M.salih 2018

2) If the population variance is unknown: In this case firstly a pilot study is.
A pilot study of the research is to establish a small sample. According to the
variance obtained from this pilot study the sample size of the real survey are
estimated. When studies conducted on small samples, the small sample
distribution of the t distribution is utilized.

Student's t distribution is used in testing related to an mean and a forecasts.It is


the normal distribution which is shown by the small sample. Research is often
used for more common than our standard normal distribution which had to be
carried out in small samples. t distribution table as opposed to the critical value
of z-table degrees of free (df) is determined together. For an example it is
sd=n-1 .
For small samples, the sample size is foreceasted with this formula. The
 / 2, ( n  1) is not a multiplier in this formula.It is just to find the t critical ruler

subscript. S 2 is the sample variance of the pilot study and d: is the researchers
maksiumum deviation value

Sample: A pilot study about the blood sugar has been made with a grup of 20
people and the blood sugar standart deviations has been calculated 30. To find
the save ratio %99 in a 5 unit deviation, how many sample of people has to be
the study ?

Solution: The sample variance and the deviation value has been givin to in the
formula. The unknown t critical value for error level and degree of freedom
could be found on t table. The save coefficient is : 1-=0.99; =0.01 . The
predicted mean ( x ) could maybe less or more from the real population mean
(µ). According to this the both tails error value is 0.01. The t0.005,19 2.861 value
would be find on t table.

According to this result in a 295 people sample, the save ratio with 5 unit
deviation would be %99.

5.2.2. Determining the Sample Size for Population Ratio


The researches that is made on means which would be go on with sized sample
and becouse of that Z distribution is usually used. The work of sized samples
that is made for mean, is found with the formula below. Here p is succes rate
for the event that is desired to happen, q is unsucces rate for the event that is

37
Asst.Prof.dr.sufian M.salih 2018

desired to happen, z/2 is the specific value of the standard normal probability
level in the z table and d: is the researchers maksiumum deviation value.

ˆ ˆ 2 / 2
pqz
n
d2
Sample: In a grup of 50 dietitian, 7 of them has been found that they are eating
unhealthy. According to those which are eating unhealthy, what is the
possibility sample amount of this study that has a unhealthy eating habit save
ratio about %95 and with a %1 deviation ?
Solution: pˆ  7 / 50  0.14; qˆ  1  0.14  0.86 ve Z0.025=1.96
ˆ ˆ 2 / 2 0.14*0.86*1.962
pqz
n   4625
d2 0.012

5.3. Sample questions


Q1. Hemoglobin values’s standart deviation is known 15. To predict the Hg
mean’s value with a safe of %95 and 7 unit deviation, to find the people for
this sample wich is needed?
Q2. The birth weight variance of children is known as 36. With a %95 of a
save to predict the real birth weight with a maksimum 0.8 kg deviation, to find
the people for this sample wich is needed?

Q3. In a pilot study with sample of 25 persons, the blood presure rate is found
S=15. To predict the blood pressure rate’s mean value with a %95 safe and 5
unit deviation, find the people for this sample which needed?
Q4. 15 dropsical patients’s weight’s standart deviation is found 310 gr. To
predict the dropsical patients’s weight’s with a %90 safe and 120 gr deviation
from their real values, to find the people for this sample wich is needed?
Q5. Postpartum education which is taken from 40 midwives, seen that 10 of
them have missing knowledge about infections. In this midives population, to
predict the infection caught with a %99 safe and maksimum 7 unit deviation, to
find the people for this sample wich is needed?
Q6. In a study from 50 doctors, it have seen that 10 of them are smoking
cigarettes. In the population predict the smoking rate with a %99 safe and
maksimum 4 unit deviation, to find the people for this sample wich is needed?
Q7. From 35 Students of 7 which all are in a study of health education, were
determined as a smoker. In this student population predict the smoking rate
with a %99 safe and maksimum %5 rat deviation, to find the people for this
sample wich is needed?

38
Asst.Prof.dr.sufian M.salih 2018

PART 6
Hypothesis testing and Confidence Limits

Studies are usually planned to test the claims raised in any matter. Hypothesis
testing is a method of statistical testing of the claims put forward in a certain
error level. With this feature has an important place in the statistical methods
and the most widely used method. Mean value obtained from the sample, the
ratio of variance, etc. such as statistics, for example, a decision was taken on
the suitability of the population parameter. The accuracy of the hypothesis
tested. It contains a specific error is not certain.

6.1. Error type


The first type of error, the rejection of a hypothesis must be accepted and the
resulting error is indicated by . The first type of error level is determined by
the investigator and is generally taken to be 0.01 or 0.05. Apart from that error
levels can be also determined. The second type of error, it is the rejection of
the errors associated with the adoption of a hypothesis is needed and is
indicated by . This error gives us the strength of the test. 1- is described the
test’s strength. The error types are givin below.
Real (In a population) Sitiuation
According to test H0 True H0 False
results
H0 Acceptet Right Decision II. Type Error ()
H0 Rejected I. Type Error () Right Decision

This is an inverse relationship between the two types of error, when error 
increase, error  decreases. According to a scientific study the significance
level level of the study is discribed with , the H0 hypothesis acceptance and
rejection is described below:

If the H0 hypothesis is acceptet it will described in this (P>0.05) and the


importance level is described in  = 0.05 this, which means the H0 hypothesis
isn’t rejected.
If the H0 hypothesis is rejected;

If the recejtion rate is  = 0.05 , it will described in (P<0.05) this which is an


importance for the differences in statistics and it will be shown with one *.

If  = 0.01 is, it will described (P<0.01) and the H0 hypothesis is rejected in are
of 0.01 it is the description diffrences of statistics and it will be shown with two
**.

39
Asst.Prof.dr.sufian M.salih 2018

If  = 0.01 is, it will described (P<0.01) and the H0 hypothesis is rejected in are
of 0.01 it is the description diffrences of statistics and it will be shown with two
***.

Summaries
P;  value Decision Degree Symbol
H0 hypothesis accepted. no diffrence,
P>0.05 Unimportant ud/ns
equal, did not affect …
H1 hypothesis accepted. % 5 error rate
P<0.05 (%95 save ratio) diffrence, not equal, did Important *
affect…
H1 hypothesis accepted. % 1 error rate Important
P<0.01 (%99 save ratio) diffrence, not equal, did **
affect…
H1 hypothesis accepted. % 0,1 error rate Important
P<0.001 (%99,9 save ratio) diffrence, not equal, ***
did affect…

Two types of installation for testing the hypothesis claims raised in


hypothesis testing;
H0 hypothesis: Is called the basic hypothesis or the hypothesis H0, which this
hypothesis is put forward significant negative hypothesis to the adoption of the
normal conditions.
Samples :
a) The life span mean of the people living in the highlands is not diffrent that
is living in Turkey.
b) Blood pressure medicine is not a factor of increasing the blood pressure.
c) There is no an increasing ratio diffrence between twins.
d) There is no a diffrence smartness ratio betwen the gender of humans.
H1 hypothesis: It is described alternative or opposing hypothesis which the
researchers test subject is desriben for its claims. According to the researches
situation the terms diffrence, big or small are used.
Sample:
a) The life of the mean age of people living in Highland is different from
Turkey of those that live in cities, they may last longer or have a shorter service
life.
b) Blood pressure medications are effective to raise blood pressure..
c) There is a difference between the growth rates of twins.
d) There are dependencies between gender and success rate.

According to accept the alternative hypothesis on the organization, chart and


rejection regions are arranged in different ways.

If it is H1:µ1≠µ2 the hypothesis is two-way, the means are different (looked at


α/2),

40
Asst.Prof.dr.sufian M.salih 2018

If it is H1:µ1>µ2 the hypothesis is one-way, the first mean is bigger than the
second mean (Right tail test),
If it is H1:µ1<µ2 the hypothesis is one-way, the first mean is smaller than the
second mean (left tail test)

There are 4 Stages in all Hypothesis Testing


1. Hcreating a hypothesis; H0: Null hypothesis. H1: Alternative hypothesis.
2. Finding the critical value; By looking to according to the alternative
hypothesis, a suitibal distribution would be determined as to one-way and two-
way for the critical ruler value.
3. The test statistic would be found; According to the specified account value
distribution has to be found. For the population and sized samples mean would
be find from Z distribution. If in the small sample the (n<30) is, the t
distribution would be used. The Z distribution is used when the ratios and
forecast are being used. Summarize;
Condition Test whic is used
In tests related to rate Z distribution
(test)
Mean related tests
Population variance is know as (2). Z distribution
(testi)
Population variance is unknow as (2), sample variance is Z distribution
known as (S2) but n≥30. (testi)
Population variance is unknow as (2), sample variance is t distribution
2
known as (S ) but n<30 (testi)

4. The critical value and test statistics would be matched between each other
and the result would be interpreted.

6.2. Tests of hypotheses about mean

6.2.1. An mean of Hypothesis Testing:


It is the testing method of a sample mean standart in terms of difference, small
or big. As the means are parametric some assumptions has to be to be valid.
Those assumptions in summarize are; the normal distribution and the sample
has to be enough sized. Lets analyze both Z and t test. The 4 stage above will
be happen in a order.

Sample 1: Z test application


The mean length of the newborns in Turkey is known 50 cm and the variance
is calculated 25. The length mean of newborns in Isparta has been defined
60cm. Could be said that the newborn babies length mean in this location
much taller than the general mean ? ( =0.05)

Theoretical calculation steps;


41
Asst.Prof.dr.sufian M.salih 2018

1) Hypotheses are established:


Null Hypotheses; H 0 : µ = µ0
Alternative Hypotheses; H1 : µ  µ0 ; µ > µ0 ; µ < µ0

2) The distribution is defined and according to this distribution the table value
is been found.
For one-way Hypotheses. zc = z
For two-way Hypotheses. zc = z/2

3) The test statistics calculated.


x - 2 
Zh  and standart error ;  x  
x n n

4) Comprasion accours.
When z h > z c is, H 0 rejected, H1 accepted.
When z h < z c is, H 0 accepted.

Solution:
H 0 : µ = 50
H1 : µ > 50
Critical Ruler Value; 0.05

zc = z =Z0.05=1.64 0 1.64

The test statistics;


x - 60  50 10 10
Zh      12
x 25/ 36 5/ 6 0.83
Decision: Becouse of 12 > 1.64 H0 is rejected, Hı hypothesis is accepted. So
the newborn in this location, could be said in save rate of %95 that they are
taller than to the general mean.

6.2.1.1. Mean’s confidence limitations (Z distribution)


The margin of error will be found remaining in a single queue is used for the
lower and upper limits of the mean of the confidence limits.
According to this; Z0.025 =1.96

üs = x + z/2 *  x = 60 + 1.96*0.83 =61.63


as = x - z/2 *  x = 60 – 1.96*0.83 =58.37
0.025 0.025
As the results the newborns lengths are
-1.96 µ 1.96
between 58.37 - 61.63 with a %95 save
ratio.
42
Asst.Prof.dr.sufian M.salih 2018

Sample 2: t test application


In a research of 20 sick patient’s blood’s has been found that the Na value
mean is 3.2 and the varians is 0.98. Test the Na amount if it is above from 2.9
or not in the blood with a error rate of %1.

Theoretical calculation steps;


1) Hypotheses are established,
H 0 : µ = µ0
H1 : µ  µ0 ; H1 : µ > µ0; veya H1 : µ < µ0.
2) Critical Ruler Value would be defined.
For one-way Hypotheses; tc = t(n-1)
For two-way Hypotheses; tc = t/2.(n-1)

3-) Test statistics calculated.


x - S2 S
th  and standart error; S x  
Sx n n

4-) Comprasion accours.


t h > t c  H 0 reject H 1 accepted.
t h < t c  H 0 kabul edilir.

Solution:
H 0 : µ = 2.9
H1 : µ  2.9
Two way hypothesis;
tc = t/2.(n-1) =t0.01/2,19=2.861

The test statistic; 0.005 0.005

-2.861 µ 2.861
S2 0.98
Sx    0.22
n 20
x -  3.2  2.9
th    1.36
Sx 0.22
Decision: When -2.861<1.36 < 2.861 is, H0 hypothesis is accepted. So the Na
value in blood with error ratio of %1 is not diffrent from 2.9 value.

6.2.1.2. Mean’s confidence limitations (t distribution)

up = x + t/2.(n-1) * S x = 3.2 + 2.861*0.22 =3.83


lo = x - t/2.(n-1) * S x = 3.2 - 2.861*0.22 =2.57

43
Asst.Prof.dr.sufian M.salih 2018

The confidence limitations with %99 of Sodium amounts values are between
2.57 – 3.83.

6.2.2. Comparing the difference between two means


To compare the mean should be decided for primarily is it dependent or
independent. Because independent groups (comparison group) can be done
with both t and z tests, but dependent (paired) groups can be assessed only by t-
test.

Samples for the mean individual which will be compared (group comparison).;
 Comparison of the hemoglobin value of girls and boys,
 Comparison of the same age and sex that were divided into two groups of
people which are suffering from the same disease by applying two separate
drug treatment recovery in a period of time,
 Compareing the height of boys and girls in certain age groups
 Comparison of the endurance time of products from two factories producing
the same type of production etc. events like these are been applied with group
matching.

Sample for dependent mean (Pair) that are going to beeing compared;
 Compareing the sleeping time of twins,
 Comparison of the blood pressure values after - before takeing of the blood
pressure medicine,
 Comparison of the hemoglobin values after – before takeing of the medicine,
 Compareing of the heat, blood pressure, blood content.. etc. values that has
been taken from pateints.

Comparing the two mean basic assumptions can be summarized as follows:


Grup Comparison Pairing Metod
Samples has tobe accour by chance (coincidence), Distribution should be normal
Mateiral has to be homogeneous Materials could be heterogeneous, Pairs
have to be homogeneous
Variances have to be homogeneous Only one variance can be used
Groups are independent from each Pairs are dependent.
other
Gropus can be equal (n1=n2) or Each observation has to have a pair and
different sized (n1n2) they are n1= n2

6.2.2.1. Group Comparison


Both z and t test will be defined with examples.

Sample 1: Z test application


In a study the men and women hemoglobin statistics values have been found
below. Can be said that the hemoglobin values of men is more than women ?

44
Asst.Prof.dr.sufian M.salih 2018

(=0.01) and more find the %99 confidence limitations of the homoglobin
values by subtracting the men and women homoglobin values.

Statistics Man (1) Woman (2)


Mean ( x ) 15 12
Variance 9 10
N 35 35

Theoretical calculation steps;


1) Hypotheses are established;
H 0 : µ1 = µ2 or µ1-µ2 = 0
H1 : µ1-µ2  0; H1 : µ1-µ2 > 0; H1 : µ1-µ2 < 0
2) Critical rular value zc is beeing defined;
For one-way Hypotheses; zc = z
For two-way Hypotheses; zc = z/2

3) The test statistics zh will calculated;


x1 - x2 - ( 1 - 2 ) x1 - x2 - ( 1 - 2 )
zh  
 x -x  12  22
1 2

n1 n2

4) Comprasion accours.
z h > z c  H 0 rejected and H1 accepted.
z h < z c  H 0 accepted.

Solution:
1) Hypotheses are established;
H 0 : µ1 = µ2 or µ1-µ2 = 0
H1 : H1 : µ1-µ2 > 0
2) Critical rular value is beeing defined;

For one-way Hypotheses; zc = z = z0.01=2.33

15  12  (0) 3
3) The test statistics will calculated; z h =   4.07
9  10 0.74
35
4) Comprasion accours; When zh =4.07>zc  2.33 is H0 is rejected and H1
accepted (P<0.01). So the hemoglobin value of men in blood with error ratio of
%1 is founded more than women.

6.2.2.2. The mean subtraction’s confidence limitations


45
Asst.Prof.dr.sufian M.salih 2018

Confidence limits are often determined duplex. For this the critical value of Z
table z / 2  z0.005  2.57 and other statistics formula will be placed and the
confidence limitations would be predicted while the calculations are been made.
(µ1 - µ2 )as   x1  x2   Sx  x 1 2
* z / 2 =15-12 + 0.74*2.57 = 1.1

(µ1 - µ2 )üs   x1  x2   Sx  x 1 2
* z / 2 =15-12 + 0.74*2.57 = 4.9

Erkek ve bayanların hemoglobin değerleri arasındaki %99 güvenle 1.1 ile 4.9
arasında değişmektedir. The confidence limitations with %99 of hemoglobin
values of men and women are between 1.1 – 4.9.

Sample 2: t test application


A cheese which contains or not microorganism has been taken 9 piece of
sample (n=9), the calsium values has been researched from the taken samples
and the Ca values has been givin below. According to these values have the
cheese that include and not include microorganism equal Ca means (=0.05). ?
And more forecast the two means subtraction of the %95 save limitation.

Microorganism not 11.0 13.0 12.8 12.6 9.0 12.0 13.2 12.7 12.8
included (1)
Microorganism 12.5 12.0 11.9 12.3 12.6 11.6 11.8 11.9 12.1
included (2)

Theoretical calculation steps;


1) Hypotheses are established.
H 0 : 1 = 2 or 1 - 2 = 0
H1 : 1   2  0, H1 : 1   2 > 0; H1 : 1   2 < 0
2) Critical rular value is beeing defined.
For the two-way test, t t = t/2(n1+n2 -2)
For the one-way test, t t = t (n1 +n 2 -2)
x1 - x2 - ( 1 - 2 )
3) The test statistcs calculated; th 
S x1 - x2

4) Comprasion accours.
t c < t t  H 0 accepted,
t c > t t  H 0 rejected H1 accepted
Here the two mean subtraction standart error S x1 - x 2 while calculating;
S12  S22
When n 1 = n 2 is, S x - x  will be.
1 2
n
1 1
When n 1  n 2 is, S x - x  S02    will be.
 n1 n2 
1 2

46
Asst.Prof.dr.sufian M.salih 2018

In this formula S 02 , is the common variance, it will be calculated with


(n1 -1) S12  (n2 -1)S22
S02  .
n1  n2 - 2

Solution:
Let us first estimate the descriptive statistics;
n

x i
109.1 108.7
x i 1
; x1 
 12.12; x2   12.08 ;
n 9 9
(xi )2 109.12
xi2  1336.97 
S2  n ; S12  9  1.80
n -1 9 1
108.7 2
1313.73 
S22  9  0.11; ve S  0.33
9 1
2

1.80  0.11
The standart error will be(when n’s are equal); S x1 - x2   0.46
9
1) Hypotheses are established.
H 0 : 1 = 2 or 1 - 2 = 0
H1 : 1   2  0
2) Critical rular value is beeing defined.
For the two-way test, tc  t / 2( n n -2)  t0.05 / 2;16  2.12
1 2

x1 - x2 - ( 1 - 2 ) 12.12  12.08  0
3) The test statistcs calculated; th    0.09
S x1 - x2 0.46

4) Comprasion accours.
As the test statistics is lower than the critical rular velue the H0 is accepted and
according to Ca the two sample means which has a %5 error rate are accepted
with no diffrence.(P>0.05).
The difference between mean’s confidence limitations are;
(µ1 - µ2 )üs   x1  x2   Sx1  x2 * t / 2,( n1  n2  2)  12.12 -12.08  0.46*2.12  1.17

(µ1 - µ2 )as   x1  x2   Sx1  x2 * t / 2,( n1  n2  2) =12.12-12.08 + 0.46*2.12=1.02

From these two samples the mean, according to Ca the diffrences values
between 1.02 – 1.17 is changed with a save ratio of %95.

6.2.2.3. Test for dependent groups (Pairing Test)


47
Asst.Prof.dr.sufian M.salih 2018

The describetion for the test is being discribed in the begenin of the part of the
topic. As the test procces is being made from the subtraction of the pair, the
proces stages are same like the hypothesis process. As the interpretation is the
difference’s proces, the solution would be like two different mean’s
explanation.

Example: A test has been made to twins, the test results has been writen
below. Test the score mean of the twins as to the importance of the diffrence
(=0.05). Forecast the save limitation rate of %95 according to the subtractions
?

Total
Pair1 82 80 78 80 76 74 84 76 68 84 782
Pair2 84 76 82 84 72 70 82 84 72 80 786
Subtractions(
xi ) 2 -4 4 4 -4 -4 -2 8 4 -4 4
2
xi 4 16 16 16 16 16 4 64 16 16 184

Theoretical calculation steps;


1) Hypotheses are established.

H0 : f = 0
H1 :  f  0 ; H1 =  f > 0 ; H1 :  f < 0
or H0 : f = a
H1 :  f  a ; H1 = f > a ; H1 :  f < a

2) Critical rular value is beeing defined.


For the one-way test t c = t  (n -1) and For the two-way test t c = t /2(n-1)
3) The test statistcs calculated.
x - f S f2 S f
th = f ve S xf = = dir.
Sxf n n
4) Comprasion accours.
t h < t c  H 0 accepted.
t h > t c  H 0 rejected H 1 accepted.
Solution:
Let us first estimate descriptive statistics;
x 4
x  i ; x   0.4
n 10

48
Asst.Prof.dr.sufian M.salih 2018

(xi )2 42
xi2  184 
S 2f  n ; S 2f  10  20.82
n -1 10  1
1) Hypotheses are established.
H0 : f = 0
H1 :  f  0
2) Critical rular value is being defined.
For the two-way test t c = t /2(n-1) =t0.05/2; 9=2.262
3) The test statistcs calculated.
xf -f 0.4  0 0.4
It is t h =    0.28 .
Sxf 20.82 /10 1.44
4) As the test statistics is lower than the critical rular velue the H0 is accepted
and according to twins score which has a %5 error rate that is accepted with no
diffrence.(P>0.05).

6.2.2.4. Subtraction’s mean confidence limitations


95% of the value of the mean differences between the spouses confidence
limits can be calculated as follows:
üs = x + t/2.(n-1) * S x = 0.4 + 2.262*1.44 =3.66
as = x - t/2.(n-1) * S x = 0.4 - 2.262*1.44 =-2.850
The lower limits negaitve value means of the variance is big and has to be taken
null. The twins scores’s subtraction’s mean save limit is %95 which is between
0 – 3.65.

6.3. Tests of hypotheses regarding rates


For the test and the forecast of the rates Z distribution is only used. Becouse the
researches is made with a population or with sized samples.

6.3.1. Hypothesis test of rate


It is the testing of a sample with standart accepted sample according of the
terms with difference, small or big. The number of individuals in the sample
has to be enough (n>30). The test process has givin with an application of a
sample.

Example: In a hospital which of 75 from 15 children in a age of 2 years old


has been admited, whose of the 15 has a malnutrition problem.
a) Test the baby populations malnutrition problem if it is above from %15
or not ?(=0.05)
b) Determine the baby populations malnutrition problem according to the
mean save limitation ratio in %90.
49
Asst.Prof.dr.sufian M.salih 2018

Theoretical calculation steps;


1) Hypotheses, H0 : p = p0

It is like H1 :p p 0 ; H1 : p> p 0 , H1: p<p0


2) Critical rular value is being defined;
For the one-way hypothesis; zc  z
For the two-way hypothesis; zc  z / 2
p̂ - p 0
3) The test statistcs; z h = dir.
 p̂
4) Comprasion accours
zh  zc  H0 rejected H1 accepted.

zh  zc  H 0 accepted.
x
Here, p̂ = is the ratio that is calculated from the sample p 0 : is the ratio of
n
ˆˆ
pq
population,  pˆ  : shows the standart error of the mean.
n
Solution:
a)
1) Hypothesis, H 0 : p = 0.15
H1 : p > 0.15
2) Critical rular value is being defined;
For the one-way hypothesis; zc  z  z0.05  1.64
15 0.20*0.80
3) The test statistcs; pˆ   0.20 ve qˆ  1  0.20  0.80,  pˆ   0.05 ,
75 75
pˆ - p0 0.20  0.15
It is zh    1.08
 pˆ 0.05
4) As the calculation value is lower than the critical rular velue the H0 is
accepted and the malnutrition problem rate is accepted that it is not above of
%15.(P>0.05).
6.3.1.1. Confidence limitations of the rate
Malnutrition problem rate of 90% save limits for the critical value z0.10 / 2 =
z0.05 = 1.64

ˆˆ
pq
püs  pˆ  z / 2  0.20  1.64*0.05  0.28
n
ˆˆ
pq
pas  pˆ  z / 2  0.20  1.64*0.05  0.12
n
Malnutrition problem rate of 90% save limits are changeing between
0.12(%12) and 0.28(%28).
50
Asst.Prof.dr.sufian M.salih 2018

6.3.2. Hypothesis Testing of Two Rate Difference

The number of individuals has to be in enough amount for the two samples.
(n1 ve n2 >30)

Sample 1: 500 of 450 and 350 of 200 families had a certain level of knowledge
about hepatitis B which they had been choosen from a different two provinces.
Along to that are the families of the first province has more knowledge about
hepatitis B according to families of the second provinci (=%1). Forecast the
save limitation of %99 rate for the rate difference ?

Theoretical calculation steps;


1) Hypothesis; H 0 : p1 - p 2 =0
H1 : p 1 - p 2 0; p1 - p 2 >0; p1 - p 2 <0,
2) Critical rular value is being defined;
z c = z Tek yönlü hipotez için,
z c = z/2 İki yönlü hipotez için,
3) The test statistcs
pˆ1 - pˆ 2 - ( p1 - p2 )
Would be defiend: zh  Here,
 pˆ - pˆ
1 2

1 1 x1  x2
 pˆ - pˆ  p0 q0    and p0  , q0  1- p0 is defined.
1 2
 n1 n2  n1  n2

4) Comparison is performed as below.


If z h > z c is, H 0 rejected H1 accepted.
If z h < z c is, H 0 accepted.

Solution:
1) Hypotheses; H 0 : p1 - p 2 =0

should be established H1 : p1 - p 2 > 0 .Alternative hypothesis is one-way.


2) Critical rular value for one-way hypothesis; zc  z  z0.01  1.64
450 200
3) The test statistcs; pˆ1   0.90 ve pˆ 2   0.57
500 350
450  200 650
p0    0.77, q0  1- 0.77  0.23 ;
500  350 850

 1 1 
 pˆ - pˆ  0.77 *0.23     0.03
1 2
 500 350 

51
Asst.Prof.dr.sufian M.salih 2018

pˆ1 - pˆ 2 - ( p1 - p2 ) 0.77  0.23  0 0.54


zh     18.0 olarak bulunur.
 pˆ - pˆ
1 2
0.03 0.03
4) As comments the test statistics critical rular velue is bigger than H0 , the
hypothesis is rejected, alternative hypothesis is accepted. In a accordance of a
%1 error rate it is difened that the knowledge amount of the families in the first
province is more than to the families in the second province. (P<0.01)

6.3.2.1. Save limit of the difference between two proportions


The save limits of the estimates can be estimated by the following formula.
pˆ1qˆ1 pˆ 2 qˆ2
( p1  p2 )üs  ( pˆ1  pˆ 2 )  z / 2   0.77  0.23  2.57*0.03  0.62
n1 n2

pˆ1qˆ1 pˆ 2 qˆ2
( p1  p2 )as  ( pˆ1  pˆ 2 )  z / 2   0.77  0.23  2.57*0.03  0.46
n1 n2

In this case, knowledge of the differences between provinces is seen that varies
between 62% and 46%.

6.4. Chi-squared (2) Distribution


Chi-squared distribution counting and classification results are obtained from
discrete (qualitative, quantitative) which is used in the interpretation and
evaluation of data. Chi-square tests with the distribution will focus on only the
fit and independence tests. This test is often used to evaluate the survey results
and the interpretation. Chi-square distribution is positively skewed to the right.
2 test negative value should be estimated in statistical calculations.
Hypotheses in 2 distribution is not in a clear
statement in writing with the established parameters.

6.4.1. Chi-squared test Compliance


It is the fitting research by counting and classifications of the obtained data’s
ratio’s. Here the fitness of the rate is only be used. When a rate fittness test is
accouring; while the free rate is difeneing class value would be choosen instead
of the n number. So in the rate fitness test the 1 unit subtraction of the group
will be taken as it is the free rate.
Sample: A patient in come at daily for 5 service to a hospital has been given
below. Is the distribution of the patients to the services equal ? Are the patient
coming to the services in a same amount of ratio ? Test it in =0.05 error rate.
Internal
Medicine ENT Eye Child Orthopedics Total
24 21 15 28 12 100

Solution: Firstfully hypotheses has to be established.

52
Asst.Prof.dr.sufian M.salih 2018

H0: The distribution of patients to the services are in a equal ratio.


H1: The distribution of patients to the services are not in a equal ratio.
The test statistcs; would be calculated with

In the formula; gi: is the observed value, bi: is the waited value
2 critical test value is defined 2,(SD). Free Rate is defined (FR)= Catagory
number – 1.

The values which are observed in the test statitistic, are the patient income
values to the services. The values are being defined by asking ourselfes: “
What would be the change of the values, if the income distribution to the
services would be equal. So the distribution rate should be (1:1:1:1:1).
According to this 100/5 = 20 is the number of the equivalent income to the
services.
Internal
Medicine ENT Eye Child Orthopedics Total
Observed
Value(Oi) 24 21 15 28 12 100
Expected value
(Ei) 20 20 20 20 20 100

2 = 0.80 + 0.05 + 1.25 + 3.20 + 3.20 = 8.50


If we use the test statistics as a comprasion, 2 the critical table value
for the free degree would be like below;
SD=5-1=4 is. Alpha error level is defined 0.05. Thus 2 table value
would be  0.05,4
2
=9.488.
The decision or the comment for the test statistics the value is 8.50, as it
is lower than the the critical table value 9.49 H0 is accepted true and
the patients distributions to the services has a no effect for the statistics
terms.

6.4.2. Chi-Squared Independence Test

Classified data obtained from the survey are summarized in mostly two-
dimensional table. In these tables for quantitative factors examined the
relationship between two factors of 2 test. For example; The
relationship between hair color or eye color and gender; The
relationship between success and sex; a large number of provinces as
examples of the relationship between the type of birth according to the
neighborhood. This relationship between events is determined by the
independence test.

53
Asst.Prof.dr.sufian M.salih 2018

The tables which had been applied an independence test has a size like r
x c. r is the linage and c number of columns. The minimal of this table
is 2 x 2 sized and 4 celled. In this tables the linage and number of
columns are changeable according to the category number. The linage
and the number of columns has not to be equal. The are some
regulations according to cells’s numbers which is writen in to the
spaces.

This regulations can be summarized as follows:

If the income values are between 5-20, the 2 test statistics would be
2x2 as to Yates regulations, if the income values are lower than 5,
Fisher’s exact test 2 would be made. Any factor that is above 2 and the
catogarisation rate wich is more from %20 that have an income lower
than 5 frequency which is catagoriesed (rows or columns ), will be
combined with previous catogaries for makeing the 2 test.

Sample: In a two location the birth distribution of boy and girl has been given
below. Are there any relation of the girl/boy birth distribution according to the
locations ? Test it.(=0,05)

I. Location II. Location Total


Boy 150 105 255
Girl 100 125 225
Total 250 230 480

Solution:
Hypotheses are established.
H0: There isn’t a relation between the gender to the distribution of the
locations.

H1: There is a relation between the gender to the distribution of the


locations.
Critical rular value is beeing defined.  c2 =  2,( r 1)( c 1) ;  0.05,1
2
=3.84
Here; r = is the row number, c= column number. The free rate of indepence
test;
It is defined with SD=(r-1)(c-1) formula. It is SD = (2-1)(2-1) =1
The calculation value ( test statistics) is calculated.

In this formula; gij = i. order j. value that is observed in column. Eij= i. order j.
value that is awaited in the column. As seen the awaited values has been writin
like gij ve bij in the test statistics. Because the values in each cell belongs to
two factors here. the Görüldüğü üzere test istatistiğinde gözlenen ve beklenen
değerler şeklinde yazılmıştır. Çünkü burada her bir hücredeki değer iki faktöre
54
Asst.Prof.dr.sufian M.salih 2018

aittir. Indicis respectively shows the rows (i) and columns (j). The awaited
values;
Is calculated with help of formula. Here; ri = i. Addition of
rows, c j = j. addition of columns ve T = the general addition.

Expected values can be found by others after removing cells found through this
formula as to the free rate.So,

; ;
In the form. The test statistic is calculated using the values
Observed value Expected Values
I. Location II. Total I. Location II. Total
Location Location
Boy 150 105 255 132.8 122.2 255
Girl 100 125 225 117.2 107.8 225
Total 250 230 480 250 230 480
(150  132.8)2 (105  122.2) 2 (100  117.2) 2 (125  107.2) 2
 h2    
132.8 122.2 117.2 107.2
h2  2.23  2.42  2.52  2.74  9.92
Each value of the test statistic is defined as the contribution of each cell to the
test statistics. Men and contribution of location, the cell types is in a value of
2.23.

Commentary: As the h2 =9.92 >c2  3.84 is, H 0 will rejected and H1 will
accepted. So, there is a important relations with locations about the
gender.(P<0.05).

6.5. Sample Questions

Q1. 300 of 56 women have pulled her teeth in a dentist and the same have
made to 480 of 72 men. Is there any gender difference about the teeth pulling
ratio ? (α=0.05)

Q2. A medecine which increase the hearth pulse rate has given to 6 person and
the results below have been seen. Is there any hearth pulse increase ? (α=0.01)

1 2 3 4 5 6
Before Medecine 65 70 68 80 73 69

55
Asst.Prof.dr.sufian M.salih 2018

After Medecine 69 73 70 85 70 70
Difference (A-B) 4 3 2 5 -3 1

Q3. The values below are the datas of breath rates of patients which are in
chest treatment service in a hospital. Calculate the values’es mean and variance
statistics. Test the breath rate mean if it is difference from 20 or not. (=0.05)
xi : {16,15,17,20,15,19,19,16,22,21,19,20}

Q4. In a hospital which of 75 from 15 children in a age of 2 years old has been
admited, whose of the 15 has a malnutrition problem. Determine the baby
populations malnutrition problem according to the mean save limitation ratio in
%90.

Q5. The birth weight of a newborn babies’s standart deviation is known 1.6 kg
. In a newborn service 12 babies are choosen by chance and their birth weight
mean’s have been found 3.2 kg.
a) Test the birth weight’s mean according to the difference term of more than
3 kg. (=0.05)
b) Predict the population mean with a safe of %95.

Q6. In a internal medicine service of a hospital, 20 diabetes patient’s blood


sugar’s mean has been calculated 188 and the variance 64. Calculate diabetes
patient’s blood sugar’s real mean with a safe of %99.

Q7. In a surgery service of a hospital, 20 patient’s blood sugar’s mean has been
calculated 168 and the variance 64. Calculate patient’s blood sugar’s real mean
with a safe of %95.

Q8. To commentate the dyspnea in a state hospital datas have been given
below. Find the breath values if it is diffrence from 15 or not ?(=0.01)
S.D={ 14,12,12,15,19,19,18,16,16,17,16}

Q9. In a location, the standart deviation of 5 month old girls weight’s is


known 1.5 kg. 25 girls which is choosen by chance, has been found that their
weight’s mean is 5.5 kg. According to these values find population mean with
safe limitations of %95.

56
Asst.Prof.dr.sufian M.salih 2018

Q10. In a health center, which 1.5 year old of 25 boys’es height mean has
found 80 cm and variance 64. Calculate the true height of the children in safe
limitation of %99.

Q11. Newborns birth weight’s variance is known 2,4 kg. The birth weight’s
mean of 16 baby has been found X= 3,2 by chance. According to these values
predict the population mean with a safe limitation of %95.

Q12. In a Class, 28 students’es height mean has calculatet 80 cm and variance


64. Calculate the true height mean of the students in a safe limitation of %95.

Q13. In a study of 20 patient’s blood sodiom value mean which is being found
1.4 and the variance 0.81. Calculate the patient’s blood sodiom value with a
safe limitation of %99.
Q14. In a study of 20 pregnant’s blood potassium value mean which are being
found 2.6 and the variance 0.64. Calculate the patient’s blood sodiom value
with a safe limitation of %95.
Q15. 500 of 450 and 350 of 200 families had a certain level of knowledge
about hepatitis B which they had been choosen from a different two provinces.
Along to that are the families of the first province has more knowledge about
hepatitis B according to families of the second provinci (=%1). Forecast the
save limitation of %99 and rate for the rate difference
Q16. 30 baby boys height mean has been calculated 150 and the standart
deviation 81.
a) Are the babies weight mean smaller than 160 cm ? (=0.01).
b) Calculate the babies real weight mean with a safe of %95 ?
Q17. The boy children distribution of 60 families who have 3 children has
given below. Could have the families with a 0,1,2 amount child in their family
a equalent child distribution ? Test it with a error rate of =0.05
Number of children (boy) 0 1 2
Number of family member 25 12 23

Q18. It is observed that 150 of 120 products have a fine quality which is
choosen randomly in factory. Predict the products fine quality mean in this
factory with safe of %95 ?
Q19. A researcher wants to investigate the side effects of cigarrettes for a
cancer caught. He observed that a smoker group of 300 people which 45 of
them are caughted cancer. He also observed a non smoker group of 400 people
which 30 of them caughted cancer. According to these datas determine the
cancer caught difference between smokers and non?(=0,05)
57
Asst.Prof.dr.sufian M.salih 2018

Q20. Bir The probation’s weekly day distribution of a health schools students
which are studieng. The datas of them are given below. Are students
distributed equalent to the probation ? (=0,05)
Monday - Tuesday - Wednesday - Thursday - Friday
15 25 25 15 20

Q21. 50 of %36 premature babies has icterus disease in a newborn service of


a hospital. According to these data, is the premature rate different thant %40 or
not test it ?and calculate the premature rate with a safe of %95 ?

Q22. In a hospital’s maternity service’s sentry-go’s weekly distributio for the


staff is given below. Is the staff’s daily sentry-go equalent to the weekly days ?
Has it a consistence to the (1:1:1:1:1) ratio (=0.01)
Monday - Tuesday - Wednesday - Thursday - Friday - Total
10 8 5 8 4 35

Q23. A researcher who wants to compare two medecines anesthetic time


effectivity found these results: x A = 200 sn ve xB =150 sn, SA=25 ve SB=30,
nA=15 nB=26 . According to these values are the two medecines effectivity
time different ? (=0.01)

Q24. In a sample of 200 people that taken by chance whose life span mean are
65 years, is found in the mediterranean region. The life span population’s
variance is 12, find the populations safe limits with a safe of %99.

Q25. Two wild hybridized and the following results were obtained in
Drosophila. Assuming that the wild-type is dominant to the mutant types in
such a hybridization result of Mendelian ratio (3: 1) can be it assumed to be
valid? (=0.05)
Wild Type Mutant Type Toptal
Number of individuals 80 10 90

Q26. 1000 of workers’es hair and eye color has been classified in a workplace
below. Test the independence of the eye color wether to the hair color.
Calculate the independence coefficient via interpreting.
C
E
y

Hair Color
e

r
l

58
Asst.Prof.dr.sufian M.salih 2018

Yellow Brown Black 


Green 40 50 160 250
Blue 50 100 50 200
Brown 60 100 190 350
Black 50 100 50 200
 200 350 450 1000

Q27. In a population of medecine pests the death ratio has been seen %80. A
researcher which is studieng this assertion, has made 5 different field work and
found the results below. Is the medecines efficiency realy accourd %80 ?
(=0.05)

Location Died Of Pests Surviver Of Pests Total


1 75 25 100
2 73 27 100
3 151 49 200
4 230 70 300
5 35 15 50
Total 564 186 750

PART 7

REGRESSION AND CORRELATION

Regression and correlation analysis is used to examine the relationships


between the variables affecting each other. These relationships also often used
for sequential data structures which are rated or continuous variables.

7.1. Correlation analysis


Correlation: is a linear relationship between two variables and methods of
analysis which provides information about the severity direction. These
variables are often used in research studies as a relationships between the
different material properties which are been used. For example, age-length,
age-weight dry substance in a material, protein, pH, etc. are properties in a
correlations; sugar in the blood, hemoglobin, protein, fat, ... .. and so on, the
relations between them, can be also examined by correlation analysis.
Correlation coefficient in a sample will difiened for sample (r) and for
population ().
Correlation coefficient will take the values between -1 ile +1. If it comes near
to +1 there will be a linearly increasing relationship, If it comes near to -1
linearly decreasing relationship, If it comes near to 0 there will be no any
59
Asst.Prof.dr.sufian M.salih 2018

linearly relationship. If there would be found any different values, between of


this interval; the values would be a miscalculated.
r=1 r1 r-1 r0 r=-1

The correlation coefficient is estimated by the following formula.


xi yi
xi yi -
n ( xi - x )( yi - y )
r veya r
 2  xi   2  yi 2 
2
( xi - x ) 2 ( yi - y ) 2
 xi -  yi - 
 n  n 
  
The correlation coefficient of zero (with no relation) is significantly different, a
test can be made for whether it is small or large.
Hypotheses ;
H0:=0
H1: 0; H1: 0; H1: 0
Critical rular value; For the one-way test t c = t  (n -1) ve For the two-way test
t c = t /2(n-1)
r 1 r2
The test statistcs is defiened by calculation of; th  ; Sr 
Sr n2

Sample: Below 6 babies age (month) and weights (kg ) has been given.. Guess
the relationship between fat and weight status, by checking the significance of
the correlation coefficient which is different from zero ?(=0.05)

Weight
Age (x) (y) x*y x2 y2
0 3.5 0.0 0.0 12.3
5 7.0 35.0 25.0 49.0
10 10.0 100.0 100.0 100.0
15 12.0 180.0 225.0 144.0
20 13.0 260.0 400.0 169.0
25 14.0 350.0 625.0 196.0
Total 75 59.5 925.0 1375.0 670.3

60
Asst.Prof.dr.sufian M.salih 2018

75*59.5
925 
r 6  0.967
 75 
2
59.52 
1375   670.3  
 6  6 

Becouse the correlation coefficient nears to +1, we can say that there is a
linearly relationship between age and weight. However, it should be made
more precise to speak of the significance test;

Hypotheses ; H0:  = 0
H1:   0
Critical valu For the two-way test; t c = t /2(n-1) =t0.05/2;5=2.571
The test statistcs; would be find
1 r 2
1  0.967 0.967  0
Sr    0.09; th   10.74
n2 62 0.09

As the test statistics is bigger than the critical rular velue, the H0 hypothesis is
rejected and H1 hypothesis is accepted and it can be said that there is a
important linearly relationship between weight and age(P>0.05).

7.2. Regression Analysis


The relationship between two or more variables are analyzed by means of a
function that is describing the method. It can explained simple as: age-length,
age-weight, dry matter in the protein, etc. which used to examine the
relationship between. The relationship between the two variables; linear /
linear, parabolic / quadratic, cubic, exponential, can be geometric or hyperbolic
functions. In more complex relationships between more than two variables can
be evaluated by regression analysis. In this section introduction to regression
analysis is a reason and becouse of that the most simple way is just to focused
on the linear relationship between two variables.

The main purpose of regression analysis is to provide an estimate of the


unknown values. Growth curves is the most typical example of this. Physician
to evaluate the development of the child from the health center or hospital to
benefit from this curve.
In a regression analysis, one of the variables have to be dependented (the
effected one) The independence variable could be one or more. For example
the relation of age – weight depends on age. There could maybe be some other
variables that effects the weight. But we won’t think the independence
variables that is more than one. The dependent variable would be shown in (y),
the independet one with (x). This linear equation’s parameter would be
defiened as;
like this: yi     xi   i .
Ignoring the error estimates of this equation is given as follows.
61
Asst.Prof.dr.sufian M.salih 2018

yi  a  bxi

  a : is the intercept coefficient. If the independce value is, x  0 y is the


value which it would take. The intercept coefficient could take negative (–) ort
positive(+) values. If negatieve values should taken, the estimation segment
will cross the horizontal axis.
  b : is the regression coefficient. In mathematics this independent
coefficient is discribed the slope of a line which difines a one-unit change in
the dependent variable corresponds to the amount of the increase that will
occur. The regression coefficient could take negative (-) or positive (+) values.
The correlation coefficient and the regression coefficients symbols are the
same. So; if the correlation coefficient is negative, the regression coefficients
will also be negative or opposite.

It is necessary to draw a scatter chart before creating the regression equation.


Scatter chart helps the researchers to choose the appropriate function of the
data.
Lets study on the sample application that age (month) and weights (kg) of
children given below.
Weight
Age (x) (y) x*y x2 y2
0 3,5 0,0 0,0 12,3
5 7,0 35,0 25,0 49,0
10 10,0 100,0 100,0 100,0
15 12,0 180,0 225,0 144,0
20 13,0 260,0 400,0 169,0
25 14,0 350,0 625,0 196,0
Total 75 59,5 925,0 1375,0 670,3

Graph 5. Distribution graph of Age and Weight

62
Asst.Prof.dr.sufian M.salih 2018

This graph shows that according to a linear equation of the data. The following
equation is used to estimate the parameters of the equation.
xi yi
xi yi -
( xi - x )( yi - y ) n
b veya b
2  xi 
( xi - x ) 2 2

xi -
n
The calculated values from the data tables, are placed into the 2. formula and
the parameters are predicted like below.
75*59.5
925 
6 59.5 75
b  0.414 ; y  9.92 ; x   12.5
752 6 6
1375 
6
a  y - bx  9.92  0.414*12.5  4.74
y  4.74  0.414 x This current equation that made, would be defined as placeing
the  or y variables according to certain limits (confidence limits of the
estimate). This is the equation which uses the method of least squares error, for
minimizeing the deviation.For example the weight of a 23 months old baby;
It will be y  4.74  0.414*23  14.26 . But using the same equation for
forecasting a 50 months old baby’s weight would be wrong examination.
Becouse the curves would not increase in a linear direction, a decreaseing
change may eventually accours according to the growing age.

Graph 6. Distribution graph of Age and Weight, prediction line and


equation

7.3. Relations Between Correlation and Regression Coefficients


The relationship among these factors is present investigating the level of
exchange between the variables together.
In a regression analyze, the dependent variable (y)’s regression to the
independent variable (x) is defined with b(yx) . If the defined independet
variable changes its order, the predicted parameters values would also be
changed. Regresyon analizinde bağımlı değişkenin (y) bağımsız değişkene (x)
göre regresyonu b(yx) ile tanımlanmaktadır. Eğer bağımsız değişken olarak
tanımlanan değişkenler yer değiştirirse tahmin edilen parametrelerin değerleri
63
Asst.Prof.dr.sufian M.salih 2018

de değişecektir. Because terms in the denominator of the coefficient b is


positioned accordingly.
In short of term, in this situation x’s regression to y’s is defined b(xy) , and they
have a severance between each other and showned like this: b(yx)  b(xy) . In
correlation analyze, as the independent and dependent variables are not a
concern, the same coefficient would be predicted in the 2 sitiuation.
There is a relation between regression coefficient and correlation coefficient
like:
r = b yx b xy

this. In other words the geometric mean of the reciprocal regression


coefficients, gives the correlation coefficient.

7.4. Determination Coefficient (R²)


The determination coefficient is the defineing of independet variable in term of
with dependent variable. It is also called the consistence coefficient. It is a ratio
and has no units. If it is multiplied with 100 the definations is defined with %.
If the determination coefficient nears to 0, the describeing of the independent
variable would be decrease becouse of the non consistence, but if it nears to +1
consistence is enough and the description could be described fine.
The determination coefficient is showed with R2 and is equal to the correlations
coefficient square. As the given age – weight sample above R2 = r2
=0.9672=0.936 is be. As this value nears to +1 the age – weight relation is
described in a enough rat of (%94).

7.5. Sample Questions


Q1. Define Regression and Correlation ?

Q2. Age and pulse rate on a study of 6 people is as follows.

Age 1 5 10 15 20 25
Pulse 130 110 100 95 86 84

a) Contact the regression model representing the relationship ?


b) Guess the pulse rate of a teenager which is 17 years old ?
c) Interpret the correlatıon coefficient with your guesses ?
d) Calculate the determinatiıon coefficient and interpret it ?

64
Asst.Prof.dr.sufian M.salih 2018

Q3. The regression weight of chest circumference bxy=1.20 and weight of the
chest circumference byx=0.60. What is the correlation coefficient between the
two variables?
Q4. 10 persons weight (y) and length (x) values are as follows.
X 45 55 70 60 50 65 68 70 75 78
Y 160 164 170 165 165 170 175 168 175 177
a) describes the linear relationship between height and weight of the
correlation coefficient (r) to predict and determine the direction?
b) Culculate the determination coefficient with your guesses ?
c) Set a simple linear regression equation by thinking the weight and
height as a function ?
d) Predict somneones weight whose height is 150 cm.

Q5. Total number of a dental filler contained and the age of the patients is
given below. Find the regression equation that gives the relationship between
age and the number of dental fillings.

Dental Fillings( yi
Patients Age, xi xi2 yi2
xi * yi
)
1 9 1 81 1 9
2 13 2 169 4 26
3 15 2 225 4 30
4 16 3 256 9 48
5 19 4 361 16 76
Total 72 12 1092 39 189

Q6. A researcher which is investigateing the releation of weights between


fatcher and son, has found the father’s and his elder son’s weights as below.
Weight of 65 63 67 64 68 62 70 66 68 67 69 71
fathers
Weight of sons 68 66 68 65 69 66 68 65 71 67 68 70

a) Calculate the correlation coefficient of the linear relation between the


father’s and his son’s weight by your predictions. .
b) Calculate the determination coefficient by your predictions.
c) Predict the linear regression equation of father’s and his son’s weights.
d) If the father is 70 kg, predict the kg of his son’s.

65
Asst.Prof.dr.sufian M.salih 2018

Q7. In a experiment of a bread, the fermentation increase per time has been
found like below.

ni Ferment amount/100 (yi) Time (xi) xi2 yi2 xi*yi


1 1 5 1 25 5
2 1,4 15 1,96 225 21
3 2,5 30 6,25 900 75
4 1,9 20 3,61 400 38
5 1,2 18 1,44 324 21,6
Total 8 88 14,26 1874 160,6

a) Predict the linear regression equation ?


b) Predict the relationship between two variables by calculating the
correlation coefficient.

Q8. Someones monthly weight lost distribution who is on a diet has been
given below. According to this;
a-) Find the regression equation in a releation with weight and month.
b-) Find the starting weight and the lost weight after three months.
c-) Find the correlation and the determination coefficiants according by your
predictions.

Months Lost weight


amount
1 1
2 1,5
3 1,6
4 2

66
Asst.Prof.dr.sufian M.salih 2018

Index

PART 1 .............................................................. Error! Bookmark not defined.


INTRODUCTION AND BASIC CONCEPTS . Error! Bookmark not defined.
1.1. Introduction ............................................................................................. 1
1.1.1. Classification of Statistics ................................................................ 2
1.1.2. Some important terms used in Statistics / concepts ......................... 2
1.1.4. Measurement Scales (Scales) ........................................................... 4
1.1.5. Different scales are used depending on the variable feature ............ 5
1.1.6. Conducting research and data collection .......................................... 5
1.2. Sample Questions .................................................................................... 6
PART 2 ............................................................................................................... 7
DESCRIPTIVE STATISTICS .......................... Error! Bookmark not defined.
2.1. Tables ...................................................................................................... 7
2.1.1. Frequency Tables ............................................................................. 7
2.2. Figures and Graphs ................................................................................ 10
2.3. Sample Questions .................................................................................. 12
PART 3 ............................................................................................................. 13
CENTRAL TENDENCY / LOCATION AND CHANGE / DISTRIBUTION
MEASUREMENTS .......................................... Error! Bookmark not defined.
3.1. Merkezi Eğilim (Yer) Measurements .................................................... 13
3.1.1. Arithmetic Mean ( x ) ..................................................................... 13
3.1.2.Weighted Mean ............................................................................... 15
3.1.3. Geometric Mean (GO).................................................................... 16
3.1.4. Median and Mode ........................................................................... 17
3.2. Change (Distribution) Measurements ................................................... 19
3.2.1. Change Width (Range) ................................................................... 19
3.2.2. Variance.......................................................................................... 19
3.2.3. Standart Deviation .......................................................................... 21
3.2.4. Standart Error ................................................................................. 22
3.2.5. Variation Coefficient ...................................................................... 23
3.2.6. Skewness and Kurtosis ................................................................... 24
3.3. Sample Questions .................................................................................. 24
PART 4 .............................................................. Error! Bookmark not defined.
POSSIBILITIES AND THE POPULATION DISTRIBUTION ......... Error!
Bookmark not defined.
4.1. Possibilities ............................................. Error! Bookmark not defined.
4.2. Permutations and Combinations ............................................................ 28
4.3. Population Distribution ......................................................................... 28
4.3.1. Discrete Distributions ..................................................................... 29
4.3.1.1. Binomial Distribution ...................................................................... 29
4.3.1.2. Poisson Distribution ........................................................................ 30
4.3.2. Continuous Distributions ................................................................ 30
4.3.3. Central limit theorem...................................................................... 32
4.4. Sample Questions .................................................................................. 32
PART 5 ............................................................................................................. 35
SAMPLING AND PREDICTION OF SAMPLE SIZEError! Bookmark not
defined.
67
Asst.Prof.dr.sufian M.salih 2018

5.1. Sampling ................................................................................................ 35


5.2. Prediction of the sample size ................................................................. 36
5.2.2. Determining The Sample Size For Population Ratio ..................... 37
5.3. Sample Questions .................................................................................. 38
PART 6 ............................................................................................................. 39
HYPOTHESIS TESTING AND SAVE LIMITSError! Bookmark not
defined.
6.1. Error Types ............................................................................................ 39
6.2. Tests of hypotheses about mean ............................................................ 41
6.2.1. Hypothesis Testing Of An Mean : .................................................. 41
Sample 1: Z test application ..................................................................... 41
6.2.1.1. Means’s save limits (Z distribution) ................................................ 42
Sample 2: t test application ...................................................................... 43
6.2.1.2. Means’s save limits (t distribution) ................................................. 43
6.2.2. Comparing the difference between two means .............................. 44
6.2.2.1. Group Comparison .......................................................................... 44
Sample 1: Z test application ..................................................................... 44
6.2.2.2. Save limits of the Mean difference .................................................. 45
Sample 2: t test application ..................................................................... 46
6.2.2.3. Test for dependent groups (Pairing Test)Error! Bookmark not
defined.
6.2.2.4. Save limits of the mean differences ................................................ 49
6.3. Tests of a hypotheses regarding to rates ................................................ 49
6.3.1. Hypothesis test of a rate ................................................................. 49
6.3.1.1. Save limits of a ratio ........................................................................ 50
6.3.2. Hypothesis Test of Two Rate Difference ....................................... 51
6.3.2.1. Save limits of the difference between two rate ............................... 52
6.4. Chi-squared (2) Distribution ................................................................ 52
6.4.1. Chi-squared Compliance Test ........................................................ 52
6.4.2. Chi-squared Independence Test ..................................................... 53
6.5. Sample Questions .................................................................................. 55
PART 7 ............................................................................................................. 59
REGRESSION AND CORRELATION .......... Error! Bookmark not defined.
7.1. Correlation analysis ............................................................................... 59
7.2. Regression Analysis .............................................................................. 61
7.3. Relations Between Correlation and Regression Coefficients ................ 63
7.4. Determination Coefficient(R²) .............................................................. 64
7.5. Sample Questions .................................................................................. 64

68

You might also like