Lecture 03-Handling missing values in RCBD

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

HANDLING MISSING VALUES IN RCBD

STA 514 2.0: Advanced Design and Analysis of Experiments


Lecture 03

Dr. Prasansha Liyanaarachchi

Department of Statistics
University of Sri Jayewardenepura
20 January 2024
REVISIT: Introduction to RCBD
• Randomized complete block designs differ from the completely
randomized designs in that the experimental units are grouped
into blocks according to known or suspected variation which is
isolated by the blocks.

• The similar experimental units are grouped into blocks or


replicates so that the observed differences are largely due to
true differences between treatments.

• Units within the block are homogeneous and between blocks


are heterogeneous.

• Treatments are assigned at random to the subjects in the blocks


once in each block.
20/01/2024 HANDLING MISSING VALUES IN RCBD 2
• The “complete block” part of the name indicates that each
treatment combination is applied in all blocks.

• If a block misses one or more treatment combinations, the


experiment would be called Randomized Incomplete Block
Design.

• The design would still be called randomized because the


treatment combinations are randomly assigned to the
experimental units within the blocks.

20/01/2024 HANDLING MISSING VALUES IN RCBD 3


Figure 1: The randomized complete block design.

20/01/2024 HANDLING MISSING VALUES IN RCBD 4


Notation
Block
Treatment (Level) 1 2 … 𝑏 Row Row Averages
Totals
1 𝑦!! 𝑦!" … 𝑦!# 𝑦!. 𝑦&!.
2 𝑦"! 𝑦"" … 𝑦"# 𝑦". 𝑦&".
.
.
.
a 𝑦%! 𝑦%" … 𝑦%# 𝑦%. 𝑦&%.
Column Total 𝑦.! 𝑦." 𝑦.#
Column Averages 𝑦&.! 𝑦&." 𝑦&.#

20/01/2024 HANDLING MISSING VALUES IN RCBD 5


Notation (Cont.)
𝑎 ∶Number of treatments.

𝑏: Number of blocks.

𝑦!" :observation taken under treatment 𝑖 and block 𝑗 .

𝑦!. : Total of the observations under treatment 𝑖.

𝑦." : total of all observations taken under block 𝑗.

𝑦.. : grand total of all observations.

𝑁 = 𝑎𝑏 is the total number of observations.

20/01/2024 HANDLING MISSING VALUES IN RCBD 6


Model for RCBD
𝑖 = 1,2, … , 𝑎
𝑦!" = 𝜇 + 𝜏! + 𝛽" + 𝜖!" /
𝑗 = 1,2, … , 𝑏

where 𝑦!" is the 𝑖𝑗$% observation, 𝜇 is the overall mean, 𝜏! is the 𝑖 $% treatment
effect 𝛽" is the 𝑗$% blocking effect and 𝜖!" is the random error.

The model errors are assumed to be normally and independently distributed


random variables with mean zero and constant variance 𝜎 & .

𝜖!" ~𝑁𝐼𝐷(0, 𝜎 & )

For the effects model there is a constraint.

) *
< 𝜏! = < 𝛽! = 0
!'( "'(

20/01/2024 HANDLING MISSING VALUES IN RCBD 7


Hypotheses
• Main (for primary interest factor)

𝐻': 𝜏( = 𝜏) = ⋯ = 𝜏* = 0
𝐻(: 𝜏+ ≠ 0 for at least one 𝑖

• Sub (for nuisance factor)

𝐻': 𝛽( = 𝛽) = ⋯ = 𝛽, = 0
𝐻(: 𝛽- ≠ 0 for at least one j

20/01/2024 HANDLING MISSING VALUES IN RCBD 8


Analysis of Variance (ANOVA)
• ANOVA is derived from decomposing the total sum of squares
(partitioning of total variability into its components).

𝑉𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦!"#$% = 𝑉𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦!&# + 𝑉𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦'%( + 𝑉𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦)&&"&

Source of Sum of DF Mean F


Variation Squares Square (MS)
(SS)
Treatment 𝑆𝑆&'( 𝑎−1 𝑀𝑆&'( 𝑀𝑆&'(
𝐹) =
𝑀𝑆*
Block 𝑆𝑆#+,-. 𝑏−1 𝑀𝑆#+,-.

Error 𝑆𝑆* (𝑎 − 1) 𝑏 − 1 𝑀𝑆*

Total 𝑆𝑆& 𝑎𝑏 − 1

20/01/2024 HANDLING MISSING VALUES IN RCBD 9


Missing Values in RCBD
• Missing values may happen because of carelessness or error or
for reasons beyond our control, such as unavoidable damage to
an experimental unit.

• A missing observation introduces a new problem into the analysis


because treatments are no longer orthogonal to blocks; that is,
every treatment does not occur in every block.

• There are two general approaches to the missing value problem.

vApproximate Analysis.
vExact Analysis.

20/01/2024 HANDLING MISSING VALUES IN RCBD 10


Approximate Analysis
• Suppose the observation 𝑦+- for treatment 𝑖 in block 𝑗 is missing.
Denote the missing observation by 𝑥. This can be estimated
providing minimum contribution to the error sum of squares.

* ,
)
𝑆𝑆6 = 9 9 𝑦+- − 𝑦;+. − 𝑦;.- + 𝑦;..
+7( -7(
• This is equivalent to

20/01/2024 HANDLING MISSING VALUES IN RCBD 11


where 𝑅 includes all terms not involving 𝑥, and 𝑦..9 , 𝑦+.9 , 𝑦.-9 are the
grand total, 𝑖 :; treatment total and 𝑗:; block total with one
missing observation.

• 𝑥 can be estimated from 𝑑𝑆𝑆𝐸 /𝑑𝑥 = 0

𝑎𝑦+.9 + 𝑏𝑦.-9 − 𝑦..9


𝑥= (1)
𝑎−1 𝑏−1

• The ANOVA can be performed using the estimated missing


observation and reducing the error degree of freedom by 1.

20/01/2024 HANDLING MISSING VALUES IN RCBD 12


• If several missing values are present then above procedure can be
applied writing the 𝑆𝑆6 as a function of missing values and
differentiating with respect to each missing value, equating the
result to zero, and solving the resulting equations.

• Alternatively, one can use equation (1) above to estimate missing


values as given below. Procedure for two missing values:

vArbitrarily estimate the first missing value and use equation


(1) to estimate the second missing value.

vNow use equation (1) to estimate the first missing value.


Continue the process until convergence is obtained.

20/01/2024 HANDLING MISSING VALUES IN RCBD 13


Example 1: Chemical Effect
A chemist wishes to test the effect of four chemical agents on the
strength of a particular type of cloth. Because there might be
variability from one bolt to another, the chemist decides to use a
randomized block design, with the bolts of cloth considered as
blocks. Five bolts are selected and applied all chemicals in random
order to each bolt. Unfortunately, the observation for chemical
type 2 and bolt 3 is missing. Estimating the missing observation
analyze the data and draw appropriate conclusions.

20/01/2024 HANDLING MISSING VALUES IN RCBD 14


First, we’ll Consider Non of the Data is Missing
A chemist wishes to test the effect of four chemical agents on the
strength of a particular type of cloth. Because there might be
variability from one bolt to another, the chemist decides to use a
randomized block design, with the bolts of cloth considered as
blocks. Five bolts are selected and applied all chemicals in random
order to each bolt. Unfortunately, the observation for chemical
type 2 and bolt 3 is missing. Estimating the missing observation
analyze the data and draw appropriate conclusions.

75

20/01/2024 HANDLING MISSING VALUES IN RCBD 15


𝑖 = 1,2,3,4
𝑦!" = 𝜇 + 𝜏! + 𝛽" + 𝜖!" /
𝑗 = 1,2,3,4,5

where,
𝑦!" is the strength of 𝑖 $% Chemical on 𝑗$% bolt,
𝜇 is the overall mean,
𝜏! is the 𝑖 $% chemical effect
𝛽" is the 𝑗$% bolt effect and
𝜖!" is the random error.
Assumption and Constraints

𝜖!" ~𝑁𝐼𝐷(0, 𝜎 & )

+ ,
< 𝜏! = < 𝛽! = 0
!'( "'(

20/01/2024 HANDLING MISSING VALUES IN RCBD 16


By considering the qq plot and
the p value of the normality test
(𝐻- : 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑠 𝑎𝑟𝑒 𝑛𝑜𝑟𝑚𝑎𝑙𝑙𝑦
𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑑 )we can conclude
that the normality assumption is
not seriously violated.

We can see a random pattern (no


unusual pattern) in residual vs
fitted graph.
Ø Independent assumption is not
violated.
Ø Constant variance assumption
is not violated.
Ø Zero mean assumption is not
violated.
20/01/2024 HANDLING MISSING VALUES IN RCBD 17
𝐻* : 𝜏+ = 𝜏, = 𝜏- = 𝜏. = 0
𝐻+ : 𝜏/ ≠ 0 for at least one 𝑖, 𝑖 = 1,2,3,4

The p-value > 0.05. Therefore 𝐻* is not rejected at 5% significance level.


We have enough evidence to conclude that the chemical effect is not
significant on the strength of a particular type of cloth.
20/01/2024 HANDLING MISSING VALUES IN RCBD 18
Now, we’ll Consider One Data is Missing
A chemist wishes to test the effect of four chemical agents on the
strength of a particular type of cloth. Because there might be
variability from one bolt to another, the chemist decides to use a
randomized block design, with the bolts of cloth considered as
blocks. Five bolts are selected and applied all chemicals in random
order to each bolt. Unfortunately, the observation for chemical
type 2 and bolt 3 is missing. Estimating the missing observation
analyze the data and draw appropriate conclusions.

20/01/2024 HANDLING MISSING VALUES IN RCBD 19


𝑖 = 1,2,3,4
𝑦!" = 𝜇 + 𝜏! + 𝛽" + 𝜖!" /
𝑗 = 1,2,3,4,5

where,
𝑦!" is the strength of 𝑖 $% Chemical on 𝑗$% bolt,
𝜇 is the overall mean,
𝜏! is the 𝑖 $% chemical effect
𝛽" is the 𝑗$% bolt effect and
𝜖!" is the random error.
Assumption and Constraints

𝜖!" ~𝑁𝐼𝐷(0, 𝜎 & )

+ ,
< 𝜏! = < 𝛽! = 0
!'( "'(

20/01/2024 HANDLING MISSING VALUES IN RCBD 20


By considering the qq plot and
the p value of the normality test
(𝐻- : 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑠 𝑎𝑟𝑒 𝑛𝑜𝑟𝑚𝑎𝑙𝑙𝑦
𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑑 )we can conclude
that the normality assumption is
marginally satisfied.

We can see a random pattern (no


unusual pattern) in residual vs
fitted graph.
Ø Independent assumption is not
violated.
Ø Constant variance assumption
is not violated.
Ø Zero mean assumption is not
violated.
20/01/2024 HANDLING MISSING VALUES IN RCBD 21
𝑦!. =
𝑦".0 =
𝑦2. =
𝑦3. =
𝑦.! = 𝑦." = 𝑦.20 = 𝑦.3 = 𝑦.4 = 𝑦..0 =

𝑎𝑦/.0 + 𝑏𝑦.10 − 𝑦..0


𝑥=
𝑎−1 𝑏−1

𝑎𝑦".0 + 𝑏𝑦.20 − 𝑦..0


=
𝑎−1 𝑏−1

4(282) + 5(227) − 1360


= = 75.25
4−1 5−1

20/01/2024 HANDLING MISSING VALUES IN RCBD 22


𝐻* : 𝜏+ = 𝜏, = 𝜏- = 𝜏. = 0
𝐻+ : 𝜏/ ≠ 0 for at least one 𝑖, 𝑖 = 1,2,3,4

The p-value > 0.05. Therefore 𝐻* is not rejected at 5% significance level.


We have enough evidence to conclude that the chemical effect is not
significant on the strength of a particular type of cloth.
20/01/2024 HANDLING MISSING VALUES IN RCBD 23
Example 2: Mail Marketing Example
A consumer products company relies on direct mail marketing pieces as a
major component of its advertising campaigns. The company has three
different designs for a new brochure and wants to evaluate their
effectiveness, as there are substantial differences in costs between the
three designs. The company decides to test the three designs in four
different regions of the country. Since there are known regional differences
in the customer base, regions are considered as blocks. The rate of
responses to each mailing is as follows. Suppose the cost under Design 2
and Region SE is missing. Estimating the missing observation analyze the
data and draw appropriate conclusions.
Region
Design NE (1) NW (2) SE (3) SW (4)
1 2.5 3.5 2.2 3.8
2 4.0 5.3 x 5.8
3 2.8 3.4 2.0 3.1
20/01/2024 HANDLING MISSING VALUES IN RCBD 24

You might also like