Statistical Analysis

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 168

Part 2: STATISTICAL

ANALYSIS
References
 Core textbooks
Douglas A. Skoog, Donald M. West, F. James Holler and
Stanley R. Crouch, Fundamentals of Analytical Chemistry,
Ninth Edition.
 Background textbooks
1. Analytical Chemistry 2.1 by David Harvey (summer
2016)
2. Gary D. Christian, Purnendu K. (Sandy) Dasgupta,
Kevin A. Schug, Analytical Chemistry, 7th Edition.
Contents
• Introduction
• Quantitative analysis
• Step involved in typical quantitative analysis
 Select a method
 Sampling
 Sample preparation
 Performing the measurement
 Processing of data
 Estimate the reliability of the results liquid/solid extraction

• Statistics and data handling


• Significant figures
• Propagation of uncertainty
• Error in chemical analysis
• Accuracy and precision
• Types of errors in experimental data

• Statistical data treatment and evaluation


• Confidence intervals
• Test of significance
• Correlation and regression analysis
Recall
Qualitative Analysis (identification)

Provides information about the


elements and compounds in a sample.
“Identifying what is in unknown”
Quantitative Analysis

provides information about the amount


of each substance in a sample .
“Identifying how much is present”
Recall
Quantitative Analysis

 A typical quantitative analysis


involves a sequence of steps.
 In some instances one or more of
these steps may be left out
depending on the type of sample and
a number of factors influencing some
of the choices considered in a step.
Quantitative Analysis
Steps in a typical quantitative
analysis
 Select a method
 Sampling
 Sample preparation
 Performing the measurement
 Processing of data
 Estimate the reliability of the results
1. Select a method
 This is the most essential and difficult step, and
requires experience as well as intuition.
 The following need to be considered in the selection
process:
• The nature of the analyte (inorganic, organic or
biological).
• Expected concentration of the analyte since
different methods have different detection limits.
• The level of accuracy required. High reliability
nearly always requires time and expensive
equipment.
• Economic factors such as time and money are
therefore also crucial.
Sampling
Sampling
 This process can be the most critical aspect of an
analysis.
 The significance and accuracy of measurements can be
limited by the sampling process.
 Unless sampling is done properly, it becomes the weak
link in the chain of the analysis.
 For instance, a life could sometimes depend on the
proper handling of a blood sample during and after
sampling.
 If the analyst is given a sample and does not actively
participate in the sampling process, then the results
obtained can only be attributed to the sample “as it was
received.”
Sampling
 By definition Sampling is the process of obtaining a
representative and homogenous sample from bulk
material.
 A sample that is representative of the whole bulk material
is called the gross sample.
 Its composition should closely reflect the composition of the
bulk.
 Its size may vary from a few grams or less to several
pounds, depending on the type of bulk material.
 Once a gross sample is obtained, it may have to be reduced
to a sufficiently small size to be handled. This is called
laboratory sample.
 Once the sample is obtained, an aliquot, or portion, of it will
be analysed. This aliquot is called the analysis sample.

Steps involved in sampling bulk
material
Identify the population
from which the sample
is to be obtained.
Collect a gross sample
that is truly
representative of the
population being
sampled.
Reduce the gross
sample to a laboratory
sample that is suitable
for analysis.
An aliquot, or portion, of the sample
will be analysed (usually be from a
few millilitres to a fraction of a drop (a
few microliters) in quantity)
Sampling
ow do we obtaining gross samples of solids
 Inhomogeneity of the material,
variation in particle size, and
variation within the particle make
sampling of solids more difficult than
other materials.
 The easiest but usually most
unreliable way to sample a material
is the grab sample, which is one
sample taken at random and
assumed to be representative.

 Themost
For grab reliable
sample results,
will be satisfactory
it is best to take 1/50 to 1/100
only
of theif total
the material
bulk for from whichsample,
the gross it is unless the
Sampling
ow do we obtaining gross samples of solids
 The easiest and most reliable time to sample large
bodies of solid materials is while they are being
moved.
 In this way any portion of the bulk material can usually
be exposed for sampling.
 Thus, a systematic sampling can be performed to
obtain aliquots representing all portions of the bulk.
 e.g. In the loading or unloading of bags of cement, a
representative sample can be obtained by
taking every fiftieth or so bag or by taking a sample
from each bag.
In the moving of grain by wheelbarrow,
Sampling
ow do we obtaining gross samples of liquid
 Liquid samples tend to be homogeneous and
representative samples are much easier to get.
 If the material is indeed homogeneous, a simple grab
(single random) sample will suffice.
 If liquid samples are not homogeneous, and if they are
small enough, they can be shaken and sampled
immediately.
E.g. There may be particles in the liquid that have tended
to settle.
 Large bodies of liquids are best sampled after a transfer or
if in a pipe, after passing through a pump when they have
undergone thorough mixing.

Sampling
Thief sampler
 It is used for the sampling of large
stationary liquids.
 The separate portions of liquids can
be analyzed individually and results
combined.
 Or preferably, Portions can be
combined into one gross sample
and replicate analysis is performed.
 Heterogeneous large body of liquid
is sampled by collecting several
individual samples. E.g.,
Analysing water contaminants in lake
water, one has to collect water sample
from each side of the lake
Sampling
Sampling of biological fluids
 The composition of some samples vary on when it is
taken.
 This is the case for urine samples, therefore 24-h urine sample
collections
are generally more representative than a single “spot sample”.
 The timing of sampling of biological fluids is very
important.
 E.g. The composition of blood varies considerably before and
after meals, sampling after 12h fasting is often
recommended..
 In case of blood, a grab sample is OK. Syringes are used
to collect blood samples.
 Preservatives such as NaF for glucose preservation and
Sampling
How do we obtaining gross samples of Gas?
 The usual method of sampling gases involves sampling
into an evacuated container (stainless steel canister or an
inert polyvinyl fluoride (Tedlar) bag is commonly used).
 The sample may be collected rapidly (a grab sample) or
over a long period of time, using a small orifice to slowly
fill the bag.
 To collect a breath sample, for example, the subject could
blow into an evacuated bag or blow up a mylar balloon.
 Auto exhaust could be collected in a large evacuated
plastic bag.
 O2, CO2 dissolved in liquid (e.g., blood) is considered as
liquid sample.
 In active sampling procedures, workplace air is drawn
Sample Storage and
Preservation
 There is a time gap between when the sample is taken and
the actual analysis is being carried out.
 Without preservation, a solid sample may undergo a
change in composition due to the loss of volatile material,
biodegradation, or chemical reactivity (particularly redox
reactions).
 Storing samples at lower temperatures makes them less
prone to biodegradation and to the loss of volatile material
but fracturing of solids and phase separations may present
problems.
 To minimize the loss of volatile compounds, the sample
container is filled completely, eliminating a headspace
where gases collect.
 Samples that have not been exposed to O2 particularly are
Sample preparation
 After a sample has been collected, a solution of the
analyte must usually be prepared before the
analysis can be continued.
 Drying of the sample may be required, and it must
be weighed or the volume measured.
 If the sample is already a solution (e.g., serum,
urine, or water), then extraction, precipitation, or
concentration of the analyte
may be in order, and this may also be true with
other samples.
 In this section we describe common means for
preparing solutions of inorganic and organic
Sample preparation
Drying the sample
 Solid samples will usually contain variable amounts of
adsorbed water.
 With inorganic materials, the sample will generally be
dried before weighing.
 This is accomplished by placing it in a drying oven at
105 to 110 ◦C for 1 or 2 h (Water entrapped in the
crystals may require higher temp.)
 Decomposition or side reactions of the sample must be
considered during drying.
 Thermally unstable samples can be dried in a
desiccator; using a vacuum desiccator will hasten the
drying process.
Sample preparation
Drying the sample

A desiccator. The solid in A freeze dryer the sample


the bottom of the must be frozen prior to
desiccator is the desiccant, placing it in a vessel
Sample preparation

Sample dissolution
 Before the analyte can be measured, some sort of
sample manipulation is generally necessary to get the
analyte into solution or,
 For biological samples, to get rid of interfering
substances, such as proteins.
 Complex samples can be subjected to centrifugal
filtration prior to analysis
e.g., perchlorate and iodide in milk have been
chromatographically
determined after centrifugal filtration.
 There are two types of sample preparation:
 Those that totally destroy the sample matrix and
Sample preparation
e preparation involving destruction of sample
 This type can generally be used only when the
analyte is inorganic or can be converted to an
inorganic derivative for measurement.
e.g., -Kjeldahl analysis, in which organic nitrogen is
converted to ammonium ion.
-Iodine in food is similarly determined after
total oxidative digestion to HIO3.
 Destructive digestion is typically used if trace
element analysis is conducted in a largely organic
matrix.
Sample preparation
Dissolving inorganic solids
 Strong mineral acids or bases are good solvents for
many inorganics.
Sample preparation
Dissolving inorganic solids
 Inorganic samples that resist decomposition by
digesting with acids or bases often are brought into
solution by fusing with a large excess of an alkali metal
salt, called a flux.
 The sample is mixed with the flux in
a sample-to-flux ratio of about 1 to
10 or 20, and the combination is
heated in an appropriate crucible
until the flux becomes molten and
allowed to cool slowly to room
temperature.
 The resulting cooled solid usually Crucible
dissolves readily in distilled water or
Sample preparation
Dissolving inorganic solids
 Table below summarizes several common fluxes and their
uses.
 Fusion works when other methods of decomposition do not
because of the high temperature and the flux’s high
concentration in the molten liquid.
 Disadvantages include contamination from the flux and the
crucible, and the loss of volatile materials.
Sample preparation
Destruction of organic materials for inorganic
analysis-Burning or acid oxidation
 Animal and plant tissue, biological fluids, and organic
compounds are usually decomposed by:
 Wet digestion with a boiling oxidizing acid or

mixture of acids, or
 Dry ashing at a high temperature (400 to 700 ◦C) in a

 In muffle furnace.the acids oxidize


wet digestion,
organic matter to CO2, H2O, and other
volatile products, leaving behind salts
or acids of the inorganic constituents.
 In dry ashing, atmospheric oxygen
serves as the oxidant; that is, the Example of a
muffle furnace.
Sample preparation

Performed at high
temperature

Atmospheric O2 DRY
serves as anti-
oxidant ASHING

Organic matter
burnt off, leaving
inorganic residue
Sample preparation

Dry ashing
 Performed by weighed sample in porcelain crucible,
heated in muffle furnace then the residue is
dissolve in suitable acid.
 Typical ashing temperatures are 450 to 550 °C.
Magnesium nitrate is commonly used as an ashing
aid (oxidizing material).
 Charring the sample prior to muffling is preferred.
Charring is accomplished using an open flame.
 Care must be taken to ensure that non of the
volatile elements (Hg, Arsenic, Pb) from escaping
Sample preparation

Dry ashing
What happens if sample are liquids and
wet tissues ?
 The sample are dried on a stream bath or by gentle
heat before they are placed in a muffle furnace.
 The heat from the furnace should be applied
gradually up to full temperature to prevent rapid
combustion and foaming.
 After dry ashing is complete, the residue is usually
leached from the vessel with 1 or 2 mL
concentrated or 6 M HCl and transfer to a flask or
Sample preparation

WET ASHING
( Digestion)

Usually use Performed in Boiled of the


the kjedahl flask acids, white
combination of fumes evolve
acid
Eg, sulphuric
acid and nitric
Wet ashing (Digestion)
 A method for the decomposition of an organic
material, such as resins or fibers, into an ash by
treatment with a boiling oxidizing acid or mixture
of acids.
 Mixture of HNO3 and HCl is often used (aqua regia)
 A small amount (5 mL) of H2SO4 is used with larger
volumes of HNO3 (20 to 30 mL).
 Usually performed in a Kjeldahl flask.
 The acids oxidize organic matter to CO2,
H2O and other volatile products,
which are driven off,
Sample preparation
Wet ashing (Digestion)
 The following reagents can also be used in wet
digestion
• A mixture of hydrochloric acid, nitric acid or
aqua regia (3:1 HCl:HNO3) dissolve many
inorganic substances.
• HF acid decompose silicates.
• Perchloric acid is used to break up organic
complexes.
Microwave-assisted
dissolution
 In some cases the dissolution of sample
can be done by using microwave oven
to accelerate the dissolution process (at
microwaves T=100 – 250oC).
MICROWAVE  The sample is sealed in specially
designed microwave digestion vessel
with a mixture of appropriate acids.
 Microwave ovens can be used for rapid
and efficient drying and acid
decomposition of samples.
 Advantages of microwave digestions
include reduction in times from hours to
minutes and low blank levels due to
reduced amounts of reagents required.
Sample preparation
Partial destruction or non-destruction of
sample matrix
 This method is used when the substance to be
determined is organic in nature.
 For the determination of metallic elements, it is also
sometimes unnecessary to destroy the molecular
structure of the sample, particularly with biological fluids.
 Constituents of solid materials such as
soils can sometimes be extracted by
an appropriate reagent.
 Thorough grinding, mixing, and refluxing
are necessary to extract the analyte.
 Many trace metals can be extracted from
soils with 1M ammonium chloride
or acetic acid solution.
Sample preparation
Partial destruction or non-destruction of
sample matrix
 This method is used when the substance to be
determined is organic in nature.
 For the determination of metallic elements, it is also
sometimes unnecessary to destroy the molecular
structure of the sample, particularly with biological fluids.
 Constituents of solid materials such as
soils can sometimes be extracted by
an appropriate reagent.
 Thorough grinding, mixing, and refluxing
are necessary to extract the analyte.
 Many trace metals can be extracted from
soils with 1M ammonium chloride
or acetic acid solution.
Sample preparation

Eliminating Interferences
 Interferences are substances that prevent direct
measurement of the analyte and must be removed.
 The interferences removal may included separation
steps:
i) Precipitation
ii) Chromatography
iii) Distillation
iv) Dialysis
v) Extraction into an immiscible solvent
Sample preparation

Protein-free filtrates
 Proteins in biological fluids interfere with many
analyses and must be removed non-destructively.
 Several reagents will precipitate (coagulate) proteins.
 Trichloroacetic acid (TCA), tungstic acid (sodium
tungstate plus sulfuric acid), and barium hydroxide plus
zinc sulfate (a neutral mixture) are some of the
common ones.
 A measured volume of sample (e.g., serum) is usually
treated with a measured volume of reagent.
 Following precipitation of the protein (approximately 10
min), the sample is filtered through dry filter paper.
Sample preparation

Defining replicate samples


 Replicate samples are always performed unless the
quantity of the analyte, expense or other factors
prohibit.
 Replicate samples are portion of a material of
approximately the same size that is carried through
an analytical procedure at the same time and the
same way.
 Obtaining replicate data on samples improves the
quality of the results and provides a measure of
their reliability.
Performing the measurement
 The property to be
measured must vary in
a known and
reproducible way with
the concentration of the
analyte.
 Calibration is the most
important step in most
analyses and standards
are used in this step.
Processing of data
 Computations on raw
experimental data
collected from the
measurement step are
done.
 Usually, the
characteristics of the
measurement,
stoichiometry of the
analytical reaction, and
other, determine the
computations needed.
Estimate the reliability of the
results
 The experimenter must provide some
measure of the uncertainties associated with
the computed results if the data are to have
any value.
 At least, the standard deviation of a
measurement must be presented (hence at
least three measurements of the same
quantity must be made).
 Statistical theories can, when necessary, be
used to estimate the reliability of the results.
STATISTICS AND
DATA HANDLING
IN ANALYTICAL
CHEMISTRY
Recall
 Significant figures
The minimum number of digits needed to write a
given value in scientific notation without loss of
accuracy.
Examples:
 9.25 x104
 9.250 x 104
 9.2500 x 104
Significant figures in numerical
Computations

Determining the appropriate number of


significant figures in the result of an
arithmetic combination of two or more
numbers requires great care.”
Significant figures in numerical
computations
 Sums and Differences
The result should have the same number of decimal pla
the number with the smallest number of decimal places

3.4 + 0.020 + 7.31


= 10.730 {round to 10.7)
= 10.7 (rounded)

 Products and Quotients


- Answer should be rounded so that it contains
the same number of significant digits as the
original number with the smallest number of
significant digits. Unfortunately, this procedure
sometimes leads to incorrect rounding.
Significant figures in numerical
omputations
garithms and Antilogarithms
n = 10a means that log n = a
a logarithm of a number, keep as many digits to the righ
decimal point as there are significant figures in the
ginal
mber.
log 339 = 2.530 The number of SF in the
mantissa should equal the
characteristic mantissa number of SF in the
original number.
Significant figures in numerical
omputations
ogarithms and Antilogarithms
n = 10a means that log n = a
n an antilogarithm of a number, keep as many digits as t
are digits to the right of the decimal point in the original
number.

The number of SF in
antilog(-3.42) = 10 -3.42
= 3.8
I the antilogarithm
x 10-42 digits 2 shoul equal the
digits
number of digits in
the mantissa.
Propagation of uncertainty
 Absolute Uncertainty
presses the margin of uncertainty associated with
measurement.
example, if the burette on the right has an absolute
certainty of ± 0.02 mL and when the reading is
0.25 mL, the true value could be anywhere in
he
ange 30.23 to 30.27 mL
Propagation of Uncertainty
 Relative Uncertainty
ompares the size of the absolute uncertainty with
he size of its associated measurement.
Absolute uncertainty
Relative Uncertainty(RU) =
Magnitude of measurement
The relative uncertainty of a burette reading of
30.25 ± 0.02mL is
0.02 mL
RU =30.25 mL =
0.0007
%RU = 100 x RU= 0.0007 x 100
= 0.07 %
Propagation of Uncertainty
Addition and Subtraction
1.76 (±0.03) e1
+ 1.89 (±0.02) e2
- 0.59 (±0.02) e3
3.06 (±e4)

Uncertainty in
addition and 𝑒 4 =√ 𝑒 1 + 𝑒2 +𝑒 3
2 2 2

subtraction
= 0.041

%R = x 100 = 1.3% 3.06 (±0.04)


(Absolute Uncertainty)
3.06 (±1%)
(Relative Uncertainty)
Propagation of Uncertainty
Multiplication and Division
- First convert all uncertainties to percent relative
uncertainties. Then calculate the error of the
product as follows:

% 𝑒 = √ %𝑒 +%𝑒 +% 𝑒
Uncertainty
2 2 2
in
multiplication 4 1 2 3
and division
- Example:
= 5.64(
Absolute Uncertainty
= 5.64( 5.64 (±0.22)
Relative Uncertainty
5.64 (±4%)
= 4%
Sample
problem
Calculate the molar concentration of 8.45 (±0.473%) m
0.2517 (±1.82%) g/mL ammonia solution that
was
(Ans.to
diluted 0.250 (±0.005)
0.5000 M) L.
(±0.0002)
. Consider the function pH= -log[H+], where [H+] is the
molarity of H+. For pH = 5.21 ± 0.03, find [H+] and its
uncertainty.
(Ans. 6.2 (±0.4) x 10-6)
Errors in Chemical
Analysis
Errors in Chemical
Analysis
Errors can sometimes be calamitous, as this picture of
the famous train accident at Montparnasse station in
Paris illustrates. On October 22,1895, a train from
Granville, France, crashed through the platform and
the station wall because the brakes failed. The engine
fell thirty feet into the street below killing a woman.
Fortunately, no one on the train was seriously hurt,
although the passengers were badly shaken. The
story of the train derailment was featured in the
children's story ‘‘the Invention of Hugo Cabret’’ by
Brian Selznick (2007) and part of Hugo’s nightmare in
the movie Hugo (2011), winner of 5 academy awards
in 2012.
Errors in chemical analyses are seldom this dramatic,
but they may have equally serious effects as
described in this chapter. Among other applications,
analytical results are often used in the diagnosis of
disease, in the assessment of hazardous wastes and
pollution, in the solving of major crimes, and in the
quality control of industrial products. Errors in these
results can have serious personal and societal effects.
This chapter considers the various types of errors
encountered in chemical analyses and the methods
we can use to detect them.
Errors in Chemical
Analysis
 Every measurement has some degree of
uncertainty.
 Measurement uncertainties can never be
completely eliminated, so the true value for any
quantity is generally unknown.
 The probable magnitude of the error in a
measurement can be evaluated, however.
 It is then possible to define limits within which the
Errors in Chemical Analysis

 The reliability of that result can be assessed in


different ways:
• Compared with a standard reference material
or certified reference material.
• Some measurements are available in the
literature.
• Statistical tests
 None of these options is perfect, so in the end an
Errors in Chemical Analysis

Some important terms


 To improve the reliability and to obtain
information about the variability of results,
two to five portions (replicates) of a sample
are usually carried through an entire analytical
procedure.
Replicates are samples of about the same size that
are carried through an analysis in exactly the same
way
Some important terms

 Individual results from a set of measurements are seldom


the same, so we usually consider the “best” estimate to be
the central value for the set.
Results from six replicate
determinations of iron in
aqueous samples of a standard
solution containing 20.0 ppm
iron(III).

 Analysis of replicates can be justified in two ways:


• The central value of a set should be more reliable than
any of the individual results. The mean or the median is
used as the central value.
• Analysis of the variation in the data allows us to
estimate the uncertainty associated with the central
Some important terms
The Mean
 The most widely used measure of central value.
 Also known as the arithmetic mean or the average
 It is obtained by dividing the sum of replicate
measurements by the number of measurements in the set
where xi represents the individual values
of x making up the set of N replicate
measurements.

The median
 Middle result when replicate date are arranged in increasing
or decreasing order.
• For odd number of results, locate the middle
• For even number of results, average value of middle
Some important terms

Precision
 Precision describes the reproducibility of measurements
 In other words, the closeness of results that have been
obtained in exactly the same way.
 Generally, the precision of a measurement is readily
determined by simply repeating the measurement on
replicate samples.
 Three terms are widely used to describe the precision of a
set of replicate data: standard deviation, variance, and
coefficient of variation.
 These three are functions of how much an individual result
(xi) differs from the mean, called the deviation from the
mean di.
Some important terms
Standard deviation
 The standard deviation (s) describes the spread of
individual values about their mean, and is given as

Where Xi is one of n individual


x data set, and is the
values in the
data set’s mean value.

 Frequently, we report the relative standard deviation, sr,


instead of the absolute standard deviation.

 The percent relative standard deviation, % sr, is sr ×100.


Problem
 Report the standard deviation, the relative standard
deviation, and the percent relative standard deviation for
the data below.
Solution
 To calculate the mean we add together the results for all measurements
3.080+3.094+3.107+3.056+3.112 +3.174+3.198=21.821g
and divide by the number of measurements
21.821g
X= = 3.117 g
7
 To calculate the standard deviation we first calculate the difference between
each measurement and the data set’s mean value, square the resulting
differences, and add them together to find the numerator of standard deviation
equation
Solution
 Next, we divide this sum of squares by n–1, where n is the number of
measurements, and take the square root.

 Finally, the relative standard deviation and percent relative standard deviation
are
Some important terms
Accuracy
 Accuracy indicates the closeness of the measurement to the true
or accepted value and is expressed by the error.
 The difference between accuracy and precision can be illustrated
as follow.
 Note that accuracy measures agreement between a result and
the accepted value.
 Precision, on the other hand, describes the
agreement among
several results
obtained in the same way.
 We can determine precision just by
measuring replicate samples.
 Accuracy is often more difficult to
determine because the true
Some important terms
Accuracy
 Accuracy is expressed in terms of either absolute or relative
error.
Absolute Error
 The absolute error (E) of a measurement is the difference
between the measured (Xi ) value and the true value (Xt).

 The sign of the absolute error tells you whether the value in
question is high or low.
 The negative sign shows that the experimental result is
smaller than the accepted value.
 The positive sign in the second case shows that the
experimental result is larger than the
Some important terms
Relative Error
 The relative error of a measurement is the absolute
error divided by the true value.

 Relative error may be expressed in percent, parts per


thousand (ppt), or parts per million (ppm), depending on
the magnitude of the result.
 For example, the relative error for the mean of the data
in Fe(III) example above is:
Types of Errors in
Experimental Data
 Errors associated with the central
tendency reflect the accuracy of the
analysis.

 Errors associated with the spread


reflect the
precision of the analysis
Types of Errors in
Experimental Data
 Chemical analyses are affected by at least two types of errors.
1) Random (or indeterminate) error, causes data to be
scattered more or less symmetrically around a mean value.
Random, or indeterminate, errors affect measurement
precision.
2) -Systematic (or determinate) error, causes the mean of
a data set to differ from the accepted value. Systematic, or
determinate, errors affect the accuracy of results.
-In general, a systematic error in a series of replicate
measurements causes all the results to be too
high or too low.
-An example of a systematic error is the loss of a volatile
analyte while heating a sample.
3. -Gross error. Gross errors usually occur only occasionally,
are often large, and may cause a result to be either high or
low. They are often the product of human errors.
Systematic errors
 Systematic errors have a definite value, an
assignable cause, and are of the same magnitude
for replicate measurements made in the same way.
 They lead to bias in measurement results.
Sources of Systematic Errors
 Instrumental errors are caused by:
• Non-ideal instrument behaviour
• Faulty calibrations
• use under inappropriate conditions.
 Method errors arise from:
• Non-ideal chemical or physical behaviour of analytical
systems.
 Measurement errors
• Due to limitations in the equipment and instruments
used to make measurements e.g. analytical balance.
 Sampling errors
• When sampling strategy fails to provide a
representative sample e.g. soil sampling
(heterogeneous sample).
Detection of systematic
instrument and personal errors
 Most systematic instrument errors are found and
corrected by calibration.
 Periodic calibration of equipment is always
desirable because the response of most
instruments changes with time as a result of wear,
corrosion, or mistreatment.
 Many personal errors can be minimised by care
and self-discipline.
 It must be a good habit to check instrument
readings, notebook entries, and calculations
systematically.
 Errors due to limitations of the experimenter can
Detection of systematic method
errors
 Bias in an analytical method is particularly difficult
to detect.
 One or more of the following steps can be taken to
recognize and
1. Analysis ofadjust for areference
standard systematic error in an
• analytical
materials
SRMs method.
(SRMs) sold by the National Institute of
are substances
Standards and Technology (NIST) and certified to
contain specified concentrations of one or more
• analytes.
Often, analysis using standard
reference materials gives results
that differ from the accepted value.
• The question becomes one of
establishing whether such a
difference is due to bias
Detection of systematic
method errors
2. Independent analysis
• If standard samples are not available, a second
independent and reliable analytical method can be
used in parallel with the method being evaluated.
• The independent method should differ as much as
possible from the one under study.
• This practice minimizes the possibility that some
common factor in the sample has the same effect
on both methods.
• Again, a statistical test must be used to determine
whether any difference is a result of random errors
in the two methods or due to bias in the method
Detection of systematic
method errors
3. Blank determination
 A blank contains the reagents and solvents used in
a determination, but no analyte.
 Often, many of the sample constituents are added
to simulate the analyte environment, which is
called the sample matrix.
 In the blank determination, all steps of the analysis
are performed on the blank material and the
results are used to correct the sample
measurements.
 Blank determinations reveal errors due to
interfering contaminants from the reagents and
Exercises
1. Explain the difference between
(a) random and systematic error.
(b) constant and proportional error.
(c) absolute and relative error.
(d) mean and median.
2. Name three types of systematic errors.
3. Describe at least three systematic errors that might occur
while weighing a solid on an analytical balance.
4. Describe at least three ways in which a systematic error
might occur while using a pipet to transfer a known
volume of liquid.
5 How are systematic method errors detected?
Random errors in chemical
analysis
 As we saw previously, precision is a measure of the
spread of individual measurements or results about a
central value.
 That measurement is express as a range, a standard
deviation, or a variance.
 There are two types of precision: repeatability and
reproducibility.
 Repeatability is the precision when a single analyst
completes an analysis in a single session using the
same solutions, equipment, and instrumentation.
 Reproducibility, on the other hand, is the precision
under any other set of conditions, including between
analysts or between laboratory sessions for a single
Random errors in chemical
analysis
 Errors that affect precision are called random or
indeterminate and are characterized by random
variations in their magnitude and their direction.
 Random errors can never be totally eliminated and
are often the major source of uncertainty in a
determination.
 Usually, most contributors to random error cannot be
positively identified.
 Even if we can identify random error sources, it is
often impossible to measure them because most are
so small that they cannot be detected individually.
 The accumulated effect of the individual
Random errors in chemical
analysis
 For example, the scatter of data in Figures below is
a direct result of the accumulation of small random
uncertainties.
Results from six
replicate determinations
of iron in aqueous
samples of a standard
solution containing 20.0
ppm iron(III). The mean
value of 19.78 has been
rounded to 19.8 ppm
Random Error Sources
 We can assign indeterminate errors to several
sources such as:
• Samples collection: When we collect a sample, for
instance, only a small portion of the available material is
taken, which increases the chance that small-scale
inhomogeneities in the sample will affect repeatability.
• Samples manipulation during the analysis: During an
analysis there are many opportunities to introduce
indeterminate method errors.
• Sample analysis: all measuring devices are subject to
indeterminate measurement errors due to limitations in
our ability to read its scale.
 Because they are random, positive and negative
indeterminate errors tend to cancel, provided that
Random Error Sources
 For example: We can get a qualitative idea of the
way small undetectable uncertainties produce a
detectable random error in the following way.
 Imagine a situation in which just four small random
errors combine to give an overall error.
 We will assume that each error has an equal
probability of occurring and that each can cause
the final result to be high or low by a fixed amount
±U.
 The table below shows all the possible ways the
four errors can combine to give the indicated
deviations from the mean value.
Random Error Sources

 The negative errors have the


same relationship.
 This ratio of 1:4:6:4:1 is a
measure of the probability for a
deviation of each magnitude.

 If we make a sufficiently large number of


measurements, we can expect a frequency
distribution like that shown in Figure on the left.
 Such a plot is called a Gaussian curve or a normal
error curve.
 Note that the y-axis in the plot is the relative
Distribution of Experimental
Results
 From experience with many determinations, it was
found that the distribution of replicate data from
most quantitative analytical experiments
approaches that of the Gaussian curve.
 A Gaussian, or normal error curve, is a curve that
shows the symmetrical distribution of data around
the mean of an infinite set of data.Gaussian probability
probability distribution distribution
• Shows that data is
scattered more
or less symmetrically
around the mean
Statistical treatment of random
errors
 We can use statistical methods to evaluate the
random errors discussed in the preceding section.
 Statistical methods, do allow us to categorize and
characterize
data in different ways and to make objective and
intelligent
decisions about data quality and interpretation.
 Generally, we base statistical analyses on the
assumption that random errors in analytical results
follow a Gaussian, or normal, distribution such as
that illustrated above.
 The approximation becomes better in the limit of a
Statistical treatment of
random errors
Samples and Populations
 Typically in a scientific study, we infer information
about a population or universe from
observations made on a subset or sample.
 In some cases,
the population is
finite and real,

while in others,
the
population
is hypothetical
Statistical treatment of
random errors
Exercises
Statistical treatment of
random errors
 Statistical laws have been derived for populations,
but they can be used for samples after suitable
modification.
 Such modifications are needed for small samples
because a few data points may not represent the
entire population.
 Do not confuse the statistical sample with the
analytical sample.
 Consider four water samples taken from the same
water supply and analysed in the laboratory for
calcium.
 The four analytical samples result in four
Statistical treatment of
random errors
Properties of Gaussian Curves
The Population Mean m and the Sample Mean
x
 The sample mean x is found from

 Where N is the number of measurements in the


sample set.
 The same equation is used to calculate the population
mean m
Statistical treatment of
random errors
Properties of Gaussian Curves
The Population Standard
Deviation s and the Sample Standard Deviation s
 The population standard deviation s, which is a

measure of the precision of the population, is given


by the equation. Where N is the number of data
points making up the
population.

 Equation must be modified when it is applied to a


small sample of data. Thus, the sample standard
deviation s is given by the equation
Statistical treatment of
random errors
An Alternative Expression for Sample Standard
Deviation
 To find s with a calculator that does not have a
standard deviation key, the following rearrangement
is easier to use than directly applying Equation above:
Problem
Statistical treatment of
random errors
How to improve the reliability of s?
 Increase N, then s becomes smaller;
i.e., if N →ꚙ, x → m and s → σ.
 Pooling data also improves the reliability of s.

 Where N1 is the number of results in set 1, N2 is the


number in set 2, and so forth. The term Nt is the
total number of data sets pooled.
Problem
Variance and Other Measures of
Precision
Variance (s2)
 The variance is just the square of the standard
deviation.
 The sample variance s2 is an estimate of the
population variance s2 and is given by:

 Note that the standard deviation has the same


units as the data, while the variance has the units
of the data squared.
 Scientists tend to use standard deviation rather
than variance because it is easier to relate a
Variance and Other Measures of
Precision
Relative Standard Deviation (RSD) and Coefficient of
Variation (CV)
 RSD is calculated by dividing the standard deviation

by the mean value of the data set.

 The result is often expressed in parts per thousand


(ppt) or in percent by multiplying this ratio by 1000
ppt or by 100%. For example,
Variance and Other Measures of
Precision
 The relative standard deviation multiplied by 100%
is called the coefficient of variation (CV).

 Relative standard deviations often give a clearer


picture of data quality than do absolute standard
deviations.
 As an example, suppose that a copper
determination has a standard deviation of 2 mg. If
the sample has a mean value of 50 mg of copper,
 The CV for this sample is 4% (2/50 x100%).
 For a sample containing only 10 mg, the CV is 20%.
Variance and Other Measures of
Precision
Spread or Range (w)
 The spread, or range, w, is another term that is
sometimes used to describe the precision of a set
of replicate results.
 It is the difference between the largest value in the
set and the smallest.
Statistical Data
Treatment and
Evaluation
Introduction
 The consequences of making errors in statistical tests
are often compared with the consequences of errors
made in judicial procedures.
 In the jury room, we can make two types of errors. An
innocent person can be convicted, or a guilty person
can be set free.
 In our justice system, we consider it a more serious
error to convict an innocent person than to acquit a
guilty person.
 Similarly, in statistical tests to determine whether two
quantities are the same, two types of errors can be
made.
 A type I error occurs when we reject the hypothesis
Introduction
 Scientists use statistical data analysis to evaluate the
quality of experimental measurements, to test various
hypotheses, and to develop models to describe
experimental results.
 In this chapter, we consider several of the most common
applications of statistical data treatment. These include
1) Defining confidence interval
2) Determining the number of replicate measurements required to ensure
that an experimental mean falls within a certain range with a given level
of probability.
3) Estimating the probability that (a) an experimental mean and a true value
or (b) two experimental means are different, that is, whether the
difference is real or simply the result of random error.
4) Determining at a given probability level whether the precision of two sets
of measurements differs.
5) Comparing the means of more than two samples to determine whether
differences in the
Confidence intervals
Confidence intervals
 For example, we might say that it is 99% probable
that the true population mean for a set of
potassium measurements lies in the interval 7.25
± 0.15% K.
 Thus, the probability that the mean lies in the
interval from 7.10 to 7.40% K is 99%.
 The size of the confidence interval, which is
computed from the sample standard deviation,
depends on how well the sample standard
deviation s estimates the population standard
deviation s.
 If s is a good estimate of s, the confidence interval
Finding the confidence interval
when s is known or s is a good
estimate of s
 If we select at random a single member from a
population, what is its most likely value?
 This can be obtained from the formula below:

 Where the value of z is how confident we are in


assigning this range.
 Values reported in this fashion are called
confidence intervals.
Finding the confidence interval
when s is known or s is a good
estimate of s
 Table below gives the confidence intervals for
several values of z.
 However, a 95% confidence level is a common
choice in analytical chemistry.
Problem
Finding the confidence interval
when s is known or s is a good
estimate of s
 Alternatively, we can rewrite above equation so
that it gives the confidence interval for m based on
the population’s standard deviation and the value
of a single member drawn from the population.

 Note the qualification that the prediction for m is


based on one sample; a different sample likely will
give a different 95% confidence interval.
 Our result here, therefore, is an estimate for m
based on this one sample.
Problem
Finding the confidence interval
when s is known or s is a good
estimate of s
 It is unusual to predict the population’s expected
mean from the analysis of a single sample.
 Instead, we collect n samples drawn from a
population of known s, and report the mean, X .
 The standard deviation of the mean, sX , which is
also known as the standard error of the mean, is:

 The confidence interval for the population’s mean,


therefore, is:
Problem
Finding the confidence interval
when s is unknown
 Often, limitations in time or in the amount of available
sample prevent us from making enough measurements to
assume s is a good estimate of s.
 In such a case, a single set of replicate measurements must
provide not only a mean but also an estimate of precision.
 As indicated earlier, s calculated from a small set of data
may be quite uncertain.
 Thus, confidence intervals are necessarily broader when we
must use a small sample value of s as our estimate of s.
 To account for the variability of s, we use the important
statistical parameter t (student’s test). which is defined in
exactly the same way as z.
 For a single measurement with result x, we can define t as:
Finding the confidence interval
when s is unknown
 Like z in the previous section, t depends on the
desired confidence level.
 However, t also depends on the number of degrees
of freedom in the calculation of s.
Finding the confidence interval
when s is unknown
Finding the confidence interval
when s is unknown
 For the mean of N measurements:

 The confidence interval for the mean x of N


replicate measurements can be calculated from t
by Equation below, which is similar to the previous
Equation using z:
Problem
Tests of Significance-Is There a
Difference?
 In developing a new analytical method, it is often
desirable to compare the results of that method
with those of an accepted (perhaps standard)
method.
 How, though, can one tell if there is a significant
difference between the new method and the
accepted one?
 Again, we resort to statistics for the answer.
 Deciding whether one set of results is significantly
different from another depends not only on the
difference in the means but also on the amount of
data available and the spread.
Constructing a Significance
Test
 The four steps for a statistical analysis of data
using a significance test:
1. Pose a question, and state the null hypothesis,
Ho, and the alternative hypothesis, HA.
2. Choose a confidence level for the statistical
analysis.
3. Calculate an appropriate test statistic and
compare it to a critical value.
4. Either retain the null hypothesis, or reject it
and accept the alternative hypothesis.
Constructing a Significance
Test
Null hypothesis and alternative hypothesis
 The first step in constructing a significance test is to
state the problem as a yes or no question, such as “Is
this medication effective at lowering a patient’s blood
glucose levels?”
 A null hypothesis and an alternative hypothesis define
the two possible answers to our yes or no question.
 The null hypothesis, Ho , is that indeterminate errors
are sufficient to explain any differences between our
results.
 The alternative hypothesis, Ho , is that the differences
in our results are too great to be explained by random
error and that they must be determinate in nature.
Statistical aids to hypothesis
testing
 Specific examples of hypothesis tests that
scientists often use include the comparison of
1. The mean of an experimental data set with
what is believed to be the true value.
2. The mean to a predicted or cut-off (threshold)
value.
3. The means or the standard deviations from
two or more sets of data.
Comparing an Experimental Mean
(m) with a Known Value (mo)
 There are many cases in which a scientist or
engineer needs to compare the mean of a data set
with a known value.
 In some cases, the known value is the true or
accepted value based on prior knowledge or
experience.
 There are two contradictory outcomes that we
consider in any hypothesis test.
1. The null hypothesis Ho , states that m = mo.

2.The alternative hypothesis Ha can be stated in


several ways.
 We might reject the null hypothesis in favour of H
Comparing an Experimental
Mean (m) with a Known Value
(mo)
For examples
Comparing an Experimental
Mean (m) with a Known Value
(mo)
Comparing an Experimental
Mean (m) with a Known Value
(mo)
 The null hypothesis is rejected if the test
statistic lies within the rejection region.
 For tests concerning one or two means, the
test statistic might be the z statistic if we
have a large number of measurements or if
we know s.
 Quite often, however, we use the t statistic for
small numbers of measurements with
unknown s.
 When in doubt, the t statistic should be used.
Large Sample z Test
Large Sample z Test

 The rejection regions are illustrated below for the


95% confidence level.
 Note that for Ha: m ≠ mo, we can reject for either a
positive value of z or for a negative value of z that
exceeds the critical value.
 This is called a two-tailed test since rejection can
occur for results in either tail of the distribution.
Large Sample z Test

 If instead our alternative hypothesis is Ha: m . mo,


the test is said to be a one tailed test.
 In this case, we can reject only when z ≥ zcrit.
Large Sample z Test

 Similarly, if the alternative hypothesis is m <


m0, we can reject only when z ≤ - zcrit.
Problem
Small Sample t Test
Small Sample t Test
Comparison of two experimental
means
 Three factors influence the result of an analysis:
the method, the sample, and the analyst.
 We can study the influence of these factors by
conducting experiments in which we change one
factor while holding constant the other factors.
 For example, to compare two analytical methods
we can have the same analyst apply each method
to the same sample and then examine the
resulting means.
 In a similar fashion, we can design experiments to
compare two analysts or to compare two samples.
Comparison of two experimental
means
Unpaired data and paired data
 Before we consider the significance tests for
comparing the means of two samples, we need to
make a distinction between unpaired data and paired
data.
 This is a critical distinction and learning to
distinguish between
these two types of data is important.
 Here are two simple examples that highlight the
difference between unpaired data and paired data.
 In each example the goal is to compare two
balances by weighing coins.
Comparison of two
experimental means
Unpaired data and paired
 Exampledata
1: We collect 10 coins and weigh each coin
on each balance. This is an example of paired data
because we use the same10 coins to evaluate each
balance.
 Example 2: We collect 10 coins and divide them into
two groups
of five coins each. We weight the coins in the first
group on
one balance and we weigh the second group of
pennies on the other balance.
 Note that no coin is weighed on both balances. This
is an
The t test for differences in
means
Unpaired data
 let us assume that N1 replicate analyses by analyst 1
yielded a mean value of x1 and that N2 analyses by analyst
2 obtained
by the same method gave x2.
 The null hypothesis: H0: m1 = m2
 Most often when testing differences in means, the
alternative hypothesis is Ha: m1 ≠ m2, and the test is a two-
tailed test.
 If the data were collected in the same manner, it is often
safe to assume that the standard deviations of both data
sets are similar.
The t test for differences in
means
Unpaired data
 The test statistic t:

 The test statistic is then compared with the critical value of


t obtained from the table for the particular confidence level
desired.
 The number of degrees of freedom for finding the critical
value of t in Table is Reject
N1 + N2H-o 2.
if: t >
tcrit
The t test for differences in
means
Problem
The t test for differences in
means
Unpaired data
 In Example above, no significant difference in the alcohol
content of the two wines was detected at the 95% probability
level.
 This statement is equivalent to saying that m1 is equal to
m1with a certain degree of confidence.
 However, the tests do not prove that the wine in the glass
came from the same bottle.
 To establish with a reasonable probability that the two wines
are identical would require extensive testing of other
characteristics, such as taste, color, odor, etc…
 If no significant differences are revealed by all these tests and
by others, then it might be possible to judge the glass of wine
as originating in the open bottle.
The t test for differences in
means
Paired
 Data
Scientists and engineers often make use of pairs of
measurements on the same sample in order to
minimize sources of variability that are not of
interest.
 For example, two methods for determining glucose
in blood serum are to be compared. Method A
could be performed on samples from five randomly
chosen patients and Method B could
be performed on samples from five different
patients.
 However, there would be variability because of the
different glucose levels of each patient.
The t test for differences in
means
Paired
 The paired t testData
uses the same type of procedure as
the normal t test except that we analyze pairs of data
and compute the differences, di.
 The standard deviation is now the standard deviation
of the mean difference.
 Our null hypothesis is Ho: md = Δo, where Δo is a specific
value of the difference to be tested, often zero.
 The alternative hypothesis could be: md ≠ Δo, md > Δo, or
md < Δo.
 The test statistic value is
Where d is the average difference = Σdi/N.
Problem
Solution
Comparison of Variances (or
standard deviations)
F test
 At times, there is a need to compare the variances
(or standard deviations) of two data sets.
 For example, the normal t test requires that the
standard deviations of the data sets being
compared are equal.
 A simple statistical test, called the F test, can be
used to test this assumption under the provision
that the populations follow the normal (Gaussian)
distribution.
 The F test is based on the null hypothesis that the
two population variances under consideration are
Comparison of Variances (or
standard deviations)
F test
 The test statistic F:
F = s12/s2 S: sample variance
 F is compared 2with the critical value of F at the
desired significance level.
 The null hypothesis is rejected if F the test statistic
differs too much from unity.
 Critical values of F at the 95% significance level are
shown in Table below.
 Note that two degrees of freedom are given, one
associated with the numerator and the other with
the denominator.
Comparison of Variances (or
standard deviations)
F test
Comparison of Variances (or
standard deviations)
F test
 The F test can be used in either a one-tailed mode or in
a two-tailed mode.
 One-tailed mode
Ha: s12 > s22 Or Ha: s12 < s22 s: population standard deviation

The variance of the supposedly more precise procedure is placed in the


denominator and that of the less precise procedure is placed in the
numerator.

 Two-tailed mode
Ha: s12 ≠ s22
For this application, the larger variance always appears in the
numerator.

Problem
Problem
Rejection of Outliers
 There are times when a set of data contains an
outlying result that appears to be outside the
range that the random errors in the procedure
would give.
 It is generally considered inappropriate and in
some cases unethical to discard data without a
reason.
 However, the questionable result, called an outlier,
could be the result of an undetected gross error.
 Hence, it is important to develop a criterion to
decide whether to retain or reject the outlying
data point.
Rejection of Outliers
The Q

Test
The absolute value of the difference between the
questionable result xq, and its neighbour xn is
divided by the spread (w) of the entire set to give
the quantity Qexp.

 Qexp is compared with Qcrit found in table below ,


and if Qexp > Qcrit the result can be rejected with the
indicated degree of confidence.
Rejection of Outliers

E.g.:
Problem
Rejection of Outliers
Tn test
 The absolute value of the difference between the
questionable result (xq) and the mean (x) is divided
by the standard deviation (s) to give Tn.

 if Tn > critical values (Table below); then the result


is rejected with the indicated degree of confidence.
Rejection of Outliers
Problem

A new method for the analysis of copper was tested


with a sample known to contain 16.68% Cu. A total of
5 replicate measurements were carried out, giving
the following results (%): 16.54, 16.64, 16.30, 16.67,
16.70.
(a)Evaluate the mean and the median
percentages of copper for these data.
(b)Apply the Q test and the Tn test (95%
confidence level) to the outlying result.
(c) Which value, the mean or the median, do you
prefer as best value for this analysis? Defend
Correlation and regression
analysis
 Correlation analysis refers to a broad class of
relationships in statistics that involve dependence.
 Dependence in statistics means a relationship
between two sets of data or two random variables.
 The purpose of correlation analysis is to find how
strong the relationship is between the two
variables.
 In order to study and develop an equation that
expresses the relationship between the two
variables, e.g. Y and X, we need to estimate the
value of the dependent variable Y’ for example
based on a selected value of the independent
Correlation and regression
analysis
 The technique of developing the equation is a
regression analysis.
 The relationship equation is the regression
equation.
 The equation for the straight line is the simplest to
estimate X from.
 Because of the indeterminate error in instrument
reading, some points may not lie exactly on the
straight line given by the regression equation:
y = ax + b
Correlation and regression
analysis
Correlation and regression
analysis
 The values of b (the intercept) and a (gradient) in
regression equation are called regression
coefficient and are given by:

a
n( xy )  ( x)( y )
2 b
 y
a
 x
n( x )  ( x)
2
n n

 Where: X = value of the independent variable


Y = value of the dependent variable
n = number of items in the sample
Correlation and regression
analysis
 If for instance a has been calculated to be equal to
2.14 and b = - 1.76, the regression equation would
be: y = 2.14 x – 1.76.
 This mathematical method of determining the
regression equation in order to draw the regression
line is called the least squares principle.
 The regression line is the best-fitting straight line. It
gives the minimum sum of squares of the vertical
deviations from the line.
 Regression analysis involves the identification of the
relationship that exist between a dependent variable
and either one or more independent variables.

How is the regression line
drawn?
 First the raw data are drawn on scatter diagram.
Each dot gives a pair of X and Y values.
 Then a and b valves are calculated and the
regression equation formulated as above.
 Using any two pairs of X and Y, i.e. two points are
marked on the scatter diagram.
 The two points are then connected by a straight
line graph.
Coefficient of correlation
 In correlation analysis, we estimate a sample
correlation coefficient, more specifically the Pearson
Product Moment correlation coefficient to test if there
is a linear relationship between the variables.
 To quantify the strength of the relationship, we can
calculate the correlation coefficient (r) as follow.
Where r is the correlation coefficient, n is
 the number of observations, sx is the
standard deviation of x, sy is the standard
deviation of y, xi and yi are the individual
values of the variables x and y, respectively
and x and y are their means.
 The numerical value of r ranges from +1.0 to -1.0.
 r > 0 indicates positive linear relationship, r < 0
Coefficient of correlation
 The sign of the correlation coefficient indicates the
direction of the association. The magnitude of the
correlation coefficient indicates the strength of the
association.
 For example, a correlation of r = 0.9 suggests a
strong, positive association between two variables,
whereas a correlation of r = -0.2 suggest a weak,
negative association.
 A correlation close to zero suggests no linear
association between two continuous variables.
 It is important to note that there may be a non-linear
association between two continuous variables, but
computation of a correlation coefficient does not
Coefficient of correlation
 Graphical displays are particularly useful to explore
associations between variables.
 The figure below shows four hypothetical scenarios
in which one continuous variable is plotted along
the X-axis and the other along the Y-axis.
Coefficient of correlation
 From the plots above one can see that if the more
number of points will tend to gather around a
straight line and the correlation is higher then, the
linear relationship between the two variables is
stronger.
 If existence of random scattering of points is seen,
then there is no relationship between the two
variables that is the correlation is very low or there
is zero correlation.
 Zero or very low correlation can result from a non-
linear relationship existing between the variables.
End

You might also like