Educational Statistics
Educational Statistics
Educational Statistics
Graduate School
Tacloban City
COMPREHENSIVE EXAMINATION
in
EDUCATIONAL STATISTICS
4. Discuss the type of data presentation and cite example to substantiate your
answer.
7. Discuss the following basic measurements in statistics and give the formula for
each.
Answers:
Statistics -Are defined as numerical data, and is the field of math that deals with
the collection, tabulation and interpretation of numerical data. An example of
statistics is a report of numbers saying how many followers of each religion
there are in a particular country.
In statistics, we generally want to study a population. You can think of a
population as a collection of persons, things or objects under study. To study the
population, we select a sample. The idea of sampling is to collect a portion (or
subset) of the larger population. The science of statistics deals with the collection,
analysis, interpretation, and presentation of data. We see and use data in our
everyday lives. In statistics, we generally want to study a population. You can
think of a population as a collection of persons, things, or objects under study. To
study the population, we select a sample. The idea of sampling is to select a portion
(or subset) of the larger population and study that portion (the sample) to gain
information about the population. Data are the result of sampling from a
population.
Example:
If you wished to compute the overall grade point average at your school, it would
make sense to select a sample of students who attend the school. The data collected
from the sample would be the students’ grade point averages.
Show the population and sample.
In presidential elections, opinion poll samples of 1,000–2,000 people are taken.
The opinion poll is supposed to represent the views of the people in the entire
country.
Show the population and sample.
City of Houston wants to know if the annual household income in the city is higher
than national average. The statisticians collect data from 1500 families.
Show the population and sample.
An automobile manufacturer wanted to know if more than 50% of the US drivers
own at least a domestic car. This company surveyed 10,000 drivers over US.
Show the population and sample.
From the sample data, we can calculate a statistic. A statistic is a number that
represents a property of the sample. For example, if we consider one math class to
be a sample of the population of all math classes, then the average number of
points earned by students in that one math class at the end of the term is an
example of a statistic. The statistic is an estimate of a population parameter. A
parameter is a number that is a property of the population. Since we considered all
math classes to be the population, then the average number of points earned per
student over all the math classes is an example of a parameter.
Answer:
Continuity, in mathematics, rigorous formulation of the intuitive concept of
a function that varies with no abrupt breaks or jumps. A function is a relationship
in which every value of an independent variable—say x—is associated with a
value of a dependent variable—say y. Continuity of a function is sometimes
expressed by saying that if the x-values are close together, then the y-values of the
function will also be close. But if the question “How close?” is asked, difficulties
arise.
For close x-values, the distance between the y-values can be large even if the
function has no sudden jumps. For example, if y = 1,000x, then two values of x that
differ by 0.01 will have corresponding y-values differing by 10. On the other hand,
for any point x, points can be selected close enough to it so that the y-values of this
function will be as close as desired, simply by choosing the x-values to be closer
than 0.001 times the desired closeness of the y-values.
Thus, continuity is defined precisely by saying that a function f(x) is continuous at
a point x0 of its domain if and only if, for any degree of closeness ε desired for
the y-values, there is a distance δ for the x-values (in the above example equal to
0.001ε) such that for any x of the domain within the distance δ from x0, f(x) will be
within the distance ε from f(x0). In contrast, the function that equals 0 for x less
than or equal to 1 and that equals 2 for x larger than 1 is not continuous at the
point x = 1, because the difference between the value of the function at 1 and at
any point ever so slightly greater than 1 is never less than 2.
A function is said to be continuous if and only if it is continuous at every point of
its domain. A function is said to be continuous on an interval, or subset of its
domain, if and only if it is continuous at each point of the interval. The sum,
difference, and product of continuous functions with the same domain are also
continuous, as is the quotient, except at points at which the denominator is zero.
Continuity can also be defined in terms of limits by saying that f(x) is continuous
at x0 of its domain if and only if, for values of x in its domain,
A more abstract definition of continuity can be given in terms of sets, as is done
in topology, by saying that for any open set of y-values, the corresponding set of x-
values is also open. (A set is “open” if each of its elements has a “neighbourhood,”
or region enclosing it, that lies entirely within the set.) Continuous functions are
the most basic and widely studied class of functions in mathematical analysis, as
well as the most commonly occurring ones in physical situations.
Answers:
What is Measurement? Normally, when one hears the term measurement, they may
think in terms of measuring the length of something (ie. the length of a piece of
wood) or measuring a quantity of something (ie. a cup of flour). This represents a
limited use of the term measurement. In statistics, the term measurement is used
more broadly and is more appropriately termed scales of measurement. Scales of
measurement refer to ways in which variables/numbers are defined and
categorized. Each scale of measurement has certain properties which in turn
determines the appropriateness for use of certain statistical analyses. The four
scales of measurement are nominal, ordinal, interval, and ratio. Nominal:
Categorical data and numbers that are simply used as identifiers or names represent
a nominal scale of measurement. Numbers on the back of a baseball jersey (St.
Louis Cardinals 1 = Ozzie Smith) and your social security number are examples of
nominal data. If I conduct a study and I'm including gender as a variable, I will
code Female as 1 and Male as 2 or visa versa when I enter my data into the
computer. Thus, I am using the numbers 1 and 2 to represent categories of data.
Ordinal: An ordinal scale of measurement represents an ordered series of
relationships or rank order. Individuals competing in a contest may be fortunate to
achieve first, second, or third place. First, second, and third place represent ordinal
data. If Roscoe takes first and Wilbur takes second, we do not know if the
competition was close; we only know that Roscoe outperformed Wilbur. Likert-
type scales (such as "On a scale of 1 to 10 with one being no pain and ten being
high pain, how much pain are you in today?") also represent ordinal data.
Fundamentally, these scales do not represent a measurable quantity. An individual
may respond 8 to this question and be in less pain than someone else who
responded 5. A person may not be in half as much pain if they responded 4 than if
they responded 8. All we know from this data is that an individual who responds 6
is in less pain than if they responded 8 and in more pain than if they responded 4.
Therefore, Likert-type scales only represent a rank ordering. Interval: A scale
which represents quantity and has equal units but for which zero represents simply
an additional point of measurement is an interval scale. The Fahrenheit scale is a
clear example of the interval scale of measurement. Thus, 60 degree Fahrenheit or
-10 degrees Fahrenheit are interval data. Measurement of Sea Level is another
example of an interval scale. With each of these scales there is direct, measurable
quantity with equality of units. In addition, zero does not represent the absolute
lowest value. Rather, it is point on the scale with numbers both above and below it
(for example, -10 degrees Fahrenheit). Ratio: The ratio scale of measurement is
similar to the interval scale in that it also represents quantity and has equality of
units. However, this scale also has an absolute zero (no numbers exist below the
zero). Very often, physical measures will represent ratio data (for example, height
and weight). If one is measuring the length of a piece of wood in centimeters, there
is quantity, equal units, and that measure can not go below zero centimeters. A
negative length is not possible. The table below will help clarify the fundamental
differences between the four scales of measurement.
Answers:
Nonparametric Tests vs. Parametric Tests
By Jim Frost 79 Comments
Nonparametric tests don’t require that your data follow the normal distribution.
They’re also known as distribution-free tests and can provide benefits in certain
situations. Typically, people who perform statistical hypothesis tests are more
comfortable with parametric tests than nonparametric tests.
You’ve probably heard it’s best to use nonparametric tests if your data are not
normally distributed—or something along these lines. That seems like an easy way
to choose, but there’s more to the decision than that.
In this post, I’ll compare the advantages and disadvantages to help you decide
between using the following types of statistical hypothesis tests:
4.Discuss the type of data presentation and cite examples to substance your
answer.
Introduction
Data are a set of facts, and provide a partial picture of reality. Whether data are
being collected with a certain purpose or collected data are being utilized,
questions regarding what information the data are conveying, how the data can be
used, and what must be done to include more useful information must constantly
be kept in mind.
Since most data are available to researchers in a raw format, they must be
summarized, organized, and analyzed to usefully derive information from them.
Furthermore, each data set needs to be presented in a certain way depending on
what it is used for. Planning how the data will be presented is essential before
appropriately processing raw data.
First, a question for which an answer is desired must be clearly defined. The more
detailed the question is, the more detailed and clearer the results are. A broad
question results in vague answers and results that are hard to interpret. In other
words, a well-defined question is crucial for the data to be well-understood later.
Once a detailed question is ready, the raw data must be prepared before processing.
These days, data are often summarized, organized, and analyzed with statistical
packages or graphics software. Data must be prepared in such a way they are
properly recognized by the program being used. The present study does not discuss
this data preparation process, which involves creating a data frame,
creating/changing rows and columns, changing the level of a factor, categorical
variable, coding, dummy variables, variable transformation, data transformation,
missing value, outlier treatment, and noise removal.
We describe the roles and appropriate use of text, tables, and graphs (graphs, plots,
or charts), all of which are commonly used in reports, articles, posters, and
presentations. Furthermore, we discuss the issues that must be addressed when
presenting various kinds of information, and effective methods of presenting data,
which are the end products of research, and of emphasizing specific information.
Go to:
Data Presentation
Data can be presented in one of the three ways:
–as text;
–in tabular form; or
–in graphical form.
Methods of presentation must be determined according to the data format, the
method of analysis to be used, and the information to be emphasized.
Inappropriately presented data fail to clearly convey information to readers and
reviewers. Even when the same information is being conveyed, different methods
of presentation must be employed depending on what specific information is going
to be emphasized. A method of presentation must be chosen after carefully
weighing the advantages and disadvantages of different methods of presentation.
For easy comparison of different methods of presentation, let us look at a table
(Table 1) and a line graph (Fig. 1) that present the same information [1]. If one
wishes to compare or introduce two values at a certain time point, it is appropriate
to use text or the written language. However, a table is the most appropriate when
all information requires equal attention, and it allows readers to selectively look at
information of their own interest. Graphs allow readers to understand the overall
trend in data, and intuitively understand the comparison results between two
groups. One thing to always bear in mind regardless of what method is used,
however, is the simplicity of presentation.
5. Compare and contrast the four scales of measurement
Answers:
. Ratio Scale. The ratio scale is exactly the same as the interval scale with one
major difference: zero is meaningful. For example, a height of zero is meaningful
(it means you don’t exist). Compare that to a temperature of zero, which while it
exists, it doesn’t mean anything in particular (although admittedly, in the Celsius
scale it’s the freezing point for water).
Weight is measured on the ratio scale.
6. Give at least (5) sampling techniques and give examples to concretize your
answer.
Answers:
Probability Sampling Methods
1. Simple random sampling
In this case each individual is chosen entirely by chance and each member of the
population has an equal chance, or probability, of being selected. One way of
obtaining a random sample is to give each individual in a population a number, and
then use a table of random numbers to decide which individuals to include.1
For example, if you have a sampling frame of 1000 individuals, labelled 0 to 999,
use groups of three digits from the random number table to pick your sample. So, if
the first three numbers from the random number table were 094, select the
individual labelled “94”, and so on.
As with all probability sampling methods, simple random sampling allows the
sampling error to be calculated and reduces selection bias. A specific advantage is
that it is the most straightforward method of probability sampling. A disadvantage
of simple random sampling is that you may not select enough individuals with your
characteristic of interest, especially if that characteristic is uncommon. It may also
be difficult to define a complete sampling frame and inconvenient to contact them,
especially if different forms of contact are required (email, phone, post) and your
sample units are scattered over a wide geographical area.
2. Systematic sampling
Individuals are selected at regular intervals from the sampling frame. The intervals
are chosen to ensure an adequate sample size. If you need a sample size n from a
population of size x, you should select every x/nth individual for the sample.
For example, if you wanted a sample size of 100 from a population of 1000,
select every 1000/100 = 10th member of the sampling frame.
Systematic sampling is often more convenient than simple random sampling, and it
is easy to administer. However, it may also lead to bias, for example if there are
underlying patterns in the order of the individuals in the sampling frame, such that
the sampling technique coincides with the periodicity of the underlying pattern. As
a hypothetical example, if a group of students were being sampled to gain their
opinions on college facilities, but the Student Record Department’s central list of
all students was arranged such that the sex of students alternated between male and
female, choosing an even interval (e.g. every 20th student) would result in a sample
of all males or all females. Whilst in this example the bias is obvious and should be
easily corrected, this may not always be the case.
3. Stratified sampling
In this method, the population is first divided into subgroups (or strata) who all
share a similar characteristic. It is used when we might reasonably expect the
measurement of interest to vary between the different subgroups, and we want to
ensure representation from all the subgroups.
For example, in a study of stroke outcomes, we may stratify the population
by sex, to ensure equal representation of men and women. The study sample is
then obtained by taking equal sample sizes from each stratum. In stratified
sampling, it may also be appropriate to choose non-equal sample sizes from each
stratum. For example, in a study of the health outcomes of nursing staff in a
county, if there are three hospitals each with different numbers of nursing staff
(hospital A has 500 nurses, hospital B has 1000 and hospital C has 2000), then it
would be appropriate to choose the sample numbers from each
hospital proportionally (e.g. 10 from hospital A, 20 from hospital B and 40 from
hospital C). This ensures a more realistic and accurate estimation of the health
outcomes of nurses across the county, whereas simple random sampling would
over-represent nurses from hospitals A and B. The fact that the sample was
stratified should be taken into account at the analysis stage.
Stratified sampling improves the accuracy and representativeness of the results by
reducing sampling bias. However, it requires knowledge of the appropriate
characteristics of the sampling frame (the details of which are not always
available), and it can be difficult to decide which characteristic(s) to stratify by
4. Clustered sampling
In a clustered sample, subgroups of the population are used as the sampling unit,
rather than individuals. The population is divided into subgroups, known as
clusters, which are randomly selected to be included in the study. Clusters are
usually already defined, for example individual GP practices or towns could be
identified as clusters. In single-stage cluster sampling, all members of the chosen
clusters are then included in the study. In two-stage cluster sampling, a selection of
individuals from each cluster is then randomly selected for inclusion. Clustering
should be taken into account in the analysis.
The General Household survey, which is undertaken annually in England, is a
good example of a (one-stage) cluster sample. All members of the selected
households (clusters) are included in the survey.1
Cluster sampling can be more efficient that simple random sampling, especially
where a study takes place over a wide geographical region. For instance, it is easier
to contact lots of individuals in a few GP practices than a few individuals in many
different GP practices. Disadvantages include an increased risk of bias, if the
chosen clusters are not representative of the population, resulting in an increased
sampling error.
Non-Probability Sampling Methods
1. Convenience sampling
Convenience sampling is perhaps the easiest method of sampling, because
participants are selected based on availability and willingness to take part. Useful
results can be obtained, but the results are prone to significant bias, because those
who volunteer to take part may be different from those who choose not to
(volunteer bias), and the sample may not be representative of other characteristics,
such as age or sex. Note: volunteer bias is a risk of all non-probability sampling
methods.
2. Quota sampling
This method of sampling is often used by market researchers. Interviewers are
given a quota of subjects of a specified type to attempt to recruit.
For example, an interviewer might be told to go out and select 20 adult men, 20
adult women, 10 teenage girls and 10 teenage boys so that they could interview
them about their television viewing. Ideally the quotas chosen would
proportionally represent the characteristics of the underlying population.
Whilst this has the advantage of being relatively straightforward and potentially
representative, the chosen sample may not be representative of other characteristics
that weren’t considered (a consequence of the non-random nature of sampling). 2
3. Judgement (or Purposive) Sampling
Also known as selective, or subjective, sampling, this technique relies on the
judgement of the researcher when choosing who to ask to participate.
Researchers may implicitly thus choose a “representative” sample to suit their
needs, or specifically approach individuals with certain characteristics. This
approach is often used by the media when canvassing the public for opinions and
in qualitative research.
Judgement sampling has the advantage of being time-and cost-effective to perform
whilst resulting in a range of responses (particularly useful in qualitative research).
However, in addition to volunteer bias, it is also prone to errors of judgement by
the researcher and the findings, whilst being potentially broad, will not necessarily
be representative.
4. Snowball sampling
This method is commonly used in social sciences when investigating hard-to-reach
groups. Existing subjects are asked to nominate further subjects known to them, so
the sample increases in size like a rolling snowball.
For example, when carrying out a survey of risk behaviors amongst intravenous
drug users, participants may be asked to nominate other users to be interviewed.
Snowball sampling can be effective when a sampling frame is difficult to identify.
However, by selecting friends and acquaintances of subjects already investigated,
there is a significant risk of selection bias (choosing a large number of people with
similar characteristics or views to the initial individual identified).
Bias in sampling
There are five important potential sources of bias that should be considered when
selecting a sample, irrespective of the method used. Sampling bias may be
introduced when:1
1. Any pre-agreed sampling rules are deviated from
2. People in hard-to-reach groups are omitted
3. Selected individuals are replaced with others, for example if they are
difficult to contact
4. There are low response rates
An out-of-date list is used as the sample frame (for example, if it excludes people
who have recently
7. Discuss the following basic measurement in statistics and give the formula
each
Answer:
Statistics Basics: Overview
The most common basic statistics terms you’ll come across are the mean, mode
and median. These are all what are known as “Measures of Central Tendency.”
Also important in this early chapter of statistics is the shape of a distribution. This
tells us something about how data is spread out around the mean or median.
Perhaps the most common distribution you’ll see is the normal distribution,
sometimes called a bell curve. Heights, weights, and many other things found in
nature tend to be shaped like this:
Overview
Stuck on how to find the mean, median, & mode in statistics?
1. he mean is the average of a data set.
2. The mode is the most common number in a data set.
3. The median is the middle of the set of numbers.
Of the three, the mean is the only one that requires a formula. I like to think of it in
the other dictionary sense of the word (as in, it’s mean as opposed to nice!). That’s
because, compared to the other two, it’s not as easy to work with.
Other Types
There are other types of means, and you’ll use them in various branches of math.
Most have very narrow applications to fields like finance or physics; if you’re in
elementary statistics you probably won’t work with them.
These are some of the most common types you’ll come across.
1. Weighted mean.
2. Harmonic mean.
3. Geometric mean.
4. Arithmetic-Geometric mean.
5. Root-Mean Square mean.
6. Heronian mean.
1. Weighted Mean
These are fairly common in statistics, especially when studying populations.
Instead of each data point contributing equally to the final average, some data
points contribute more than others. If all the weights are equal, then this will
equal the arithmetic mean. There are certain circumstances when this can give
incorrect information, as shown by Simpson’s Paradox.
2. Harmonic Mean
A. Add the reciprocals of the numbers in the set. To find a reciprocal, flip
the fraction so that the numerator becomes the denominator and the
denominator becomes the numerator. For example, the reciprocal of 6/1 is
1/6.
B. Divide the answer by the number of items in the set.
C. Take the reciprocal of the result.
The harmonic mean is used quite a lot in physics. In some cases involving rates
and ratios it gives a better average than the arithmetic mean. You’ll also find
uses in geometry, finance and computer science.
2. Geometric Mean
This type has very narrow and specific uses in finance, social sciences and
technology. For example, let’s say you own stocks that earn 5% the first year,
20% the second year, and 10% the third year. If you want to know the average
rate of return, you can’t use the arithmetic average. Why? Because when you
are finding rates of return you are multiplying, not adding. For example, the
first year you are multiplying by 1.05.
3. Arithmetic-Geometric Mean
This is used mostly in calculus and in machine computation (i.e. as the basic
for many computer calculations). It’s related to the perimeter of an ellipse.
When it was first developed by Gauss, it was used to calculate planetary orbits.
The arithmetic-geometric is (not surprisingly!) a blend of the arithmetic and
geometric averages. The math is quite complicated but you can find a
relatively simple explanation of the math here.
4. Root-Mean Square
It is very useful in fields that study sine waves, like electrical engineering. This
particular type is also called the quadratic average. See: Quadratic Mean / Root
Mean Square.
5. Heronian Mean
Used in geometry to find the volume of a pyramidal frustum. A pyramidal
frustum is basically a pyramid with the tip sliced off.
1. sed in geometry to find the volume of a pyramidal frustum. A pyramidal
frustum is basically a pyramid with the tip sliced off.
2. What is the Mode?
The mode is the most common number in a set. For example, the mode in this set
of numbers is 21:
21, 21, 21, 23, 24, 26, 26, 28, 29, 30, 31, 33
3. What is the Median?
The median is the middle number in a data set. To find the median, list your data
points in ascending order and then find the middle number. The middle number in
this set is 28 as there are 4 numbers below it and 4 numbers above:
23, 24, 26, 26, 28, 29, 30, 31, 33
Note: If you have an even set of numbers, average the middle two to find the
median. For example, the median of this set of numbers is 28.5 (28 + 29 / 2).
23, 24, 26, 26, 28, 29, 30, 31, 33, 34
How to find the mean, median and mode by hand: Steps
How to find the mean, median and mode: MODE
Step 1: Put the numbers in order so that you can clearly see patterns.
For example, lets say we have 2, 19, 44, 44, 44, 51, 56, 78, 86, 99, 99.
The mode is the number that appears the most often. In this case: 44, which
appears three times.
How to find the mean, median and mode: MEAN
Step 2: Add the numbers up to get a total.
Example: 2 +19 + 44 + 44 +44 + 51 + 56 + 78 + 86 + 99 + 99 = 622. Set this
number aside for a moment.
Step 3: Count the amount of numbers in the series.
In our example (2, 19, 44, 44, 44, 51, 56, 78, 86, 99, 99), we have 11 numbers.
Step 4: Divide the number you found in step 2 by the number you found in
step
3.In our example: 622 / 11 = 56.5454545. This is the mean, sometimes called
the average.
Answers:
INTRODUCTION
Statistics is a branch of science that deals with the collection, organisation, analysis
of data and drawing of inferences from the samples to the whole population.[1]
This requires a proper design of the study, an appropriate selection of the study
sample and choice of a suitable statistical test. An adequate knowledge of statistics
is necessary for proper designing of an epidemiological study or a clinical trial.
Improper statistical methods may result in erroneous conclusions which may lead
to unethical practice.
VARIABLES
Variable is a characteristic that varies from one individual member of population to
another individual.[3] Variables such as height and weight are measured by some
type of scale, convey quantitative information and are called as quantitative
variables. Sex and eye color give qualitative information and are called as
qualitative variables
Quantitative variables
Quantitative or numerical data are subdivided into discrete and continuous
measurements. Discrete numerical data are recorded as a whole number such as 0,
1, 2, 3,… (integer), whereas continuous data can assume any value. Observations
that can be counted constitute the discrete data and observations that can be
measured constitute the continuous data. Examples of discrete data are number of
episodes of respiratory arrests or the number of re-intubations in an intensive care
unit. Similarly, examples of continuous data are the serial serum glucose levels,
partial pressure of oxygen in arterial blood and the oesophageal temperature.
A hierarchical scale of increasing precision can be used for observing and
recording the data which is based on categorical, ordinal, interval and ratio scales
Categorical or nominal variables are unordered. The data are merely classified into
categories and cannot be arranged in any particular order. If only two categories
exist (as in gender male and female), it is called as a dichotomous (or binary) data.
The various causes of re-intubation in an intensive care unit due to upper airway
obstruction, impaired clearance of secretions, hypoxemia, hypercapnia, pulmonary
oedema and neurological impairment are examples of categorical variables.
Ordinal variables have a clear ordering between the variables. However, the
ordered data may not have equal intervals. Examples are the American Society of
Anesthesiologists status or Richmond agitation-sedation scale.
Interval variables are similar to an ordinal variable, except that the intervals
between the values of the interval variable are equally spaced. A good example of
an interval scale is the Fahrenheit degree scale used to measure temperature. With
the Fahrenheit scale, the difference between 70° and 75° is equal to the difference
between 80° and 85°: The units of measurement are equal throughout the full range
of the scale.
Ratio scales are similar to interval scales, in that equal differences between scale
values have equal quantitative meaning. However, ratio scales also have a true zero
point, which gives them an additional property. For example, the system of
centimetres is an example of a ratio scale. There is a true zero point and the value
of 0 cm means a complete absence of length. The thyromental distance of 6 cm in
an adult may be twice that of a child in whom it may be 3 cm.
Descriptive statistics
The extent to which the observations cluster around a central location is described
by the central tendency and the spread towards the extremes is described by the
degree of dispersion.
Measures of central tendency
The measures of central tendency are mean, median and mode.[6] Mean (or the
arithmetic average) is the sum of all the scores divided by the number of scores.
Mean may be influenced profoundly by the extreme variables. For example, the
average stay of organophosphorus poisoning patients in ICU may be influenced by
a single patient who stays in ICU for around 5 months because of septicaemia. The
extreme values are called outliers. The formula for the mean is
Mean,
where x = each observation and n = number of observations. Median[6] is defined
as the middle of a distribution in a ranked data (with half of the variables in the
sample above and half below the median value) while mode is the most frequently
occurring variable in a distribution. Range defines the spread, or variability, of a
sample.[7] It is described by the minimum and maximum values of the variables. If
we rank the data and after ranking, group the observations into percentiles, we can
get better information of the pattern of spread of the variables. In percentiles, we
rank the observations into 100 equal parts. We can then describe 25%, 50%, 75%
or any other percentile amount. The median is the 50th percentile. The interquartile
range will be the observations in the middle 50% of the observations about the
median (25th -75th percentile). Variance[7] is a measure of how spread out is the
distribution. It gives an indication of how close an individual observation clusters
about the mean value. The variance of a population is defined by the following
formula: