Lecture One - Statistical Data
Lecture One - Statistical Data
1
predictions, and inform decisions. It comes in various forms, each serving different
purposes in analysis.
Types of Statistical Data
1. Quantitative Data: Numerical values (e.g., height, weight).
o Discrete Data: Counts or measurements that can take on a finite number of
values. Countable values (e.g., number of students in a class).
o Continuous Data: Measurements that can take on any value within a
range, Measurable values (e.g., height, weight).
2. Qualitative Data (Categorical Data):
o Nominal Data: Categories without a natural order (e.g., colors, names).
o Ordinal Data: Categories with a natural order (e.g., rankings, grades).
A more helpful definition of ‘data’ is ‘a series of facts from which conclusions may
be drawn’.
2
Data is a set of measured or described observations e.g. heights of 30 students in a
class, age of 30 students refers to a set of measured observations while the described
observations may be sex (Male or female), dry or wet, hot or cold, germinated or
ungerminated seedlings. Data are a set of research information expressed inquantifiable
and qualifiable forms for the purpose of statistical analysis.
Richard (1997) defines quatitative data as … number of information such as “how
much” or “how many”. They are measured on a numerical scale e.g. weight, age, price,
rank, length and volume. While quanlitative data represent categorical or attribute
information that can be classified by some criterion or quality e.g. sex (male or female),
colour (red, green, black etc); blood type (A,B,AB,O); computer brand (IBM, KAYPRO,
COMPAD), favourite make of car (Ford, Chevrolet) etc.
Roberts (1992) also defines Data as a set of values collected for the response
variable from each of the elements belonging to the sample e.g.
The heights of 100 students.
- ages of 500 students.
- account numbers of 90 customers in a bank.
- scores of 800 students in an examination.
For examples, a departmental store describes the ages of its 100 employees with the
following grouped frequency distribution.
Table 1.1 Frequency Distribution
Ages Numbers of
Employees
20-29 30
30-39 35
40-49 20
5-59 10
60-69 5
In order to collect data you need to observe or to measure some property. This
property is called a variable.
In general terms, variables are certain characteristics of objects which are amenable
to change and can take on different values at any given time depending on the prevailing
conditions imposed on the characteristics under consideration. For example, the age,
height or score of student is an example of a variable, where x takes the values of
2,2,2,3,3,3,3,4,4,5,5,5,6,6,6,7,7,8,8, and 9 as shown in Table 1.2 above.
There are two basic sources of data namely, primary and secondary data.
4
The company might also have carried out a survey to produce their own primary data on
prospective customer attitudes and the availability of distribution through wholesalers.
The Use of Secondary Data
Secondary data are generally used when:
(a) the time, manpower and resources necessary for your own survey are not available
(and, of course, the relevant secondary data exists in a usable form), or
(b) it already exists and provides most, if not all, of the information required.
The advantages of using secondary data are savings in time, manpower and resources in
sampling and data collection. In other words, somebody else has done the ‘spade work’
already.
The disadvantages of using secondary data can be formidable and careful examination of
the source(s) of the data is essential. Problems include the following.
i. Data quality might be questionable. For example, the sample(s) used may have been
too small, interviewers may not have been experienced or any questionnaires used
may have been badly designed.
ii. The data collected might now be out-of-date
iii. Geographical coverage of the survey may not coincide with what you required. For
example, you might require information for Liverpool and the secondary data
coverage is for the whole of Merseyside.
iv. The strata of the population covered may not be appropriate for your purposes. For
example, the secondary data might be split up into male/female and fulltime/part-
time workers and you might consider that, for your purposes, whether part-time
workers are permanent or temporary is significant.
v. Some terms used might have different meanings.
Common examples of this are:
* Wages (basic only or do they include overtime)
* Level of production (are rejects included?)
* Workers (factory floor only or are office staff included?)
5
1.3 Censuses
A census is the name given to a survey which examines every member of a
population.
(a) A firm might take a census of all its employees to find out their opinions on
the possible introduction of a new incentive scheme.
(b) The Government Statistical Service carries out many official censuses. Some
of them are described as follows.
i. A Population Census is taken every ten years, obtaining information such as
age, sex, relationship to head of household, occupation, hours of work,
education, use of a car for travel to work, number of rooms in place of
dwelling etc. for the whole population of Nigeria.
ii. A Census of Distribution is taken every five years, covering virtually all retail
establishments and some wholesalers. It obtains information on numbers of
employees, type of goods sold, turnover and classification etc.
iii. A Census of Production is taken every five years, covering manufacturing
industries, mines and quarries, building trades and public utility production
services. The information obtained and analysed includes distribution of
labour, allocation of capital resources, sticks of raw materials and finished
goods and expenditure on plant and machinery.
A census has the obvious advantages of completeness and being accepted as
representative, but of course must be paid for in terms of manpower, time and
resources. The three government censuses described above involve great deal of
organization, with some staff needed permanently to answer queries on the census
form, check and correct errors and omissions and extensively analyze and print the
information collected. Forms can take up a year to be returned with a further gap
of up to two years before the complete results are published.
6
1.4 Method of Data Collection
Data collection can be thought of as the means by which information is obtained
from the selected subjects of an investigation. There are various data collection
methods which can be employed. Sometimes a sampling techniques will dictate
which method is used and in other cases there will be a choice, depending on how
much time and manpower (and inevitably money) is available. The following list
gives the most common methods.
1. Observational Studies: Data is collected by observing subjects without
manipulation. This method is often used in social sciences.
2. Interviews: In-depth data collection through direct interaction with
respondents. This method provides qualitative insights.
3.Surveys and Questionnaires: These are used to gather information from a
large group of respondents. They can be conducted via paper, online, or face-to-
face.
4. Experiments: Conducted to test hypotheses by manipulating variables and
observing the effects. This method is common in scientific research.
5. Secondary Data Analysis: Utilizing existing data from sources like
government reports, scientific articles, and online databases.
a. Direct Observation: The investigator obtains the data by observing the object
under study. This method needs personal presence before the respondents, so that
a first-hand information is obtained. It can be ‘Direct’ or ‘Participant’. This method
can be used for examining items sampled from a production line, in traffic surveys
or in work study. It is very labour-intensive, expensive and lengthy, and cannot be
used in many situations. It is seldom used because most statistical inquiries have
wider field than is possible for any one investigator to cover single-handedly within
a reasonable period of time.
b. Individual (Personal) Interview: Questions are asked from the respondents from
the set of question prepared in advance. It can be ‘formal’ or ‘informal’. The
method is adopted generally in cases when the information desired is complex or
there is reluctance or indifference on the part of informant. The method may be
7
time consuming and its success of failure depends largely upon the personal
knowledge of the interviewer and the representative character of the respondent.
This method is probably the most expensive, but has the advantage of completeness
and accuracy.
Other factors involved are:
➢ interviewers need to be trained;
➢ interviewers need arranging;
➢ can be used to advantage for pilot surveys, since questions can be
thoroughly tested;
➢ uniformity of approach if only one interviewer is used;
➢ an interviewer can see or sense if a question has not been fully
understood and it can be followed-up on the spot.
This form of data collection can be used in conjunction with random or
quasi-random sampling.
i. Street (informal) interview:
This method of data collection is normally used in conjunction with quota
sampling, where the interviewer is often just one of a team. Some factors
involved are;
➢ possible difference in interviewer approach to the respondents and the
way replies are recorded;
➢ questions must be short and simple;
➢ non-response is not a problem normally, since refusals are ignored
and another subject selected;
➢ convenient and cheap.
ii. Telephone interview:
This method is sometimes used in conjunction with a systematic sample
(from the telephone book). It would generally be used within a local area
and is often connected with selling a product or a service (for example,
insurance). It has an in-built bias if private homes are being telephoned
8
(rather than businesses), since only those people with telephones can be
contacted and interviewed. It can cause aggravation and the interviewer
needs to be very skilled.
c. Questionnaire: A number of questions pertaining to the object under
investigation (questionnaire) is prepared. These questionnaires are sent to the
informant either by post or by hand, and are to be completed. Mail questionnaires
are generally cheaper and quicker but the response may be so low than that which
are hand-delivered.
Postal questionnaire
This is a much cheaper method than the personal interview since manpower (one of
the most expensive resources) is not used in the data collection. However, much
more effort needs to be put into the design of the questionnaire, since there is often
no way of telling whether or not a respondent has understood the questions or has
answered them correctly (both of these are generally no problem in a personal
interview).
Other factors involved are:
➢ low response rates (although inducements, such as free gifts, often
help);
➢ convenience and cheapness of the method when the population is
scattered geographically;
➢ no prior arrangements necessary (unlike the personal interview);
➢ questionnaires sent to a company may not be filled in by the correct
person.
This method can be used in conjunction with most forms of sampling.