QT Short Notes
QT Short Notes
QT Short Notes
Introduction to statistics
The term ‘Statistics’ seems to have been derived from the Latin word ‘status’or the
Italian word ‘statista’ or the German word ‘statistik’ or the French word
‘statistique’.Each word means a ‘political state’ . In the early years a collection of facts
about the People in the state for administrative or political purpose was known as
statistics. For the proper administration of the state it is necessary to collect data
regarding income, expenditure, wealth, health, employment, birth, death, etc.of the
people belonging to the state. Thus the subject of statistics has developed as a ‘ science of
state craft’.
Meaning of statistics
The term ‘statistics’ is used as a plural noun as well as as singular noun .In plural sense
statistics means numerical data . In singular sense statistics means different techniques
and methods used for
collection, analysis and interpretation of numerical data
Definition of statistics
According to croxton and cowden, “statistic is the collection presentation analysis and
interpretation of numerical data”.
Importance of statistics
1. Simple presentation of data
2. Helps in comparison
3. Helps in decision making
4. Helps in measuring relationship
5. Helps in planning and policy formulation
Limitations of statistics
1. Statistics does not study qualitative phenomena
2. Statistics is incapable of revealing all aspects of a problem
3. Statistical laws lack accuracy following
4. Statistics does not deal with individual items
5. Statistics is liable to be misused
6. Data must be uniform
7. Too many methods to find a single result
8. Statistics are true only on an average
Distrust of statistics
Distrust of statistics refers to disbelief or lack of faith in statistics.
Distrust Of statistics occurs due to the reasons:
1. incomplete knowledge of statistical method
2. Unrealistic assumptions
3. Deliberate misuse of statistics
4. Ignoring limitations of statistics
5. Wrong application of statistical methods
To overcome the problems of distrust, the following precautions should be taken into
The term survey means search for information, knowledge or truth. The
investigation is statistical when it is conducted by using statistical methods.
The suitability of primary data for the The suitability of the secondary data
current investigation will be more. for the current investigation cannot be
predicted. It may or may not suit the
objectives of the study.
Primary data can be used without much The use of secondary data should be
precaution because the data is collected with grater care; otherwise it may lead
by the investigator itself. to wrong interpretations.
The source of primary data may be the The source of secondary data are
result of an experiment, a survey etc. governmental and nongovernmental
organization, published reports,
journals, books etc.
The possibility of personal prejudice Possibility of lesser degree of personal
exists in primary data. prejudice.
1. Published sources
- Official publications of Central, state and local governments, Official
publications of foreign governments or International bodies, Report
submitted by University bureaus, economist research scholars, etc.
2. Unpublished sources
- It includes many enquiries of private nature are conducted by some persons
which are not published since they are usually meant for private use.
- Examples data relating to trade associations, Chambers of Commerce etc.
● It should be divided into two parts : The first part should contain the aims
and objectives of the enquiry and the reasons for issuing the questionnaire.
Second part is the main part and it contains those questions.
● Questions should be clear, simple and easy to understand : Questionnaires
should use apt words at the apt places. Words with the multiple meaning
should be avoided
● Questionnaire should be brief : Number of questions should be reduced to
the minimum.
● Questions should be arranged in a logical order :
● Personal questions should be avoided : Avoid private, confidential or
personal questions which respondents will be reluctant to answer it.
● Questions should be capable of objective answers : as far as possible the
question should be in the form of yes or no.
● Questionnaire should look attractive.
● Questions requiring calculations should be avoided.
● Questionnaire should be pre tested with a group before mailing it out : Pre
testing helps to overcome the shortcomings of the questionnaire.
● Crosscheck.
● Method of tabulation
All individual items are studied As only selected items are studied
1. Convenient Sampling
- Is a non probability sampling in which sampling units are selected according
to the convenience of the investigator.
- This method is also known as accident sampling because the researcher
select those respondents he meets accidentally.
3. Quota Sampling
- In this method the interviewer is instructed to interview a specified number
of persons from each quota.
- This quota is fixed in advance for collecting samples from each group
according to certain specific homogeneous characteristics.
4. Snowball Sampling
- In the sampling a set of respondents are selected initially and interviewed.
- After this respondents are asked to list the names of other people in their
opinion who form a part of the target sample.
- So this technique creates a snowball effect which keeps on growing in size
as it rolls down.
● Purpose of survey
● Measurability
● Degree of precision
● Information about population
● Nature of population
● Geographical area of the study and size of population
● Financial resources
● Time limitation
● Economy
Theories of sampling
Coding - It is the process of organising and sorting the collected data. Coding is
done by assigning some symbols, alphabetical or numericals or both to the
collected data.
Module 3 A
Univariate Data Analysis- 1
Measures of Central Tendency
Essential characteristics
Types of averages
Average is obtained by adding together all the items and by dividing this
total by the number of items .
An average may be defined as “the quotient obtained by dividing the total of the
values of a variable by the total number of their observations”
Calculation of Arithmetic Average
1. Individual series
2. Discrete series
3. Continuous series
Merits and Demerits of arithmetic average
1. It is rigidly defined and hence there is no scope for ambiguity or
misunderstanding about its meaning
2. It is easy to understand and thus it is a popular average.
3. it is simple to calculate .
4. It is based on all the items of a series.
5. It is not very much affected by fluctuations in sampling .
6. It is capable of further algebraic treatment.
7. It is a calculated value, and not based on position in the series
1. It cannot be determined by inspection nor can it be located graphically
2. It cannot be used in the study of qualitative phenomena like intelligence, beauty
, etc .
3. It is affected by Extreme values.
4. It is not suitable for averaging ratios and percentages
5. If a single observation is missing or lost or is illegible, mean cannot be
6. In a distribution with open - and classes the value of mean cannot be computed
without making assumptions regarding the size of the class interval of the open end
Combined mean
“ A combined mean is the mean of whole series when there are two or more
component series”
Correction in mean
From the total of the values the incorrect values are first subtracted and
then the correct values are added. This total is divided by the number of items to
get the correct value of the mean.
Weighted average (weighted mean )
Weighted average may be defined as the average, whose component items are
being multiplied by certain values known as weights and the aggregate of the
multiplied results are being divided by the total sum of their weights instead of the
sum of the items.
Advantages of weighted average
1. Unequal importance to items.
2. Varying frequencies .
3. Wide change in values or frequencies
4. Comparison .
5. Calculation of average from different series .
6. Calculation of rates
Geometric mean
Geometric mean is defined as the nth root of the product of n items.
Steps for calculating Geometric mean
1. Find the logarithm of all values
2. Add the logarithmic values
3. Divide them by number of items
4. Then find the anti logarithm
Merits and demerits of geometric mean
1. Well defined .
2. Based on all the items .
3. Further algebraic treatment .
4. Not affected by fluctuations .
5. weight according to size of items .
6. Not affected by Extreme values .
1. It is difficult to understand by a layman.
2. It cannot be calculated if the number of negative values is odd.
3. It cannot be calculated if any value is zero.
4. At times it gives a value which may not be found in the series.
Harmonic mean
Harmonic mean is a mathematical average. It is defined as “the reciprocal
of the Arithmetic average of the reciprocals of the values of a variable”
Merits and demerits of harmonic mean
1. Well defined
2. Based on all the items
3. Further algebraic treatment
4. Not affected by fluctuations
5. Measures relative changes
6. Can be calculated even when a series contain negative values
1. It is not easy to understand by a layman
2. It is only a summary figure and may not be the actual item in the series
3. It is very difficult to calculate
4. It is not truly representative of the statistical series
5. Its algebraic treatment is very much limited
Module 3B
Uni-variate data analysis- 1
Positional averages and Partition values
Median is a positional average . It is the middle most item of a series
when the values are arranged according to their magnitudes .
Median may be defined as “ that value of the variable which divides the group into
two equal parts, one Part comprising all the values greater and the other, all values
being less than the median”.
Merits of median
1. Rigidly defined
2. Easiness
3. positional average
4. good for open end classes
5. location by inspection
6. can deal with qualitative data
7. Central location
Demerits of median
1. In case of even number of observations for an ungrouped data, median cannot be
determined exactly . In this case median is the arithmetic average of two middle
2. Median is not suitable for further mathematical treatment .
3. Median is relative less stable than mean , particularly for small samples.
4. Median being a positional average, is not based on each and every item of the
distribution .
5. For calculating median, it is necessary to average the data in ascending order or
descending order; other averages do not need any arrangement .
6. The value of median is affected more by sampling fluctuations than the value of
arithmetic mean
7. At times, it produces a value which is never found in the series.
Partition values
The values whitch break the series into a number of equal parts are called the
partition values.
Median, quartiles, deciles , percentiles etc. are the important partition values .
The values which divide the series into four equal parts are known as
quartiles.There are three quartiles namely Q1 , Q2, Q3.
The values which divide the series into 10 equal parts are known as deciles . There
are 9desciles D1, D2 ,D3 etc..D9.
The values which divide the series in to 100 equal parts are known as
percentiles.There are 99 percentiles , P1, P2, P3 , etc...P99.
Module 3 C
Uni-variate data analysis- 1 Mode
According to A M Tuttle, “ mode is the value which has the greatest frequency
density in its immediate neighbourhood”.
Characteristics of mode
1. It gives the most representative value of a series.
2. It is not affected by the extreme values of a series.
3. It can be determined graphically .
4. It is considered as a reliable average for studying skewness of a distribution .
5. It is commonly understood and easily calculated .
6. open end classes also do not pose any problem in the location of mode.
7. For the calculation of mode It is not necessary to know the value of all the items
of a series
8. It is very much useful in the field of business and Commerce.
1. It is not rigidly defined and so in some cases it may come out with the different
the result.
2. It is not based on all the observation of a series but on the concentration of
frequencies of the items
3. It is not capable of further algebraic treatment
4. It is ill defined , inderminate and indefinite.
5. As compared with mean, mode is affected to a greater extent by sampling
6. It cannot be determined from series with unequal class intervals unless they are
7. In many cases , it may be impossible to get a definite value of mode .
Module 4 B
Univariate data analysis - 2
Skewness means the asymmetry or lack of symmetry.
1. It refers to lack of symmetry
2. lt refers to the difference in value of Mean, Median and mode
3. it refers to the difference in distance between the quartiles and median.
4. It may be positive or negative.
Types of skewness
1. Karl Pearson’s coefficient of skewness.
2. Bowley's co- efficient of skewness.
3. Kelly’s coefficient of skewness .
4. Measures of skewness based on moments and kurtosis.
Moments defined as “the arithmetic average of a certain power of deviations
of the items from their arithmetic mean”.
Uses of moments
Interpolation and Extrapolation
Interpolation is a statistical technique used for arriving at a missing value from the
normal values pertaining to a given phenomenon .
Extrapolation is a statistical technique used for arriving at the unknown projected
value from the Known values pertaining to a given phenomenon .
Methods of interpolation
1. Graphic method
2. Algebraic method
(a) Newton’s formula of advancing differences
(b) Binomial expansion method
(c) Lagrange's method