RESEARCH ANALYSIS; QUANTITATIVE DATA THE GOOD RESEARCH GUIDE BY MARTYN DENSCOMBE
Slides dev: Musicha NS
⦿ Sources of quantitative data • Types of quantitative data • Preparing quantitative data for analysis • Grouping the data • Describing the mid-point or average • Describing the spread of data • Looking for patterns and relationships in the data • Statistical tests for association and difference • Presenting the data – tables and charts • Validating the data ⦿ Advantages of quantitative analysis • Disadvantages of quantitative analysis • Checklist for the use of basic statistics Sources of quantitative data ⦿ Quantitative data take the form of numbers.
⦿ They are associated primarily with (but not limited
to) strategies of research such as surveys and experiments, and with research methods such as questionnaires and observation Types of quantitative data ⦿ Nominal data; come from counting things and placing them into a category. E.g., male/female or White/South Asian/African Caribbean. No underlying order to the names. ⦿ Ordinal data are based on counts of things assigned to specific categories and the categories stand in some clear, ordered, ranked relationship which allows comparison. E.g., the Likert scale below: ⦿ Interval data; categories are ranked on a scale; the ‘distance’ between the categories is a known factor and that it is proportionate. E.g., 1966, 1976, 1986
⦿ Ratio data; the categories exist on a scale which has
a ‘true zero’ or an absolute reference point; incomes, distances and weights
⦿ Discrete data; whole units (1,2,3..). Numbers of
children per family. We might aggregate the numbers to arrive at an average figure of 1.9 or whatever, but we do not suppose that there exists anywhere 0.9 of a child belonging to a family.
⦿ Continuous data; measured ‘to the nearest unit’
because they do not come in discrete chunks. E.g., height in mm, weight in grams, age in seconds. Preparing quantitative data for analysis Coding the data; entails the attribution of a number to a piece of data, or group of data; advisable to do it prior to collection of data.
⦿ For example, instead of asking 1000 people to
state what their job is, they can be asked to identify which category of occupations best fits their own.
⦿ The result of doing it this way is that, if there
are, say, ten categories of occupation used, the researcher will obtain results in the form of numbers in each of the ten categories, rather than the list of 1,000 words. Grouping the data (1st stage)
⦿ Organize the raw data in a way that makes
them more easily understood. As Table 13.2 shows, it is difficult to make sense of raw data when the no. of items goes above ten. You should construct an array of the raw data, i.e. to arrange the data in order.
Make a tally of the frequencies
⦿ If there are a large no. of frequencies, make grouped frequency distributions Describing the mid-point or average (a measure of central tendency)
1) The mean (the arithmetic average); It is based on
equal distribution of values. Add all the values and divide by the no. of cases. E.g., if the total =116, and the no. of cases is 8. 116/8= 14.5. 2)The median (the middle point); Values in the data are placed in either ascending or descending rank order and the point which lies in the middle of the range is the median. In the following set of data, the median is 11.5, half-way between the two middle values. 3) The mode (the most common); value which is most common. E.g., 17 in the set below. Describing the spread of data (a measure of dispersion)
⦿ 1) The range; subtract the minimum value
from the maximum value. In the set below, the range is 44 (i.e. 47 minus 3).
⦿ It may give a biased picture of the spread of
the values between the extremes. Here, the range of 44 is determined by the outlier value of 47. 2) Fractiles; (IQR) Fractiles are based on the idea of dividing the range of the data into fractions with each fraction containing the same number of data cases. When quartiles are used, the data are divided into four sections. For deciles, you divide the range into tenths. For percentiles,100 equal parts. 3) Standard deviation; To understand the dispersion of the data, it would be useful to know how far, on average, the values varied from the mean. Looking for patterns and relationships (looking for connections between variables) ⦿ Does one category of data match another, or vary along with another? Is there a difference between two sets of data where similarity was expected?
⦿ Find out: •whether the findings were a fluke; •how
strong the connection is between the two variables; •whether one causes the other, or whether they are mutually interdependent
⦿ Start with a ‘null hypothesis’; the presumption that
there is no real relationship between sets of data. ⦿ Only if the patterns or relationships are shown to be statistically significant (p < 0.05) should you be persuaded to reject the null hypothesis. Statistical tests for association and difference How do I find out if two variables are associated to a significant level?
⦿ Probably the most flexible and certainly the
most commonly used statistical test for this is the chi-square test. It works with nominal, ordinal, interval and ratio data ⦿ It works on the supposition that if there were no relationship between the variables then the units would be distributed in the cells of the contingency table in equal proportion.
⦿ This means that if there were three times as
many women as men in a sample, we would start from the assumption that women would account for 75 per cent of the amounts or frequencies How do I see if two groups or categories are different to a significant level? ⦿ A teacher might wish to compare the results obtained from students in one class with those from another.
⦿ The best statistical test for this purpose is the t-test.
It uses the means of the two sets of data and their standard deviations to arrive at a figure which tells you the specific likelihood that any differences between the two sets of data are due to chance.
⦿ When using the t-test, take the null hypothesis as the
starting point. Treat differences as ‘real’ when you find that there is a probability of ‘p < 0.05’, that any difference between the two sets of data were due to chance. Example: The t-test ⦿ We have two sets of marks obtained from two classes who have both taken the same test at about the same time. Is there a significant difference between the scores obtained by students in class A and the scores obtained by those in class B?
Works well with small sample
sizes (less than 30) and the groups do not have to be exactly the same size. How do I see if three or more groups or categories are significantly related?
⦿ It suggests a fairly large data set with more
than two variables potentially connected and worthy of statistical test.
⦿ It calls for a basic factor analysis, such as
one-way ANOVA (analysis of variance). This test analyses the variation within and between groups or categories of data using a comparison of means. How do I assess the strength of a relationship between two variables? ⦿ Correlations between two variables can be visualized using a scatter plot and they need ratio, interval or ordinal data, and cannot be used with nominal data. ⦿ They also require reasonably large data sets ⦿ Spearman’s rank correlation coefficient (which works with ordinal data) and Pearson’s product moment correlation coefficient (which works with interval and ratio data) are the statistical tests used to arrive at a correlation coefficient based on the data.
⦿ 0.3 is reasonably weak, 0.7 is reasonably
strong. Presenting the data – tables and charts Vital information The table or chart must always have: • a title; • information about the units being represented • the source of the data, if they were originally produced elsewhere.
of nominal data. Bar charts ⦿ Used with both nominal and discrete data. The bars should be of equal width, with the height of the bars representing the frequency or the amount for each separate category. Histograms ⦿ A histogram is used for continuous data, whereas a bar chart is used for discrete data or nominal data: •no gaps between the bars; • the data ‘flow’ along the x axis, rather than being separate items. Scatter plots ⦿ used to display the extent of a relationship between two variables. ⦿ The closer the points come together, the closer the relationship between the variables. Good at showing patterns and deviations Line graphs ⦿ Used for depicting development or progression in a sequence of data. Good for showing trends in data. Pie charts ⦿ They convey the proportions of each category which make up the total. Mostly, the segments are presented in terms of percentages. No more than seven segments and no segment which accounts for less than 2 % of the total. Validating the data ⦿ The research instrument does not vary in the results it produces on different occasions when it is used.
⦿ Respondents provide the same kind of answers to
similar questions and generally answer in a consistent fashion.
⦿ Different researchers would arrive at similar
conclusions when looking at the same data.
⦿ The findings will apply to other people and other
contexts.
⦿ The analysis is ‘correct’ - ‘works’’
Advantages of quantitative analysis ⦿ Scientific; statistical techniques based on the principles of mathematics and probability.
⦿ Confidence; Statistical tests of significance give
researchers additional credibility
⦿ Measurement; Interpretations and findings are
based on measured quantities rather than impressions
⦿ Analysis; Large volumes of quantitative data can
be analysed relatively quickly
⦿ Presentation; Tables and charts provide a
succinct and effective way of organizing Disadvantages of quantitative analysis ⦿ Quality of data are only as good as the methods used to collect them and the questions that are asked.