Quantitative Research Techniques and Statistics Notes
Quantitative Research Techniques and Statistics Notes
Quantitative Research Techniques and Statistics Notes
Descriptive Statistics
1. Definition: Descriptive statistics focuses on organizing, summarizing, and presenting
data to reveal its key features in an informative way. It helps in understanding the
characteristics of a dataset.
2. Graphical Techniques: Graphical methods like histograms or bar graphs visually
represent data distributions, showing patterns such as normal distribution, skewness, or
multimodality.
3. Numerical Techniques: Numerical methods summarize data using measures like the
mean (average), median (midpoint), mode (most frequent value), and range (difference
between highest and lowest values). Variance and standard deviation are also used to
describe data variability.
4. Practical Use: Descriptive statistics are useful for summarizing data, such as estimating
annual profits from an exclusivity agreement by analyzing sample data to understand
overall consumption patterns, which then informs decision-making.
Inferential Statistics
1. Purpose: Inferential statistics involves using sample data to make inferences or draw
conclusions about a larger population.
2. Sampling: Instead of surveying an entire population, a smaller sample is used to infer
characteristics of the whole. This is more practical and cost-effective.
3. Accuracy and Uncertainty: Predictions based on samples come with some degree of
uncertainty. The accuracy of these predictions is typically expressed as a confidence
level, often between 90% and 99%.
In summary, inferential statistics helps in making educated guesses about large populations
based on sample data, though these inferences come with inherent uncertainty and are subject
to correction as new information becomes available.
Key Concepts
Statistical inference involves three key concepts: population, sample, and statistical inference. A
population encompasses all items of interest, such as the diameters of ball bearings, and its
descriptive measure is a parameter. A sample is a subset of the population, and its descriptive
measure is a statistic. Statistical inference uses sample data to estimate, predict, or make
decisions about the larger population. Since examining the entire population is often
impractical, samples are used instead. The reliability of these inferences is quantified by
Cecelia Hof
WK1 Assignment: Quantitative Module Pre-Test and Study Notes
confidence levels (the proportion of times an estimate is correct) and significance levels (the
likelihood of incorrect conclusions).
Confidence Level + Significance Level = 1
Data Collection and Sampling
Data collection in statistics involves several methods, including direct observation, experiments,
and surveys. Direct observation is inexpensive but may yield limited insights and potential
biases. Experiments offer more reliable data but are costlier. Surveys, which can be conducted
through personal interviews, telephone interviews, or self-administered questionnaires, vary in
cost and response accuracy. Key aspects of surveys include ensuring a high response rate and
designing questions clearly to avoid biases.
Sampling methods are used to make inferences about a population based on a smaller sample,
with common techniques including simple random sampling, stratified random sampling, and
cluster sampling. Each method has its advantages and drawbacks in terms of cost, accuracy,
and representativeness.
Sampling errors arise from natural variations between the sample and the population, while
non-sampling errors result from issues like data acquisition mistakes or non-responses. Non-
sampling errors can seriously affect results and are not mitigated by increasing sample size.
Ensuring accurate and representative data collection is crucial to valid statistical analysis.
Sampling Plans
A simple random sample is a sample selected in such a way that every possible sample
with the same number of observations is equally likely to be chosen.
A stratified random sample is obtained by separating the population into mutually
exclusive sets, or strata, and then drawing simple random samples from each stratum.
A cluster sample is a simple random sample of groups or clusters of elements versus a
simple random sample of individual objects.
Probability
To understand probability, we first need to define a random experiment, which is an action or
process leading to one of several possible outcomes.
Probabilities are assigned to outcomes using a sample space, which is a list of all possible
outcomes that is both exhaustive (includes all possibilities) and mutually exclusive (no two
outcomes can occur simultaneously).
Assigning Probabilities involves three approaches:
1. Classical Approach: Used for well-defined scenarios like games of chance. If an
experiment has ( n ) possible outcomes, each outcome is assigned a probability of 1/n.
2. Relative Frequency: Defines probability based on the long-run frequency of outcomes.
For instance, if an event occurs a certain number of times in a large number of trials, its
probability is estimated as the ratio of the number of occurrences to the total number of
trials. This estimate becomes more accurate with a larger sample size.
Cecelia Hof
WK1 Assignment: Quantitative Module Pre-Test and Study Notes
3. Subjective Approach: Used when historical data is unavailable or impractical. This
involves assigning probabilities based on personal judgment or belief, such as
predictions based on experience or analysis.
Interpreting Probability involves understanding it as the long-term relative frequency of an
event occurring. This approach links probability with statistical inference and real-world
applications.
Sampling Distributions
Cecelia Hof
WK1 Assignment: Quantitative Module Pre-Test and Study Notes
A sampling distribution is the probability distribution of a given statistic based on a random
sample. It shows how the statistic varies from sample to sample.
1. Distribution of the Sample Mean:
a. If you take multiple samples from a population and calculate the mean of each
sample, the distribution of these sample means is called the sampling
distribution of the sample mean.
b. According to the Central Limit Theorem, if the sample size is sufficiently large,
the sampling distribution of the sample mean will be approximately normally
distributed, regardless of the shape of the population distribution.
2. Standard Error:
a. The standard error measures the dispersion of the sample statistic around the
population parameter. For the sample mean, it is the standard deviation of the
sampling distribution of the sample mean.
3. Central Limit Theorem (CLT):
a. The CLT states that the sampling distribution of the sample mean will tend to be
normal (or approximately normal) if the sample size is large enough, regardless
of the population's distribution.
4. Sampling Distribution of Proportions:
a. When dealing with proportions, the sampling distribution of the sample
proportion (e.g., proportion of successes in a sample) also approaches a normal
distribution as the sample size increases, provided the sample size is large
enough to satisfy the conditions
5. Application:
a. Sampling distributions are used to estimate population parameters, construct
confidence intervals, and conduct hypothesis tests.
Testing the population mean when the population standard deviation is known
1. Formulate Hypotheses: Null Hypothesis and Alternative Hypothesis
2. Determine the Significance Level (α):This is the probability of rejecting the null
hypothesis when it is true, commonly set at 0.05, 0.01, or 0.10.
3. Calculate the Test Statistic: The test statistic for the population mean when the standard
deviation is known is calculated using the formula: where x is the sample mean, μ₀ is the
hypothesized population mean, σ is the known population standard deviation, and n is
the sample size.
4. Determine the Rejection Region:
a. For a two-tailed test, the rejection regions are in both tails of the normal
distribution, determined by critical z-values corresponding to the significance
level
b. For a one-tailed test, the rejection region is in one tail (either right or left)
depending on whether the alternative hypothesis specifies greater than or less
than
5. Make a Decision:
a. Rejection Region Method: Compare the test statistic to the critical z-value(s). If
the test statistic falls into the rejection region, reject the null hypothesis.
b. p-Value Method: Calculate the p-value, which is the probability of observing a
test statistic as extreme as, or more extreme than, the one computed. If the p-
value is less than α, reject the null hypothesis.
6. Interpret Results:
a. If you reject the null hypothesis, there is statistical evidence suggesting that the
population mean differs from the specified value. If you do not reject the null
hypothesis, there is insufficient evidence to claim a difference.
This process allows you to determine whether observed sample data provide enough evidence
to make inferences about the population mean.
Multiple Comparisons
involve testing multiple hypotheses simultaneously to determine if there are significant
differences among groups or treatments.
1. Identify which specific group means differ when multiple comparisons are made after a
significant result from an overall test (e.g., ANOVA).
2. Problem: Performing multiple statistical tests increases the risk of Type I errors (false
positives), where you incorrectly conclude that a difference exists when it does not.
3. Techniques:
o Tukey's Honestly Significant Difference (HSD): Compares all pairs of means while
controlling for Type I errors. Suitable for equal sample sizes.
o Bonferroni Correction: Adjusts the significance level by dividing it by the number
of comparisons. It is conservative and reduces Type I errors but may increase
Type II errors (false negatives).
o Scheffé’s Test: Flexible and can handle unequal sample sizes. It is less powerful
but controls Type I errors in a broader range of comparisons.
o Dunnett’s Test: Compares each group to a control group, controlling Type I
errors when comparing multiple treatments to a single control.
4. Decision: Choose an appropriate method based on the nature of the comparisons and
the balance between controlling Type I and Type II errors.
In general, the difference between the two experimental designs is that, in the randomized
block experiment, blocking is performed specifically to reduce variation, whereas in the two-
factor model, the effect of the factors on the response variable is of interest to the statistics
Cecelia Hof
WK1 Assignment: Quantitative Module Pre-Test and Study Notes
practitioner. The criteria that define the blocks are always characteristics of the experimental
units. Consequently, factors that are characteristics of the experimental units will be treated
not as factors in a multifactor study, but as blocks in a randomized block experiment.
Google uses "people analytics" to inform its talent management strategies, employing data to
enhance various aspects of its workforce dynamics. This approach includes relational analytics,
which examines how employee interactions impact overall performance. By analyzing what’s
termed "digital exhaust," which encompasses data from emails, chats, and collaboration tools,
Google gains insights into the underlying social networks that contribute to its success. There
are six key elements: ideation, influence, efficiency, innovation, silos, and vulnerability. By
assessing these factors, Google can identify employees who play critical roles in achieving
company objectives. Initiatives like Project Oxygen exemplify this data-driven strategy, as they
identify the traits of effective managers and share these insights to enhance leadership
development across teams.
I believe that leveraging data in this manner is ethical, especially when employees are kept
informed about how their data is utilized. Google’s focus on fostering a positive work
environment while pursuing organizational goals reflects a commitment to employee welfare.
It’s essential for companies to establish clear policies regarding data collection and usage,
ensuring that employees feel comfortable and respected throughout the process. Moreover,
management must remember that data represents real individuals, not just numbers. By
maintaining a personal connection while employing analytics, organizations can create a more
engaged and productive workforce. As long as employees are aware of and consent to the use
of their data, relational analytics can serve as a valuable tool for enhancing both employee
experience and overall performance. In summary, as companies like Google continue to
advance their talent management practices through data, prioritizing ethical considerations and
a human-centered approach is crucial for long-term success.
Cecelia Hof
WK1 Assignment: Quantitative Module Pre-Test and Study Notes
In evaluating whether Amazon should be "broken up" under antitrust laws, it’s essential to
consider the ethical implications of its business practices, particularly regarding competition.
Reports indicating that Amazon employees have used proprietary data from independent
sellers to develop competing products raise serious concerns about fairness and transparency.
This behavior not only undermines trust among third-party sellers but also threatens the
competitive landscape of the marketplace. If Amazon's actions are shown to significantly stifle
competition and harm independent brands, it may warrant regulatory scrutiny akin to the
enforcement of the Sherman Antitrust Act by President Roosevelt.
To draw the line for intervention, we should assess factors such as Amazon's market power, the
impact of its practices on consumer prices and innovation, and whether these practices mirror
historical monopolistic behaviors. Ultimately, fostering open discussions around these ethical
issues in our data analytics framework can help illuminate the broader implications of such
business practices and guide potential regulatory approaches, ensuring a fair and competitive
marketplace for all stakeholders.
When considering whether Amazon should be "broken up" under antitrust laws, we need to
think about the ethical issues surrounding its business practices. Reports that Amazon
employees have used data from independent sellers to create competing products raise serious
concerns about fairness. This not only undermines trust among third-party sellers but also
threatens competition on the platform. If it's shown that Amazon's actions significantly hurt
these sellers and limit competition, that could justify regulatory action similar to what President
Roosevelt did with the Sherman Antitrust Act.
To figure out where to draw the line, we should look at Amazon's market power, how its
practices affect prices and innovation for consumers, and whether these actions resemble past
monopolistic behaviors. Ultimately, fostering open discussions about these ethical issues can
help us understand the wider implications and guide future regulatory approaches, ensuring a
fair marketplace for everyone involved.