0% found this document useful (0 votes)
38 views35 pages

Topic 1 Introduction To Statistics

a

Uploaded by

aiskrimsoda483
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views35 pages

Topic 1 Introduction To Statistics

a

Uploaded by

aiskrimsoda483
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Topic 1

Introduction to Statistics
What is statistics?

• Statistics are numerical facts or figures. In essence, it deals with:


a) Collecting
b) Classifying
c) Analysing
d) Presenting
in order to get a better understanding of some given situation.
• This may mean summarizing the data in tabulated, graphical or numerical
form.
• Enabling us to gain insight into a virtually unknown situation to a
sophisticated analysis designed to produce numerical confirmation, or
rejection, of some widely held belief.
Types of Statistics

In maritime business, we may be interested in the type of statistics based on


a set of data:

• Descriptive analysis: describe data both numerically and graphically in


the most appropriate manner
• Inferential statistics: a branch of statistics that involves using data
from a sample to make inferences or draw conclusions about a larger
population.
• Forecasting: using data up to the present time to estimate the value of
the quantity in the future
Descriptive Analysis

Descriptive analysis is a branch of statistics that involves summarizing and


describing the basic features of a data set. Here are some examples of
descriptive analysis:
• Measures of Central Tendency: Descriptive analysis can involve calculating
measures of central tendency, such as mean, median, and mode, to
describe the typical or central value of a data set.
• Measures of Variability: Descriptive analysis can also involve calculating
measures of variability, such as range, variance, and standard deviation, to
describe the spread or dispersion of a data set.
• Frequency Distribution: Descriptive analysis can involve creating a
frequency distribution, which shows how often each value or category
appears in a data set.
• Cross-Tabulation: Descriptive analysis can involve cross-tabulating or
creating contingency tables, which summarize the frequency distribution of
two or more categorical variables.
• Graphical Analysis: Descriptive analysis can also involve creating graphical
representations of data, such as histograms, box plots, scatterplots, and
bar charts, to visually display the distribution, central tendency, and
variability of the data.
• These are just a few examples of descriptive analysis techniques.
Descriptive analysis is useful for summarizing and understanding the basic
characteristics of a data set, which can provide insights for further analysis
and decision-making.
Inferential Statistics
• Inferential statistics is a branch of statistics that involves using data from a
sample to make inferences or draw conclusions about a larger population. It
involves analyzing and interpreting data in a way that allows researchers to make
predictions or draw conclusions that can be generalized beyond the specific
sample that was studied.
• Inferential statistics uses a variety of techniques to estimate population
parameters (such as means, variances, and proportions) based on sample data.
These techniques include hypothesis testing, confidence intervals, and regression
analysis.
• The goal of inferential statistics is to make accurate and reliable predictions about
a population based on a smaller sample of data. However, it is important to note
that there are certain assumptions and limitations to inferential statistics, and care
must be taken to ensure that the sample data is representative of the population
being studied.
Forecasting Analysis

• Forecasting analysis is a branch of statistical analysis that involves using


historical data to predict future trends or outcomes. It involves analyzing
past patterns, trends, and relationships within the data to create a model
that can be used to make predictions about future events or behaviors.
• There are various techniques used in forecasting analysis, including time
series analysis, regression analysis, and machine learning algorithms. Time
series analysis involves analyzing the historical patterns of a variable over
time to identify trends, cycles, seasonality, and other patterns. Regression
analysis involves identifying relationships between variables and using those
relationships to make predictions about future outcomes. Machine learning
algorithms use historical data to train models that can then be used to make
predictions about future events.
Types of Data & Scale of Measurement

There are four types of data, which are:

• Nominal Data: Nominal data is a type of categorical data in which data


values represent discrete categories with no inherent order or ranking.
Examples of nominal data include gender, nationality, and favorite color.

• Ordinal Data: Ordinal data is a type of categorical data in which data values
represent discrete categories with a specific order or ranking. Examples of
ordinal data include education level (such as elementary, high school, and
college) and customer satisfaction ratings (such as "very satisfied",
"satisfied", "neutral", "dissatisfied", and "very dissatisfied").
• Interval Data: Interval data is a type of numerical data in which data values
represent continuous numerical values with equal intervals between them.
Examples of interval data include temperature, IQ scores, and year of birth.

• Ratio Data: Ratio data is a type of numerical data in which data values
represent continuous numerical values with a true zero point. Examples of
ratio data include height, weight, and income.

It is important to recognize the type of data when analyzing and interpreting


data because different types of data require different statistical analysis
techniques. For example, nominal and ordinal data often require frequency
distribution tables and chi-square tests, while interval and ratio data are
analyzed using measures of central tendency, variability, correlation, and
regression analysis.
Qualitative versus Quantitative Data

Qualitative and quantitative data are two types of data used in research and
analysis.

• Qualitative data is non-numerical data that is descriptive and subjective. It


involves collecting and analyzing data through observation, interviews, focus
groups, and other forms of non-numeric data collection. Qualitative data
provides insights into the opinions, attitudes, behaviors, and perceptions of
individuals and groups. It often involves analyzing text, images, or other
forms of non-numerical data to identify patterns, themes, and relationships.
Examples of qualitative data include interview transcripts, survey responses,
and observational field notes.
• Quantitative data, on the other hand, is numerical data that is objective and
measurable. It involves collecting and analyzing data through structured
surveys, experiments, and other forms of numeric data collection.
Quantitative data provides insights into the relationships between variables,
and allows for statistical analysis and hypothesis testing. Examples of
quantitative data include age, height, weight, test scores, and numerical
ratings.
• One important difference between qualitative and quantitative data is that
qualitative data tends to be exploratory and used to generate hypotheses,
while quantitative data tends to be confirmatory and used to test hypotheses.
Qualitative data is often used in social sciences, such as sociology and
management, while quantitative data is often used in sciences such as
physics, chemistry, and economics.
• Both qualitative and quantitative data have their own strengths and
weaknesses, and the choice between them depends on the research question
and the goals of the study. In some cases, researchers may use both types
of data to gain a more comprehensive understanding of a phenomenon.
Discrete and Continuous Data

Discrete and continuous data are two types of numerical data used in
statistics.

• Discrete data is numerical data that consists of separate values or


categories that are distinct and separate from one another. The values of
discrete data cannot be divided or broken down into smaller units. Examples
of discrete data include the number of staff in company, the number of
customers who purchased a particular product, and the number of vessels
in berth area.
• Continuous data, on the other hand, is numerical data that can take on any
value within a range or interval. Continuous data is usually measured using
decimal points or fractions. Examples of continuous data include
temperature, height, weight, and time. In contrast to discrete data,
continuous data can be divided into smaller and smaller units.
• The distinction between discrete and continuous data is important because
different statistical methods are used to analyze each type of data. Discrete
data is usually analyzed using frequency tables, graphs, and measures of
central tendency, such as the mean and mode. Continuous data is analyzed
using histograms, density plots, and measures of variability, such as the
range and standard deviation.
• It is important to determine whether data is discrete or continuous before
analyzing it, as using the wrong statistical method can lead to incorrect
conclusions.
Population and Sampling

• In statistics, a population is the entire group of individuals, objects, events,


or other entities that are of interest to a researcher or analyst. A population
is often defined by certain characteristics or criteria, such as age, positions,
geographic location, or other relevant factors.

• For example, the population of a study on the effectiveness of a new nation-


flagged regulation might need some seafarers’ opinions working onboard. In
another example, the population for a survey on public opinion about a
trade issue might be defined as all importers and exporters in a particular
country.
• It is important to define the population clearly and precisely in order to draw
accurate conclusions about the characteristics or behaviors of the
population. However, in many cases, it is impractical or impossible to study
the entire population, and so a sample is used instead.
• A sample is a subset of the population that is selected for study or analysis.
Sampling techniques are used to ensure that the sample is representative of
the population and that the results obtained from the sample can be
generalized to the population as a whole.
Types of Sampling
There are several types of sampling methods, including:
• Simple Random Sampling: A method where each member of the population
has an equal chance of being selected.
• Systematic Sampling: A method where members of the population are
selected at regular intervals
• Stratified Sampling: A method where the population is divided into
subgroups or strata, and members are selected from each stratum in
proportion to its size.
• Cluster Sampling: A method where the population is divided into clusters,
and a random sample of clusters is selected for analysis.
Stratified versus Cluster Sampling
• Convenience Sampling: A method where members of the population are
selected based on their easy accessibility or availability.
• Snowball Sampling: A method where participants are asked to refer other
potential participants, and the sample grows through referrals.
• Quota Sampling: A method where participants are selected based on
specific characteristics or quotas, such as age, gender, or income level.

• Each sampling method has its advantages and disadvantages, and the
choice of sampling method will depend on the research question, the
characteristics of the population, and other factors.
Example
• Kirsty would like to survey what people enjoy eating for lunch in
the canteen. She lists all the members in the population and
gives them a unique reference number. After selecting the first
member at random, she then selects the remainder of the
sample over a specified interval. What type of sampling method
is Kirsty using?
• Kirsty would like to survey what people enjoy eating for lunch in
the canteen. She lists all the members in the population and
gives them a unique reference number. After selecting the first
member at random, she then selects the remainder of the
sample over a specified interval. What type of sampling method
is Kirsty using?

= Systematic sampling
• Peter is carrying out a survey. He would like to find out how
many miles people drive per day. He divides the population into
a two way table, and calculates the proportion of each category
required for his sample. Which sampling technique is Peter
using?
• Peter is carrying out a survey. He would like to find out how
many miles people drive per day. He divides the population into
a two way table, and calculates the proportion of each category
required for his sample. Which sampling technique is Peter
using?

= Stratified sampling
• Laura has a sampling frame of all the people who are members
of a club. She gives each member a unique reference number
and uses a random number generator to select the sample.
Which sampling method is Laura using?
• Laura has a sampling frame of all the people who are members
of a club. She gives each member a unique reference number
and uses a random number generator to select the sample.
Which sampling method is Laura using?

= Random sampling
Krejcie and Morgan, 1970
G*Power

• G*Power is statistical power analysis software that allows researchers to


estimate the sample size required for their research study or to calculate
statistical power based on the sample size, effect size, and significance
level. It is designed to help researchers and students in planning,
conducting, and analyzing various types of statistical tests.

• With G*Power, researchers can calculate statistical parameters, such as


sample size, effect size, statistical power, and alpha level and offers a range
of statistical tests and models, including posthoc tests, correlation analysis,
and factorial designs.
Summary

Scale of Non-numeric data Numeric data


Measurement
Nominal Name or label only Numbers only identify
groups which cannot be
ordered
Ordinal Name or label only can These numbers allow
be ranked ranking but no arithmetic
Interval Always numeric Intervals between numbers
are meaningful
Ratio Always numeric Intervals between numbers
are meaningful and also
their ratios as the lowest
value is a meaningful zero
Example:
You are doing a study to determine the number of years of education each
officer at a shipping company has. Identify the suitable sampling technique.

• Can be a cluster sample because each department is a naturally occurring


subdivision.
or
• A convenience sample because you are using the officers that are readily
available to you.
or
• A stratified sample because the officers are divided by department and
some from each department are randomly selected.

You might also like