TRENDWorkshop

www.toolkit.net.
au
Detecting trends in
environmental time series data
Francis Chiew
The University of Melbourne
Brisbane 6 July Sydney 20 July

Workshop Objectives
• To discuss reasons for detecting trend/change in
environmental time series data.
• To show some visual tools for exploratory data analysis

(EDA).
• To explain basic concepts in formal statistical testing of

trend/change in time series data.
• To show some examples of the statistical methods.
• To provide hands-on experience on the use of TREND, a

model product in the Modelling Toolkit.
The methods presented in this workshop can be applied to

any time series data, but the focus of the workshop is mainly
on annual streamflow, rainfall and other hydrologic data.
TREND
• TREND is a model product in the Modelling Toolkit (
www.toolkit.net.au/trend).
• TREND is designed to facilitate statistical testing for

trend, change and randomness in hydrological and other
time series data.
• TREND has 12 statistical tests, based on the

WMO/UNESCO Expert Workshop on Trend/Change
Detection and the CRC for Catchment Hydrology publication
Hydrological Recipes.
Workshop Program
9:00 – 9:15 Welcome and Introduction
9:15 – 10:00 Presentation on trend detection, exploratory data

analysis and formal statistical testing (basic
concepts, and types of tests in TREND)
10:00 – 10.30 Demonstration of statistical tests (in Excel)
10:30 – 10:50 Morning tea
10:50 – 12:30 Demonstration of TREND
12:30 – 1.15 Lunch
1:15 – 3:00 Hands-on experience with TREND
3:00 – 3.20 Afternoon tea
3.20 – 5.00 More statistical tests and TREND

Why detect trend/change in
environmental time series data?
Most water resources systems have been designed and operated

based on the assumption of stationary hydrology. If this
assumption of stationarity is not valid, current systems may be
under or over designed.
Trend/change in environmental time series data can be caused by

• climate change as a result of increased greenhouse gas
concentrations
• land use change (urbanisation, clearing, afforestation, etc…)
• change in management practice
• etc …
Climate change and climate variability
Climate change defines the difference between long-term mean

values of a climate parameter or statistic, where the mean is taken
over a specified interval of time, usually several decades. Climate
change can occur because of internal changes in external forcing
either for natural reasons or because of human activities. It is
difficult to make clear attribution between these causes.
Climate variability can be regarded as the variability (the extremes

and differences of monthly, seasonal and annual values from the
climatically expected value) inherent in the stationary process
approximating the climate on a scale of a few decades. The inter-
annual hydroclimate can vary considerably, resulting in difficulty in
detecting a statistically significant change in the climate.
Climate change signal in streamflow data?
Natural annual inflows into Hume Weir

14,000,000
12,000,000
10,000,000
Annual flow
8,000,000
6,000,000
4,000,000
2,000,000
0
1891 1911 1931 1951 1971 1991
• Clear jump in mean (climate change?)

• Multi-jumps in mean (inter-decadal variability?)
Hydroclimate is always changing
over various time scales
Annual inflows into O’Shannassy Reservoir
200
150
Annual inflow (GL)
Interdecadal
variability Climate
100 change?
50
0
1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010
• Seasonal • Interdecadal (20-30 years)

• Inter-annual • “Climate change”
• ENSO (3-7 years)
Exploratory data analysis (EDA)
• Exploratory data analysis (EDA) involves using graphs to explore,

understand and present data. EDA is an iterative process where
graphs are plotted and refined so that important features of the
data can be seen clearly.
• EDA is an essential component of any statistical analysis. EDA

allows much greater appreciation of the features in the data than
summary statistics or statistical significance levels. This is because
the human brain and visual system is very powerful at identifying
and interpreting patterns. A well conducted EDA may eliminate the
need for a formal statistical analysis.
• A statistical analysis is incomplete with EDA. Statistical test

results can be meaningless without a proper understanding of the
data. For example, EDA can identify amongst other things, outliers
due to poor data quality, strong data independence/autocorrelation,
and change in station location or recording technique.
Time series plot
• The time series plot is the most useful visual tool for analysing
trend/change.
• The variable of interest is plotted against time as a scatter or line

plot.
• A trend line can be fitted to the data. Trend lines include moving
average, linear regression, quadratic regression, and nonparametric
smoothing (e.g., lowess smooth).
• The Excel spreadsheet can be used to perform most of the plotting

and visual trend analysis (see examples in Excel spreadsheet).
• Multiple time series plots can be used when data from several sites
(plot same data on the same scale) or variables are available. It can
be informative presenting data from several sites within a region (or
different variables for a location) on a single page. For example, a
trend/change is more conclusive if it is observed in data from
several locations.
Example time series plots
5000
4000
3000
Annual River Nile flows
at Aswan in cumecs
2000
1000
1860 1870 1880 1890 1900 1910 1920 1930 1940 1950
Scatter plot
5000 5000
4000 4000
3000 3000
2000 2000
1000 1000
1860 1870 1880 1890 1900 1910 1920 1930 1940 1950 1860 1870 1880 1890 1900 1910 1920 1930 1940 1950
Line plot with data points Line plot without data points
Example trend fits to time series data
5000 5000
4000 4000
3000 3000
2000 2000
1000 1000
1860 1870 1880 1890 1900 1910 1920 1930 1940 1950 1860 1870 1880 1890 1900 1910 1920 1930 1940 1950
10-point moving average Lowess smooth
5000 5000
4000 4000
3000 3000
2000 2000
1000 1000
1860 1870 1880 1890 1900 1910 1920 1930 1940 1950 1860 1870 1880 1890 1900 1910 1920 1930 1940 1950
Linear regression Polynomial regression

Some commercial software for visual data
analysis and time series data analysis
• Excel
• Mathematica
• MATLAB
• MINITAB
• SAS
• SPlus
• SPSS
• Systat
Statistical methods for detecting
trend/change in time series data
Change in a time series can occur steadily (a trend), abruptly (a step-
change) or in a more complex form. It may affect the mean, median,
variance or other aspect of the data.
TREND has various statistical methods for detecting trend, step

change, differences in means/medians between two data periods and
randomness in hydrological time series data.
The statistical methods in TREND are based on the WMO/UNESCO

WCP Expert Workshop on “Detecting Trend and Other Changes in
Hydrological Data” and the CRCCH “Hydrological Recipes”.
Kundzewicz, Z.W. and Robson, A. (Editors) (2000) Detecting Trend and Other Changes in
Hydrological Data. World Climate Program – Water, WMO/UNESCO, WCDMP-45,
WMO/TD 1013, Geneva, 157 pp.
Grayson, R.B., Argent, R.M., Nathan, R.J., McMahon, T.A. and Mein, R. (1996) Hydrological
Recipes: Estimation Techniques in Australian Hydrology. Cooperative Research Centre for
Catchment Hydrology, Australia, 125 pp.
Basic concepts of statistical testing
Hypotheses
The starting point of a statistical test is to define a null hypothesis (H 0) and an
alternative hypothesis (H1). For example, to test for trend in the mean of a
time series, H0 would be that there is no change in the mean of the data, and H 1
would be that the mean is either increasing or decreasing with time.
Test statistic
The test statistic is a means of comparing the H 0 and H1. It is a numerical value
calculated from the data series that is being tested.
Significance level
The significance level is a means of measuring whether the test statistic is very
different from values that would typically occur under H 0.
Power and errors

There are two possible types of errors. Type I error is when H 0 is incorrectly
rejected. Type II error is when H 0 is accepted when H1 is true. A test with low
Type II error is said to be powerful.
Significance level
• The significance level () is a means of measuring whether the test statistic is very
different from values that would typically occur under H 0.
• Specifically, the significance level is the probability of a test statistic value as extreme
as, or more extreme than the observed value assuming no trend/change (H 0). For
example, for  = 0.05, the critical test statistic value is the value that would be
exceeded by 5% of test statistic values obtained from randomly generated data. If the
test statistic value is greater than the critical test statistic value, H 0 is rejected.
• The significance level is therefore the probability that a test detects a trend/change
(reject H0) when none is present (Type I error).
• A possible interpretation of the significance level might be:

 > 0.10 little evidence against H0
0.05 <  < 0.10 possible evidence against H0
0.01 <  < 0.05 strong evidence against H0
 < 0.01 very strong evidence against H0
• For most traditional statistical methods, critical test statistic values for various
significance levels can be looked up in statistical tables or calculated from simple
formulas, provided the test assumptions are satisfied. Where test assumptions are
violated, resampling methods can be used to estimate the significance level of a test
statistic.
• For detecting trend/change, the critical test statistic value at /2 is used (two-sided
tail). For detecting an increase (or decrease), the critical test statistic value at  is
Resampling to estimate significance level
• Resampling is a robust method for estimating the significance level of a

test statistic. It is particularly useful when the test assumptions are
violated.
• In resampling, the original time series data are resampled to provide many
replicates of time series data of equal length as the original data. The
time series data for each replicate are obtained by randomly selecting
data value from any year in the original time series continuously until a
time series of equal length as the original data is constructed. In TREND,
the data are resampled with replacement (bootstrapping method), i.e., the
replicate series may contain more than one of some values in the original
series and none of other values.
• The test statistic value of the original time series data is then compared
with the test statistic values of the generated data (replicates) to
estimate the significance level. For example, if the test statistic value of
the original data is the same as the 950 th highest value from 1000
replicates, H0 is rejected at  = 0.05 (i.e., a trend/change is detected,
with a 5% probability that this trend/change is incorrectly detected).
Parametric and non-parametric tests
• Most statistical tests assume that the time series data are
independent and identically distributed.
• Parametric tests also assume that the time series data and the
errors (deviations from the trend) follow a particular distribution.
Most parametric tests assume that the data are normally
distributed. Parametric tests are useful as they also quantify the
change in the data (e.g., change in mean or gradient of trend).
Parametric tests are generally more powerful than non-parametric
tests.
• Non-parametric tests are generally distribution-free. They detect

trend/change, but do not quantify the size of the trend/change.
They are very useful because most hydrologic time series data are
not normally distributed.
Statistical tests in TREND
Tests for trend

• Mann-Kendall (non-parametric)
• Spearman’s Rho (non-parametric)
• Linear Regression (parametric)
Tests for step change in mean/median

• Distribution Free CUSUM (non-parametric)
• Cumulative Deviation (parametric)
• Worsley Likelihood Ratio (parametric)
Tests for difference in mean/median in two different data periods

• Rank-Sum (non-parametric)
• Student’s t-test (parametric)
Tests for randomness

• Median Crossing (non-parametric)
• Turning Points (non-parametric)
• Rank Difference (non-parametric)
• Autocorrelation (parametric)
Cautionary Words
• Must have good data and must understand data (via exploratory
data analysis).
• Must understand statistical test and the assumptions.
• A statistical test provides evidence, not proof.
• Significance is not the same as importance (e.g., a change may be

detected, but the size of the change may be so small that it is of
no importance).
• If H0 is rejected, the reason for the trend/change must be

investigated.

TRENDWorkshop

Uploaded by

Copyright:

Available Formats

TRENDWorkshop

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TRENDWorkshop

Uploaded by

Copyright:

Available Formats

www.toolkit.net.

Brisbane 6 July Sydney 20 July

• To show some visual tools for exploratory data analysis

• To explain basic concepts in formal statistical testing of

• To show some examples of the statistical methods.

• To provide hands-on experience on the use of TREND, a

The methods presented in this workshop can be applied to

• TREND is designed to facilitate statistical testing for

• TREND has 12 statistical tests, based on the

9:15 – 10:00 Presentation on trend detection, exploratory data

10:00 – 10.30 Demonstration of statistical tests (in Excel)

10:30 – 10:50 Morning tea

10:50 – 12:30 Demonstration of TREND

12:30 – 1.15 Lunch

1:15 – 3:00 Hands-on experience with TREND

3:00 – 3.20 Afternoon tea

3.20 – 5.00 More statistical tests and TREND

Most water resources systems have been designed and operated

Trend/change in environmental time series data can be caused by

Climate change defines the difference between long-term mean

Climate variability can be regarded as the variability (the extremes

Natural annual inflows into Hume Weir

• Clear jump in mean (climate change?)

• Seasonal • Interdecadal (20-30 years)

• Exploratory data analysis (EDA) involves using graphs to explore,

• EDA is an essential component of any statistical analysis. EDA

• A statistical analysis is incomplete with EDA. Statistical test

• The variable of interest is plotted against time as a scatter or line

• The Excel spreadsheet can be used to perform most of the plotting

10-point moving average Lowess smooth

Linear regression Polynomial regression

TREND has various statistical methods for detecting trend, step

The statistical methods in TREND are based on the WMO/UNESCO

Power and errors

• A possible interpretation of the significance level might be:

• Resampling is a robust method for estimating the significance level of a

• Non-parametric tests are generally distribution-free. They detect

Tests for trend

Tests for step change in mean/median

Tests for difference in mean/median in two different data periods

Tests for randomness

• Must understand statistical test and the assumptions.

• A statistical test provides evidence, not proof.

• Significance is not the same as importance (e.g., a change may be

• If H0 is rejected, the reason for the trend/change must be

You might also like