TRENDWorkshop
TRENDWorkshop
TRENDWorkshop
au
Detecting trends in
environmental time series data
Francis Chiew
The University of Melbourne
12,000,000
10,000,000
Annual flow
8,000,000
6,000,000
4,000,000
2,000,000
0
1891 1911 1931 1951 1971 1991
150
Annual inflow (GL)
Interdecadal
variability Climate
100 change?
50
0
1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010
• A trend line can be fitted to the data. Trend lines include moving
average, linear regression, quadratic regression, and nonparametric
smoothing (e.g., lowess smooth).
• Multiple time series plots can be used when data from several sites
(plot same data on the same scale) or variables are available. It can
be informative presenting data from several sites within a region (or
different variables for a location) on a single page. For example, a
trend/change is more conclusive if it is observed in data from
several locations.
Example time series plots
5000
4000
3000
Annual River Nile flows
at Aswan in cumecs
2000
1000
1860 1870 1880 1890 1900 1910 1920 1930 1940 1950
Scatter plot
5000 5000
4000 4000
3000 3000
2000 2000
1000 1000
1860 1870 1880 1890 1900 1910 1920 1930 1940 1950 1860 1870 1880 1890 1900 1910 1920 1930 1940 1950
Line plot with data points Line plot without data points
Example trend fits to time series data
5000 5000
4000 4000
3000 3000
2000 2000
1000 1000
1860 1870 1880 1890 1900 1910 1920 1930 1940 1950 1860 1870 1880 1890 1900 1910 1920 1930 1940 1950
5000 5000
4000 4000
3000 3000
2000 2000
1000 1000
1860 1870 1880 1890 1900 1910 1920 1930 1940 1950 1860 1870 1880 1890 1900 1910 1920 1930 1940 1950
• Excel
• Mathematica
• MATLAB
• MINITAB
• SAS
• SPlus
• SPSS
• Systat
Statistical methods for detecting
trend/change in time series data
Change in a time series can occur steadily (a trend), abruptly (a step-
change) or in a more complex form. It may affect the mean, median,
variance or other aspect of the data.
Kundzewicz, Z.W. and Robson, A. (Editors) (2000) Detecting Trend and Other Changes in
Hydrological Data. World Climate Program – Water, WMO/UNESCO, WCDMP-45,
WMO/TD 1013, Geneva, 157 pp.
Grayson, R.B., Argent, R.M., Nathan, R.J., McMahon, T.A. and Mein, R. (1996) Hydrological
Recipes: Estimation Techniques in Australian Hydrology. Cooperative Research Centre for
Catchment Hydrology, Australia, 125 pp.
Basic concepts of statistical testing
Hypotheses
The starting point of a statistical test is to define a null hypothesis (H 0) and an
alternative hypothesis (H1). For example, to test for trend in the mean of a
time series, H0 would be that there is no change in the mean of the data, and H 1
would be that the mean is either increasing or decreasing with time.
Test statistic
The test statistic is a means of comparing the H 0 and H1. It is a numerical value
calculated from the data series that is being tested.
Significance level
The significance level is a means of measuring whether the test statistic is very
different from values that would typically occur under H 0.
• Specifically, the significance level is the probability of a test statistic value as extreme
as, or more extreme than the observed value assuming no trend/change (H 0). For
example, for = 0.05, the critical test statistic value is the value that would be
exceeded by 5% of test statistic values obtained from randomly generated data. If the
test statistic value is greater than the critical test statistic value, H 0 is rejected.
• The significance level is therefore the probability that a test detects a trend/change
(reject H0) when none is present (Type I error).
• For most traditional statistical methods, critical test statistic values for various
significance levels can be looked up in statistical tables or calculated from simple
formulas, provided the test assumptions are satisfied. Where test assumptions are
violated, resampling methods can be used to estimate the significance level of a test
statistic.
• For detecting trend/change, the critical test statistic value at /2 is used (two-sided
tail). For detecting an increase (or decrease), the critical test statistic value at is
Resampling to estimate significance level
• In resampling, the original time series data are resampled to provide many
replicates of time series data of equal length as the original data. The
time series data for each replicate are obtained by randomly selecting
data value from any year in the original time series continuously until a
time series of equal length as the original data is constructed. In TREND,
the data are resampled with replacement (bootstrapping method), i.e., the
replicate series may contain more than one of some values in the original
series and none of other values.
• The test statistic value of the original time series data is then compared
with the test statistic values of the generated data (replicates) to
estimate the significance level. For example, if the test statistic value of
the original data is the same as the 950 th highest value from 1000
replicates, H0 is rejected at = 0.05 (i.e., a trend/change is detected,
with a 5% probability that this trend/change is incorrectly detected).
Parametric and non-parametric tests
• Most statistical tests assume that the time series data are
independent and identically distributed.
• Parametric tests also assume that the time series data and the
errors (deviations from the trend) follow a particular distribution.
Most parametric tests assume that the data are normally
distributed. Parametric tests are useful as they also quantify the
change in the data (e.g., change in mean or gradient of trend).
Parametric tests are generally more powerful than non-parametric
tests.
• Must have good data and must understand data (via exploratory
data analysis).