The Interpretation of Geochemical Survey Data

Downloaded from http://geea.lyellcollection.
org/ at Cornell University on November 19, 2012
Geochemistry: Exploration, Environment, Analysis
The interpretation of geochemical survey data

Eric C. Grunsky
Geochemistry: Exploration, Environment, Analysis 2010, v.10; p27-74.

doi: 10.1144/1467-7873/09-210
Email alerting click here to receive free e-mail alerts when new articles cite this article
service
Permission click here to seek permission to re-use all or part of this article
request
Subscribe click here to subscribe to Geochemistry: Exploration, Environment,
Analysis or the Lyell Collection
Notes
© The Geological Society of London 2012

Downloaded from http://geea.lyellcollection.org/ at Cornell University on November 19, 2012

Eric C. Grunsky
Geological Survey of Canada, Natural Resources Canada, Ottawa, Ontario, Canada K1A 0E9
(e-mail: egrunsky@nrcan.gc.ca)
ABSTRACT: Geochemical data are generally derived from government and industry
geochemical surveys that cover areas at various spatial resolutions. These survey data
are difficult to assemble and integrate due to their heterogeneous mixture of media,
size fractions, methods of digestion and analytical instrumentation. These assembled
sets of data often contain thousands of observations with as many as 50 or more
elements. Although the assembly of these data is a challenge, the resulting integrated
datasets provide an opportunity to discover a wide range of geochemical processes
that are associated with underlying geology, alteration, landscape modification,
weathering and mineralization. The use of data analysis and statistical visualization
methods, combined with geographical information systems, provides an effective
environment for process identification and pattern discovery in these large sets of
data.
Modern methods of evaluating data for associations, structures and patterns are
grouped under the term ‘data mining’. Mining data includes the application of
multivariate data analysis and statistical techniques, combined with geographical
information systems, and can significantly assist the task of data interpretation and
subsequent model building. Geochemical data require special handling when
measures of association are required. Because of its compositional nature logratios
are required to eliminate the effects of closure on geochemical data. Exploratory
multivariate methods include: scatterplot matrices (SPLOM), adjusting for censored
and missing data, detecting atypical observations, computing robust means, correla-
tions and covariances, principal component analysis, cluster analysis and knowledge
based indices of association. Modelled multivariate methods include discriminant
analysis, analysis of variance, classification and regression trees neural networks and
related techniques. Many of these topics are covered with examples to demonstrate
their application.
KEYWORDS: geochemistry, data analysis, visualization, statistical methods, data interpretation,

review
A review of contributions to the Exploration 1977, 1987 and should focus on the identification of mineral deposits at
1997 conferences held in Toronto in the field of exploration depth, and for countries such as Canada, the evaluation of
geochemistry and the interpretation of regional geochemical basal till geochemistry is an effective means of exploration for
survey data provides a perspective and appreciation of the metallic mineral deposits. The role of government surveys in
very powerful tools that geoscientists now have at their the collection of various geological media and subsequent
disposal. Boyle (1979) described the first part of the twentieth geochemical analysis was considered paramount for a success-
century when rapid advancements were made in the recogni- ful mineral exploration strategy for any country. Boyle dis-
tion of primary and secondary dispersion haloes: development cussed the term ‘vectors’ as a means to identify mineral
of accurate and rapid analytical methods (e.g. the development deposits through the evaluation of patterns and trends in
of atomic absorption spectroscopy, fluorimetry, chromatog- geochemical data in both two and three dimensions.
raphy, neutron activation analysis, mass spectrometry); At the time of Exploration 77, the use of geochemical data
improvements in sampling technologies; radiometric methods, in glacial terrains, (Bølviken & Gleeson 1979), non-glaciated
airborne geochemical sampling methods; improvements in terrains (Bradshaw & Thomson 1979), lithogeochemistry
field techniques and access (helicopters); heavy minerals in (Govett & Nichol 1979), biogeochemistry (Brooks 1979;
glacial media; and developments in statistical and computer Cannon 1979), stream sediment geochemistry (Meyer et al.
techniques. At that time, Boyle also pointed out that further 1979), lake sediments (Coker et al. 1979) and hydrogeochemis-
research was required to understand the trace and major try were well advanced. The fundamentals of these develop-
element chemistry of rocks and their geochemical relationship ments are still applicable today. There have been refinements
to metallogenic belts. Boyle also noted that future research in methods of extraction (digestion methods and selective
Geochemistry: Exploration, Environment Analysis, Vol. 10 2010, pp. 27–74 1467-7873/10/$15.00 2009 AAG/Geological Society of London
DOI 10.1144/1467-7873/09-210
28 E. C. Grunsky
leaches), improvements in detection limits and better under- consist of collecting several thousand specimens and be ana-
standing of the sedimentary environments of stream, lake, lysed for at least 50 elements. Analysing and interpreting these
glacial and weathered environments. Howarth & Martin (1979) large sets of data can be a challenge. Data can be categorical
provided the basics of evaluating geochemical data, the princi- (discrete numeric or non-numeric) or continuous in nature. To
ples of which are still in use today. The term ‘integration’ was extract the maximum amount of information from these data,
already in use in the 1970s when it was realized that several various multivariate data analysis techniques are available. In
types of geoscience data could be merged using computer- many cases, these techniques reduce these large datasets into a
based methods (Coope & Davidson 1979). few simple diagrams that outline the principal geochemical
The Exploration ’87 meeting contained similar discussions trends and assist with interpretation. The trends that are
along the lines of weathered terrains (Butt 1989; Mazzucchelli identified may include variation associated with underlying
1989; Smith 1989), glaciated terrains (Coker & DiLabio 1989; lithologies, zones of alteration, and in special cases, zones of
Shaw 1989), stream sediments (Plant et al. 1989), lake sedi- potentially economic mineralization. Areas of mineralization
ments (Hornbrook 1989), biogechemistry (Dunn 1989), and are typically small in geographic extent and less likely to be
bedrock geochemistry (Govett 1989). In addition, the role of ‘sampled’ in the course of regional geochemical sampling
computers, databases and computer-based methods for use in survey. Thus, they can be considered as rare events relative to
mineral exploration were distinct contributions to the meeting the regional geochemical signatures within a study area and they
(Garrett 1989a; Harman et al. 1989; Holroyd 1989) and expert will commonly be under-represented within a population. This
systems were introduced as a means for decision-making in means that they may be observed as atypical or masked by the
exploration (Campbell 1989; Martin 1989). Exploration ’87 also main mass of the population.
contained more results on the benefits of integrated exploration The term ‘sample’ in statistical literature, usually refers to a
strategies. selection of observations from a population. In the lexicon of
Exploration ’97 covered much of the same material of geoscientists, specimens of soil, rocks, stream sediments and
advances in geochemical exploration methods for the geo- other such media, are generally called ‘samples’. This has been
chemistry of glaciated terrains (Klassen 1997; McClenaghan a source of confusion between the geoscience and the statistical
et al. 1997), the geochemistry of deeply weathered terrains communities. Within this contribution, specimens (i.e. the
(Mazzucchelli 1997; Smith et al. 1997), geochemistry of stream geochemist’s samples) that have been collected in the field are
sediments (Fletcher 1997), lake sediment geochemistry (Friske referred to as ‘specimens’ and the data derived from them as
1997; Davenport et al. 1997a), lithogeochemistry (Franklin ‘observations’. Elements are the geochemical entities that
1997; Harris et al. 1997), plus developments in extraction become variables in the application of statistics. The terms
techniques for the enhancements of geochemical responses ‘variable’ and ‘element’ are used interchangeably in this contri-
(Bloom 1997; Hall 1997; Smee 1997). Closs (1997) emphasized bution. Specimen collection strategies are an important part of
that careful sample design and objectives are the fundamental any geochemical survey programme. Garrett (1983, Chapter 4)
tenets of exploration geochemistry, which had not changed in provides a useful discussion on various approaches for sam-
the previous 30 years. Integrated exploration information pling media for geochemical surveys.
management was a major focus at the Exploration ’97 confer- The evaluation and interpretation of geochemical data rely
ence with significant contributions by Bonham-Carter (1997), on understanding the nature of the material that has been
Davenport et al. (1997b), de Kemp & Desnoyers (1997) and sampled. Different materials require a variety of methods and
Harris et al. (1997) along with the early developments on the techniques of data analysis for the interpretation of geochemi-
use of the world wide web (internet) by Cox (1997). cal results. In the case of surficial sedimentary materials (glacial
Prior to the arrival of Geographic Information Systems till, lake and stream sediments), different size fractions of
(GIS) and desktop statistical computing packages, exploration specimens can reflect different geological processes. The choice
geochemistry was limited in scope in terms of extensive data of size fraction can have a profound influence on the inter-
analysis. Textbooks such as those by Hawkes & Webb (1962), pretation of the geochemistry of an area. In any geochemical
Rose et al. (1979) and Levinson (1980) provided the foundation survey the material for study should be carefully collected and
for exploration geochemistry strategies and defined the princi- classified in order to provide any clues about the underlying
ples for planning, executing and interpreting geochemical geochemical processes.
surveys. These texts were written before the development of Quality control is an essential part of assessing geochemical
GIS or easily accessible statistical packages. As a result, they data. All data should be initially examined for analytical reliabil-
offered limited treatment for a statistical analysis of geochemi- ity and screened for the identification of suspect analyses.
cal survey data. In the late 1980s, GIS began to play an Typically, this is done using exploratory data analysis (EDA)
increasingly important role in the display and management of methods. Issues of quality control, analytical accuracy and
spatially referenced data (e.g. geochemical data). These systems precision are beyond the scope of this contribution; however, it
required large computers and specialists in the management and is briefly discussed in the section, ‘Special Problems’.
maintenance of the software. GIS have evolved into ‘Desktop Five sets of data have been used in this contribution.
Mapping’ systems that allow users of personal computers to
display, query, manage, and to a limited extent analyse spatially 1. Lithogeochemical data from Ben Nevis township, Ontario, Canada
referenced data. (Plate 1). Rock specimens were collected as part of a study to
Geochemical surveys are an important part of geoscience examine the nature of alteration and associated mineraliza-
investigations in both mineral exploration and environmental tion in a sequence of volcanic rocks (Grunsky 1986a, b).
monitoring. The International Geological Correlation Program Two significant Zn–Ag–Cu–Au occurrences have been
(IGCP Project 259 (Darnley et al. 1995) summarized the value investigated in this area: the Canagau Mines deposit and the
of geochemical surveys for both exploration and global change Croxall property (Grunsky 1986a). The results of a detailed
monitoring. This report contains recommendations for sam- lithogeochemical sampling programme outlined a zone of
pling strategies, data management, analytical methods and extensive carbonatization associated with the Canagau
numerous other topics for the development of a global network Mines deposit. The alteration consists of a large north–south
of geochemical knowledge. A soil or lake sediment survey can trending zone of carbonate alteration with a central zone of
The interpretation of geochemical survey data 29
Indonesia, provides an example of how multivariate data analy-

sis and digital elevation data can be used to isolate geochemical
responses related to different processes. A geochemical survey
was carried out on a grid of lines 100 m apart with sampling
sites every 25 m. The geology is poorly understood because of
extensive weathering in the tropical climate. The mineralization
of Cu and Au occurs in breccia zones that are associated with
a felsic intrusion and appear to be structurally controlled as en
echelon fractures that parallel the great Sumatra fault. Plate 3
shows the generalized geology for the area.
4. The Campo Morado mining camp in the Guerrero state of Mexico.
This camp hosts seven precious-metal-bearing volcanogenic
massive sulphide deposits in the complexly folded and
faulted Guerrero terrain (Oliver et al. 1996; Rebagliati 1999),
shown in Plate 4. A total of 29 221 samples were collected
over a soil grid comprising 25 m sample spacing along lines
and each line was 100 m apart. The field samples were
analysed for Al, Fe, Ca, K, Mg, Na, Ti, Au, Ag, As, Ba, Cd,
Co, Cr, Cu, Hg, Mn, Mo, Ni, P, Pb, Sc, Sr, V, W and Zn
using aqua regia digestion and ICP-ES. A digital elevation
model (DEM) was created at 25 m resolution. Plate 4 shows
the location of each sample point and is coloured according
to the lithology over which the sample was collected. The
high density of sampling yields a detailed picture of the
lithologies of the area as shown in the figure. Principal
component analysis (PCA) was carried out on the data and
revealed several significant patterns related to lithological
variation and mineralization.
5. Kimberlite bodies from Fort à la Corne Saskatchewan (Fig. 1).
Fig. 1. Location map of the Fort à la Corne kimberlite field, Five kimberlite phases from the Fort à la Corne area of
Saskatchewan, Canada. Saskatchewan have been analysed for major and trace
element geochemistry. These five phases are shown to be
silica enrichment with gold and copper sulphide mineraliza- statistically distinct and can be used to form the basis of a
tion. A lesser zone of carbonatization is associated with the classification scheme for scoring unknown samples
Croxall property. Small isolated zones of sulphide minerali- (Grunsky & Kjarsgaard 2008). Because of confidentiality
zation occur throughout the area. The specimens were not requirements, geographic coordinates are not presented
collected over a regular grid but were collected wherever with these results.
rock outcrops could be located in the field. The geology of
the area and the specimen locations are shown in Plate 1.
Lithogeochemical sampling was carried out over the area in GEOCHEMICAL DATA MINING
1969, 1972 and 1979–1981. A total of 825 specimens were Data mining involves the use of automatic and knowledge-
analysed for SiO2, Al2O3, Fe2O3, FeO, MgO, CaO, Na2O, based procedures for the recognition of patterns that can be
K2O, TiO2, P2O5, MnO, CO2, S, H2O+, H2O, Ag, As, attributed to known processes (i.e. crystal fractionation, hydro-
Au, Ba, Be, Bi, Cl, Co, Cr, Cu, F, Ga, Li, Ni, Pb, Zn, B, Mo, thermal alteration, weathering). Common forms of data mining
Sc, Sn, Sr, V, Y, U and Zr. Initially, the major element oxides involve supervised and unsupervised pattern recognition. Unsu-
were assessed using a multivariate procedure known as pervised data mining includes techniques such as cluster analy-
‘correspondence analysis’ that is documented in Grunsky sis, principal component analysis, exploratory data analysis,
(1986a). Details on the geology, sampling methodology and multivariate ranking of data, neural networks and empirical
mineral occurrence descriptions can be found in Grunsky indices. These methods vary from automatic, semi-automatic,
(1986b). A regional picture of the alteration and prospectiv- to manual in the degree of pattern delineation. The use of a
ity for volcanogenic massive sulphide deposits can be found fully automatic method does not guarantee a result that
in Hannington et al. (2003). necessarily represents the best view or meaningful structure in
2. Lake sediment survey data from the Batchawana district, Ontario, the data. Caution must be applied in using such techniques.
Canada (Plate 2). This set of survey data, consisting of 3047 Supervised methods include discriminant analysis, canonical
lakes sediment specimens collected between 1989–1995, variate analysis, model-based clustering, neural networks, sup-
from a series of lakes that overlie a Precambrian volcanic- port vector machines and cell automata. All require a priori
sedimentary sequence that has been intruded by granitic assumptions and/or ‘target’ and ‘background’ definitions to
rocks (Grunsky 1991). The lake sediments in the area are which unknown data can be classified. Typically, target popu-
derived from the underlying bedrock (shown in the legend), lations represent sets of geochemical data that define mineral
glacial overburden and organic matter (not shown). Glacial exploration targets.
till, outwash sand, lacustrine deposits and recent re-worked
glacial deposits blanket the area in varying thickness. Bed-
rock exposure is less than 5% of the area with most of the Visualization of geochemical data
glacial overburden being less than 3 m. Visualization is one of the most effective ways of evaluating
3. Data from the island of Sumatra, Indonesia. This dataset, from a soil data. The human eye is more adept at recognizing patterns from
survey over a Cu–Au prospect on the island of Sumatra, pictures than with tables of numbers. Geochemists need to
30 E. C. Grunsky
evaluate data comparatively in both the spatial domain (geo- goals of the geochemist can be achieved faster and at less cost.
graphic location) and the variable (element/oxide) domain. As digitally based map and attribute data are being created
When a single element’s data are being evaluated, simple plots continually, there has been an increasing demand to view and
such as probability plots (Sinclair 1976; Stanley & Sinclair 1987, assess these data without the use of complex GIS. In its
1989; Stanley 1987), histograms, or box plots can be used. simplest form, a desktop mapping system has significant
However, there are many other ways to evaluate data graphi- advantages in exploration geochemistry. Geochemical data can
cally. Many of these methods have been outlined by Cleveland be loaded and visualized in both the geochemical space and the
(1993). Garrett (1988) developed a data analysis, statistics and geographical space very quickly. Geochemical data can also be
visualization system, IDEAS, that provides a multitude of processed using a number of statistical or other data analysis
methods that are useful to the exploration geochemist. This techniques from which the results can also be loaded into a
package was recently replaced by another package, ‘rgr’ (Garrett desktop mapping system. The permutations and combinations
& Chen 2007) and is available from www.r-project.org. Reimann of data layer manipulation provide a wide variety of ways of
et al. (2008) have published a book that provides methods for examining and interpreting data.
evaluating geochemical data in an environmental context using R.
Even the field of statistical evaluation of data has changed
significantly in the past 10 years. This is exemplified by texts Image processing
that combine extensive visualization techniques (Sarkar 2008) When the sampling density of geochemical data is adequate, it
together with modern statistical methods (Venables & Ripley is desirable to produce maps that represent smoothed gridded
2002). data and coloured/shaded surfaces. Smoothed, gridded data
This contribution has made extensive use of the data analysis can be considered a raster image. Image analysis is primarily
and statistical analysis software package, R (R-Development used for presentation purposes to enhance the results of an
Core Team 2008), which provides a number of powerful tools analysis or to show variation within data. Image analysis
for manipulating and visualizing data. Most of the statistical manipulates integer-scaled raster data using a number of
graphics herein have been created using R. The application of matrix-based methods and after the use of additional integer-
this environment for geoscience applications is described by scaling procedures represents the resulting transformed data on
Grunsky (2002a). various graphical output devices using colour (e.g. intensity,
hue, saturation, RGB, CMYK). Richards & Jia (1999) provide
an introduction to image processing methods. Carr (1994)
Geographical information systems provides an introduction to image processing in geological
GIS represent digital visualization of spatially-based data on a applications and Gupta (1991) and Vincent (1997) provide
map. GIS require a spatial definition of the data plus attribute comprehensive reviews of remote sensing applications in
tables that contain information relevant to the specified geo- geology. Rencz (1999) contains a collection of papers covering
graphic locations and the representation of geochemical data. the topic of remote sensing in the Earth sciences and Pieters &
Examples of this have been presented by Mellinger et al. (1984), Englert (1993) covers the topic of remote geochemical analysis
Gaál (1988), Kuosmanen (1988), Bonham-Carter (1989a, b), through the evaluation of satellite spectroscopy.
George & Bonham-Carter (1989), Hausberger (1989) and
Mellinger (1989). In particular, GIS facilitates the organized
storage and management of spatially-based data that are linked Exploratory data analysis (EDA)
to a number of other features or other geo-referenced data sets. Exploratory data analysis is concerned with analysing geo-
Bonham-Carter (1994) has written a monograph of geoscience chemical data for the purpose of detecting trends or structures
applications using GIS and Harris (2006a) has edited a volume in the data. These features can provide insight into the
on GIS applications in the Earth sciences covering a wide range geochemical/geological processes from which models can be
of topics in which geochemistry is covered by Cheng (2006), constructed. Exploratory methods of data analysis include the
Grunsky (2006), Harris (2006b) and Wilkinson et al. (2006). evaluation of the marginal (individual) distributions of the data
As geoscience information and data become available in by numerical and graphical methods. These include the use of
ever-increasing volumes, exploration programmes and govern- summary tables (minimum, maximum, mean, median, standard
ment research programmes involve significant amounts of data deviation, 1st and 3rd quartiles), measures of correlation,
compilation. The compiled datasets are subsequently placed covariance and skewness. Graphical methods include histo-
into GIS and integrated with other geoscience information. grams, probability (quantile–quantile) plots, box plots, density
Recent developments in the use of GIS together with data plots and ScatterPLOt Matrices (SPLOM). More sophisticated
compilation programmes have been discussed in Wilkinson data visualization can be carried out using packages such as the
et al. (1999) and Harris et al. (1997, 1999, 2000) and a book with ‘lattice’ library that is available in R (Sarkar 2008). The spatial
a chapter on the evaluation of geochemical data using GIS presentation of data summaries can be incorporated into a GIS
(Harris 2006a, Chapters 12–16). using features such as bubble and symbol plots, and interpo-
Depending on the nature of the geochemical data (stream lated grids.
sediment, soil, lake sediment, or lithogeochemical), various Multivariate methods include the use of PCA, cluster
types of analysis can be performed that are dependent on the analysis, Mahalanobis distance plots, empirical indices and
type of associated data present. Point, polygon (vector) and various measures of spatial association.
raster (regular array cells) features can be overlain, merged and
analysed through the associated map merging and database
querying tools. Raster image grid cells can be considered as Target and background populations
points provided there is an associated attribute record of data In an exploration programme, geochemical background repre-
with each grid cell. sents a population of observations that reflect unmineralized
Desktop mapping systems have evolved to the point that ground. Background may be a mixture of several populations
they are cheaper and less complex, are easier to use and offer an (gravel–sand–clay or granitoid–volcanic–sedimentary litholo-
effective way for the geochemist to evaluate data. Thus, the gies). The separation of the background population into similar
subsets that represent homogeneous multivariate normal popu- Standard numerical and statistical methods have been devel-
lations is important and forms the basis of the modelled oped for data analysis where the values being considered add to
approach of geochemical data analysis. This can be achieved a constant sum (e.g. whole rock analyses summing to 100%).
using exploratory methods such as PCA, methods of spatial This is discussed in more detail below.
analysis, Mahalanobis distance plots and cluster analysis. Quality assurance and quality control of geochemical data
A group of specimens that represent an entity under require that rigorous procedures be established prior to the
investigation (features of geochemical alteration or mineraliza- collection and subsequent analysis of geochemical data. This
tion) is termed the ‘sample’ population, from which inferences includes the inclusion of certified reference standards, randomi-
will be made about the ‘target’ population that cannot be zation of samples and the application of statistical methods for
sampled in its entirety. These populations are derived from testing the analytical results. Historical accounts of ‘Thompson
specimens collected from orientation studies over known and Howarth’ plots for analytical precision studies can be found
mineral deposits or areas of specific interest. in Thompson & Howarth (1973, 1976a, b, 1978). Additional
Sample populations, whether representing background or discussion on the subject was most recently covered by Stanley
other populations, represent training sets with unique charac- (2003, 2006) and Garrett & Grunsky (2003).
teristics. These training sets are generally distinct from one
another through their statistical properties, although it is
common for training sets to overlap. Unknown specimens can Compositional data
be tested against these populations to determine if they have Geochemical data are reported as proportions (weight %, parts
similar characteristics. Probability-based methods can deter- per million, etc.) For a given observation compositional pro-
mine if the unknown specimen belongs to none, one or more portions (i.e. weight %) always sum to a constant (100%). As a
of the populations. result, as some measures increase, others are ‘forced’ to
A case study is presented where distinctions between kim- decrease to keep the sum constant. Because compositional data
berlites from the Fort à la Corne area, Saskatchewan have been occur only in the real positive number space, the calculation of
statistically determined based on their multi-element signatures. statistical measures, such as correlation and covariance, can be
misleading and result in incorrect assessment of correlation or
other measures of association. It is dangerous to make the
assumption that closure has no effect on the outcome of any
Special problems
statistical measure. Raw compositional data is useful for observ-
Problems that commonly occur in geochemical data include: ing stoichiometric trends in data (e.g. Grunsky & Kjarsgaard
+ many elements have a ‘censored’ distribution, meaning that 2008); however, any type of regression or procedure that
values at less than the detection limit can only be reported as requires statistical measures necessitates the use of logratios
being less than that limit; which are described below.
+ the distribution of the data is not normal; Aitchison (1986) developed a methodology for data analysis
+ the data have missing values. That is, not every specimen has and statistical inference of compositional data using logratio
been analysed for the same number of elements. Often, transformations. These transformations project the composi-
missing values are reported as zero, which is not the same as tional data into the entire (positive and negative) real number
a specimen having a zero amount of an element. This can space, which allows standard statistical procedures to be
create complications in statistical applications; applied. These methods are gaining popularity and examples of
+ combining groups of data that show distinctive differences application to geochemical data are given by Aitchison (1990),
between elements where none is expected. This may be the Grunsky et al. (1992) and Buccianti et al. (2006). The approach
result of different limits of detection, instrumentation or has also been extended into spatial data processing that is
poor Quality Assurance/Quality Control (QA/QC) proce- commonly used in ore reserve estimation (Pawlowsky 1989).
dures. Levelling of the groups is required; Recent work by Barcelo et al. (1995, 1996, 1997), Martin-
+ the constant sum problem for compositional data. Fernandez et al. (1998, 2000) Pawlowsky-Glahn & Buccianti
(2002) and von Eynatten et al. (2002, 2003) document methods
These problems create difficulties when applying math- and issues around the treatment of compositional data.
ematical or statistical procedures to the data. Statistical proce- Aitchison (1997) provides a very readable account of compo-
dures have been devised to deal with all of these problems. In sitional data issues. Appendix 1 provides a basic description of
the case of varying detection limits, the data require separation the use of logratios. Buccianti et al. (2006) provide the most
into the original groups so that appropriate adjustments can be recent developments in the field of compositional data analysis.
applied to the groups of data. A package for compositional data analysis (van den Boogaart &
To overcome the problems of censored distributions, pro- Tolosana-Delgado 2008) known as ‘compositions’ provides a set
cedures have been developed to estimate replacement values of tools for evaluating compositional data using the R statistical
for the purposes of statistical calculations. When data have package (www.r-project.org).
missing values, several procedures can be applied to impute Most geochemical survey data comprise trace element
replacement values that have complete analyses. This will be measurements that are reported as parts per million (ppm). The
discussed in more detail further on in the text. reporting in ppm constitutes the potential for closure, the trace
Plate 5 summarizes the problems of censoring, non- element concentrations may interfere with each other particu-
normality and the discrete differences in the data due to larly when one or more of the elements of interest is close to
analytical resolution. The image is a shaded relief map derived zero. The application of a centred logratio transformation (clr)
from the density of observations of As v. Au. The ‘valleys’ will provide more reliable and statistically defensible results
represent limits in data resolution near the lower limit of than the use of raw data. The use of the isometric logratio (ilr)
detection for Au. The actual limit of detection appears as a (Egozcue et al. 2003), where balances between the elements are
‘wall’ at the zero end of the Au axis. In contrast, As displays a constructed, provides orthonormal basis in the compositional
continuous range of values without the same resolution or data space in which statistical and vector calculations can be
detection limit problems exhibited by Au. applied.
32 E. C. Grunsky
provide a basis for context and comparison of different data

types.
Histograms
The histogram is one of the most popular graphical means of
displaying a distribution since it reflects the shape similar to
theoretical frequency distributions. Figures 2a and 3a illustrate
how the histogram can be used to display the distribution of
Al and As in lake sediments. These two elements have been
chosen to demonstrate two very different geochemical
responses. Aluminium is ubiquitous in the lake sediments,
mostly derived from aluminosilicates such as feldspars and
some clay minerals (kaolinite). Aluminium abundance is largely
controlled by rock types such as granites and volcanic rocks.
Figure 2a illustrates the range of Al values from sediments in
lake catchments. The distribution appears polymodal, which
could lead to the interpretation that the lake sediments have
been derived from several different lithologies. In the Batcha-
wana area of Ontario, these lithologies are granite gneiss,
migmatite, granitoid intrusions, metasediments and metavol-
canic rocks. However, on closer examination, these ‘peaks’
appear to be artefacts of the analytical method (varying
Fig. 2. Exploratory Data Analysis (EDA) plot of Al in lake detection limits) and can create difficulties with the interpreta-
sediments, Batchawana area, Ontario. Note the distinct polymodal tion. Other graphical methods that are discussed below are
nature of the distribution. The Q–Q plot suggests that this polymo-
dal appearance may be due to lack of precision in the chemical better suited for interpreting these data.
analysis. Arsenic is much less abundant in the country rocks of the
area. When it is present, it is usually associated with sulphide
minerals. Relative to Al, elevated amounts of As are a ‘rare
event’. This is reflected in the histogram of Figure 3a where
most As values are below 10 ppm. The shape of this kind of
distribution is commonly thought of as ‘lognormal’. However,
such a distribution may be the result of mixtures of value from
different distributions where the number of values in the lower
range is greater than the values in the upper range.
For constructing a histogram, a number of objective proce-
dures have been established as initial starting points for interval
selection (see Venables & Ripley 2002, p. 112). If the nature of
the distribution is normal or close to normal then Sturge’s rule
can be applied. Sturge’s rule sets the number of intervals equal
to log2n +1 where n is the number of observations. Sturge’s rule
does not work well if the distributions are not normal. If the
number of intervals is too few, then the finer details of the
distribution are smoothed over. If the number of intervals is
too many, then the distribution appears discontinuous.
Histograms can be tuned by experimenting with starting
points, cut-off points and interval selections. This process is
subjective and when the end points and intervals are well
chosen, a meaningful interpretation is likely. Conversely, if the
end points and intervals are poorly chosen, an incorrect
interpretation, or no significant interpretation can be obtained.
Fig. 3. Exploratory Data Analysis (EDA) plot of As in lake Box plots

sediments, Batchawana area, Ontario. Arsenic exhibits a log-normal
type of distribution. Extreme values (outliers) influence the shape of The box plot is a method used to display order statistics in a
the distributions in all four plots. graphical form (Tukey 1977). The main advantage of the box
plot is that, unlike the histogram, its shape does not depend on
a choice of interval. Providing the scale of presentation is
SUMMARIZING GEOCHEMICAL DATA reasonable, the box plot provides a fast visual estimate of the
frequency distribution. A box plot for As in lake sediments is
Univariate data summaries shown in Figure 3b.
The following description of data exploration is based on Within a box plot, the box is made up of the median (50th
examining univariate populations. EDA plots are shown in percentile), left and right hinges (25th and 75th percentile, or
Figures 2a–d and 3a–d. These plots are often useful when first and third quartile). The ‘whiskers’ are the lines that extend
grouped together as they provide different ways of summariz- beyond the box. Several variations exist on the graphical
ing data. Data summaries, in combined graphical and text form, presentation of box plots. The extreme ends (maximum and
minimum values) of the data are marked by vertical bars at the continuous but are reported as discrete values rounded off at
end of the whiskers. Alternatively, the whiskers can extend to the nearest percentage value. The step-like pattern indicates
the ‘fences’, which are defined as the last value before that measurements were made in 1% increments for some of
1.5midrange beyond the hinges of the data. Observations the data and in 0.01% increments for other data. In fact, the
that plot beyond 3midrange are plotted as bars or special pattern that is observed is a mixture of four surveys, three of
symbols. The location of the median line within the box gives which have a resolution of 1% for Al, and the fourth survey has
an indication of how symmetrical the distribution is within the a resolution of 0.01%. The departure of the stepped plot from
range of the upper to lower hinge (midrange). The lengths of the straight line indicates that it is a slightly skewed distribution.
the whiskers on each side of the box provide an estimate of the Figure 2d shows the Q–Q plot for As which clearly reveals the
symmetry of the distribution. Notches can also be added to the non-normal nature of the distribution by its non-linearity. Q–Q
diagram, which identify the width of the confidence bounds plots are also useful for identifying extreme values at the tails of
about the median. Notches are evident in the box plot of Figure the distribution. The line that cuts through the data represents
2b, where the distribution of Al is not highly skewed. The the intersection at the 25th and 75th percentiles of the data.
notches are not visible in Figure 3b because of the skewed In the case of the As data (Fig. 3d), the distribution is very
nature of the data and the scaling of the plot. skewed.
When using these plots to compare datasets representing
different lithologies, and so on, the notches provide an informal Summary statistical tables
statistical analysis. If the notches do not overlap, it is evidence
Summary statistical tables are useful descriptions of data when
that the difference between the medians is significant.
quantitative measures are desired. Summary statistical tables
commonly include listings of the minimum, maximum, mean,
Density plot median, 1st quartile, and 3rd quartiles. Measures of dispersion
include the standard deviation, median absolute deviation
The distribution of data can also be described graphically
(MAD), and the coefficient of variation (CV). The coefficient
through the use of density plots. Density plots are smooth
of variation is useful because the dispersion is expressed as a
continuous curves that are derived from computing the prob-
percentage (the mean divided by the standard deviation), so it
ability density function of the data. The density plot is similar
can be used as a relative measure to compare different
to the histogram; however, the curve actually represents an
elements. An example of a summary table for a selected group
estimate of the probability density function. Density estima-
of elements from the lake sediment data is shown in Table 1.
tion involves the use of smoothing procedures to compute
The table lists minimum, maximum, mean, median and selected
the curves and is described in Venables & Ripley (2002,
percentile values for 35 elements and loss on ignition (LOI).
p. 126–132). Density curves can be modified by specifying the
Comparison of the mean and median values for each of the
range of the data from which the smoothing and estimation is
elements shows that many of them are significantly different
calculated.
from each other. This implies that the distributions for these
Figure 2c shows a density plot for Al in lake sediments. The
elements are not normal and are likely skewed.
polymodal nature of Al is shown more clearly than in Figure 3a
Summary tables are useful for the purpose of publishing
and b. Figure 3c shows the density plot for As where the
actual values; however, graphical methods, as previously
skewed nature of the distribution is illustrated by the sharp
described, provide visualization about the nature of distribu-
single peak followed by a long tail.
tions and the relationships between observations. The values of
a summary table are best interpreted when used in combination
Quantile–quantile (Q–Q) plots with graphical summaries.
Quantile–quantile (Q–Q) plots are a graphical means of com-
paring a frequency distribution with respect to an expected Spatial presentation
frequency distribution, which is usually the normal distribution. It is particularly meaningful to display geochemical survey data
Q–Q plots are equivalent to normal probability plots that have in a geographical context. As discussed previously, GIS is a very
been extensively used by Sinclair (1976) for the analysis of useful tool for evaluating geochemical data during the explora-
geochemical data. Stanley & Sinclair (1987, 1989) and Stanley tory analysis phase. Plate 6a shows a symbol plot of As from
(1987) have written extensively on the use of probability plots lake sediments in the Batchawana area of Ontario. Each symbol
for dissecting populations. A general description of Q–Q plots represents a collection site. The number of symbols and the
can be found in Venables & Ripley (2002, p. 108). These plots symbol sizes were chosen based on an evaluation of the
are generated by calculating quantile values for the normal accompanying EDA plot in Plate 6b. An initial view of the EDA
frequency distribution (value of the normal frequency distri- plot for As showed that the distribution was positively skewed
bution over the range of probability, 0.0–1.0) and then plotting and the plot was difficult to interpret. A log10 transform was
these against the ordered observed data. If a frequency distri- then applied to the data values and the resulting EDA plot was
bution is normally distributed, when the quantile values are much easier to interpret. The EDA plot of Plate 6b shows at
plotted against the ordered values of the population, the plot least four distinct populations. The first population ranges in
will be a straight line. If the frequency distribution of the values from <0.02–0 log10 scale (0.9–1 ppm) and is related to
population is skewed or the population is polymodal, the Q–Q the many specimens with As values close to the detection limit.
plot will be curved or discontinuous. The advantage of the The second population ranges from 0–1.2 log10 scale (1–16
Q–Q plot is that each individual observation is plotted and thus ppm) and reflects background As values associated with the
the detailed characteristics of groups of observations can be geology. The third population ranges from 1.2–1.6 log10 scale
observed. (16–40 ppm) and occurs mainly in the south-central part of the
Figure 2d shows a plot for Al in lake sediments. The plot Batchawana greenstone belt in an area where there is known
provides some insight into the nature of the data that is not pervasive carbonate alteration associated with shear zones. The
shown by any of the other three plots (Fig 2a–c). The ‘stepped’ fourth population ranges from 1.6–2.0 log10 scale (40–100 ppm)
nature of the plot suggests that many values of the data are not and represents areas where there are known sulphides.
34 E. C. Grunsky
Table 1. Summary statistics for lake sediments, Batchawana Area, Ontario.
Element Units LLD Num Min 1% 5% 10% 25% Median Mean 75% 90% 95% 99% Max Std. MAD CV
Obs (50%) Dev.
LOI weight % 2.96 3019 3 8.6 20.55 27 35 44 44 53 61 65.8 76.08 91.5 13.7 13.3 0.3
Ag ppm 0.2 2900 0.2 0.2 0.2 0.2 0.2 0.5 0.7 1 1 1 1 72 1.5 0.4 2.3
Al weight % 0.36 3047 0.4 0.64 0.93 1 1.52 2 2.5 3 4 5 6 8 1.2 1.4 0.5
As ppm 0.5 3046 0.5 0.6 0.9 1 1 1.2 2.2 2 4 6 17 96 4 0.4 1.8
Au ppb 1 3042 1 1 1 1 1 1 2.1 3 5 5 8 64 2.1 0 1
Ba ppm 30 3047 30 50 70 80 109 148 167.8 210 290 340 440 680 85.2 71.2 0.5
Be ppm 0.5 3047 0.5 0.5 0.5 0.5 0.5 0.5 0.8 1 1 1 2 54.1 1 0 1.3
Bi ppm 2 3047 2 2 2 2 2 2 2.9 5 5 5 6 10 1.4 0 0.5
Br ppm 1 3046 1 3 6 8.5 14 22 25.6 34 48 57.4 76.7 132 16.1 14.1 0.6
Ca weight % 0.23 2685 0.2 0.43 0.56 0.66 0.89 1 1 1.04 1.35 1.58 2 9.1 0.4 0.1 0.4
Cd ppm 0.2 3047 0.2 0.2 0.5 0.5 0.6 1 1 1 2 2 3 6 0.6 0.3 0.5
Co ppm 1 3047 1 1 2 3 4 6 6.9 9 11 13 21 105 5 3 0.7
Cr ppm 1 3047 1 8 12 15 20 27 31.3 38 49 63 99 328 18.2 13.3 0.6
Cu ppm 2 3047 2 7 11 14 20 29 34.2 41 60 74 120 441 24.3 14.8 0.7
Fe weight % 0.14 2649 0.1 0.2 0.31 0.4 0.63 1 1 1 1.7 2 4 15 0.7 0.3 0.7
Hf ppm 1 3046 1 1 1 1 1 2 2.3 3 4 5 7 30 1.4 1.5 0.6
K ppm 0.05 1809 0.1 0.09 0.13 0.15 0.21 0.3 0.5 0.69 1 1 1.36 2 0.3 0.3 0.7
La weight % 1 3046 1 5 9 11 17 25 29 36 49 60 95 408 19.3 13.3 0.7
Lu ppm 0.1 1605 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.3 0.4 1 2 0.2 0 0.7
Mg weight % 0.04 1636 0 0.06 0.08 0.09 0.12 0.2 0.3 0.32 0.5 0.99 1 2 0.2 0.1 0.9
Mn ppm 20 3047 20 30 42 50 70 114 159.8 195 295 415 745 3410 168 77.1 1.1
Mo ppm 1 3047 1 1 1 1 1 2 2.3 3 4 5 10 84 3.2 1.5 1.4
Na weight % 0.03 1999 0 0.06 0.09 0.12 0.21 0.5 0.7 1 1.25 1.94 2.19 4 0.5 0.5 0.8
Ni ppm 3 3047 3 6 8 10 12 16 17.3 21 26 31 44 153 7.9 5.9 0.5
P ppm 150 2197 150 260 340 400 540 820 941 1240 1630 1890 2410 4700 508.6 474.4 0.5
Pb ppm 2 3047 2 2 4 4 6 10 11.6 14 19 22 35 1340 27.3 5.9 2.4
Sb ppm 0.1 1627 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.2 0.3 1 7 0.3 0 1.8
Sc ppm 0.1 3046 0.1 1.7 2.4 3 4 5 5.2 6.1 8 9 12 19 2.2 1.5 0.4
Sr ppm 12 3047 12 21 29 32 42 60 78.3 95 153 195 276 427 54.3 34.1 0.7
Ta ppm 0.5 3046 0.5 0.5 0.5 0.5 0.5 2 1.4 2 2 2 2 3 0.7 0 0.5
Th ppm 0.4 3044 0.4 1 1.2 1.7 2 3 3.3 4 5.2 6 9 26 1.7 1.5 0.5
Ti weight % 0.009 1557 0 0.02 0.029 0.032 0.047 0.1 0.1 0.103 0.137 0.16 0.21 0.3 0 0 0.5
U ppm 0.1 3009 0.1 0.3 0.6 0.9 1 2 4.2 4.1 9.3 16 34 195.5 7.5 1.5 1.8
V ppm 5 3047 5 7 10 12 16 24 27.1 34 46 54 79 301 15.9 13.3 0.6
W ppm 1 3046 1 1 1 1 1 1 1.7 2 2 3 8 46 1.7 0 1.1
Zn ppm 13 3047 13 21 36 45 62 86 98.6 118 155 184 361 952 68.1 38.5 0.7
The choice of symbol size and colour can be used to Application of geostatistical techniques for evaluating the spatial
illustrate patterns of similarity and difference between several continuity of geochemical processes
elements in the data. If the goal is to illustrate atypical Contouring or imaging techniques are most reliable when the
observations, then once a background range of values has been sampling density is sufficient enough so that variation between
established, observations that exceed the limit of the back- sample sites is minimal for the purposes of the sampling survey.
ground can be assigned unique colours or different sized Subjective judgment is often employed for a decision to use
symbols. If the distribution of the data is non-normal and the contouring or imaging techniques. If the sampling density is
observations of interest are in the positive tail of the distri- high, but the investigator believes that the geochemical
bution, then a logarithmic scale can be used to assign symbol response between sample sites is predictable, then contouring
sizes. or imaging may be an appropriate means of visually describing
Kürzl (1988) and Reimann et al. (2005) suggest a unique the data. If the geochemical variability between sampling sites is
approach by creating symbols based on EDA methods. Using unknown or large then it is better to use point or bubble plots
the divisions within a box plot, the median value (Q2) and the
as described previously. A quantitative way of assessing spatial
interquartile range Q1–Q3 (r), the upper fence (Q3 +
variability can be carried out by the use of geostatistical
1.5*(Q3 Q1), the lower fence (Q1 1.5*(Q3 Q1),
procedures. The construction of a semi-variogram or correlo-
lower outside values (Q1 3*(Q3 Q1)), and upper outside
gram can provide a measure of the spatial continuity/variability
values (Q3 + 3*(Q3 Q1)) can be used to define unique
of a specific element. A semi-variogram measures the average
symbols which express the ranking of an observation. An
variance between sample points at specific distances (lags).
example of a seven-symbol set can be defined as:
Generally, as the distance increases between any pair of points,
1. < lower outside values the variance is expected to increase, the limit of which is the
2. lower outside values to the lower fence total variance of all of the data. In the correlogram, as the
3. lower fence to Q1 distance between any pair of points increases, the average
4. Q1 to Median (Q2) correlation between the points decreases, eventually decaying to
5. Median (Q2) to Q3 zero. Isaaks & Srivastava (1989, Chapter 4) describe a number
6. Q3 to upper fence of detailed methods for evaluating the spatial continuity of data.
7. upper fence to upper outside values The effectiveness of employing geostatistical methods relies on
8. > upper outside values 5 Q3 to Q3 + 1.5*r an adequate sampling density in terms of representing the actual
95 km (east–west) and 62 km (north–south). Semi-variograms

have been calculated for four preferred orientations: east–west,
(0), north–south (90), NE–SW, (45) and SE–NW (135),
using a search angle tolerance of 22.5. The y-axis of each figure
is the semi-variance and the x-axis is the lag interval. The
maximum lag distance was chosen as 20 000 m and the lag
interval was selected as 200 m. The selection of a suitable lag
distance can be made by visually examining the distribution of
sample sites; geostatistical software packages can also determine
optimum lag intervals. These figures were generated using the
gstat package from R. Each figure has been fitted with an
exponential model. The most regular semi-variograms appear
for the 135 and 90 orientations. This is no surprise given that
that there are two primary stratigraphic orientations in the area,
one trending east–west and the other trending SE–NW. The
orientations of 0 and 90 display different nugget values, with
the lowest nugget occurring with the east–west orientation, also
suggesting better correlation between adjacent points in that
direction. All four semi-variograms display periodicity, which
indicates that there is heterogeneity in the spatial structure of
the data, most likely reflecting changes in the underlying
geology (granite vs. greenstone).
The use of kriging makes some assumptions about the
spatial uniformity (stationarity) properties of the data. In many
Fig. 4. Semi-variogram of Zn from lake sediments, Batchawana area, cases, particularly in regional sampling programmes, there are
Ontario. Semi-variograms are derived for four different orientations. several lithological domains in which elements have different
spatial ranges. Kriging can account for various types of spatial
variation of the data as well as the spatial distribution of the drift in datasets; however, the error in the kriged estimates
points themselves. tends to increase.
A large number of freeware and commercial geostatistical The use and application of geostatistical methods is a
software packages are now available for carrying out geostatis- combination of art and science. Skill, knowledge and experience
tical analysis. The website www.ai-geostats.org provides a list of are required to use geostatistical techniques effectively. It
software that is currently available. A geostatistical package, requires considerable effort and time to model and extract
‘gstat’ has been written for the R programming environment information from spatial data. The benefit of these efforts is a
(Pebesma 2004), which is freely available from the Compre- better understanding of the spatial properties of the data which
hensive R Archive Network (R-DEVELOPMENT CORE permits better estimates of geochemical trends. However, they
TEAM) (see: www.r-project.org). Deutsch & Journel (1997) must be used and interpreted with the awareness of problems
provide a library of software routines in Fortran (GSLIB). A with techniques of interpolation and the spatial behaviour of
general introductory discussion on spatial statistics can be the data.
found in Venables & Ripley (2002, Chapter 15) and Davis
(2002, Chapter 5). Fractal methods
If the spatial sampling density appears to be continuous then
it may be possible to carry out spatial prediction techniques The use of fractal mathematics is playing an increasingly
such as spatial regression modelling and kriging. A major important role in the geosciences. Carr (1994) gave a good
difficulty with the application of spatial statistics to regional introduction into the use of fractal methods in the geosciences.
geochemical data is that the data seldom exhibit stationarity. Cheng & Agterberg (1994) have shown how fractal methods
Stationarity means that the data have some type of location can be used to determine thresholds of geochemical distribu-
invariance, that is, the relationship between points is the same tions on the basis of the spatial relationship of abundance. They
regardless of geographic location. Thus, interpolation tech- have shown that where the concentration of a particular
niques such as kriging must be applied cautiously, particularly if component per unit area satisfies a fractal or multifractal model,
the data cover several geochemical domains in which the same then the area of the component follows a power law relation-
element has significantly different spatial characteristics. ship with the concentration. This can be expressed mathemati-
Evaluation of the variogram or the autocorrelation plots can cally as:
provide insight about the spatial continuity of an element. If the
autocorrelation decays to zero over a specified range, then this As # vd ~ ⳮ1
represents the spatial domain of a particular geological process As . vd ~ ⳮ2
associated with the element. Similarly, for the variogram, the
range represents the spatial domain of an element, which where A() denotes an area with concentration values greater
reaches its limit when the variance reaches the ‘sill’ value, the than a contour (abundance) value greater than . This also
regional variance of the element. Theoretically, at the origin (lag implies that A() is a decreasing function of . If is
= 0), the variance should be zero. However, typically, an considered the threshold value then the empirical model shown
element may have a significant degree of variability even at above can provide a reasonable fit for some of the elements.
short distances from neighbouring points. This variance is In areas where the distribution of an element represents a
termed the ‘nugget’ effect. continuous single process (i.e. background) then the value of
Figure 4 displays four semi-variograms for Zn from the remains constant. In areas where more than one process has
Batchawana lake geochemistry survey data covering an area of resulted in a number of superimposed spatial distributions,
36 E. C. Grunsky
there may be one or more values of defining the different

processes.
An example of the use of concentration v. area plots is
shown for As derived from lake sediments collected over the
Batchawana area. Plate 7 shows a colour contoured image of As
values superimposed on the sample sites, and, as well, a plot of
log10 As concentration v. log10 area occupied by each contour
interval. Distinct changes in the slope of the plot represent
breaks based on the spatial distribution of the data and each
break represents a threshold between populations of data
possibly derived from different processes. There are three
distinct trends shown on the concentration–area plot of Plate 7.
The regional background is characterized by a straight line of
points ranging from 0.7 (5 ppm) to 1.3 (20 ppm). Interpolated
As values greater than 5 ppm and less than 20 ppm are shown
as red, blue and cyan. This represents the regional background
of the area. The group of points that form a straight line from
1.3 (20 ppm) to 1.6 (40 ppm) represent the next population
reflecting As associated with mineralization and anthropogenic
effects. Anthropogenic effects are prevalent in the eastern part
of the map area, whereas As values associated with potential
mineralization are shown in the central and western part of
the map area. Values above 1.6 (40 ppm) represent a small
population of observations that are greater than 40 ppm
(shown as orange and red on the map). These observations Fig. 5. Scatterplot matrix of altered and unaltered metavolcnics from
occur in the SE portion of the map area and may represent the Ben Nevis area of Ontario. Carbonate altered rocks cluster
areas of mineralization. differently from the non-altered rocks.
Cheng et al. (2000) have also implemented the use of
power-spectrum methods to evaluate concentration–area plots of Li and Na. Chromium, Na, Ni, Cu and Co show greater
derived from geochemical data. By the application of filters, variability in the altered area. The greater variability is due to a
patterns can be detected related to background and noise, thus breakdown of the original mineralogy accompanied with the
enabling the identification of areas that are potentially related to addition of CO2, Si, Li, Cu and several other elements that are
mineralization. More details on this methodology can be found associated with hydrothermal activity and mineralization.
in Cheng (2006).
Lattice graphics
Multivariate data summaries Lattice graphics is a special graphics library in R that enables
Scatterplot matrix multivariate summaries of data for more effective visualization
The ScatterPLOt Matrix (SPLOM) is a useful graphical multi- and subsequent interpretation (Sarkar 2008). For example, a
variate method for visually assessing the relationships between correlation matrix can be expressed graphically as illustrated in
variables. When categorical information is available, colour can Plate 9: this is a graphical expression of the correlation matrix
be used to show differences between the categories. of the lithogeochemical data from the Ben Nevis, Ontario area.
Two areas were chosen from the Ben Nevis mapsheet (Plate The colour ramp, on the right side of the figure, gives the
8): one representing an area of carbonate alteration and the scale of the correlation coefficient 1 (blue) to +1 (red). Thus
other, an area of metavolcanics without carbonate alteration. the positive, negative and neutral associations of the elements
Figure 5 shows a scatterplot matrix of a selected number of can be quickly assessed.
elements from the two areas. The matrix highlights associations
and patterns in the data. There is a clear distinction between the DIFFERENTIATING GEOCHEMICAL
altered and unaltered observations for CO2 with Co, Cu and Cr. BACKGROUND FROM ANOMALIES
CO2 shows an overall increase for the altered specimens,
whereas the abundances of Cu, Cr and Co vary widely in a suite The recognition of a geochemical anomaly requires that a
of specimens from the carbonate alteration zone. The distri- geochemical background has been established, which in itself
bution patterns for these elements can be studied further using can be difficult to define. Geochemical values that depart from
other graphical measures such as box plots. the background, that is, those values which are atypical, may be
anomalous. Howarth & Sinding-Larsen (1983, p. 208) discuss
the concept of anomaly and suggest that anomalous concen-
Multiple box plots trations are those values that exceed a given threshold. Work-
In Figure 6, box plots for nine elements from the Ben Nevis shops held by the Association of Exploration Geochemists
lithogeochemistry data show that there are clear differences in (AEG) in 1983 and 1985 (Garrett 1984; Aucott 1987) failed to
the geochemistry between the two areas. Box plots are a give any formal definitions and concluded that an anomaly is a
convenient way of summarizing the differences between groups desired level of abundance in which the geologist has a
of data. Note that there is a distinct shift in the median value particular interest and is different from the regional or back-
data for CO2 and Li (an increase) and a corresponding decrease ground values. Joyce (1984, p. 15) discusses the definition of an
in Ca and Sr for the specimens from the altered area. This is anomaly in terms of an adequate definition of background.
consistent with studies that indicate that there is overall loss of Historically, values exceeding the 98th percentile were scru-
Ca and Sr in the zone of carbonate alteration, and an increase tinized for their potential to be identified as geochemical
Fig. 6. Box plots showing the character

of selected elements between the
altered and unaltered sites.
anomalies. As well, the threshold was defined as the mean 2 the basis of examination of Mahalanobis distance plots or some
standard deviations (Hawkes & Webb 1962; Howarth 1983, p. other more robust measure of background and departures from it.
208). This definition was based on the assumption of normality Observations from distributions that represent processes of
of the data. However, with the introduction of computer-based interest (mineralization or anthropogenic effects) usually over-
methods for evaluating geochemical data, the ability to study lap with observations from background distributions such that
sample populations and the nature of geochemical distributions the threshold is more likely a range of values where the two
has provided powerful tools for the identification of outliers distributions overlap. Rather than choose a specific threshold
and specimens that might be related to mineralization targets value, it may be better to assign a probability of the likelihood
(anomalies). As a result, the use of choosing thresholds based of an unknown specimen belonging to each population. In
on the calculation of the mean 2 standard deviations is no geochemical surveys, anomalies have a spatial association and
longer recommended (see Rose et al. 1979; Levinson 1980; are small and only occupy a fraction of the area that is covered
Garrett 1989b). Filzmoser et al. (2005) describe an approach to by the regional population.
outlier and anomaly detection using robust methods and Plate 10 shows the threshold as determined by a visual
adaptive techniques for recognizing outliers. inspection of the Q–Q plot. In this case, the threshold for K2O
is chosen at 2.5 %, which is considered above the usual range
of values for volcanic rocks. The values that exceed the
The threshold and pathfinder elements threshold can be identified on the map by choosing a symbol
size or colour to identify them.
An important goal of the investigation of geochemical data is the Mineral deposits are often characterized by a unique suite of
detection of spatially continuous zones of elevated values of a elements whose values exceed the threshold of the surrounding
strategic element that exceed a specified threshold value. Obser- background material. These elements are called pathfinder ele-
vations that exceed the threshold are termed ‘anomalies’. Joyce ments and often have a greater spatial extent relative to the target
(1984, p. 9–13) provides a detailed description of indicator and being sought. In the Ben Nevis metavolcanic sequence, K can be
pathfinder elements and minerals that can be used in exploration considered as a pathfinder element. Elevated values of K are
strategies. Garrett (1991) defined the threshold as the outer limit typically associated with epithermal Au deposits. Examination of
of background variation; the term ‘outer’ is used instead of the distribution of K2O in Plate 10b suggests that values above
‘upper’. This allows the definition to include both ‘upper’ and 2.5 wt% K2O are atypical and that value defines the threshold.
‘lower’ limits, as it is common in some geochemical environments The map of K2O values in Plate 10a indicates that K2O values
for depletion haloes to be as important as enrichment haloes. greater than 2.5 wt% are associated with the two known mineral
Reimann et al. (2005) further refined the definition of threshold occurrences as well as several other sulphide-bearing occurrences.
and background based on robust methods.
The concept of threshold can be extended from single
element to multi-element data by the use of multivariate statistical Outliers or anomalies?
methods such as the use of the Mahalanobis distance (Garrett An outlier can be defined as an observation with a value that
1989c). In the multivariate case, the threshold can be selected on is distinctively different from observations with which it is
38 E. C. Grunsky
intimately associated. If a threshold has been defined, then an

outlier, by default, exceeds the threshold. Outliers may be of
significance from an exploration or contamination point of
view. An outlier may define a mineralized zone (anomaly) or
a value that is above an accepted environmental background
level. Outliers can also be artefacts of erroneous analytical
results or data entries. An outlier can be identified as a
geochemical anomaly if it exceeds the threshold, is not the
result of an analytical problem, or assigned to an improper
population. In other words, an anomaly is associated with a
process of interest (alteration or mineralization), whereas an
outlier is a value without an interpretation that requires
further assessment.
Outliers should always be examined carefully to be certain
that the observed values are not the result of an error. An
observation that is an outlier in one group may be indistinguish-
able (masked) from other observations within another group.
In practice, outliers are assessed by a graphical examination of
the upper and lower rankings of the data and the identification
of observations that occur as distinct breaks from the back-
ground population. The application of a transformation may be
sufficient to separate the background from outliers.
Plate 11a shows a Q–Q plot of As from the lake sediment Fig. 7. Cobalt (ppm) in metavolcanics, Ben Nevis Township,
data. Arsenic, a pathfinder element, is commonly associated Ontario, Canada.
with gold deposits. An examination of the plot shows that
‘breaks’ occur at the approximate values of 20, 25 and 35 ppm. for Co has a lower limit of detection of 5.0 ppm and 85 out of
In comparison with the fractal approach, the break at 20 ppm the 824 observations fall below that limit. The histogram of
is equivalent to the abrupt change in slope in Plate 7, where the Figure 7a shows a bar with a high frequency of observations at
concentration–area plot identifies a distinct change in the data the lowest end of the scale. This bar represents the 85 values
population at a value of log10As=1.3 (19.95 ppm). These that are less than the detection limit. The Q–Q plot (Fig.7d)
breaks most likely represent distinct populations that can be shows these values as a flat part of the distribution at the left
attributed to different source lithologies. The breaks are used as side of the figure. The box and density plots (Fig. 7b, c) do
the basis for a change in symbol sizes on the map of Plate 11b. not show the censored values as clearly. Historically, censored
There are six extreme values that occur above the level of data were handled by applying a substitute value, somewhere
35 ppm, which is considered to be the threshold. These values between 1/3 to 1/2 of the actual detection limit. As the
can be considered as anomalies because of the break in the number of observations below the lld (censored) increases,
slope of the curve and the distance between these values and then this estimate will produce inaccurate estimates of the mean
the bulk of the population. These outliers would be of interest and variance (see Sanford et al. 1993).
in a mineral exploration programme. Several techniques have been developed to minimize the
In the case of two or more (multi-modal) populations it is problem of censored data. The problem of censored data becomes
necessary to decompose the populations into separate distinct more important when means of elements and covariances between
populations through the analysis of Q–Q plots, probability elements are required. Using an arbitrary ‘replacement’ value (i.e.
plots or by computer-based means (Sinclair 1976; Stanley 1987; 1/2 or 1/3 the lld) can introduce bias in the computation of the
Bridges & McCammon 1980). Garrett (1989c), Filzmoser et al. moments of the distribution. However, if the nature of the
(2005) and Filzmoser & Hron (2008) have developed methods distribution can be assumed as normal, then the replacement value
for outlier detection in multivariate data using a multivariate of the censored data and parameters of the distribution (mean,
outlier plot, which identify observations that appear to belong variance) can be estimated based on the portion of the distribution
to a population different from the main population. This has that is not censored. The process of finding suitable replacement
obvious benefits in evaluating geochemical data for observa- values is known as ‘imputation’ in the statistical literature. Esti-
tions associated with alteration or mineralization. mates of the distribution parameters are obtained using the EM
algorithm (Dempster et al. 1977), and is discussed by Chung (1985,
1988, 1989) and Campbell (1986). From these characteristics, an
Truncated and censored data estimate can be made as to how the data are distributed below the
When an analytical procedure detects the presence of an lld. The assumption of normality is essential for the EM algorithm
element, but the value is too low to be accurately quantified, to work. Campbell (1986) invokes an algorithm to transform the
the value is reported as ‘less than the limit of detection’ (lld). data to normality using Box-Cox. Sanford et al. (1993) have
The same applies for values that exceed the upper limit of developed a method that allows for the calculation of a suitable
detection. The lower/upper limits of detection are the limits of replacement value based on a maximum likelihood approach.
reliable quantification by the analytical procedure. Typically, a Helsel (1990) provides a detailed discussion on dealing with
laboratory will report the value prefixed with a ‘<’ for a value missing data in environmental studies. Chung (1985, 1989),
less than the lld or ‘>’ for a value that exceeds the upper limit Campbell (1986) and Lee & Helsel (2005, 2007) have published
of detection. When a group of values contains observations that computer procedures that estimate the mean and variance of
exceed the detection limits, the effect is called ‘censoring’. censored distributions by calculating a replacement value that is
Figure 7 shows the distribution of Co in metavolcanics derived from the characteristics of the uncensored portion of the
collected during a lithogeochemical sampling programme in the sample population. Dickson & Giblin (2007) have used self-
Ben Nevis township area of Ontario. The analytical procedure organizing maps as a means of finding suitable replacement values.
Robust estimation
The presence of extreme or atypical values in a sample popu-
lation can have a dramatic effect on the estimation of the mean
and variance, which in turn will affect the estimation of
correlation and covariance with other variables. As these
measures of association are used by many statistical techniques,
it is useful to minimize the influence of atypical observations.
Methods of robust estimation are primarily concerned with
minimizing the influence of observations that are atypical. There
are several methods for determining robust estimates of location
(mean/median) and scale (variance). Robust estimation proce-
dures can be applied to both single and multivariate populations.
Good reviews on robust statistics can be found in Venables &
Ripley (2002, Chapter 5.5) and Daszykowski et al. 2007).
Geochemical distributions are often positively skewed and
lognormal in appearance. The skewed nature is commonly
attributed to a mixture of different populations and/or the
presence of outliers. For such distributions, a robust estimate of
the mean will be less than the standard estimate of the mean
because the influence of the long tail and outliers is reduced.
Methods for robust estimation of location and scale include
trimmed means, adaptive trimmed means, dominant cluster
mode, L-estimates, M-estimates and Huber W-estimates (see
Grunsky 2006). Fig. 8. Ni in lake sediments, Batchawana area, Ontario.
Transformation of data (1979) provided a computer program for estimating parameters

for the generalized Box-Cox power transform based on the
Statistical testing and comparison between groups of data optimization of skew and kurtosis and the optimization of the
usually requires the estimation of means, variances and covari- maximum likelihood criterion of Box & Cox (1964). Lindqvist
ances. Most statistical procedures assume that the populations (1976) published a computer program (SELLO) for transform-
being tested are normal in nature. If there are outliers (extreme ing skewed distributions based on minimizing skew.
data values) or a mixture of populations (polymodal or skewed In EDA, transformations are useful in assessing whether
distributions) then the assumption of normality is violated. In outliers are the result of a non-normal frequency distribution or
right-skewed distributions (the most common effect observed are truly atypical values. The distribution should be examined
with geochemical data), estimates of the mean exceed the for outliers both before and after a transformation has been
median value. Similarly, the estimation of the variance is applied to the data. Once any outliers are eliminated, the data
inflated for a skewed distribution. The skewed nature of the should be re-examined for outliers as above until all are
data can be overcome by applying a suitable transformation identified and eliminated. Campbell (1986) prepared computer
that shifts the values of the distribution such that it becomes programs that account for atypical values in the estimation of
normally distributed. It has been common in the geological transformations and robust estimates of means and variances.
literature to apply logarithmic transformations to data as a way Stanley (2006) discusses the application of transformations to
to correct for a positive skew. The application of transforma- maximize geochemical contrast and improve data presentation.
tions to data should be carefully applied to avoid masking the Figure 8 shows the effect of applying four different trans-
presence of multiple populations and outliers (Link & Koch formations on Ni for lake sediments from the Batchawana area
1975). If transformations are applied to data to minimize the of Ontario. The data are represented on Q–Q plots. Figure 8a
effect of skewness, then Q–Q plots of the transformed data shows the untransformed data; Figure 8b shows the log10
should be examined for changes in slope or breaks in the line, transformation of the data; Figure 8c shows a square root
as these features might suggest the presence of two or more transformation; and Figure 8d shows a Box-Cox generalized
populations. transformation with a value of determined after the top 5%
Transformations that can be applied are: of the data were trimmed. The resulting value of =0.08 is
+ linear scaling close enough to zero that there is little difference between the
log transform of Figure 8b and 8d.
y = kx or y = sxi ⳮ x̄d ⁄ s Discussions on the application of transformations of geo-
chemical data have traditionally been based on raw analytical
where s is the standard deviation, values and the potential problems associated with closure have
+ exponential y=ex not been taken into account. Further research is required in this
+ Box-Cox generalized power transform field.
y = sx ⳮ 1d ⁄ , y = lnsxd for = 0.
LEVELLING GEOCHEMICAL SURVEY DATASETS
The linear scaling transformations do not change the shape of
the distribution; however, the degree of dispersion (variance) Regional exploration programmes and integration projects
can change. The logarithmic, exponential, and Box-Cox gener- often involve the assembly of diverse sets of data. A common
alized power transforms, or log10 modify both the shape and problem associated with the assembly of geochemical survey
the dispersion characteristics of the distributions and are the datasets is known as levelling. Levelling involves the adjustment
transformations most commonly used. Howarth & Earle of values of an element from one survey to be similar to the
40 E. C. Grunsky
values of another survey. This ‘similarity’ implies that the lem. Trepanier (pers. comm. 2006; Identifcation de domains
means, medians and variations are similar, or in other words, géochemiques à partier des levés régionaux de sediments de
have the same parametric characteristics. Levelling geochemical fond de lacs, Projet 2004–09. Presentation at the Consortium
survey data involves many assumptions and is mitigated by de recherche en exploration minérale) developed an iterative
many factors, which are discussed below. and adaptive method for levelling a large number of surveys.
In many geochemical studies, the integration of several The method assumes that, for each element, one set of survey
sets of data is necessary. Geochemical surveys may have data represents the standard by which all other surveys will be
been carried out over an extended period of time during levelled. All data are stored in a database and an automatic
which field sampling methods, sample preparation, methods procedure is invoked to search through and adjust the data
of digestion and analytical instrumentation may have for each element. The method is computationally intensive
changed. Thus, there is the potential for a large degree of and time-consuming.
heterogeneity in the data that is not based on the underlying As shown in Figure 9, there are four typical scenarios for
geology. It is not advisable to level the results of geochemi- levelling between two datasets. Note that in Figure 9, the values
cal data derived from different methods of collection that are plotted are the values at specified quantiles of the data
(media), preparation (digestion) or analytical methods. The (i.e. 5, 10, 15, . . . 90, 95th percentiles). The worst possible
detection limits may be different and there may be system- scenario is shown in Figure 9e where no levelling is possible
atic shifts between the groups of data. In order to use these because no linear relationship exists between the two sets of
data effectively, one or more sets of data must be adjusted. data. It is also possible that a non-linear shift or multiplier will
This is known as levelling. One set of data is chosen against level two datasets. Graphical inspection of quantile plots
which all other sets of data will levelled. The relationship of between two sets of data should be carried out prior to
each element is compared and an adjustment is made assessing the type of levelling required.
through the application of a linear transformation. Given an Daneshfar & Cameron (1998) have demonstrated a method
observation x, with (i =1, . . .n) variables, of levelling geochemical data described in Darnley et al. (1995)
that accounts for the geology that underlies geochemical data
yi = axi + b survey sites. The method requires the use of GIS and a
xi is the unadjusted variable for observation x, statistical package that computes quantiles and linear regression.
yi is the adjusted variable for observation x, A strategy for levelling several datasets involves the deter-
a represents the slope of the line in the transformation, mination of which dataset should be chosen for all of the other
b represents the intercept or additive adjustment. databases to be levelled against. The choice of this dataset, the
‘standard dataset’, will depend on the following factors: spatial
The adjustment can be determined through regression proximity of the two datasets; accuracy and precision of the
methods. Non-linear transformations may also be applied if standard dataset; and that the standard dataset contains enough
necessary. Figure 9 shows the types of levelling scenarios that specimens and enough elements so that the other datasets can
can be encountered. The x and y axis of each figure shows the be levelled to it.
values of the quantiles (values at 5, 10, 15, etc. percentiles) for The integration of geochemical survey datasets requires
the two variables. With exception of Figure 9e, each scenario the identification of several key parameters so that the data
shows a possible relationship that will permit levelling. Figure can be accurately interpreted, that is: type of media; method
9e shows a random association between the two variables and of preparation; method of digestion; method of analysis; and
in this case levelling is not possible. A detailed example of lower and upper limits of detection.
levelling geochemical data is provided below. If levelling involves geochemical datasets where these char-
There are several challenges in levelling data, the first of which acteristics are different then it may be unwise to attempt to level
is the choice of data against which to level everything else. the data. An alternative approach is to map the departure from
Considerable time should be spent on assessing the variability of the median or some other measure that characterizes individual
each element across all of the surveys to be levelled. There may or specimens against the distribution for a particular area. Non-
may not be one set of survey data that can be used as the spatial levelling is often required (i.e. adjusting location and
benchmark dataset, for all elements. Choosing when an element scale) to remove boundary effects and the comparison of
requires levelling must be carried out with caution. Comparing different analytical methods. The following discussion describes
values on maps using bubble plots can be misleading, unless the some of the challenges associated with levelling geochemical
data are evaluated using the same range and scaling. survey datasets.
Assembling a large number of geochemical surveys and The lower and upper limits of detection are commonly
evaluating the need for levelling can be a challenging prob- different between geochemical survey reports. This is due to
Fig. 9. Levelling scenarios for geochemical data.

Plate 1. General geology of the Ben Nevis Township area, Ontario, Canada.
Plate 2. General geology of the Batchawana area, Ontario, Canada.

42 E. C. Grunsky
Plate 3. Location of the soil survey

area, Island of Sumatra, Indonesia.
Plate 5. Density plot of arsenic versus gold displaying censoring and

quantization of the analytical data.
value for the lower limit of detection (lld) may become an

issue. A straight replacement method of a single value will
not be sufficient because the replacement value is used only
to ensure a better estimate of the mean and variance of the
data. Varying detection limits within a large dataset assem-
bled from many sources may create significant problems
when deciding on a replacement value. One approach is to
set the lower limit of detection at the weighted median value
for the range of llds in the dataset. A replacement value can
then be determined based on the number of observations
and associated llds.
Levelling geochemical survey datasets: an example

using lake sediments in Northern Ontario
Plate 12 shows sites for five different lake sediment surveys in
Plate 4. Lithologies of the Campo Morado area, Mexico. the Batchawana greenstone belt of Northern Ontario. These
five surveys were collected during the 1980s by Fortescue &
the nature of the method of analysis and the developments Vida (1989, 1990, 1991a, b). Hamilton (1995) describes the
in the analytical procedures that have taken place over time. results of the survey conducted by Fortescue in the Cow River
As the technology of geochemical analysis improves, the Area. The area is an Archaean volcano-sedimentary terrane
lower limits of detection also decrease. Thus, when merging within the Abitibi-Wawa subprovince of the Superior Province.
geochemical survey datasets, the choice of a replacement The geology of the area is described by Grunsky (1991).
Plate 6. (a) Exploratory data analysis

of arsenic in lake sediments, Bathawana
area, Ontario. (b)Arsenic (log10 ) in
lakes sediments, Batchawana area,
Ontario.
44 E. C. Grunsky
Plate 7. Arsenic from lake sediments, Batchawana area, Ontario. The contoured image reflects the area associated with each As contour level.
The corresponding concentration–area plot display changes in slopes, which reflect changes in spatial patterns. These changes are associated in
differences in geology, anthropogenic effects and mineralization.
Plate 8. Map of altered/unaltered sampling sites in the Ben Nevis Township area.
Regional lake sediment surveys were carried out in five out over several years and the methods of analysis were
areas: Pancake Lake, Trout Lake, Hanes Lake, Montreal similar for all five datasets. However, a levelling problem
River and Cow River. The sampling programme was carried does exist amongst the survey areas. The greatest difference
alkaline volcanics, sediments and granitoid rocks. Plate 13 shows a

map of Zn values throughout the region. The levels of Zn in the
Cow River area (NE corner) are high relative to the other areas.
There are a number of high Zn values within the centre of the
volcanic sequence and these could be considered legitimate.
However, the Cow River background Zn values appear to be
10–20 ppm higher than the background for the adjacent areas.
Using the approach outlined by Daneshfar & Cameron
(1998), a quantile regression technique was applied. The
procedure involves selecting ‘bands’ of specific distances (5, 10,
15, 20, 25 km, or some suitable scale depending on the nature
of the surveys) between adjacent map sheets from which
quantile regression is carried out for each of the bands. The
Plate 9. Correlation matrix expressed in terms of colour. The scale

bar on the right of the matrix provides the measure of correlation
based on colour.
between geochemical data exists between the Cow River

map sheet and the adjacent Montreal River and Hanes Lake
survey areas.
Figure 10 shows the range of values for Zn over the five areas
in the Batchawana area. The interquartile range, shown in the solid
box, is significantly higher for the Cow River data than for the
other survey areas. However, the Cow River area also contains
abundant mafic volcanic rocks of tholeiitic affinity that would
naturally tend to have higher Zn values relative to the other survey
areas which are composed of a mixture of tholeiitic and calc-
Plate 10. K2O map across Ben Nevis Township. Separation of atypical K2O values.
46 E. C. Grunsky
Plate 11. Map of atypical As (ppm) across the Batchawana area, Ontario.
Plate 12. Lake sediment survey sites across the Batchawana area, Ontario.
reasoning for choosing bands is that an optimum distance, Plate 14 shows the selection of bands that were made for
which results in the selection of an optimal number of levelling the Cow River survey area against the Hanes Lake
specimens, will result in a best-fit quantile regression formula survey area. Bands were selected at the 5, 10, 15, 20 and 25 km
for levelling. ranges in a north–south direction.
Plate 13. Unlevelled Zn values in lake sediments, Batchawana area, Ontario.
Plate 14. Band selection for quantile regression. Zn in lake sediments, Batchawana area, Ontario.
48 E. C. Grunsky
Plate 15. Levelled Zn values after applying quantile regression based on the 25 km band selection. See text for a detailed explanation.
D = o wi fsqide ⳮ sqide⬘g2 where
wi is the assigned weight to the ith quantile,

(qi )e is the ith quantile in band of width e
(qi )e# is the ith quantile in band of width e# in the adjacent
map sheet
e is the width of the band expressed as a measure of distance
(i.e. m or km).
The weights favour quantile pairs at or near the median
(50th percentile) of the distribution and are based on the
ordinates of a normal distribution (weight for the median value
= 0.399). These weights are listed in Table 2.
The work by Daneshfar & Cameron (1998) was originally
carried out in British Columbia where the adjoining map sheets
show broad geological similarity. When the same approach was
tried in the Batchawana area the selection of bands of appro-
priate size became problematic.
Because of the deformed nature of the rocks and the sub-
vertical stratigraphy, there is a significant variation in geochemical
character over short distances. Figure 11a shows the results of
the values of D applied to the five band selections and it is clear
that the 5 km and 25 km bands have the lowest D values. The
Fig. 10. Boxplots of Zn from the five survey areas, Batchawana area, difference in D values for the different band selections is mostly
Ontario.
due to the diversity of lithologies associated with each band. For
the 5 km band, the lithologies are similar on both sides of the
For each of these bands, a linear regression was carried out. survey boundary: mafic volcanic and granitoid rocks. However, for
A measure, D, is used to determine which band provides the the 10, 15 and 20 km bands, Plate 14 shows that there is a range of
best quantile regression. D is defined as: lithologies within the bands between the two surveys and the
Table 2. Weights used for quantile regression in levelling geochemical data.
Regression weights
Quantile 5 10 20 30 40 50 60 70 80 90 95
Weight 0.103 0.175 0.28 0.348 0.386 0.399 0.386 0.348 0.28 0.175 0
In Daneshfar & Cameron (1998) the weight for the 95th

percentile was chosen as 0.103. For this application, many of the
values for the Cow River Zn data were atypical and represented
a group of specimens unique to Zn mineralization within the
mafic volcanic sequence. There was no equivalent Zn response in
the Hanes Lake survey area. Thus, the 95th percentile weight was
changed from 0.103 to zero so that the effects of these large Zn
values did not bias the levelling of the background.
The values of D, regression coefficients (intercept, slope)
and plot of the quantiles for the 5 km band selection are shown
in Figure 11b and for the 25 km band selection in Figure 11c.
From the two plots, it can be seen that the 25 km band is a
better fit and the results from this regression were used to
adjust the Zn values in the Cow River survey area. Note that
the results of this regression are equivalent to the shift and
multiplier effect as shown in Figure 9d.
The results of applying the regression to the Cow River
survey data for Zn are shown in Plate 15. The levelling
procedure has had a significant effect on the lower values of Zn
in the granitoid terrane but left the upper values, associated
with the mafic volcanic rocks and some Zn rich zones within
the volcanic sequence, relatively unaffected.
Levelling, using GIS and statistical procedures can produce
an optimal result and a combination of these tools is a
recommended way to level geochemical survey data.
MULTIVARIATE DATA ANALYSIS TECHNIQUES

Multivariate data analysis techniques such as PCA, cluster analy-
sis, non-linear mapping and projection pursuit regression pro-
vide numerical and graphical means by which the relationships
of a large number of elements and observations can be studied.
These techniques typically simplify the variation and relation-
ships of the data in a reduced number of dimensions, which
may commonly be tied to specific geochemical/geological pro-
cesses. The basics of multivariate data analysis techniques can
be found in Jöreskog et al. (1976), Howarth & Sinding-Larsen
(1983), Krzanowski (1988), Reyment & Jöreskog (1993) and
Davis (2002). Mellinger (1987) provides a systematic approach
to the application of multivariate methods in geological studies.
Other methods include non-linear mapping (Sammon 1969),
projection pursuit (Friedman 1987), multi-dimensional scaling
(Kruskal 1964) and self-organizing maps (Kohonen 1995). A
recent technique, independent components analysis (Comon
1994), is similar to the method of projection pursuit.
Incorporation of the spatial association with multi-element
geochemistry involves the computation of auto- and cross-
correlograms or co-variograms. This field of study falls into the
Fig. 11. Selection of optimum band width and quantile regression
for Zn in lake sediments, Batchwawana area, Ontario. realm of geostatistics, which is not covered in this contribution.
A number of texts are available that provide details on
geostatistics (David 1977, 1988; Journel & Huijbregts 1978;
lithologies are most dissimilar for the 15 km band. At the 25 km Isaaks & Srivastava 1989).
band, it is not surprising that the D value is lowest for the similar Grunsky (1986a) employed the use of PCA and clustering
range of lithologies between the two survey areas and was thus the methods to evaluate the lithogeochemistry of Archaean vol-
best band for the quantile regression methodology. canic terrains from which a number of geological processes
Quantile regressions were computed for both the 5 and were inferred, ranging from primary compositional variation to
25 km bands (Fig. 11 b, c) using the weights for each quantile, alteration and associated mineralization. This is discussed in
which are shown in Table 2. greater detail below.
50 E. C. Grunsky
Fig. 12. Quantile–quantile plots of log-centred major and trace elements for the Ben Nevis lithogeochemical data.
Multivariate techniques that have been developed specifi- one of the individual variables. These must be discarded or have
cally for geochemistry include various empirical techniques some suitable replacement value. Additionally observations that
such as the chalcophile and pegmatophile indices developed by are censored (less than the detection limit) must have a proper
Smith & Perdrix (1983), which were used to outline areas of replacement value as discussed previously. Campbell (1980) gave
potential base and precious metal mineralization in the Yilgarn some early insight into the application of robust procedures in
craton of Western Australia. multivariate analysis. Venables & Ripley (2002, p. 336) provided
a good discussion on robust estimation methods.
Two methods can be used to obtain robust multivariate
Robust estimation of mean and covariance matrices estimates of means and covariance:
Many multivariate methods require estimates of correlation or
1. Minimum Volume Ellipsoid (MVE). A multivariate method of
covariance so that interrelationships between the variables can be
determining means and correlations/covariances with mini-
quantified. Estimates of correlation/covariance are sensitive to
mal effect from outliers based on finding a hyperellipsoid
the presence of outliers in the data that can bias the results. The
that contains a subset of ‘good’ observations that minimize
influence of outliers can be reduced by applying robust methods
to the estimation of the means, correlations and covariances the volume of the ellipsoid. A geochemical application of
between variables. In multivariate analysis, the distance of an this method is given by Chork (1990).
observation to a centroid is estimated by the Mahalanobis 2. Minimum Covariance Determinant (MCD) Estimatio. This
distance which depends on an estimate of the multivariate mean method works by minimizing the determinant (a measure of
and covariance. The Mahalanobis distance is defined as: ellipsoid volume) of the covariance matrix based on a
symmetric Gaussian hyperellipsoid. The method is faster
D 2 = fx ⳮ x̄g⬘Cⳮ1fx ⳮ x̄g than the minimum volume ellipsoid but has a lower
where: breakdown point (Rousseeuw & van Driessen 1999). The
x is a vector of variables for a given observations; determinant is based on a minimum number of ‘good’
observations. As the determinant decreases, the dispersion
x̄ is a vector of the group mean;
of the ellipsoid decreases with a corresponding drop in the
C1 is the inverse of the covariance matrix.
There are many techniques for determining robust estimates estimates of central values, resulting in a ‘robust’ estimate.
of mean and variances for individual populations (Rock 1987, If there are many observations with values at the same
1988). Robust estimates can be determined for each individual detection limit, a condition of collinearity occurs, which has a
variable or simultaneously for all variables. Multivariate estimates direct effect on the covariance matrix. If there are too many
are affected by observations with missing values (no value) in any identical observations, the method fails. However, by increasing
the number of observations, the methods will generate less melting, crystal fractionation, etc.), alteration/mineralization
robust estimates. In the case of non-normal skewed distribu- (carbonatization, silicification, alkali depletion, metal associ-
tions, the means and covariances will be affected. This type of ations and enrichments, etc.) and weathering processes
problem is typically encountered when a percentage of the (bedrock–saprolite–laterite). In lithogeochemical, weathered
observations have elements with abundances below the detec- profile, lake sediment and stream sediment surveys, the first and
tion limit (censored data) and increases the likelihood of second components commonly reveal relationships of observa-
collinearity problems. tions and variables that reflect underlying lithological variation.
An example of applying multivariate robust estimates is In areas of thick overburden such as glacial till, alluvium or
shown in Table 3 where estimates of the mean for 12 elements colluvium, the linear combinations of variables and the plots of
are given for 825 lithogeochemical observations from the Ben the loadings may not be so easy to interpret as they may reflect
Nevis Township lithogeochemical data set. In this table, only a mixture of several surficial processes.
estimates of the mean are shown. Classical estimates of the Maps of the principal component scores of the observations
mean, based on univariate statistics, multivariate classical esti- can be useful in understanding geochemical processes. If a
mate, minimum volume ellipsoid and minimum covariance component expresses underlying lithologies, then a map of that
determinant methods are shown. Compared with classical component will clearly outline the major lithological variation
methods of estimation, the robust estimate tends to minimize of the area. Components that outline other processes such as
the effect of those distributions that are skewed. mineralization or alteration can also be expressed clearly on
For the minimum covariance determinant method, two maps that display the component scores (e.g. Grunsky 1986a).
estimates are shown based on two groups of ‘good’ observa- The measure of association, or metric, can have a significant
tions. The initial estimate for the MCD used 419 observations effect on the derivation of principal components. Covariance
based on an initial starting formula of (825 observations + 12 relationships between the elements reflect the magnitude of the
variables + 1)/2. Because of the large number of observations elements and thus elements with large values tend to dominate
with values at the detection limit, the initial MCD estimate was the variance–covariance matrix. This has the effect of increas-
singular. The MCD was applied using 540 and 800 observa- ing the significance of these elements in the results of the PCA.
tions. Table 3 shows that as the number of ‘good’ observations The correlation matrix represents the inter-element correla-
increases, the mean value tends towards the standard estimate- tions, which is actually the standardized equivalent of the
where the effect of the long tailed skewed distribution increases variance–covariance matrix. Other metrics of association can
the estimate of the mean for several elements. be used and this is discussed by Jöreskog et al. (1976) and Davis
(2002). If the distributions of the elements are non-normal or
PRINCIPAL COMPONENT ANALYSIS there is a presence of outliers the estimates of correlation/
covariance may be affected and it may be necessary to apply
The objective of Principal Component Analysis (PCA) is to robust procedures (Zhou 1985, 1989).
reduce the number of variables necessary to describe the In situations where there are outliers or atypical observa-
observed variation within a dataset. This is achieved by forming tions, or where the marginal distributions are not normal, a
linear combinations of the variables (components) that describe number of choices can be made:
the distribution of the data. These linear combinations are
derived from some measure of association (i.e. correlation or 1. If the marginal distribution is censored, find a suitable replace-
covariance matrix). Davis (2002, Chapter 6) gives a very readable ment value so that the mean and variance is a good estimate of
account of the mathematics of PCA. More complete discussions the population mean and variance. This can be done by:
on the theory and application of PCA can be found in in Jöreskog a) assigning a replacement value that is c. ½ to % the
et al. (1976), Jolliffe (2002) and Jackson (2003). Appendix 2 censored value;
provides a simple geometric description of PCA. b) using statistical procedures to estimate (impute) a
A method of PCA known as simultaneous RQ-mode replacement value based on the statistical characteristics
principal component analysis (Zhou et al. 1983) has the of the uncensored portion of the data (i.e. the EM
advantage of presenting the principal component scores of the method) discussed previously.
observations and the variables (elements) at the same scale,
which permits plots of the observations and variables on the 2. If there are outliers present:
same diagram. This method is similar to the biplot method of a) remove the outliers from the calculation for means and
Gabriel (1971). The interpretation of the results of PCA is covariances;
usually oriented on placing a geological/geochemical interpreta- b) apply robust procedures that minimize or eliminate the
tion on the linear combinations of elements (loadings) that effect of these values.
comprise the components. This method has been implemented
in the S programming language (Grunsky 2001). Rare events, such as mineral occurrences or deposits,
Ideally, each principal component might be interpreted as are usually under-represented in regional geochemical
describing a geological process such as differentiation (partial survey sampling schemes. A chemical signature that may be
Table 3. Robust and non-robust estimates of central values, Ben Nevis Township lithogeochemistry.
Method Ba Co Cr Cu Li Ni Pb Zn Sr V Y Zr
Univariate mean 208 23 83 56 17 78 17 89 135 132 24 132
Classical robust estimate 208 23 83 56 17 78 17 89 135 132 24 132
Univariate median 170 24 68 42 14 85 5 74 120 150 21 130
Minimum volume ellipsoid 194 22 81 38 15 78 7 73 140 139 26 138
Minimum covariance determinant 800 observations 207 23 84 47 17 79 10 78 136 133 24 132
Minimum covariance determinant 540 observations 198 22 82 39 15 79 6 73 140 139 25 136
52 E. C. Grunsky
PC1. Rocks reflecting felsic metavolcanic rocks (Si, Zr, Ba, K, Y,

Al) plot on the negative side of PC1. Observations with relative
enrichment in CO2, S, Li, Pb and Cu, plot along the positive side
of the C2 axis. Figure 14 is a biplot of the first and third
components where samples with relative enrichment in S and Cu
plot along the negative side of the PC3 axis.
Examination of the relative contributions for the first
component shows that elements such as Si, Al, Mg, K, Ba, Co,
Cr, Ni, V and Zr are accounted for primarily by this compo-
nent. The actual contribution shows that the variation is spread
almost equally amongst Si, Mg, K, Ba, Co, Cr, Ni, V and Zr
within the first component (see Table 4). The relative contri-
butions of the second component suggests alteration of the
volcanic rocks with high loadings for CO2, S, Li, Sr, Ti, Na, Ca,
Fig. 13. Biplot of the first two principal components for the Ben Fe3+ and Al. The relative contributions of the third component
Nevis lithogeochemical log-centred data. suggest alteration associated with more mafic rocks as indicated
by Fe2+, Mn, CO2, S, H2O+, Cu and Li.
The Q-mode scores were interpolated to a 100 m resolution
grid by kriging. Plate 16 shows an interpolated image of the first
principal component. The distinction between the mafic and
felsic volcanic rocks is evident by the colour map of the image.
Green and blue areas are associated with felsic rocks and red to
yellow areas are associated with mafic rocks as shown in the
relationships of the observations and elements in Figure 13.
Plate 17 shows an image of the second principal component,
which accounts for 11% of the variation in the data. The plot
of PC1 v. PC2 in Figure 13 shows that the second component
has Cu, Li, S, Pb and CO2 associated with positive values of
PC2. The image of Plate 17 shows that areas in red–yellow
correspond to the zones of carbonate alteration and minerali-
zation that are present around the Canagau Mines deposit and
the Croxall property.
Fig. 14. Biplot of the first and third principal components for the Plate 18 is an image of the third principal component (7.8%
Ben Nevis lithogeochemical log-centred data.
of the variation in the data). Areas associated with S and Cu
enrichment are evident, most notably around the Canagau
diagnostic of a unique geological event may show up as a Mines Cu–Au deposit in the eastern part of the image. These
linear combination of elements with a lesser principal com- areas are also adjacent to areas of CO2, Li, and Zn enrichment,
ponent. Thus, it is important to scan all of the components to which represent altered and mineralized country rocks that
check for such features. surround the S–Cu zones of relative enrichment. Figure 14
The following examples illustrate the use of PCA from the shows that positive values correspond with areas of increased
Ben Nevis metavolcanic data (see Plate 1). As it is a ‘compo- CO2, Li, and Zn enrichment and negative values with S and Cu
sitional’ set of data, it sums to a constant (100%). The data enrichment.
were transformed using the logcentred transformation method Much more information can be obtained by examining all of
described previously. The distributions for these transformed the principal components. Other components exhibit zoning of
variables are shown in Figure 12. Ca around the main zone of carbonate alteration and K has an
The results of the PCA are shown in Table 4 where the association with S at the mineral occurrences. The fourth
eigenvalues, R-mode loadings, as well as the relative and actual component highlights the relationship between Zn and S at
contributions of the variables are presented. Results are shown both the Canagau and Croxall properties. However, the illus-
for the first seven components only, which accounts for more tration of the first three components shows that PCA is an
than 72% of the variation in the data. The accompanying effective method for exploring the structure of the geochemical
screenplot displays the successive eigenvalues for all of the data and assisting in deriving models of geochemical processes
components. by the use of graphics and geographic representation.
The R-mode loadings are the eigenvectors scaled by multi- PCA has many different uses in evaluating geochemical data,
plying, in order, each of the eigenvectors by the square root of including the development of empirical indices for specific
the eigenvalues. The first component accounts for 34% of the element targeting (see sections on Empirical indices and
overall variation of the data as shown by the eigenvalues. The Weighted sums).
relative and actual contributions shown in Table 4 provide
details on the relative significance of the variables. The relative CLUSTER ANALYSIS METHODS
contribution is the contribution that a variable makes over all of
the components. The actual contribution is the contribution Cluster analysis methods are useful as an exploratory tool for
that a variable makes within a given component. detecting groups of multi-element data that may not be readily
Biplots of PC1 v. PC2 and PC1 v. PC3 are shown in Figures observable in simple scatter plots or through the use of
13 and 14, respectively. The scores of the observations are shown methods such as PCA. The main objective of clustering
as crosses and the scores of the elements are shown as their algorithms is to identify distinct natural groupings within
name. Figure 13 (PC1 v. PC2) shows that the compositions of multi-dimensional data. Clustering methods can be broadly
the mafic (Ni, Cr, Co, Mg, Fe) rocks plot on the positive side of divided into hierarchical and non-hierarchical methods. The
Table 4. Principal components analysis of Ben Nevis lithogeochemical data. Analysis

carried out on log-centred data.
Eigenvalue
PC1 PC2 PC3 PC4 PC5 PC6 PC7

8.93 2.86 2.03 1.56 1.28 1.15 0.99
% 34.38 11.00 7.83 6.03 4.94 4.41 3.80
^% 34.38 45.38 53.22 59.24 64.18 68.59 72.39
R-Loadings values <0 in italics

SiO2 0.87 0.26 0.03 0.06 0.04 0.06 0.11
Al2O3 0.72 0.48 0.01 0.07 0.18 0.15 0.18
Fe2O3 0.17 0.48 0.16 0.55 0.01 0.06 0.18
FeO 0.63 0.15 0.46 0.25 0.03 0.03 0.11
MgO 0.86 0.03 0.16 0.09 0.19 0.14 0.02
CaO 0.40 0.47 0.01 0.40 0.25 0.28 0.10
Na2O 0.36 0.44 0.06 0.40 0.15 0.15 0.04
K2O 0.69 0.19 0.08 0.03 0.34 0.16 0.27
TiO2 0.43 0.60 0.02 0.12 0.02 0.14 0.08
P2O5 0.12 0.29 0.10 0.01 0.14 0.79 0.24
MnO 0.20 0.25 0.57 0.01 0.47 0.31 0.07
CO2 0.35 0.42 0.37 0.51 0.24 0.16 0.02
S 0.30 0.49 0.41 0.28 0.37 0.07 0.07
H2Op 0.47 0.07 0.43 0.38 0.27 0.03 0.30
Ba 0.76 0.00 0.06 0.03 0.39 0.16 0.20
Co 0.88 0.15 0.11 0.03 0.05 0.04 0.05
Cr 0.86 0.03 0.03 0.12 0.02 0.20 0.11
Cu 0.31 0.29 0.55 0.24 0.18 0.15 0.04
Li 0.06 0.49 0.56 0.10 0.39 0.09 0.20
Ni 0.92 0.04 0.08 0.10 0.07 0.08 0.05
Pb 0.47 0.33 0.04 0.17 0.14 0.12 0.44
Zn 0.16 0.01 0.31 0.40 0.02 0.16 0.55
Sr 0.11 0.53 0.28 0.21 0.21 0.05 0.24
V 0.80 0.07 0.22 0.04 0.13 0.13 0.07
Y 0.67 0.37 0.21 0.13 0.27 0.13 0.02
Zr 0.80 0.22 0.16 0.13 0.09 0.13 0.00
Relative Contributions values <10 in italics Actual Contributions values <10 in italics
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC1 PC2 PC3 PC4 PC5 PC6 PC7
SiO2 76.63 6.93 0.11 0.42 0.16 0.38 1.13 SiO2 8.57 2.42 0.05 0.27 0.12 0.33 1.15
Al2O3 51.37 23.50 0.01 0.46 3.28 2.32 3.21 Al2O3 5.75 8.21 0.00 0.29 2.56 2.02 3.25
Fe2O3 2.82 22.97 2.57 30.83 0.02 0.41 3.21 Fe2O3 0.32 8.03 1.26 19.68 0.02 0.36 3.24
FeO 40.09 2.39 20.99 6.04 0.10 0.08 1.23 FeO 4.48 0.84 10.31 3.86 0.08 0.07 1.25
MgO 74.13 0.08 2.57 0.85 3.52 1.92 0.05 MgO 8.29 0.03 1.26 0.54 2.74 1.67 0.05
CaO 15.68 21.67 0.01 16.36 6.45 8.04 0.94 CaO 1.75 7.57 0.01 10.44 5.02 7.01 0.95
Na2O 13.08 18.99 0.42 16.27 2.17 2.15 0.19 Na2O 1.46 6.64 0.21 10.39 1.69 1.87 0.19
K2O 48.18 3.78 0.66 0.09 11.60 2.44 7.38 K2O 5.39 1.32 0.32 0.05 9.04 2.13 7.47
TiO2 18.84 35.64 0.03 1.51 0.05 1.87 0.60 TiO2 2.11 12.46 0.01 0.96 0.04 1.63 0.61
P2O5 1.53 8.51 1.03 0.01 1.89 62.58 5.68 P2O5 0.17 2.97 0.51 0.01 1.47 54.55 5.75
MnO 4.19 6.41 32.38 0.01 22.15 9.47 0.43 MnO 0.47 2.24 15.90 0.01 17.25 8.25 0.44
CO2 12.34 17.52 13.93 25.77 5.67 2.47 0.05 CO2 1.38 6.12 6.84 16.45 4.42 2.15 0.05
S 8.95 24.51 16.44 7.59 13.50 0.43 0.54 S 1.00 8.57 8.07 4.84 10.52 0.38 0.55
H2Op 21.91 0.53 18.59 14.41 7.24 0.07 8.99 H2Op 2.45 0.19 9.13 9.20 5.64 0.06 9.10
Ba 57.25 0.00 0.37 0.09 15.04 2.67 3.82 Ba 6.41 0.00 0.18 0.06 11.71 2.33 3.86
Co 78.41 2.27 1.10 0.06 0.24 0.14 0.24 Co 8.77 0.79 0.54 0.04 0.19 0.12 0.24
Cr 74.12 0.08 0.12 1.35 0.04 4.06 1.18 Cr 8.29 0.03 0.06 0.86 0.03 3.54 1.20
Cu 9.44 8.52 30.45 6.00 3.17 2.19 0.15 Cu 1.06 2.98 14.95 3.83 2.47 1.91 0.15
Li 0.38 23.88 31.60 0.97 15.43 0.79 4.04 Li 0.04 8.35 15.52 0.62 12.02 0.69 4.08
Ni 84.74 0.16 0.59 1.07 0.55 0.66 0.25 Ni 9.48 0.05 0.29 0.68 0.43 0.58 0.26
Pb 22.52 10.69 0.16 2.75 1.99 1.50 18.97 Pb 2.52 3.74 0.08 1.75 1.55 1.31 19.20
Zn 2.69 0.00 9.60 15.86 0.05 2.66 30.15 Zn 0.30 0.00 4.72 10.13 0.04 2.32 30.52
Sr 1.24 27.99 8.04 4.41 4.47 0.27 5.80 Sr 0.14 9.78 3.95 2.81 3.48 0.24 5.87
V 64.40 0.46 4.92 0.18 1.60 1.61 0.54 V 7.20 0.16 2.42 0.12 1.24 1.40 0.55
Y 45.31 13.64 4.50 1.62 7.16 1.74 0.04 Y 5.07 4.77 2.21 1.03 5.58 1.52 0.04
Zr 63.61 4.98 2.45 1.69 0.85 1.80 0.00 Zr 7.12 1.74 1.20 1.08 0.66 1.57 0.00
54 E. C. Grunsky
following example shows the use of k-means clustering as a set of dimensions for viewing the multi-element associations of
method for partitioning multivariate geochemical data. Davis the data and thus provide additional visual assistance in
(2002) is a good introductory review of clustering methods; examining grouped associations.
Sinding-Larsen (1975) used clustering methods for the initial K-means clustering was applied to the logcentred trans-
subdivision of a heterogeneous geochemical area; Jaquet et al. formed Ben Nevis township metavolcanic data. The number of
(1975) gave a detailed analysis of lake sediment geochemistry clusters was set at 10, based on the perceived variation in the
using clustering procedures; Howarth & Sinding-Larsen (1983) rock types (felsic metavolcanics, mafic volcanics, mafic intru-
provided a general discussion of clustering methods applied to sions, granite) as well as the two known mineralization zones
geochemical exploration; and Grunsky (1986a) has shown how that have surrounding alteration. The results of the clustering
dynamic cluster analysis (Diday 1973) was used to detect are shown in Plate 19. Each observation is labelled with the
different types of mineralization based on distinct geochemical group number to which it was assigned. Several clusters
differences between the mineral occurrences. The use of fuzzy (Groups 1, 2, 5, 6, 8 and 10) are associated with the distinctions
clustering methods in geochemistry was introduced (Bochang between mafic and felsic metavolcanic rocks. Groups 3 and 9
& Xuejing 1985). are directly associated with mineralization. Observations that
Hierarchical clustering is based on the linking of variables belong to these groups occur where there is known minerali-
(R-mode) or observations (Q-mode) through measures of zation. There are also two clusters associated with carbonate
similarity. The relationships between the variables or observa- alteration (Groups 4 and 7), which occur in the eastern part of
tions can be graphically expressed using a dendrogram. Indi- the map area. It is apparent that the observations assigned to
vidual clusters can be discriminated by choosing an appropriate each group not only share similar geochemical characteristics
value of linkage, which separates internally similar groups of but also have close spatial associations, as shown in Plate 19.
objects into dissimilar groups. Hierarchical clustering assumes
that all variables are linked at some level, which may not be a
reasonable assumption in some instances. Multivariate ranking using the Mahalanobis distance: a
The correlation coefficient (R-mode) is the most common multivariate extension of Q–Q plots
measure of similarity for clustering. For Q-mode analysis The use of the covariance matrix as a tool for distinguishing
(similarities between the observations), the Euclidean distance background from anomalous populations is well established
can be used as a measure of proximity by which observations in geochemical research (Garrett 1989c, 1990; Chork 1990).
can be clustered. However, when the number of observations is Filzmoser et al. (2005) have written a library of routines
large the computation becomes intractable. (‘mvoutlier’) that is available as part of the R environment
Arbitrary origin methods are non-hierarchical and may offer (www.r-project.org/cran). The covariance matrix contains
some advantage over hierarchical methods since the clusters are information on the variability of the elements as well as their
formed based on multivariate similarities (proximities) rather inter-relationships. The multi-element data constitute a
than individual correlation coefficients. These methods start hyper-ellipsoid in multi-dimensional space. The mean value
with an initial number of cluster centres that can be specified or of each element defines the centroid of this hyper-ellipsoid
randomly chosen. Each observation is allocated to one of the and the distance from each observation point to the centroid
groups based on proximity to the group centres. The process is is the Mahalanobis distance. In a multivariate normal popu-
iterative and group centres change until a stable solution results. lation, most observations lie within an expected radius of the
Methods such as K-means (McQueen 1967; Everitt 1974; centroid, which defines the background group of observa-
Hartigan 1975) or dynamic cluster analysis (Diday 1973) are tions. However, if outliers are included in the data, the shape
examples of these techniques. Kaufman & Rousseeuw (1990) of the hyper-ellipsoid will change. This resulting distortion
also describe a number of clustering methods. affects the location of the centroid and thus affects the
Mahalanobis distance for all of the observations. In such
cases, the application of robust procedures is recommended.
K-means clustering Outliers can be distinguished from the main background
K-means cluster analysis is a method that starts with an initial population by determining the Mahalanobis distance of each
‘guess’ of the cluster centres. The distance of each observation observation from the group centroid. The distances can be
from each cluster centre is measured and then provisionally compared to the ‘expected’ distances of a multivariate normal
assigned to the closest cluster centre. A new cluster centre is population (cumulative probability with the number of
calculated based on the designated observations for each degrees of freedom defined as the number of variables) by
previous centre. The process is iterative until it converges on the use of 2 values as defined by Garrett (1989c). If the
stable centres. The method requires an initial choice of the population is multivariate normal, then the plotted pairs form
number of cluster centres. If the number is too great, there will a straight line. If the population contains outliers, then the
be many small clusters that have few points. If the number of observed Mahalanobis distances (D2 ) are greater than the
centres is too few, then the structure of the data may not be expected 2 quantiles and the plot becomes non-linear.
realized. A disadvantage of the procedure is that a less than However, the 2 distribution is long-tailed near the extreme
optimal clustering may result if the initial cluster centres do not ends of the distribution and this property may mask outliers
fall in distinct clusters (Davis 2002, p. 500). Venables & Ripley with large Mahalanobis distances. An alternative to the use of
(2002) provide a method by which a suitable number of the 2 values is the cubed root of a normal distribution, which
starting clusters may be determined by using a combination of does not have the long tail property of the 2 distribution and
hierarchical clustering and PCA. is thus less likely to mask outliers.
It is common to apply non-hierarchical clustering methods The lake sediment survey data from the Batchawana area of
to principal component scores. If one or more principal Ontario were evaluated for the potential to host Cu, Zn and
components can be inferred to represent specific geological/ precious metal deposits. A suite of elements (Cu, Zn, As, Sb and
geochemical processes, then the application of cluster analysis W) was chosen to test the possibility that these elements could
can provide further insight in how those processes may be identify potential mineral deposits. For these data, censored
related. Additionally, the component plots provide a reduced values were replaced with estimates from the EM method for
Fig. 15. Mahalanobis distance (D2 ) plots of a multi-element suite (Cu, Zn, As, Sb, W) of lake sediment data. Successive trimming of the outliers
defines a homogeneous background population. The deleted outliers are then follow-up for their potential as sites of mineralization.
determining replacement values for censored distributions. geochemical processes. The techniques used in this approach
Because these data are compositional, they were normalized to a are described by Garrett et al. (1980), Chaffee (1983), Smith &
constant sum and then transformed using logratios. Perdrix (1983), Smith et al. (1987) and Garrett (1991). Garrett
Figure 15 shows a series of ranked Mahalanobis distance & Grunsky (2001) have reviewed objective comparisons of
plots versus the cubed root of a normal distribution for various weighting schemes used to highlight observations
different degrees of trimming. The first figure shows a plot of defined by pathfinder elements.
all of the observations. The plot displays a curved line with In many geochemical studies, several pathfinder elements
several outliers at the positive end of the curve, suggesting that may be identified for defining target areas (mineralization,
there are observations which are not part of a multivariate anthropogenic sources). These pathfinder elements may be
normal population. Each successive plot is the data with the chosen based on geological/geochemical knowledge of the
outliers from the previous plot removed. For each plot, a new processes of interest. Combining these pathfinder elements
centroid and corresponding Mahalanobis distances were together through a multivariate ranking scheme is a potentially
re-computed. Trimming of the data in the 7% to 10% range useful tool for defining multi-element anomalies. Defining the
yields a reasonably straight curve which suggests that the pathfinder elements can be based on geological knowledge or
trimmed observations could be considered atypical and warrant through the use of data analysis/discovery procedures dis-
further investigation. cussed previously, such as PCA and cluster analysis. These
The 10% of data that were trimmed data were then methods can reveal relationships in the data that may be directly
re-inserted into the data matrix from which the D 2 values were related to underlying lithologies or processes of interest (min-
computed based on the covariance from the other 905 of the eralization, anthropogenic effects) from which pathfinder ele-
data. The ranked multivariate distance values are plotted on the ments can be determined.
map and graph in Plate 20. Observations with high D 2 values Chaffee (1983) developed a method of scoring observations for
are locales of interest and warrant further investigation. Note anomaly potential. Each element is evaluated such that the range
that observations, which are atypical, are not necessarily geo- of values are subdivided into four groups, by thresholds, with
chemically ‘anomalous’. No multivariate equivalent of a corresponding scores that represent background (0), weakly
threshold was established, although the 10% trim could be used anomalous (1), moderately anomalous (2), and strongly anomalous
as an initial starting point in establishing the threshold. (3). These ranges are derived from orientation studies over areas
where the range of values and underlying geochemical distributions
are reasonably well understood. Each is then assessed with respect
The use of empirical indices to each element. Observations with the highest scores are consid-
The existence of pathfinder elements has prompted the use of ered anomalous and are targeted for further follow-up.
several numerical procedures through which selected elements Smith & Perdrix (1983), Smith et al. (1987) and Smith et al.
can be used in an exploration programme by creating minerali- (1989) made use of three indices derived from geochemical
zation potential indices based on the weighted sum scores of trends that were noted in the laterite geochemistry of the Yilgarn
the pathfinder elements. Empirical indices can be determined Block of Western Australia. A group of pathfinder elements, As,
from selected elements that are associated with specified Sb, Bi, Mo, Ag, Sn, and W, form the basis of these empirical
56 E. C. Grunsky
indices known as CHI-6*X, NUMCHI, and PEG-4. These methods for isolating the patterns associated with Cu enrich-
indices show elevated values of these pathfinder elements in ment in the area.
lateritic materials associated with greenstone belts, shear zones, Difficulties were encountered when the interpretation of
base metal and precious metal deposits (CHI-6*X and PEG-4). selected elements was attempted and the observed patterns
These indices are based on simple equations as follows. appeared to be discontinuous and erratic. However, the applica-
The coefficients provide weighting to the elements such tion of multivariate statistical methods identified two distinct
that observations with elevated chalcophile values have high geochemical associations: recent volcanic ash, and a saprolitic soil
CHI-6*X or PEG-4 indices. These coefficients were derived profile containing a mineralized zone of Cu associated with mafic
for lateritic materials only. The coefficients need to be volcanic rocks. Plate 3 shows the soil sampling grid from which
altered for other materials. The CHI-6*X index is suited 1665 samples were collected and analysed for Au, Cu, Pb, Zn, As,
more to isolating observations with elements associated with Sb, Ba, Ca, Cd, Co, Cr, Fe, Ga, K, La, Li, Mg, Mn, Nb, Ni, Sc, Sr,
precious metal deposits, whereas the PEG-4 index is suited Ti, V, Y, Zr, and Hg, using aqua regia digestion and ICP-ES.
for isolating observations with elements associated with The results of the application of a PCA applied to the
pegmatophile environments, such as Sn deposits within logcentred data in which two distinct sample populations
granitoid terrains. representing saprolite and ash and a trend of Cu enrichment
The NUMCHI index is a score of the number of elements associated with Cu mineralization are shown in Figure 16. The
that exceed the threshold for each element. Thus for a given bi-modal population, seen along the C1 axis of the biplot
specimen, if nine elements exceed their respective thresholds, represents material that is interpreted to be volcanic ash that
then the NUMCHI index will have a value of 9. As discussed overlies the saprolitic soils.
previously, threshold values are chosen from visual inspection Plate 21 shows a draped image of the interpolated scores of
of summary tables, order statistics, Q–Q plots etc. the first principal component draped over a 25 m DEM,
derived from the scores of population on the positive side of
Weighted sum index the C1 axis (Fig. 16). The elevation ranges from 1180–1350 m.
Note that the cyan-green-yellow-red areas represent the inter-
Garrett et al. (1980, p.144) suggested the use of a linear polated positive scores of the first principal component. These
combination of a group of indicator elements that give a areas have been interpreted to be volcanic ash occurring along
weighted sum. In a multi-element survey, those elements which hill tops and the eastern slopes of the hills. This interpretation
are considered pathfinders are given more weight than elements is supported by observations of the sampled media and reports
that may be more diagnostic of background. The choice of by geologists in Indonesia where this phenomenon is com-
weights may be based on the knowledge of the investigator. monly observed. The second component draped over the
Alternatively, principal component loadings may be used as a DEM (Plate 22) represents the Cu-enrichment trend and is
starting point. Examples of the use of this index are given by associated with mafic volcanic rocks trending northwesterly
Garrett et al. (1980) and Garrett & Grunsky (2001). along the western slopes and coincident with the regional
stratigraphy.
INTEGRATION OF MULTI-ELEMENT This example highlights the effective use of multivariate
GEOCHEMISTRY AND DIGITAL TOPOGRAPHY: statistical methods for distinguishing between different sample
AN EXAMPLE OF PROCESS IDENTIFICATION, media as well as the isolation of geochemical trends that define
INDONESIA zones of possible mineralization. The use of these types of
multivariate methods isolates relationships of the elements that
Modern methods of data management including the use of are difficult or impossible to see by examining individual
desktop database management systems (DBMS) combined elements. The application of multivariate techniques integrated
with GIS that can produce images of multiple datasets simul- with digital elevation models provides a more effective way of
taneously provide significant assistance in the management and visualizing and interpreting elemental data.
presentation of geochemical data. In many areas of the world,
digital base maps can be acquired from local governments that ANALYSING LARGE GEOCHEMICAL DATASETS:
typically include lakes, rivers, streams, road networks and other AN EXAMPLE FROM THE CAMPO MORADO
topographic information that is useful in the orientation and DISTRICT, MEXICO
interpretation of geochemical data. In addition, digital topogra-
phythat provides a topographic relief backdrop for the interpreta- The Campo Morado mining camp in the Guerrero state of Mexico
tion of geochemical data may also be available. Digital geological hosts seven precious metal-bearing volcanogenic massive sulphide
maps are now routinely provided by many geological surveys, deposits in the complexly folded and faulted Guerrero terrain
together with mineral occurrence inventory databases that have (Oliver et al. 1996; Rebagliati 1999). Approximately 29 221 samples
been accumulated from both geological survey and private were collected over a soil grid comprising 25 m sample intervals
company data. along lines and spaced 100 m apart. The field samples were
Digital topography offers a unique view of data in that it analysed for Al, Fe, Ca, K, Mg, Na, Ti, Au, Ag, As, Ba, Cd, Co, Cr,
provides a ‘real world view’ of the data over the terrain. When Cu, Hg, Mn, Mo, Ni, P, Pb, Sc, Sr, V, W and Zn using aqua regia
digital air photos or satellite imagery are integrated with digital digestion and ICP-ES. A DEM was created at 25 m resolution.
topography and viewed using image processing systems with PCA was carried out on the data and revealed several significant
three dimensional rendering ability, the viewer gets a sense of patterns related to lithological variation and mineralization.
looking at the terrain from an aircraft. Interpolated geochemical Because of the high topographic relief in the area, the problem of
images can generally be interpreted more effectively when transported material from weathering has the potential to result in
merged with digital topography and viewed in a similar manner. false anomalies that are often due to hydromorphic dispersion and
Grunsky & Smee (1999) demonstrated the usefulness of down-slope creep. When the results of the PCA are draped over
integrating digital elevation data with multi-element geochem- the topography, there is an increased ability to distinguish
istry from a soil survey on the island of Sumatra in Indonesia. anomalies associated with hydromorphic dispersion from those
Cheng et al. (2000) also demonstrated the use of fractal associated with a bedrock source.
Plate 16. Image of the first principal component derived from the log-centred lithogeochemical data, Ben Nevis Township, Ontario. This image
outlines the lithological variation.
Plate 17. Image of the second principal component derived from the log-centred lithogeochemical data, Ben Nevis Township, Ontario. This
image outlines the zones of carbonatization.
58 E. C. Grunsky
Plate 18. Image of the third principal component derived from the log-centred lithogeochemical data, Ben Nevis Township, Ontario. This image
outlines the sulphide and mineralized occurrences.
Plate 19. K-mean clustering of the log-centred lithogeochemical data, Ben Nevis Township, Ontario. Specific groups are associated with
distinctive lithologies and zones of alteration and mineralization.
The biplot of Figure 17 shows a dominant trend associated with horizons within the volcanic assemblage of the area. The ‘horse-
mineralization. This is due to the high density of sampling over shoe’ effect in Figure 17 is due to the correlation between the two
mineralized terrain that is closely associated with sedimentary trends; highly mineralized samples are depleted in Na and K and
Plate 20. Plot of D2 scores on the geological map. Sites highlighted in red indicate a significant departure from background and warrant further evaluation.
59
60 E. C. Grunsky
Plate 21. Interpolated scores of the first principal component draped

over a digital elevation model for the area. Also shown is a histogram
of the scores for the first principal component. The positive (right)
side of the histogram is coloured and the corresponding colours are
shown draped over the DEM. These areas are interpreted to be
recent volcanic ash that have accumulated on hill tops and the
windward-lee side of slopes.
Plate 23. Plot of the interpolated PC1 scores over the digital terrain
model in the Campo Morado area, Mexico. Areas highlighted in red
are elevated in Au, Cu, Ag, Pb and Zn values. The image is termed
as an ‘index of mineralization’.
Plate 22. Interpolated scores of the second principal component
draped over the digital elevation model. The image shows the Cu
enrichment trend is mostly exposed along the valley walls in areas
where the weathering is likely to be most active. positions and are mostly mudstones, argillites and sandstones.
These are the host rocks for several of the mineral deposits in
subsequently slightly more enriched (relatively) in elements associ- the Campo Morado area. The same image is shown in Plate 24
ated with intermediate to mafic volcanic rocks. where it is draped over the DEM of the area. The first principal
A planimetric image of the second principal component component highlights areas of relative enrichment of Ag, Zn,
over a shaded relief image of the DEM is given in Plate 23. Au, As, Pb, Hg, Sb and Cu. These areas, shown in red and
Felsic volcanic rocks (red and yellow) are distinguished from yellow, are potential sites of mineralization (Plate 23). This
mafic volcanic rocks (blue). Felsic rocks show relative enrich- image is a three-dimensional rendering over the DEM. Exami-
ment in K and Na, while the mafic rocks show relative nation of these areas in conjunction with the DEM assists in
enrichment in Fe, Co, Ti, Mg, Cr, Al, Sc, and V. The areas setting priorities for follow-up. Anomalies that lie along river-
highlighted in green represent lithologies of intermediate com- beds or show significant dispersion must be treated with
Plate 25. Biplot of the first two principal components derived from the
kimberlite lithogeochemical data. Each kimberlite phase is shown by a
different symbol and colour. The scores of the samples are shown as
symbols. The corresponding scores of the elements are plotted as the
element symbol.
distinct suites representing phases of the kimberlitic eruptions

and contamination with surrounding country rock. The empha-
sis on this evaluation will be on the discrimination between the
various eruptive phases and diamond-bearing versus non-
diamond-bearing phases. In this example, the geographic coor-
dinates have not been made available. As a result, a geospatial
analysis has not been carried out.
The following major element oxides and trace elements were
used in the evaluation: SiO2, TiO2, Al2O3, Fe2O3, MgO, CaO,
Na2O, K2O, P2O5, Rb, Nb, Zr, Th, V, Cr, Co, Ni, La, Er, Yb,
Y and Ga. Initially, the data were plotted as a large scatterplot
matrix to examine the distributions and associations amongst all
of the major element oxides and trace elements. Figure 18a and
b shows a scatterplot matrix for Yb, P, La, and Zr. These four
elements show a range of compositional variation that reflects
kimberlite fractionation. Figure 18a shows distinct differences
between the kimberlite phases with clearly defined linear
relationships that reflect the stoichiometry of the individual
mineral assemblages. Figure 18b shows the same elements after
applying a logcentre transform. The overall distinctiveness of
the individual kimberlite phases is readily apparent; however,
Plate 24. The index of mineralization is draped over the digital the logcentre transform has distorted the linear stoichiometric
terrain model and rendered in 2.5D. This enhances the interpretation relationships. Figure 18b provides the basis for calculating
of mineralization with respect to the terrain variation. statistical measures of association through the application of a
logcentred transform. Both figures are important and useful in
caution due to the effects of hydromorphic and downslope understanding the nature of the multivariate geochemical rela-
creep dispersion effects. tionships in kimberlitic rocks. Note that the phases of kimber-
lite show variable degrees of distinction. In both Figure 18a and
CLASSIFYING GEOCHEMICAL DATA: AN 18b, the early- (eJF) and mid-Joli Fou (mJF) phases are less
EXAMPLE USING KIMBERLITE GEOCHEMISTRY distinct than those of the late-Joli Fou (lJF), Cantuar and Pense
phases. Note, also, that the phases do not appear to be
A suite of kimberlites has been evaluated, from an area in homogeneous in their distribution, which can cause some
central Saskatchewan, Canada. A suite of 263 lithogeochemical difficulty in describing their differences within a statistical
samples selected from drill core was studied by Grunsky & framework.
Kjarsgaard (2008). This study followed an initial evaluation of RQ-mode PCA was applied to the centred logratio data and
the application of geochemical to characterize kimberlitic pro- the results are shown in Table 5. Nearly 90% of the data
cesses. (Kjarsgaard et al. 1997). On the basis of macroscopic variation is accounted for by the first seven components. These
core-logging observations, the data were partitioned into four components were subsequently used to find groups in the data
62 E. C. Grunsky
there is likely to be some confusion when classifying unknown

samples. However, the scenario, as presented here, is typical of
geochemical classification problems.
Following the initial evaluation of the data using PCA, the
data were subjected to a k-means cluster analysis using software
from Venables & Ripley (2002). It was reasoned that, from the
patterns exhibited in the first two principal components and
from observations based on core-logging studies, the five
phases of kimberlites could be discriminated using the group of
22 elements. The k-means cluster analysis for five groups was
carried out on the first seven principal components and the
results of the analysis are shown graphically in Plate 26. The
choice of five groups was based on petrographical evidence
combined with the observed clusters through several trial and
error procedures of using k-means clustering. A choice of five
groups split the data into groups that were coincident with the
five major kimberlite phases. Plate 26 shows that the early-Joli
Fou and Pense phases are clustered into Groups 3 and 5. On
the other hand, the mid- and late-Joli Fou groups are suffi-
ciently distinguished as separate groups (1 and 4). The Cantuar
phase (Group 2) shows a small amount of overlap with the
Pense phase.
Given the statistically distinct groups based on the k-means
Plate 26. K-means clusters for five groups from the kimberlite data based clustering, linear discriminant analysis was applied to the data
on 7 principal components. The observations are labeled according to suite using the phases of the kimberlite as classified based on
membership and coloured according to K-means group membership. core-logging observations. Linear discriminant analysis (lda)
was applied to the seven principal components derived from
the geochemical data using methodology described by Venables
& Ripley (2002). For discriminant analysis, a centred logratio
covariance results in a singular matrix because the inverse of the
covariance is required. For this reason a logratio transformation
was applied using Ga as the divisor. The results of the analysis
are shown graphically in Plate 27. The plot reveals reasonably
good discrimination between the suites of kimberlites.
Although there is overlap between the groups, the overall
discrimination is quite good.
Diamond content is directly related to specific suites in the
kimberlites. Elevated diamond content is associated with those
samples affiliated with more mafic material, most notably in the
rocks of suite B. The posterior probabilities generated by the
linear discriminant analysis show that the designated suites are
clearly separable on the basis of lithogeochemistry.
Table 6 provides measures of the types of errors that
occurred in the classification. The overall accuracy of 91.9%
is the percentage ratio of the total number of correctly
classified observations divided by the total number of obser-
vations for all eruptive phases. The terms used in the
measures of accuracy are described in Appendix 3. The error
of commission for each eruptive phase is the percentage of
Plate 27. Linear discriminant plot of the first discriminant functions. observations belonging to another eruptive phase, yet
Note the overlap between the kimberlite phases. See text for a more assigned the eruptive phase of interest. These errors are
detailed explanation.
5.3%, 6.7%, 16.0%, 17.9% and 9.5%, respectively, for the
eJF, mJF, lJF, Pense and Cantuar eruptive phases. This is also
and apply linear discriminant analysis. Grunsky & Kjarsgaard reflected by examining the rows of the confusion matrix
(2008) provide further details on the evaluation and interpreta- (Table 6) that show to which eruptive phases some of the
tion of the lithogeochemical data including the use of k-means observations have been assigned. The error of omission for
cluster analysis as a means to confirm the distinctive groupings each eruptive phase is the percentage of observations that
of the kimberlite phases. belong to a given eruptive phase but have been assigned to
Plate 25 shows a biplot of the first two principal compo- another eruptive phase. These errors are 3.9%, 17.8%, 4.0%,
nents with the observations coloured to represent the four 21.4% and 4.8%, respectively, for the eJF, mJF, lJF, Pense
phases of the kimberlite data based on core-logging criteria. and Cantuar eruptive phases. The columns of the confusion
These two components represent more than 65% of the total matrix (Table 6) indicate to which eruptive phase the
variation and formed the basis of a subsequent cluster analysis. observations have been assigned.
The overlap of the groups is shown clearly in this figure. For The user accuracy of Table 6 is a measure of percentage ratio,
the most part, the groups are distinct, but the early-Joli Fou, for each eruptive phase, of correctly classified observations
Cantuar and Pense phases are inhomogeneous. As a result, divided by the total number of observations that has been
Plate 28. Planimetric map of lithologies and sample location points in the central Noranda area, Quebec.
Plate 29. A map of posterior probability for a given sample being classed as a calc-alkaline rhyolite as described in the text.
64 E. C. Grunsky
Plate 30. Normative corundum values plotted on the geological map of the central Noranda area. High normative corundum indicates a relative
enrichment of Al over the alkali elements (Ca, Na, K) and is a likely indicator of alteration through alkali mobility.
Plate 31. Three dimensional visualization of normative corundum using data spheres of normative corundum and volcanic isosurfaces derived
from probability estimates of volcanic class designation.
Fig. 16. Biplot of the first two principal components of the soil
survey geochemical data from the island of Sumatra. Note the two
distinct populations that represent the saprolite and volcanic ash.
Fig. 17. Biplot of the first two principal components from the
geochemistry of the Campo Morado soil survey data. Note the
significant correlation of PC1 with PC2, which is the result of relative Fig. 18. Scatterplot matrix of elements associated with kimberlite
depletion of Na and K from the volcanic rocks and the mineralized magma fractionation. Plate 24a shows the distinctions between the
areas. kimberlite phases in different symbols and colours for the raw
untransformed data. Plate 24b shows the same data after the
application of a logcentre transform. See the text for a more detailed
assigned to each eruptive phase. The producer accuracy of explanation.
Table 6 is a measure, for each eruptive phase, of the number
of observations correctly classified divided by the number of
observations that actually belong to the eruptive phase. Both APPLICATION OF LITHOGEOCHEMISTRY IN A
measures of accuracy show high values for the eJF, mJF, and 3D ENVIRONMENT, NORANDA CAMP, QUEBEC
lJF phases and a lower accuracy for the Pense and Cantuar
eruptive phases. This lower accuracy is a reflection of the Recent studies of a large lithogeochemical database from
higher degree of dispersion of the Pense and Cantuar Ontario and Quebec, Canada, have highlighted the usefulness
observations and subsequent overlap with other eruptive of using three-dimensional imaging from a set of diverse
phases. geological data for the purpose of geological modelling and
The use of principal component analysis as a mechanism for mineral exploration projects. Data from various sources in
classifying the data is based on the ability to recognize distinc- Ontario have been assembled into an Open File Report (Hillary
tive geochemical processes, as outlined previously. The appli- et al. 2008) that contains databases that can be used for
cation of the discriminant analysis applied to the first seven subsequent evaluation by mineral exploration companies and
components confirms that these linear combinations of data detailed mapping in geological surveys.
describe variation associated with specific processes and that A group of 17 164 lithogeochemical samples were processed
the classification accuracies are acceptable. using the R statistical package. These data were derived from
66 E. C. Grunsky
Table 5. RQ-Mode principal components analysis of the 5 phase kimberlite data.
Eigenvalues PC1 PC2 PC3 PC4 PC5 PC6 PC7

7.54 7 2.34 0.85 0.84 0.6 0.55
% 34.41 31.94 10.66 3.89 3.82 2.74 2.5
% 34.41 66.36 77.01 80.9 84.73 87.47 89.97
R-Scores Values <0 in italics

Si 0.95 0.11 0.14 0.09 0.06 0.04 0.08
Ti 0.1 0.8 0.02 0.16 0.31 0 0.31
Al 0.28 0.61 0.61 0.13 0.07 0.03 0.08
Fe 0.89 0.34 0 0.14 0 0.01 0.1
Mg 0.88 0.33 0.19 0.09 0.06 0.05 0.11
Ca 0.23 0.26 0.54 0.35 0.57 0.32 0.01
Na 0.27 0.73 0.09 0.08 0.37 0.38 0.01
K 0.25 0.88 0.18 0.08 0.06 0.28 0.07
P 0.41 0.67 0.25 0.2 0.2 0.15 0.24
Rb 0.29 0.84 0.19 0.07 0.03 0.32 0.12
Nb 0.4 0.87 0.09 0.08 0.15 0.04 0.02
Zr 0.62 0.61 0.24 0.07 0.12 0.05 0.06
Th 0.57 0.69 0.01 0.03 0.2 0.02 0.19
V 0.28 0.7 0.09 0.16 0.24 0.22 0.4
Cr 0.76 0.4 0.15 0.12 0.04 0.04 0.1
Co 0.89 0.34 0.07 0.13 0.02 0.02 0.02
Ni 0.93 0.22 0.07 0.16 0.03 0.05 0.05
La 0.55 0.78 0.11 0.03 0.08 0.07 0.1
Er 0.56 0.17 0.57 0.18 0.2 0.06 0.05
Yb 0.42 0.01 0.64 0.37 0.19 0.25 0.26
Y 0.72 0.37 0.4 0.12 0.06 0.01 0.07
Ga 0.14 0.2 0.69 0.56 0.06 0.13 0.2
government surveys and mineral industry drill-hole data in both map and can help define lithologies in areas where the surface
surface geographic coordinates and three-dimensional geographic or subsurface geology is not known.
coordinates. The data were compiled and organized as follows: Many methods exist for assessing alteration of volcanic rocks.
An initial measure of alkali alteration and migration can be
1. Use samples with the following minimum information: demonstrated through the calculation of normative mineral
SiO2, Al2O3, FeO, MgO, CaO, Na2O, K2O, P2O5, MnO, procedures. The use of normative mineral procedures is well
TiO2 and LOI (loss on ignition). established (Yegorov et al. 1988; de Caritat et al. 1994; Cohen &
2. Cation equivalent values were computed for each sample. Ward 1991; Merodio et al. 1992; Rosen et al. 2000; Piche & Jebrak
3. Normative minerals were computed using a standard Barth- 2004). When corundum occurs in the calculated norm (Plate 30),
Niggli normative classification scheme. it generally signifies the mobility of Na, K and Ca, which can be
4. The samples were classed according to the two volcanic associated with alteration signatures associated with base- and
classification schemes of Irvine & Baragar (1971) and Jensen
precious-metal mineralization.
(1975).
Plate 31 shows normative corundum (diagnostic of alkali
5. The samples were logcentre transformed and then classified
alteration) plotted in GoCad by Eric de Kemp (Geological
using a linear discriminant analysis based on reference
Survey of Canada, pers. comm.) indicating an association with
groups defined by Grunsky et al. (1992).
known mineral deposits in the central Noranda camp area.
Plate 28 shows a planimetric map of the samples projected to The map is a down-plunge multi-parameter 3D model of
the surface. Drill-hole sample data are projected onto the northern Central Noranda mining camp, Quebec, Canada,
surface resulting in a denser pattern of points that is not actually combining ore bodies, regional geometry, structural observa-
present on the surface. tions, a lithological simulation and a geochemical classifica-
Grunsky et al. (1992) developed a set of reference groups tion. Volcanogenic Massive Sulphide (VMS) deposits are
representing typical volcanic compositions using the classifica- depicted as orange irregular surfaces with the Horne mine in
tion of scheme of Jensen (1975). Each composition from the the foreground (lower left inset). The deformed stratigra-
central Noranda area was classified using a linear discriminan- phic grid (in green) represents the mean of realizations for
tanalysis as documented in Venebles & Ripley (2002). Posterior felsic volcanic lithologies with bright orange (90%) and blue
probabilities of rock type membership were derived for each (< 10%) probabilities. An exhalite stratigraphic unit
sample from which maps can be created that depict the (C-Horizon) is shown as a white surface (inset upper left)
likelihood of rock type based solely on the lithogeochemistry. A contoured at 1 km depth intervals with outcrop dip measure-
logcentred transform was applied to the data and reference ments depicted as blue-red tablets with a Wulff net plot of 42
groups prior to the classification. For the classification, LOI structural observations. Variably sized spheres (green–red)
was used as the divisor for the logratio transform. Plate 29 represent normative corundum values > 5%. Geochemically,
shows a map of the likelihood of a sample being classified as a highly altered zones are represented by the largest red
rhyolite. The application of this type of scoring is that it spheres. An east–west horizontal white cylindrical scale bar is
provides a classification that is independent of the geological shown at the ground elevation.
Table 6. Measures of confusion, accuracy and error based on the PC.
Overall Accuracy (%) 91.9
Confusion (numbers) eJF mJF lJF Pense Cantuar Total

eJF 146 4 0 4 0 154
mJF 2 37 1 0 0 40
lJF 0 4 24 0 0 28
Pense 4 0 0 22 1 27
Cantuar 0 0 0 2 20 22
Total 152 45 25 28 21
Confusion (%) eJF mJF lJF Pense Cantuar Total (%)

eJF 94.8 2.6 0.0 2.6 0.0 100.0
mJF 5.0 92.5 2.5 0.0 0.0 100.0
lJF 0.0 14.3 85.7 0.0 0.0 100.0
Pense 14.8 0.0 0.0 81.5 3.7 100.0
Cantuar 0.0 0.0 0.0 9.1 90.9 100.0
Total (%) 114.6 109.4 88.2 93.2 94.6
Error/Accuracy eJF mJF lJF Pense Cantuar

Errors of Commission (%) 5.3 6.7 16.0 17.9 9.5
Errors of Ommission (%) 3.9 17.8 4.0 21.4 4.8
User Accuracy (%) 96.1 82.2 96.0 78.6 95.2
Producer Accuracy (%) 94.8 92.5 85.7 81.5 90.9
A STRATEGY FOR GEOCHEMICAL DATA mation. The choice of transform parameters can be chosen
ANALYSIS visually (Q–Q plots, box plots, histograms) or by semi-
automatic means.
Every set of geochemical data and area requires a unique + Examine scatter plots and Q–Q plots for the presence of
approach in the application of methods to analyse and assess the multiple populations.
data. The evaluation of geochemical data is an iterative and + If assembling datasets from diverse sources, examine the
adaptive process. The methods of data analysis and visualization requirement for levelling.
in both the geochemical and geographic spaces change through-
out the procedure of discovery of geological/geochemical
processes. Below is a list of suggested ways to evaluate data that Exploratory multivariate data analysis
should be considered in any investigation. Of course, not all The following is a summary of exploratory multivariate techniques.
steps are necessary or appropriate, but should serve as a
guideline for a thorough investigation of geochemical data. + Create a scatter plot matrix of the raw data and transformed
(logcentred ratios, isometric logratios) data. Look for trends/
Preliminary data analysis associations.
+ Use robust estimates to compute means and covariances to
+ Know your data! There is no substitute for spending time by enhance the detection of outliers.
evaluating the data using a wide variety of procedures so that + Apply dimension-reducing techniques, such as PCA, to
associations and structures in the data can be identified. identify patterns and trends in the data. Other methods such
+ Examine each element with histograms, box plots, Q–Q as non-linear mapping, multi-dimensional scaling and self-
plots, scatter plot matrix and summary tables. organizing maps may help discover structure in the data.
+ Use bubble or symbol maps to show the range and spatial + Use geographic maps of the component scores to assist in
variability of the elements of interest. identifying spatially-based geochemical processes.
+ Interpolated images can be used where appropriate. + Apply methods such as cluster analysis to isolate groups of
+ Trim the distribution of each element of gross outliers. observations with similar characteristics and atypical obser-
+ Investigate outliers for each element (analytical error or vations. Specific groups of interest can often be isolated
atypical value?). using these methods. Maps of the locations of the groups
+ Adjust data for censored values if required. can help to examine the spatial continuity of the groups.
+ Consider the application of logratio transformations (logcen- + Use robust Mahalanobis distance plots (D2 ) applied to
tred, isometric logratio) so that compositional data can be transformed data to assist in isolating outliers based on a
evaluated without the effect of ‘closure’. This is necessary if selected number of elements of interest. Maps of large
measures of association are required (correlation, covariance). distances (>95th percentile) can assist in identifying obser-
+ Apply measures of association using standard, as well as, vations or groups of observations of interest.
robust procedures. Examine the differences and scrutinize + Calculate specifically tailored empirical indices in areas
the outliers. where multi-element associations are well understood. The
+ Test the data to see if the identification of patterns and indices are based on a linear combination of pathfinder
outliers is improved by the use of transformations. Apply elements with coefficients that are selected for each area and
Box-Cox power transformations using observations below commodity being sought. Observations with high indices
the 95th–98th percentile to determine the optimal transfor- can be investigated for mineralization potential.
68 E. C. Grunsky
+ Visualize the results! Use GIS for visualizing data analysis/ image analysis systems offer limited analytical and developmen-
statistical results. Use the visualization features in programs, tal capability. Increased integration of multivariate methods
such as R, for a better understanding of the data. together with spatial analysis will provide a comprehensive
approach to assessing all spatially reference multivariate data.
Modelled multivariate data analysis Multivariate geostatistics, which incorporates both the spatial
and inter-element relationships, has been studied by only a
+ Where target and background groups have been established, few. Grunsky & Agterberg (1988, 1992), Grunsky (1990) and
use procedures such as linear discriminant analysis (and Wackernagel & Butenuth (1989) discuss two approaches to
variants) for testing the ability to classify sample groups of multivariate geostatistics. Bailey & Krzanowski (2000),
interest and to determine which elements provide the best Christensen & Amemiya (2003) and Krzanowski & Bailey
discriminating power. (2007) discuss approaches to ‘spatial factor’ methods. Such
methods will permit the simultaneous evaluation of geo-
CONCLUDING COMMENTS AND FUTURE chemical processes within the geochemical and geospatial
DIRECTIONS domain. The long-term benefit of this will be to identify
geochemical processes as a function of spatial scale (sam-
Garrett (1989a) stated that the power of computers and pling density) and will permit further discrimination between
capability of software would continue to grow along with a geochemical background and mineralization.
corresponding decrease in price. Almost 20 years later, that There are many data analysis and statistical methods avail-
prediction still holds. Computers are not only more powerful, able to assess geochemical data. This manuscript has reviewed
but they are more portable, which permits the most sophisti- and demonstrated the application of some of the more popular
cated processing even in the most remote parts of the planet. methods. Geochemists are encouraged to investigate the devel-
Developments in software, in terms of the amount of data oping world of data analysis and statistical methods through
capacity, developments in three-dimensional visualization and projects such as R (www.r-project.org).
statistical methods have made enormous contributions to the
The author wishes to acknowledge helpful discussions with col-
way that exploration geochemists can evaluate and integrate all leagues at CSIRO, Australia, and the Geological Survey of Canada
types of geoscience data. The rapid expansion of the internet and in the mineral exploration industry. Most notably, this includes
has allowed new statistical communities to grow, such as the R Frits Agterberg, Norm Campbell, Graeme Bonham-Carter, Bob
project (www.r-project.org) in which thousands of statisticians Garrett, Bruce Kjarsgaard, Harri Kiiveri, Barry Smee, Ray Smith and
and users throughout the world develop and contribute to an Jeremy Wallace. An earlier vision of this manuscript has also
open source statistical software environment. Recent develop- benefited from reviews by Robert Jackson, David Lawie and Graham
Closs. The author gratefully acknowledges the contribution of Eric
ments in freely available software (Grunsky 2002b) will make it de Kemp for providing the 3D imagery of the processed geochemi-
easier to integrate geochemical data with geospatial data. In the cal data from the Noranda area of Quebec. The author wishes to
R community, new statistical developments can be available to acknowledge thanks to the following for permission to use their data:
users within weeks and to anyone who has internet access. Ontario Geological Survey and the Ontario Ministry of Natural
There is no doubt that this type of cooperative approach to the Resources for the provision of the digital elevation data for the Ben
sharing of knowledge will increase the ability of geoscientists to Nevis area of Ontario; Farallon Mining Ltd and Mark Rebagliati of
Hunter Dickinson Inc., Vancouver, are also gratefully acknowledged
extract as much information from their data as possible. for their full cooperation and permission to present the results of the
Another factor that has contributed to significant advance- Campo Morado geochemical study. Shore Gold Inc. is also thanked
ments in evaluating regional geochemical data is the ubiquitous for permission to present the results of the kimberlite geochemical
development of internet resources for geochemical data avail- data from the Fort à la Corne area, Saskatchewan. This is Geological
ability. In addition, internet resources have contributed signifi- Survey of Canada contribution number: 20090302.
cantly to information on how to evaluate geochemical data. The
internet itself is one of the first places one starts to ‘mine’ for data.
Discussions on the application of transformations of geo- APPENDIX 1
chemical data have traditionally been based on raw analytical
values and the potential problems associated with closure have Logratios and compositional data
not been taken into account. Further research is required in this Compositional data should be adjusted by the use of logratios.
field. There is ongoing research at the University of Girona, A compositional vector x defined by D component variables
Spain, where the issues of evaluating compositional data are (elements). By definition, this vector will sum to a constant
being addressed. Emphasis is being placed on research and the (100%) and as a result, the composition can be described by
development of tools for the user. D1 of the variables. A composition x can be transformed by
Surprisingly, the scientific literature on levelling geochemical
data is sparse. Levelling is routinely carried out in geophysical yi = logsxi ⁄ xDd si = 1, . . . , D ⳮ 1d
and geochemical programmes; however, a formal review of
procedures has not yet been published. A full review of There is no loss of information by choosing one of the
levelling methods applied to geochemical survey data is due. variables as a divisor. This transformation is known as the
Integrating spatially referenced data together with multivari- ‘additive logratio’ (alr). The resulting logratio coordinates can-
ate observations is an area that is undergoing many interesting not be projected onto orthogonal axes because the axes are at
developments. The use of fractals has been shown to highlight 60 (Pawlowsky-Glahn & Egozcue 2006) and create difficulties
different spatial patterns that are attached to multivariate when comparing compositions using different denominators.
patterns and trends (e.g. Cheng & Agterberg 1994). Similarly In particular, measures of distances between alr-transformed
the integration of multivariate statistics with geostatistical observations are not equal when using different denominators
analysis is developing and will lead to new methods for and the angles between vectors cannot be computed using a
extracting spatially-dependent multivariate patterns and trends. standard Euclidean inner product.
Current implementations of statistics with GIS are not fully An alternative way of transforming a compositional vector is
integrated and spatial statistics that are employed by GIS or by applying the logcentered ratio, namely:
zi = logsxi ⁄ gsxDdd si = 1, . . . , Dd , Wij = s1 ⁄ n1⁄2dsxij ⳮ x̄ jd where x̄ j = 1 ⁄ no xij si = 1, nd

where g(xD ) is the geometric mean of the composition. which yields a variance covariance matrix from the minor
The logcentered ratio (clr) is useful because it preserves product matrix W#W. W can also be standardized by:
all of the variables in the composition. However, the Wij = ss jn1⁄2dⳮ1sxij ⳮ x̄ jd where Sj = fs1 ⁄ ndo sxij ⳮ x̄ jd2g1⁄2
inverse of the covariance matrix for this transform is
singular, which requires a special generalized inverse si = 1, nd
procedure for computation. which results in a correlation matrix from the minor product
An important aspect of assessing compositions is the matrix W#W.
calculation of an adequate measure of variability. This is done The advantage of plotting both the scores of the variables
by the creation of a variation matrix, T defined by: and objects on the same diagram is that the relationships
between the two can be more clearly observed. Samples with
ij = varhlogsxi ⁄ x jdj si = 1, . . . , d; j = i + 1, . . . , Dd relative abundance of one variable over another will plot near
the location of the score for that variable. Grunsky (2001) has
and the mean, E, is expressed as:
written program code for this method of PCA for both the
ij = Ehlogsxi ⁄ x jdj si = 1, . . . , d; j = i + 1, . . . , Dd S-Plus and R computing environments.
The relative contribution is the contribution that a variable
The variability matrix T summarizes the contribution that makes over all of the components. It is defined as follows. For
any pair of variables makes in a sub-compositional analysis. For m variables (i=1,. . ..,m), p components (j=1,. . .,p), (p c m) and
example, consider a major element oxide composition consist- the R-mode loadings given by AR, the relative contribution rcij
ing of SiO2, Al2O3, MgO, FeO, CaO, Na2O, K2O, TiO2 and for a variable j is:
MnO. A sub-composition may be interested in examining the p
relationships of MgO, FeO and Na2O. The amount of compo-
sitional variability that these elements will account for can be
rcij = 100 * (ARij ⁄ o
j=1
ARij )
expressed by the sum of (MgO,FeO, MgO,Na2O, FeO,Na2O ). This is The actual contribution is the contribution that a variable
an important concept in understanding the significance of makes within a given component. Similarly, the actual contri-
sub-compositional data which will never fully explain the bution is defined as follows. For m variables, p components
overall variation of the data. (p c m) and the R-mode loadings given by AR, the actual
More recent developments by Egozcue et al. (2003) have contribution acij for a variable j is:
identified the isometric logratio (ilr), which is a transformation m
that defines compositional vectors in an orthonormal basis. A
very simple explanation of this transformation is described in
acij = 100 * (ARij ⁄ o
i=1
ARij )
Pawlowsky-Glahn & Egozcue (2006). The application of the ilr The following simple example illustrates the method of PCA.
transform requires the construction of ‘balances’, which are
ratios of selected variables into groups (i.e. elements associated
with a fractionation process versus elements associated with
alteration). These balances are used to construct new variables
that exist in an orthonormal base from which standard Eucli-
dean measures can be calculated (mean, variance, etc.).
APPENDIX 2
The method of RQ-mode principal component analysis

Given a data matrix of m variables and n observations, a data
matrix X can be scaled (i.e. correlation or covariance) to
produce a m n matrix W where
W = V1⁄2U ⬘ where = diagonal matrix of eigvenalues

V = eigenvector matrix of n m WW#
U = eigenvector matrix of m m W#W
By use of the Eckhart-Young theorem (Reyment & Jöreskog
1993), W can be re-written as
W = F RARsR-mode solutiond where F R = V and AR = 1⁄2U ⬘

or
W = AQF QsQ-mode solutiond where AQ = V1⁄2 and F Q = U ⬘

F R and F Q represent the factor loadings for both the R and Q
mode solutions, and AR and AQ represent the coordinates of
the variables and objects (the scores) in the same factor space
and can be plotted on the same figures. W is scaled to permit
the projection of both F R and F Q in the same coordinate space.
W can be standardized by: RQ PCA Example.
70 E. C. Grunsky
The lithogeochemical data from the Ben Nevis township B, T.C. & K, W.J. 2000. Extensions to spatial factor methods
area in Ontario represents a suite of metavolcanics comprised with an illustration in geochemistry. Mathematical Geology, 32, 657–682.
of calc-alkalic basalt, anadesite, dacite and rhyolite. The B, C., P, V. & G, E. 1995. Classification problems
of samples of finite mixtures of compositions. Mathematical Geology, 27,
sequence has also been intruded by tholeiitic mafic sills and 129–148.
granodiorite stocks. A plot of Cr v. Ni clearly shows the three B, C., P, V. & G, E. 1996. Some aspects of
main groups of the data. transformations of compositional data and the identification of outliers. In:
The figure on the previous page top shows the linear O, R.A. (ed.) Geostatistics. Mathematical Geology, 28, 501–518.
relationship of Cr and Ni that is related to the mineralogy of the B-V, C., P-G, V. & G, E.C. 1997. A
volcanic rocks. Rocks rich in minerals containing Cr–Ni (i.e. critical approach to the Jensen diagram for the classification of a volcanic
sequence. In: P-G, V. (ed.) Proceedings of IAMG ’97. Third
pyroxenes) are enriched in Cr and Ni whereas rocks that are annual conference of the International Association for Mathematical Geology, 117–122.
poor in Cr–Ni bearing minerals (i.e. rhyolites, granites) are B, L. 1997. The critical importance of monitoring chemical analyses in
depleted in Cr and Ni. The results of the principal components frontier exploration. In: G, A.G. (ed.) Proceedings of Exploration ’97.
reflect this same relationship. The three groups of data are still Fourth decennial International Conference on Mineral Exploration, 295–300.
evident in the scatter plot of PC1 v. PC2. The loadings of Cr B, Y. & X, X. 1985. Fuzzy cluster analysis in geochemical
and Ni reveal the following information. Observations that plot exploration. Journal of Geochemical Exploration, 23, 281–292.
on the positive side of the PC1 axis closer to the loadings of Cr Bø, B. & G, C.F. 1979. Focus on the use of soils for
geochemical exploration in glaciated terrane. In: H, P.J. (ed.) Geophysics
and Ni are enriched in those elements and observations that and Geochemistry in the Search for Metallic Ores. Proceedings of Exploration ’77 – an
plot on the negative side of the PC1 axis are depleted in Cr and international symposium, Ottawa, Canada, October 1977. Geological Survey of
Ni. In addition, observations that plot on the positive side of Canada Economic Geology Report, 31, 295–326.
the PC2 axis are relatively enriched in Cr whilst those observa- B-C, G.F. 1989a. Integrating global databases with a raster-
tions that plot on the negative side of the PC2 axis are relatively based geographic information system. In:  D, J.N. & D, J.C.
enriched in Ni and relatively depleted in Cr. Using the method (eds) Digital Geologic and Geographic Information Systems. American Geophysi-
of PCA, patterns in the resulting plots can assist in producing cal Union Short Course in Geology, 10, 1–13.
B-C, G.F. 1989b. Comparison of image analysis and Geographic
meaningful interpretations of the data. Information Systems for integrating geoscientific maps. In: A,
PCA also reveals information about significance of each F.P. & B-C, G.F. (eds) Statistical Applications in the Earth
component. The first component (PC1) accounts for more Sciences. Geological Survey of Canada Paper 89-9,141–155.
than 92% of the variation in the data and the second compo- B-C, G.F. 1994. Geographic Information Systems for Geoscientists,
nent (PC2) accounts for c. 8% of the variation. Thus the first Modelling with GIS. Computer Methods in the Geosciences, 13. Pergammon
component is interpreted as the most significant and reflects Press, New York.
B-C, G.F. 1997. GIS methods for integrating exploration data
the dominant geochemical process. The second component
sets. In: G, A.G. (ed.) Proceedings of Exploration ’97. Fourth decennial
reflects a subtle feature that might be related to Cr–Ni variation International Conference on Mineral Exploration, 59–64.
in the more mafic observations. B, R.W. 1979. Geochemistry overview. In: H, P.J. (ed.) Geophysics and
Geochemistry in the Search for Metallic Ores. Proceedings of Exploration ’77 – an
international symposium, Ottawa, Canada, October 1977. Geological Survey of
APPENDIX 3 Canada Economic Geology Report, 31, 25–31.
B, G.E.P. & C, D.R. 1964. An analysis of transformations. Journal of the
Measures of accuracy Royal Statistical Society, Series B, 26, 211–252.
B, P.M.D. & T, I. 1979. The application of soil sampling to
+ Confusion matrix: a cross-referenced matrix of classified geochemical exploration in nonglaciated regions of the world. In: H,
samples for each class. Ideally, there should be zeros in every P.J. (ed.) Geophysics and Geochemistry in the Search for Metallic Ores. Proceedings of
element of the matrix except along the diagonal. Each column Exploration ’77 – an international symposium, Ottawa, Canada, October 1977.
represents a training class and the values in the column Geological Survey of Canada Economic Geology Report 31, 327–338.
correspond to the classification results applied to that particular B, N.J. & MC, R.B. 1980. Discrim. A computer program using
training class. The values can be expressed in the actual number an interactive approach to dissect a mixture of normal or lognormal
distributions. Computers & Geosciences, 6, 361–396.
of samples, or as a percentage.
B, R.R. 1979. Advances in botanical methods of prospecting for
+ Commission: Errors of commission represent samples that minerals. Part1 – Advances in biogeochemical methods of prospecting. In:
have been incorrectly classified as belonging to the class of H, P.J. (ed.) Geophysics and Geochemistry in the Search for Metallic Ores.
interest. Proceedings of Exploration ’77 – an international symposium, Ottawa, Canada,
+ Omission: Errors of omission represent samples that belong October 1977. Geological Survey of Canada Economic Geology Report
to a class of interest but have been classified incorrectly. 31, 397–410.
+ Producer accuracy:a measure of correctly classified samples B, A., M-F, G. & P-G, V. (eds) 2006.
Compositional Data Analysis in the Geosciences: From Theory to Practice. Geologi-
divided by the total number of samples used in the classification cal Society, London, Special Publications, 264.
for a specific class of interest. B, C.R.M., 1989, Geomorphology and climatic history – keys to under-
+ User accuracy: a measure of correctly classified samples standing geochemical dispersion in deeply weathered terrains, exemplified
divided by the total number of samples classified to the specific by gold. In: G, G.D. (ed.) Proceedings of Exploration ’87. Third decennial
class of interest. International Conference on Geophysical and Geochemical Exploration for Minerals
and Groundwater, Ontario Geological Survey, Toronto. Special Volume 3,
323–334.
REFERENCES C, A.N. 1989. Putting expert system technology to work. In:
G, G.D. (ed.) Proceedings of Exploration ’87. Third decennial International
A, J. 1986. The Statistical Analysis of Compositional Data. Methuen Inc. Conference on Geophysical and Geochemical Exploration for Minerals and Ground-
A, J. 1990. Relative variation diagrams for describing patterns of water, Ontario Geological Survey, Toronto. Special Volume 3, 825.
compositional variability. Mathematical Geology, 22, 487–511. C, N.A. 1980, Robust procedures in multivariate analysis. I Robust
A, J. 1997. The one-hour course in compositional data analysis or covariance estimation. Applied Statistics, 29, 231–237.
compositional data analysis is simple. In: P-G, V. (ed.) C, N.A. 1986. A General Introduction to a Suite of Multivariate Programs.
Proceedings of IAMG ’97. Third annual conference of the International Association for CSIRO Division of Mathematics and Statistics, unpaginated unpublished
Mathematical Geology, 3–35. report.
A, J.W. 1987. Workshop 5. Geochemical anomaly recognition. Journal of C, H. 1979. Advances in botanical methods of prospecting for
Geochemical Exploration, 29, 375–376. minerals. Part1 – Advances in geobotanical methods. In: H, P.J. (ed.)
Geophysics and Geochemistry in the Search for Metallic Ores. Proceedings of Newfoundland and Labrador. In: G, A.G. (ed.) Proceedings of Exploration
Exploration ’77 – an international symposium, Ottawa, Canada, October 1977. ’97: Fourth decennial International Conference on Mineral Exploration, 161–164.
Geological Survey of Canada Economic Geology Report 31, 385–396. D, M. 1977. Geostatistical Ore Reserve Estimation. Elsevier Scientific Pub-
C, J.R. 1994. Numerical Analysis for the Geological Sciences, Prentice Hall. lishing Company, New York.
C, M.A. 1983. Scoresum–a technique for displaying and evaluating D, M. 1988. Handbook of Applied Advanced Geostatistical Ore Reserve
multi-element geochemical information, with examples of its use in Estimation. Elsevier.
Regional Mineral Assessment Programs. Journal of Geochemical Exploration, D, J.C. 2002. Statistics and Data Analysis in Geology. 3rd edn. John Wiley &
19, 361–381. Sons Inc.
C, Q. 2006. GIS-based multifractal anomaly analysis for prediction of  C, P., B, J. & H, I. 1994. LPNORM: A linear
mineralization and mineral deposits. In: H, J. (ed.) GIS for the Earth programming normative analysis code. Computers and Geosciences, 20, 313–
Sciences. Geological Association of Canada, Special Publication, 44, 285– 347.
297.  K, E.A. & D, D.W. 1997 3-D visualization of structural field
C, Q. & A, F.P. 1994. The separation of geochemical data and regional sub-surface modelling for mineral exploration. In:
anomalies from background by fractal methods. Journal of Geochemical G, A.G. (ed.) Proceedings of Exploration ’97: Fourth decennial International
Exploration, 51, 109–130. Conference on Mineral Exploration, 157–160.
C, Q., X, Y. & G, E.C. 2000. Integrated spatial and spectrum D, A.P., L, N.M. & R, D.B. 1977. Maximum likelihood
analysis for geochemical anomaly separation. Natural Resources Research, 9, from incomplete data via the EM algorithm. Journal of the Royal Statistical
43–51. Society, Series B, 39, 1–38.
C, C.Y. 1990. Unmasking multivariate anomalous observations in D, C.V. & J, A.G. 1997. GSLIB: Geostatistical Software Library
exploration geochemical data from sheeted-vein tin mineralization near and Users Guide. 2nd edn. Oxford University Press, New York.
Emmaville, N.S.W. Journal of Geochemical Exploration, 37, 205–223. D, E. 1973. The dynamic clusters method in non-hierarchical clustering.
C, W.F. & A, Y. 2003. Modeling and prediction for International Journal of Computer Informatics, 2, 61–88.
multivariate spatial factor analysis. Journal of Statistical Planning and Inference, D, B.L. & G, A.M. 2007. An evaluation of methods for imputa-
115, 543–564. tion of missing trace element data in groundwaters. Geochemistry: Explora-
C, C.F. 1985. Statistical treatment of geochemical data with observation, Environment, Analysis, 7, 173–178.
tions below the detection limit. Current Research, Part B, Geological Survey of D, C.E. 1989. Developments in Biogeochmical Exploration. In:
Canada Paper, 85-1B, 141–150. G, G.D. (ed.) Proceedings of Exploration ’87: Third decennial International
C, C.F. 1988. Statistical analysis of truncated data in geosciences. Sciences Conference on Geophysical and Geochemical Exploration for Minerals and Ground-
de la Terre, Series Inf., Nancy, 27, 157–180. water, Ontario Geological Survey, Toronto. Special Volume 3, 417–438.
C, C.F. 1989. FORTRAN 77 program for constructing and plotting E, J.J., P-G, V., M-F, G. &
confidence bands for the distribution and quantile functions for truncated B́-V, C. 2003. Isometric logratio transformations for compo-
data. Computers & Geosciences, 15, 625–643. sitional data analysis. Mathematical Geology, 35, 279–300.
C, W.S. 1993. Visualizing Data. Hobart Press. E, B. 1980. Cluster Analysis. 2nd edn. Heinemann, London.
C, L.G. 1997. Exploration geochemistry: expanding contributions to F, P. & H, K. 2008. Outlier detection for compositional data
mineral exploration. In: G, A.G. (ed.) Proceedings of Exploration ’97: using robust methods. Mathematical Geosciences, 40, 233–248.
Fourth decennial International Conference on Mineral Exploration, 3–8. F, P., G, R.G. & R, C. 2005. Multivariate outlier
C, D. & W, C.R. 1991. SEDNORM–a program to calculate a detection in exploration geochemistry. Computers & Geosciences, 31, 579–
normative mineralogy for sedimentary rocks based on chemical analyses. 587.
Computers & Geosciences, 17, 1235–1253. F, W.K. 1997. Stream Sediment Geochemistry in Today’s Explora-
C, W.B. & D, R.N.W. 1989. Geochemical exploration in glaciated tion World. In: G, A.G. (ed.) Proceedings of Exploration ’97: Fourth
terrain: geochemical responses. In: G, G.D. (ed.) Proceedings of decennial International Conference on Mineral Exploration, 249–260.
Exploration ’87: Third decennial International Conference on Geophysical and F, J.A.C. 1992. Landscape geochemistry: retrospect and prospect–
Geochemical Exploration for Minerals and Groundwater, Ontario Geological 1990. Applied Geochemistry, 7, 1–53.
Survey, Toronto. Special Volume 3, 336–383. F, J.A.C. & V, E.A. 1989. Geochemical Survey of the Trout Lake Area.
C, W.B., H, E.H.W. & C, E.H. 1979. Lake sediment Ontario Geological Survey, Toronto, Map 80803.
geochemistry. In: H, P.J. (ed.) Geophysics and Geochemistry in the Search for F, J.A.C. & V, E.A. 1990 Geochemical Survey, Hanes Lake Area.
Metallic Ores. Proceedings of Exploration ’77 – an international symposium, Ottawa, Ontario Geological Survey, Toronto, Map 80806.
Canada, October 1977. Geological Survey of Canada Economic Geology F, J.A.C. & V, E.A. 1991a. Geochemical Survey, Montreal River Area.
Report 31, 385–396. Ontario Geological Survey, Toronto, Map 80808.
C, P. 1994. Independent component analysis. A new concept? Signal F, J.A.C. & V, E.A. 1991b. Geochemical Survey, Pancake Lake Area.
Processing, 36, 287–314. Ontario Geological Survey, Toronto, Map 80807.
C, J.A. & D, M.J. 1979. Same aspects of integrated exploration. F, J.M. 1997. Lithogeochemical and mineralogical methods for base
In: H, P.J. (ed.) Geophysics and Geochemistry in the Search for Metallic Ores. metal and gold exploration. In: G, A.G. (ed.) Proceedings of Exploration
Proceedings of Exploration ’77 – an international symposium, Ottawa, Canada, ’97: Fourth decennial International Conference on Mineral Exploration, 191–208.
October 1977. Geological Survey of Canada Economic Geology Report F, J.H. 1987. Exploratory projection pursuit. Journal of the American
31, 575–592. Statistical Association, 82, 249–266.
C, S. 1997. Delivering exploration information on-line using the WWW: F, P.W.B. 1997. Putting it all together—surficial geochemistry maps for
challenges, and an Australian experience. In: G, A.G. (ed.) Proceedings large areas of Canada. In: G, A.G. (ed.) Proceedings of Exploration ’97:
of Exploration ’97: Fourth decennial International Conference on Mineral Explora- Fourth Decennial International Conference on Mineral Exploration, 363.
tion, 135–143. G́, G. (ed.) 1988. Exploration target selection by integration of geodata using
D, B. & C, E. 1998. Levelling geochemical data between statistical and image processing techniques: an example from Central Finland.
map sheets. Journal of Geochemical Exploration, 63, 189–201. Geological Survey of Finland, Report of Investigation 80, Part 1.
D, A.G., B, A. et al. 1995. A global geochemical database for G, K.R. 1971. The biplot graphical display of matrics with application
envrionmental and resource management, recommendations for International Geochemi- to principal component analysis. Biometrika, 58, 453–467.
cal Mapping. Final Report of IGCP 259, with contributions by R.G. Garrett G, R.G. 1983. Sampling Methodology, In: H, R.J. (ed.)
and G.E.M. Hall. Earth Sciences Report 19, UNESCO Publishing. Statistics and Data Analysis in Geochemical Prospecting, Handbook of Explora-
D, M., K, K., V H, Y. & W, B. tion Geochemistry, 2, 83–110, Elsevier.
2007. Robust statistics in data analysis – A review: Basic concepts. G, R.G. 1984. Workshop 5. Thresholds and anomaly interpretation.
Chemometrics and Intelligent Laboratory Systems, 85, 203–219. Journal of Geochemical Exploration, 21–142.
D, P.H., F, P.W.B. & B, M. 1997a. The application of G, R.G. 1988. IDEAS: an interactive computer graphics tool to assist
lake sediment geochemistry to mineral exploration: recent advances and the exploration geochemist. In: Current Research, Part F, Geological Survey
examples from Canada. In: G, A.G. (ed.) Proceedings of Exploration ’97: of Canada, Paper 88-1F, 1–13.
Fourth decennial International Conference on Mineral Exploration, 261–270. G, R.G. 1989a. The role of computers in exploration geochemistry. In:
D, P.H., K, G.J., C-S, S.P. & N, L.W. G, G.D. (ed.) Proceedings of Exploration ’87: Third Decennial Inter-
1997b. Towards comprehensive digital geoscience data coverages for national Conference on Geophysical and Geochemical Exploration for Minerals and
72 E. C. Grunsky
Groundwater, Ontario Geological Survey, Toronto, Special Volume 3, G, E.C. & S, B.W. 1999. The differentiation of soil types and
586–608. mineralization from multi-element geochemistry using multivariate methods
G, R.G. 1989b. A cry from the heart. Explore, 66, 18–20. and digital topography. Journal of Geochemical Exploration, 67, 287–299.
G, R.G. 1989c. The chi-square plot. a tool for multivariate outlier G, E.C., E, R.M., T, P.C. & J, L.S. 1992. A
detection. Journal of Geochemical Exploration, 32, 319–41. statistical approach to the characterization and classification of Archean volcanics rocks
G, R.G. 1990. A robust multivariate procedure with applications to of the Superior Province, geology of Ontario. Ontario Geological Survey, Toronto,
geochemical data. In: A, F.P. & B-C, G.F. (eds) Special 4, Part 2, 1397–1438.
Statistical Applications in the Earth Sciences. Geological Survey of Canada Paper G, R.P. 1991. Remote Sensing Geology. Springer-Verlag, Heidelberg.
89-9, 309–318. H, G.E.M. 1997. Recent advances in geoanalysis and their implications. In:
G, R.G. 1991. The management, analysis and display of exploration G, A.G. (ed.) Proceedings of Exploration ’97: Fourth decennial International
geochemical data. In: Exploration Geochemistry Workshop. Geological Survey Conference on Mineral Exploration, 293–294.
of Canada, Open File 2390, 9.1-9.41. H, S. 1995. Lake sediment geochemistry of the Cow River Area. Ontario
G, R.G. & C, Y. 2007. rgr: The GSC (Geological Survey of Canada) Geological Survey, Toronto, Open File Report 5917.
Applied Geochemistry EDA Package – R tools for determining background ranges H, M.D., S, F., K, I.M. & C, L.M.
and thresholds, Geological Survey of Canada, Open File 5583, 2007; 1 2003. Regional-scale hydrothermal alteration in the Central Blake River
CD-ROM Group, western Abitibi subprovince, Canada: implications for VMS
G, R.G. & G, E.C. 2001. Weighted sums – knowledge based prospectivity. Mineralium Deposita, 38, 393–422.
empirical indices for use in exploration geochemistry. Geochemistry: Explo- H, P.G., B, S.M. & M, A.G. 1989. Image processing of
ration, Environment, Analysis, 1, 135–141. geophysical and geochemical exploration data sets. In: G, G.D.
G, R.G. & G, E.C. 2003. S and R functions for the display of (ed.) Proceedings of Exploration ’87: Third decennial International Conference on
Thompson-Howarth plots. Computers & Geosciences, 29, 239–242. Geophysical and Geochemical Exploration for Minerals and Groundwater, Ontario
G, R.G., K, V.E. & Z, R.K. 1980. The management and Geological Survey, Toronto, Special Volume 3, 822.
analysis of regional geochemical data. Journal of Geochemical Exploration, 13, H, J.R. 2006a. Statistical, mathematical and geostatistical methods for
113–152. dealing with glacial dispersal: application of GIS technology to till data
G, H. & B-C, G.F. 1989. An example of spatial modelling from the Swayze greenstone belt and Cape Breton Island. In: H, J.
of geological data for gold exploration Star Lake area. In: A, F.P. (ed.) GIS for the Earth Sciences. Geological Association of Canada, Special
Publication, 44, 317–368.
& B-C, G.F. (eds) Statistical Applications in the Earth Sciences.
Geological Survey of Canada Paper 89-9, 171–183. H, J.R. 2006b. Integration of geoscience data for mapping potassic
alteration, Swayze greenstone belt, Ontario, Canada. In: H, J. (ed.)
G, G.J.S. 1989. Bedrock geochemistry in mineral exploration.
GIS for the Earth Sciences. Geological Association of Canada, Special
In: G, G.D. (ed.) Proceedings of Exploration ’87: Third decennial
Publication, 44, 369–396.
International Conference on Geophysical and Geochemical Exploration for Minerals
H, J.R., G, E.C. & W, L. 1997. Developments in the
and Groundwater. Ontario Geological Survey, Toronto, Special Volume 3,
effective use of lithogeochemistry in regional exploration programs:
273–200.
application of GIS Technology. In: G, A.G. (ed.) Proceedings of
G, G.J.S. & N, I. 1979. Lithogeochemistry in mineral exploration.
Exploration ’97: Fourth decennial International Conference on Mineral Exploration,
In: H, P.J. (ed.) Geophysics and Geochemistry in the Search for Metallic Ores.
285–292.
Proceedings of Exploration ’77 – an international symposium. Ottawa, Canada, H, J.R., W, L., G, E.C., H, K. & A, J. 1999.
October 1977. Geological Survey of Canada Economic Geology Report
Techniques for analysis and visualization of lithogeochemical data with
31, 339–362.
applications to the Swayze greenstone belt, Ontario. Journal of Geochemical
G, E.C. 1986a. Recognition of alteration in volcanic rocks using
Exploration, 67, 301–334.
statistical analysis of lithogeochemical data. Journal of Geochemical Exploration, H, J.R., G, G. & W, L. 2000. Effective use and
25, 157–183. interpretation of lithogeochemical data in regional mineral exploration
G, E.C. 1986b. Recognition of alteration and compositional variation patterns in programs: application of Geographic Information System (GIS) tech-
volcanic rocks using statistical analysis of lithogeochemical data, Ben Nevis Township nology. Ore Geology Reviews, 16, 107–143.
Area, District of Cochrane, Ontario. Ontario Geological Survey, Toronto, H, J.A. 1975. Clustering Algorithms. Wiley, New York.
Open File Report 5628. H, G. 1989. GIS and computer-mapping aspects of the Austrian
G, E.C. 1990. Spatial factor analysis: a technique to assess the spatial stream-sediment geochemical sampling project. In: V D, J.N. &
relationships of multivariate data. In: A, F.P. & D, J.C. (eds) Digital Geologic and Geographic Information Systems. American
B-C, G.F. (eds) Statistical Applications in the Earth Sciences. Geophysical Union Short Course in Geology, 10, 25–45.
Geological Survey of Canada, Paper 89-9, 329–347. H, H.E. & W, J.S. 1962. Geochemistry in Mineral Exploration. 1st edn,
G, E.C. 1991. Geology of the Batchawana Area, District of Algoma. Ontario Harper and Row, New York.
Geological Survey, Toronto, Open File Report 5791. H, D.R. 1990. Less than obvious: Statistical treatment of data below the
G, E.C. 2000 Strategies and Methods for the Interpretation of Geochemical Data detection limit. Environmental Science and Technology, 24, 1766–1774.
in Exploration Geochemistry in Today’s World. Queen’s University, Kingston, H, E.M., G, E.C. & A, S.A. 2008. Compilation of
11–17 March, 2000. lithogeochemistry: Abitibi Greenstone Belt, Ontario Portion. Digital publi-
G, E.C. 2001. A program for computing rq-mode principal compo- cation containing a database of lithogeochemical analyses. Ontario, Geological
nents analysis for S-Plus and R. Computers & Geosciences, 27, 229–235. Survey of Canada, Open File 5510.
G, E.C. 2002a. R: a data analysis and statistical programming environ- H, M.T. 1989. The relevance of data base technology to resource
ment – an emerging tool for the geosciences. Computers & Geosciences, 28, exploration data In: G, G.D. (ed.) Proceedings of Exploration ’87: Third
1219–1222. decennial International Conference on Geophysical and Geochemical Exploration for
G, E.C. 2002b. Shareware and freeware in the Geosciences II. A Minerals and Groundwater, Ontario Geological Survey, Toronto, Special
special issue in honour of John Butler. In: G, E.C. (ed.) Computers Volume 3, 811–821.
& Geosciences, 28. H, E.H. 1989. Lake sediment geochemistry: Canadian applications
G, E.C. 2006. The evaluation of geochemical survey data: Data in the eighties. In: G, G.D. (ed.) Proceedings of Exploration ’87: Third
analysis and statistical methods using Geographic Information Systems. In: decennial International Conference on Geophysical and Geochemical Exploration for
H, J. (ed.) GIS for the Earth Sciences. Geological Association of Canada, Minerals and Groundwater, Ontario Geological Survey, Toronto, Special
Special Publication, 44, 229–283 Volume 3, 405–416.
G, E.C. & A, F.P. 1988. Spatial and multivariate analysis of H, R.J. 1983. Mapping. In: H, R.J. (ed.) Statistics and Data
geochemical data from metavolcanic rocks in the Ben Nevis area, Ontario. Analysis in Geochemical Prospecting. Handbook of Exploration Geochemistry,
Mathematical Geology, 20, 825–861. 2, 111–205, Elsevier.
G, E.C. & A, F.P. 1992. Spatial relationships of multivariate H, R.J. & E, S.A.M. 1979. Application of a generalized power
data. Mathematical Geology, 24, 731–758. transformation to geochemical data. Mathematical Geology, 11, 45–62.
G, E.C. & K, B.A. 2008. Classification of eruptive phases H, R.J. & M, L. 1979. Computer-based techniques in the
of the Star Kimberlite, Saskatchewan, Canada based on statistical treatment compilation, mapping and interpretation of exploration geochemical data.
of whole-rock geochemical analyses. Applied Geochemistry, 23, 3321–3336, In: H, P.J. (ed.) Geophysics and Geochemistry in the Search for Metallic Ores.
DOI: 10.1016/j.apgeochem.2008.04.027. Proceedings of Exploration ’77 – an international symposium, Ottawa, Canada,
October 1977. Geological Survey of Canada Economic Geology Report Geophysical and Geochemical Exploration for Minerals and Groundwater, Ontario
31, 544–574. Geological Survey, Toronto, Special Volume 3, 300–311.
H, R.J. & S-L, R. 1983. Multivariate analysis, In: M, R.H. 1997. Geochemical exploration in areas affected by
H, R.J. (ed.) Statistics and Data Analysis in Geochemical Prospecting. tropical weathering—an industry perspective. In: G, A.G. (ed.)
Handbook of Exploration Geochemistry, 2, 207–289, Elsevier. Proceedings of Exploration ’97: Fourth decennial International Conference on Mineral
I, T.N. & B, W.R.A. 1979. A Guide to the Chemical Classifica- Exploration, 315–322.
tion of the Common Volcanic Rocks. Canadian Journal of Earth Sciences, 8, MC, M.B., T, L.H. & D, R.N.W. 1997. Till
523–546. geochemical and indicator mineral methods in mineral exploration. In:
I, E.H. & S, R.M. 1989. An Introduction to Applied Geostatistics, G, A.G. (ed.) Proceedings of Exploration ’97: Fourth decennial International
Oxford University Press, New York. Conference on Mineral Exploration, 233–247.
J, J.-M., F, F. & B, J.-P. 1975. Comparison of auto- MQ, J. 1967. Some methods for classification and analysis of multi-
matic classification methods applied to lake geochemical samples. Math- variate observations. 5th Berkeley Symposium on Mathematics, Statistics, and
ematical Geology, 7, 237–266. Probability, 1, 281–298.
J, J.E. 2003. A User’s Guide to Principal Components. Wiley-Interscience, M, M. 1987. Multivariate data analysis. its methods. Chemometrics and
Hoboken, NJ. Intelligent Laboratory Systems, 2. 29–36.
J, L.S. 1975. Geology of Clifford and Ben Nevis Townships, District of M, M. 1989. Computer tools for the integrative interpretation of
Cochrane, Ontario Division of Mines, GR 132. Accompanied by Map geoscience spatial data in mineral exploration. In: A, F.P. &
2283, scale 1 inch to 1/2 mile. B-C, G.F. (eds) Statistical Applications in the Earth Sciences.
J, I.T. 2002. Principal Components Analysis. 2nd edition. Springer, New Geological Survey of Canada Paper 89-9, 135–139.
York. M, M., C, S.C.Y. et al. 1984. The multivariate chemical space,
J̈, K.G., K, J.E. & R, R.A. 1976. Geological Factor and the integration of the chemical, geographical, and geophysical spaces.
Analysis. Elsevier Scientific Publishing Company, Amsterdam. Journal of Geochemical Exploration, 21, 143–148.
J, A.G. & H, C.J. 1978. Mining Geostatistics. Academic Press, M, J.C., S, L.A. & B, L.M. 1992. A FORTRAN
London. program for the calculation of normative composition of clay minerals and
J, A.S. 1984. Geochemical Exploration. The Australian Mineral Foundation pelitic rocks. Computers and Geosciences, 18, 47–61.
Inc, Glenside, South Au. M, W.T., T, P.K. Jr. & B, H. 1979. Stream sediment
K, L. & R, P.J. 1990. Finding Groups in Data: An Introduction geochemistry. In: H, P.J. (ed.) Geophysics and Geochemistry in the Search for
to Cluster Analysis. John Wiley, Hoboken, NJ. Metallic Ores. Proceedings of Exploration ’77 – an international symposium, Ottawa,
K, B.A., L, D.A., MN, D. & MI, D. 1997. Regional Canada, October 1977. Geological Survey of Canada Economic Geology
and detailed geology of the Fort à la Corne kimberlite field, central Saskatchewan. Report 31, 411–434.
Unpublished proprietary report to the Fort à la Corne joint venture. O, J., P, J. & R, M. 1996. Precious-metal-bearing volca-
K, R.A. 1997. Glacial history and ice flow dynamics applied to drift nogenic massive sulfide deposits, Campo Morado, Guerrero, Mexico.
prospecting and geochemical exploration. In: G, A.G. (ed.) Proceedings Exploration Mining Geology, 6, 119–128.
of Exploration ’97: Fourth decennial International Conference on Mineral Explora- P, V. 1989. Cokriging of regionalized compositions. Mathematical
tion, 221–231. Geology, 21, 513–521.
K, T. 1995. Self-Organizing Maps. Springer-Verlag, Heidelberg. P-G, V. & B, A. 2002. Visualization and modeling
K, J.B. 1964. Multidimensional scaling by optimising goodness of fit to of sub-populations of compositional data; statistical methods illustrated by
non-metric hypothesis. Psychometrika, 29, 1–27. means of geochemical data from fumarolic fluids. International Journal of
K, W.J. 1988. Principles of Multivariate Analysis: A User’s Perspective. Earth Sciences, 91, 357–368.
Clarendon Press, Oxford. P-G, V. & E, J.J. 2006. Compositional data and their
K, W.J. & B, T.C. 2007. Extraction of spatial features using analysis. In: B, A., M-F, G. & P-G,
factor methods illustrated on stream sediment data. Mathematical Geology, 39, V. (eds) Compositional Data Analysis in the Geosciences: From Theory to Practice.
69–85. Geological Society, London, Special Publications, 264, 1–10.
K, V. (ed.) 1988. Exploration target selection by integration of geodata using P, E.J. 2004. Multivarilabe geostatistics in S: the gstat package.
statistical and image processing techniques: an example from Central Finland. Computers & Geosicences, 30, 683–691.
Geological Survey of Finland, Report of Investigation 84, Part 2, Atlas. P, M. & J, M. 2004. Normative minerals and alteration indices
K̈, H. 1988. Exploratory data analysis: recent advances for the interpreta- developed for mineral exploration. Journal of Geochemical Exploration, 82,
tion of geochemical data. Journal of Geochemical Exploration, 20, 309–322. 59–77.
L, L. & H, D. 2005. Statistical analysis of water-quality data containing P, C.M. & E, P.A.J. 1993. Remote Geochemical Analysis: Elemental
multiple detection limits: S-language software for regression on order and Mineralogical Composition. Cambridge University Press, Cambridge.
statistics. Computers & Geosciences, 31, 1241–1248. P, J.A., H, M. & R, J. 1989. Regional Geochemistry Based
L, L. & H, D. 2007. Statistical analysis of water-quality data containing on Stream Sediment Sampling. In: G, G.D. (ed.) Proceedings of
multiple detection limits II: S-language software for nonparametric distri- Exploration ’87: Third decennial International Conference on Geophysical and
bution modeling and hypothesis testing. Computers & Geosciences, 33, Geochemical Exploration for Minerals and Groundwater, Ontario Geological
696–704. Survey, Toronto, Special Volume, 3, 384–404.
L, A.A. 1980. Introduction of Exploration Geochemistry. 2nd edn. Applied R D C T, 2008. R: A language and environment for statistical
Publishing, Chicago. computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN
L, L. 1976. SELLO, A Fortran IV program for the transformation 3-900051-07-0, URL: http://www.R-project.org
of skewed distributions to normality. Computers & Geosciences, 1, 129–145. R, M. 1999. Applied Exploration Geochemistry: Campo Morado Precious-
L, R.F. & K, G.S. 1975. Some consequences of applying lognormal Metal-Bearing Volcanogenic Massive Sulphide District, Guerrero, Mexico. 19th
theory to pseudolognormal distributions. Mathematical Geology, 7, 117–128. International Geochemical Exploration Symposium, Vancouver, British Columbia,
M, L. 1989. Expert systems and their use as exploration assistants. In: Canada, April 10–16, 1999, Abstract.
G, G.D. (ed.) Proceedings of Exploration ’87: Third decennial International R, C., F, P. & G, R.G. 2005. Background and
Conference on Geophysical and Geochemical Exploration for Minerals and Ground- threshold: Critical comparison of methods of determination. Science of the
water, Ontario Geological Survey, Toronto, Special Volume, 3, 826–834. Total Environment, 346, 1–16.
M-F, J.A., B-V, C. & P-G, V. R, C., F, P., G, R.G. & D, R. 2008. Statistical
1998. A critical approach to non-parametric classification of compositional Data Analysis Explained. Applied Environmental Statistics with R. John Wiley &
data. In: R, A., V, M. & B, H.H. (eds) Advances in Data Science Sons, Chichester.
and Classification. Springer, Berlin, 49–56. R, A.N. 1999. Remote Sensing for the Earth Sciences. Manual of Remote
M-F, J.A., B-V, C. & P-G, V. Sensing, 3. 3rd edn. John Wiley & Sons, New York.
2000. Zero replacement in compositional datasets. In: K, H., R, R, R.A. & J̈, K.G. 1993. Applied Factor Analysis in the Natural
J., G, P. & S, M. (eds) Studies in Classification, Data Analysis, Sciences. Cambridge University Press, Cambridge.
and Knowledge Organization. Springer, Berlin, 155–160. R, J.A. & J, X. 1999. Remote Sensing Digital Image Analysis, an
M, R.H. 1989. Exploration geochemistry in areas of deeply Introduction. 3rd edn. Springer-Verlag, Heidelberg.
weathered terrain: weathered bedrock geochemistry. In: G, G.D. R, N.M.S. 1987. Robust, An interactive Fortran-77 package for explora-
(ed.) Proceedings of Exploration ’87: Third decennial International Conference on tory data analysis using parametric, robust and nonparametric location and
74 E. C. Grunsky
scale estimates, data transformations, normality tests, and outlier assess- S, C.R. 2003. THPLOT.M: a MATLAB function to implement
ment. Computers & Geosciences, 13, 463–494. generalized Thompson-Howarth error analysis using replicate data. Com-
R, N.M.S. 1988. Numerical geology, a source guide, glossary and selective puters & Geosciences, 29, 225–237.
bibliography to geological uses of computers and statistics. In: S, C.R. 2006. On the special application of Thompson-Howarth error
B, S., F, G., N, H.J. & S, A. analysis to geochemical variables exhibiting a nugget effect. Geochemistry:
(eds) Lecture Notes in Earth Sciences, 18. Springer-Verlag, Berlin. Exploration, Environment, Analysis, 6, 357–368.
R, A.W., H, H.E. & W, J.S. 1979. Geochemistry in Mineral S, C.R. & S, A.J. 1987. Anomaly recognition for multi-element
Exploration. 2nd edn. Academic Press. geochemical data: A background characterization approach. Journal of
R, O.M., A, A.A., M, A.A. & Y, A.A. 2000. Geochemical Exploration, 29, 333–353.
MINLITH – A program to calculate the normative mineralogy of S, C.R. & S, A.J. 1989. Comparison of probability plots and
sedimentary rocks: The reliability of results obtained for deposits of old the gap statistic in the selection of thresholds for exploration geochemistry
platforms. Geochemistry International, 38, 388–400. data. Journal of Geochemical Exploration, 32, 355–357.
R, P.J. &  D, K. 1999. A fast algorithm for the
T, M. & H, R.J. 1973. The rapid estimation and control of
minimum covariance determinant estimator. Technometrics 41, 212–223.
precision by duplicate determinations. The Analyst, 98, 153–160.
S, J.W. 1969. A non-linear mapping for data structure analysis. IEEE
Transactions in Computing, C18, 401–409. T, M. & H, R.J. 1976a. Duplicate analysis in practice––Part
S, R.F, P, C.T. & C, R.A. 1993. An objective replace- 1. Theoretical approach and estimation of analytical reproducibility. The
ment method for censored geochemical data. Mathematical Geology, 25, Analyst, 101, 690–698.
59–80. T, M. & H, R.J. 1976b. Duplicate analysis in practice––Part
S, D. 2008. Lattice, Multivariate Data Visualization with R. Springer, New 2. Examination of proposed methods and examples of its use. The Analyst,
York. 101, 699–709.
S, J. 1989. Geochemical exploration in areas of glaciated terrain: T, M. & H, R.J. 1978. A new approach to the estimation of
geological processes. In: G, G.D. (ed.) Proceedings of Exploration ’87: analytical precision. Journal of Geochemical Exploration, 9, 23–30.
Third decennial International Conference on Geophysical and Geochemical Exploration T, J.W. 1977. Exploratory Data Analysis. Addison-Wesley, Reading,
for Minerals and Groundwater, Ontario Geological Survey, Toronto, Special Massachusetts.
Volume, 3, 335.   B, K.G. & T-D, R. 2008. “Compositions”:
S-L, R. 1975. A computer method for dividing a regional a unified R package to analyze compositional data. Computers & Geosciences,
geochemical survey area into homogeneous subareas prior to statistical 34, 320–338.
interpretation. In: E, I.L. & F, W.K. (eds.) Geochemical V, W.N. & R, B.D. 2002. Modern Applied Statistics with S. 4th edn.
Exploration 1974. Elsevier, Amsterdam, 191–217. Springer-Verlag, New York.
S, A.J. 1976. Application of Probability Plots in Mineral Exploration. V, R.K. 1997. Fundamentals of Geological and Environmental Remote Sensing.
Association of Exploration Geochemists, Special Publication, 4. Prentice Hall, Upper Saddle River, NJ.
S, B.W. 1997. The formation of surficial geochemical patterns over buried  E, H., P-G, V. & E, J.J. 2002. Under-
epithermal gold deposits in desert environments: results of a test of partial standing perturbation on the simplex: A simple method to better visualize
extraction techniques. In: G, A.G. (ed.) Proceedings of Exploration ’97: and interpret compositional data in ternary diagrams. Mathematical Geology,
Fourth decennial International Conference on Mineral Exploration, 301–314. 34, 249–258.
S, R.E. 1989. Using Lateritic Surfaces to Advantage in Mineral Explo- V E, H., B-V, C. & P-G, V. 2003.
ration. In: G, G.D. (ed.) Proceedings of Exploration ’87: Third Decennial Composition and discrimination of sandstones; a statistical evaluation of
International Conference on Geophysical and Geochemical Exploration for Minerals different analytical methods. Journal of Sedimentary Research, 73, 47–57.
and Groundwater, Ontario Geological Survey, Toronto, Special Volume, 3, W, H. & B, C. 1989. Caractérisation d’anomialies
312–322.
géochemiques par la géostatistique multivariable. Journal of Geochemical
S, R.E. & P, J.L. 1983. Pisolitic laterite geochemistry in the
Exploration, 32, 437–444.
Golden Grove massive sulphide district, Western Australia. Journal of
W, L., H, J.R. & G, E.C. 1999. Building a lithogeochemical
Geochemical Exploration, 18, 131–164.
S, R.E., P, J.L. & D, J.M. 1987. Dispersion into pisolitic database for GIS analysis; methodology, problems and solutions. Geological Survey
laterite from the Greenbushes mineralized Sn–Ta pegmatite system, of Canada Open File 3788.
Western Australia. Journal of Geochemical Exploration, 28, 251–265. W, L., H, J.F., K, B.K. & MC, M.B.
S, R.E., B, R.D. & B, J.F. 1989. The implications to 2006. Till geochemistry for kimberlite exploration: Using GIS to visualize,
exploration of chalcophile corridors in the Archaean Yilgarn Block, analyze and decide. In: H, J. (ed.) GIS for the Earth Sciences. Geological
Western Australia, as revealed by laterite geochemistry. Journal of Geochemical Association of Canada, Special Publication, 44, 297–316.
Exploration, 32, 169–184. Y, D.G., K, A.N. & D, M.I. 1988. CHEMPET
S, R.E., A, R.R. & A, N.F. 1997. Use and implications of – Calculation for the chemical systematics of igneous rocks based on the
paleoweathering surfaces in mineral exploration. In: G, A.G. (ed.) CIPW norm. Computers & Geosciences, 24, 1–5.
Proceedings of Exploration ’97: Fourth decennial International Conference on Mineral Z, D. 1985. Adjustment of geochemical background by robust multivari-
Exploration, 335–346. ate methods. Journal of Geochemical Exploration, 24, 207–222.
S, C.R. 1987. PROBPLOT, An Interactive Computer Program to Fit Z, D. 1989. ROPCA: a Fortran program for robust principal components
Mixture of Normal (or Log normal) Distribution with Maximum Likeli- analysis. Computers & Geosciences, 15, 59–78.
hood Optimization Procedures. Association of Exploration Geochemists Z, D., C, T. & D, J.C. 1983. Dual extraction of R-mode and
Special Volume 14, 1 diskette. Q-mode factor solutions. Mathematical Geology, 15, 581–606.
Received 25 September 2008; revised typescript accepted 27 January 2009.

The Interpretation of Geochemical Survey Data

Uploaded by

Copyright:

Available Formats

The Interpretation of Geochemical Survey Data

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Interpretation of Geochemical Survey Data

Uploaded by

Copyright:

Available Formats

Downloaded from http://geea.lyellcollection.

org/ at Cornell University on November 19, 2012

Geochemistry: Exploration, Environment, Analysis

The interpretation of geochemical survey data

Geochemistry: Exploration, Environment, Analysis 2010, v.10; p27-74.

© The Geological Society of London 2012

The interpretation of geochemical survey data

KEYWORDS: geochemistry, data analysis, visualization, statistical methods, data interpretation,

The interpretation of geochemical survey data 29

Indonesia, provides an example of how multivariate data analy-

The interpretation of geochemical survey data 31

provide a basis for context and comparison of different data

Fig. 3. Exploratory Data Analysis (EDA) plot of As in lake Box plots

The interpretation of geochemical survey data 33

Table 1. Summary statistics for lake sediments, Batchawana Area, Ontario.

The interpretation of geochemical survey data 35

95 km (east–west) and 62 km (north–south). Semi-variograms

there may be one or more values of defining the different

The interpretation of geochemical survey data 37

Fig. 6. Box plots showing the character

intimately associated. If a threshold has been defined, then an

The interpretation of geochemical survey data 39

Transformation of data (1979) provided a computer program for estimating parameters

Fig. 9. Levelling scenarios for geochemical data.

The interpretation of geochemical survey data 41

Plate 2. General geology of the Batchawana area, Ontario, Canada.

Plate 3. Location of the soil survey

Plate 5. Density plot of arsenic versus gold displaying censoring and

value for the lower limit of detection (lld) may become an

Levelling geochemical survey datasets: an example

The interpretation of geochemical survey data 43

Plate 6. (a) Exploratory data analysis

The interpretation of geochemical survey data 45

alkaline volcanics, sediments and granitoid rocks. Plate 13 shows a

Plate 9. Correlation matrix expressed in terms of colour. The scale

between geochemical data exists between the Cow River

The interpretation of geochemical survey data 47

Plate 13. Unlevelled Zn values in lake sediments, Batchawana area, Ontario.

D = o wi fsqide ⳮ sqide⬘g2 where

wi is the assigned weight to the ith quantile,

The interpretation of geochemical survey data 49

Table 2. Weights used for quantile regression in levelling geochemical data.

In Daneshfar & Cameron (1998) the weight for the 95th

MULTIVARIATE DATA ANALYSIS TECHNIQUES

The interpretation of geochemical survey data 51

PC1. Rocks reflecting felsic metavolcanic rocks (Si, Zr, Ba, K, Y,

The interpretation of geochemical survey data 53

Table 4. Principal components analysis of Ben Nevis lithogeochemical data. Analysis

PC1 PC2 PC3 PC4 PC5 PC6 PC7

R-Loadings values <0 in italics

PC1 PC2 PC3 PC4 PC5 PC6 PC7

The interpretation of geochemical survey data 55

The interpretation of geochemical survey data 57

Plate 21. Interpolated scores of the first principal component draped

The interpretation of geochemical survey data 61

distinct suites representing phases of the kimberlitic eruptions

there is likely to be some confusion when classifying unknown

The interpretation of geochemical survey data 63

The interpretation of geochemical survey data 65

W = V1⁄2U ⬘ where = diagonal matrix of eigvenalues

W = F RARsR-mode solutiond where F R = V and AR = 1⁄2U ⬘

W = AQF QsQ-mode solutiond where AQ = V1⁄2 and F Q = U ⬘