Geostatistics as an Aid to Mapping

Michael Rosenbaum and Mats Söderström

Geostatistics provides an interpolation technique (kriging) controlled by the variogram, a graphical representation of the spatial auto-correlation between pairs of observation points. In practice, the effectiveness of kriging depends upon an appropriate selection of the model variogram parameters, and how representative are the available observations. A measure of the reliability of the interpolation, from the kriging error variance of estimation, makes the use of geostatistics as an aid to mapping of considerable interest. Cokriging uses a similar approach but can deal with a limited number of measurements for the parameter to be interpolated by employing the experimental variogram behaviour of secondary parameters with which it can be correlated. The estimation error is thereby reduced since more information is being used for the estimation of the primary parameter; a two-fold reduction in error of estimation is typical. Such geostatistical techniques are illustrated by their application to the distribution of heavy metals as an aid to agricultural mapping in a situation where an extensive sampling programme cannot be undertaken. A biogeochemical map has thus been produced characterising the likely value of cadmium where zinc has been analysed at more localities.

INTRODUCTION

Map information incorporated within GIS is frequently derived from field observations such as soil properties, vegetation types and climatic measurements. These observations are generally represented as point sources, but areal studies require their values to be estimated at unsampled locations as well. Most GIS include one or more interpolation modules to facilitate such estimation, but comparison of their results reveals significant differences in predicted value. The ability of geostatistics to not only provide an optimum weighted moving average estimation technique for interpolation ("kriging") but also to calculate a measure of its reliability, from the kriging error variance of estimation, makes its use as an aid to such mapping of considerable interest.

The reliability of measurements for a given parameter will vary with the method of sampling and measurement used. It is desirable to quantify such aspects since they contribute to the degree of spatial correlation exhibited by a particular data set. If a model of spatial correlation can be established then it can be used for interpolation, essentially by assigning weights to the available observations dependent upon the distance between the sampling locations and the point of estimation; this is the process known as kriging.

GEOSTATISTICS

Observations taken at irregularly spaced locations require estimation to facilitate spatial description. Geostatistics provides a stochastic interpolation technique which is based upon establishing the auto-correlation between observation points avoiding dependence upon the mean. The measured ("experimental") value of a single variable is then described in terms of the distance of separation ("lag") between the observation points (Isaaks & Srivastava, 1990). This is given by the "experimental variogram", which is the average square of the difference in value between pairs of observations of the parameter a given distance apart, and so is a measure of dissimilarity.

The experimental variogram can be represented by a theoretical model chosen by curve-fitting together with an application of geological common sense. The behaviour near the origin is the most critical for estimation, together with a measure of the overall variance and the distance ("range") over which the local variance continues to exhibit an increase in magnitude.

The geostatistical technique known as kriging can then be applied. Kriging is essentially a weighted moving average technique for estimation whereby the selection of weights is made such that the estimation variance is minimised. This gives the most likely value that the parameter will have at a given location (at a specific point, area, or volume) together with the range within which it is likely to lie, determined using the kriging error variance of estimation. In practice, the effectiveness of kriging depends upon the appropriate selection of the model variogram parameters and how representative the observation points are of the phenomenon.

The functionality of ARC/INFO enables such basic geostatistical methods of interpolation to be calculated. The variogram can be calculated using an AML called SEMIVARIOGRAM which runs in TIN and can be displayed in ARCPLOT. This is based on the data held within an INFO database generated by the GRAPH and BOTH options of the KRIGING command which is available within both TIN and GRID. This enables display of 5 types of model variogram which might fit the experimental values: spherical, circular, exponential, gaussian and linear. The database contains values for location and the parameter magnitude.

Estimation can then proceed using the TIN module KRIGING, which interpolates a lattice of estimated values using the selected model variogram and the original observations. The techniques of Ordinary Kriging and Universal Kriging are both available.

COKRIGING WITHIN ARC/INFO

If only a limited number of observations are available for a particular parameter, knowledge of the experimental variogram behaviour of similar parameters with which it can be correlated can be used to assist the fitting of a more realistic model, although the actual data need not give a particularly good correspondence.

The concept of Cokriging achieves this for a parameter whose estimation is primarily required (the "predictand") which is cross-correlated with one or more secondary parameters (the "covariables") (Stein et al., 1988). The estimation error is thereby reduced since more information is being used for the estimation of the primary parameter; a two-fold reduction in error of estimation would be typical.

At the time of writing the functionality of ARC/INFO does not provide for estimation by Cokriging. However, public domain software is available that allows this technique to be applied in the UNIX environment. A suitable product is the Geostatistics Software Library (GSLIB) produced at Stanford University (Deutsch & Journel, 1993).

The source code for GSLIB is published as a series of modules written in FORTRAN 77 which can be compiled on the user's own machine. The data format follows the ASCII structure introduced by the EPA (Englund & Sparks, 1988). This is essentially tabular, with a number of header lines to describe the project and each column of data.

Thus ARC/INFO TABLES can be used to generate an ASCII report containing the Easting and Northing locations for each observation, together with the magnitude of the parameter. "&system" can then be used to edit the report, add the header lines and save the file, conventionally with a ".DAT" extension.

The GSLIB module for Cokriging can now be executed, so generating two grids of values, one for the cokriged estimates and the other for the cokriging error variance of estimation. These files are written in ASCII, so enabling the header lines to be stripped and then imported back into the ARC/INFO environment as GRID files using the ASCIIGRID module.

Data analysis and display can now be performed on the cokriged estimates within ARC/INFO using conventional procedures, with the cokriging error of estimation providing a measure of estimate quality.

BIOGEOCHEMICAL SURVEY DATA

The geostatistical techniques can be illustrated by their use to describe the distribution of heavy metals as an aid to biogeochemical mapping. The agricultural problem is described in greater detail by Rosenbaum & Söderström (1996), but here attention is focused on the mapping aspects to deduce the likely content of cadmium (Cd). The occurrence of this element in agricultural crops is of concern to both producers and consumers since food is the main route by which this element enters the body. Even at relatively low concentrations, long term exposure to Cd might cause kidney dysfunction (Elinder et al., 1976).

The geology of Skåne, the southernmost province of Sweden (inset within Figure 1), has been most recently mapped by the Swedish Geological Survey (SGU) (Bergström et al., 1988). Towards the north and north-east, Precambrian granite and gneiss predominate. Lower Palaeozoic clastic sediments, including the Alum Shale, outcrop in an elongated area some 25 km across that trends from south-east to north-west. A blanket of glacial till covers most of the area and is generally thin, but reaches over 50 m, for example above the Tertiary limestone in the south-westernmost part of Skåne (Ringberg, 1980).

Figure 1. Location map for the study area of Skåne, southern Sweden, showing cadmium levels and sample locations.


The SGU biogeochemical sampling programme in Skåne has produced a total of 285 samples in which both Cd and zinc (Zn) have been analysed, and an additional 1261 samples which have been analysed for Zn alone, as shown in Figure 1 above. Chemically the behaviour of Cd and Zn are rather similar. Because of the close geochemical association between Zn and Cd, the Zn:Cd ratio has been used as an indicator for the presence of anomalies (Peterson & Alloway, 1979). For most rocks, this ratio lies in the range from 250 to 1000. However, with few exceptions, the Zn:Cd ratio in the Skåne biogeochemical samples is considerably lower than 250, suggesting that Cd is more easily taken up by the sampled vegetation, or perhaps has been added by anthropogenic activity. However, the Zn concentrations recorded from agricultural soil samples in this area are also low compared to other parts of Sweden (Eriksson & Söderström, 1994), hinting at the possibility of a lower Zn content prevailing in Skåne than normally encountered elsewhere.

COKRIGING CADMIUM ON ZINC

The spatial data regarding the geochemistry and geology were managed using ARC/INFO, so enabling the biogeochemical samples and the geological units to be compared.

The sampling density is approximately one sample per 7 sq. km for most elements, but Cd is only analysed in every fifth sample. Hence the maps for Cd are relatively crude when compared with Zn, and can only be used in a very general way. The estimation carried out for Cd using Ordinary Kriging based on Cd variography alone produces a somewhat generalised distribution (Figure 2), although this is the basis that has been employed by the SGU (Ressar & Ohlsson, 1985) using the methodology described in Rose et al. (1979).

Figure 2. Cadmium estimates using Ordinary Kriging.


However, in situations where the number of observations for a variable is limited, as it is here for Cd, there would be an advantage in applying an estimation technique which is able to take into account the distribution of a more extensively sampled variable, such as Zn, with which it can be shown to be correlated. The wider distribution of zinc analyses can reduce the variance error of estimation for Cd by using the knowledge of how zinc varies with cadmium, and then using the values of zinc together with those of cadmium. The use of this geostatistical technique of Cokriging has here been investigated in the context of conducting such an estimation.

The experimental cross-variogram at location xi can then be defined, in a similar way to the variogram, as the average square of the difference in value of predictand variable Cd and covariable Zn between n pairs of observations of Cd and Zn a given distance h apart, given by:

These values for (hzy) are then used to construct a graph of the experimental cross-variogram against lag distance.

The three most commonly applied types of Cokriging are (Deutsch & Journel, 1993): (1) traditional ordinary cokriging, (2) standardised ordinary cokriging, and (3) simple cokriging. With traditional ordinary cokriging, the sum of the weighting coefficients for the predictand variable are set to one and the sums for the covariables are set to zero. This has the effect of severely limiting the influence of the covariables. Standardised cokriging offers a more flexible approach since, according to Isaaks & Srivastava (1990), the means of the covariables are set to be the same as that for the predictand. The weights are then constrained to sum to one. The third approach, of simple cokriging, is not to constrain the weights but, as with simple kriging, to transform the variable to multivariate Gaussian, setting their means to zero and their variance to one.

The technique of simple cokriging has been found to give the most reproducible estimations (Ahmed & de Marsily, 1987) and so simple cokriging has been adopted as the preferred technique for estimating cadmium in the case of the Cd and Zn biogeochemical data in Skåne, the results of the estimations being shown in Figure 3.

Figure 3. Cadmium estimates using Cokriging.


¡Error!Marcador no definido.

COKRIGING AND ORDINARY KRIGING COMPARED

The differences in estimation calculated by the two geostatistical techniques of Cokriging and Ordinary Kriging can be seen in Figure 4. This reveals three distinct areas where the Cokriged estimates differ significantly from the Ordinary Kriged estimates:

Figure 4. Map showing the difference between estimations calculated, equal to the estimates by Ordinary Kriging minus estimates by Cokriging.


1. Near the border of the data collection area

Data is frequently collected within an administrative boundary, for instance a county or state, and the map to be compiled is just required for that bounded area. In the absence of data from neighbouring areas, interpolation will not be possible in the border region and so extrapolation will be needed, but will be prone to unnatural deviations if the sampling density is low. If additional observations are available from one or more covariables in the border region then these will help to stabilise the extrapolation of the predictand to the boundary.

2. Occurrences of extreme values

The occurrences of extreme values, for instance along the northwest-southeast trending ridge of high Cd values seen in the Cokriged estimates (Figure 3 above), can sometimes be smoothed too much where few observations are available, the ridge being somewhat subdued in the map of Ordinary Kriged estimates (Figure 2 above). These differences are brought out in Figure 4 above, the map showing the difference between estimations calculated, equal to the estimates by Ordinary Kriging minus estimates by Cokriging, the success of the Cokriging at capturing the zone of high Cd values being due to the presence of many nearby observations of similarly high Zn.

3. Sparsely sampled areas

Sparsely sampled areas, particularly with the occurrence of extreme values, are vulnerable to being dominated by the extremes. The presence of covariables enables significant zones of high or low predictand values to be more readily distinguished from isolated occurrences.

Taken overall, the correlation coefficient between estimated and observed values for Cd was raised from the Ordinary Kriging case of 0.68 to 0.85 for the Cokriging situation in which the Zn observations were taken into account in addition to the Cd.

The distribution of estimation errors is an important source of information when it comes to assessing the reliability of the interpolated map or to planning additional sampling requirements. The reduction of the estimation error when cokriging is applied is shown in Figure 5. In this case, the ratio of the cokriging error variance of estimation to the ordinary kriging error variance of estimation ranges from 0.32 to 0.87. On average, the reduction is about two-fold. The highest reduction (shown as circles in Figure 5) is found in areas where the observations of the predictand are few and far between; these are where the covariable provides the most useful influence, notably in the south-western corner of the region. The least improvement is found where the majority of observations include data for both predictand and covariable.

Figure 5. Error reduction map showing the cokriging error variance of estimation as a percentage of the ordinary kriging error variance of estimation.


CONCLUSIONS

Cokriging provides a method of estimation which combines kriging of one sparser element with kriging based on the cross-correlation between that element and one or more denser sampled secondary elements. The use of Cokriging based on both Cd and Zn variography has produced an estimate map for Cd which better reflects the local variations recorded for the Zn levels with which the Cd is strongly correlated.

This reduces the kriging error variance of estimation and so enables a more accurate estimate to be made of the less well sampled element (cadmium) when a more extensively sampled element (zinc), with which it shows some degree of correlation, is available. However, the numerical value for the error variance is dependent upon the sample geometry and thus to the sample size and the magnitude of any non-spatially dependant variability.

An improvement of the estimation of the spatial variability of background Cd values in Skåne has been achieved by the use of simple cokriging. Such techniques will therefore be beneficial both for general planning and for agricultural management. The main objective has been to present an improved technique for analysing the results of regional biogeochemical surveying using Cd as an example of an under-sampled variable considered alongside its correlated behaviour with Zn, incorporating both their spatial characteristics. Maps produced in such a manner might form a basis for delimiting areas of high risk where more detailed investigations should be carried out. Within such areas, efforts need to be made to inform farmers of the importance of using adequate farming practises (such as the appropriate selection of crops and crop species, type of fertiliser and provision of adequate liming) and enhance large scale soil maps as a basis for improved farming practice.