European Soil Bureau Research Report No. 4
Integrating GIS and process models for land resource planning
1
European Soil Bureau Research Report No. 4
Summary
The use of Geographical Information Systems (GIS) for the storage, analysis and presentation of land resource data has diversified rapidly during the last ten years. The major emphasis of the current GIS applications is, however, on the storage, management and presentation of spatial data. Until now, the analysis or modelling applications of GIS have been quite limited in number and complexity.
On the other hand a large number of scientists in the world have developed computer models for erosion, leaching and crop production, and for the simulation of all kind of processes.
A. K. Bregt
J. Bulens
DLO Winand Staring Centre for integrated Land, Soil and Water Research
(SC-DLO), Postbus 125, NL-6700 AC Wageningen, The Netherlands
Most of the current models are good for describing the process under study, but they are lacking proper tools for the input and management of spatial data needed to run the models, and have limited facilities for presentation of the results. These are aspects in which GIS has its strength, so a combination of GIS and process models is attractive from both sides.
In this paper the subject of integrating GIS and process models for land resource planning are discussed from three viewpoints. First, the conceptual aspects of the integration, especially the space and time data requirements of the process models, are considered, and the potential of current commercial GIS to meet these requirements is discussed. Second, the subject will be viewed from a technical implementation angle. Two main types of integration are distinguished: loose and tight. Advantages and disadvantages and technical consequences of both integration forms are described. Third, a few practical examples of the integration of GIS and process models in the field of land resource planning are presented.
Introduction
During the last ten years. GIS applications have been developed for a large variety of fields, ranging from land use planning and utility management at local level, to global warming and acid deposition on a global scale. Current GIS applications, however, tend to concentrate on the storage, management and presentation of spatial data, not the analysis or modelling applications. This is partly due to the limited functionality in this regard provided by commercial GIS software houses.
Many scientists, however, have developed process models to demonstrate pesticide leaching, erosion, hydrological features, acid deposition, crop production and the simulation of all kind of processes. Many current models lack proper tools for data input and management, and have poor presentation facilities. GIS is strong in these areas. The main focus of this paper is on the conceptual aspects of integrating GIS and process models. When referring to GIS in this paper, the meaning of commercial GIS is taken to be as described by Goodchild et al. (1992):
“a database containing a discrete representation of geographical reality in the form of static, two-dimensional geometrical objects and associated attributes, with a functionality largely limited to primitive geometrical operations to create new objects or to compute relationships between objects, and to simple query and summary descriptions.”
Describing the real world
The real world can be considered as a continuum in four dimensions: three dimensions in space and one in time. A great variety of processes take place, which change the status of the world continuously. These processes may have quite different spatial and temporal scales. For instance, daily processes due to the input of solar energy, yearly processes due to seasonal variation in weather and processes on a time scale of millions of years like the formation of mountains. Some of the processes are discrete in time (e.g. earthquakes), some have a continuous character. All these processes and their interactions lead to a world which changes from second to second and is so complex that we will never be able to describe all aspects to their full extent.
To achieve an understanding of the world, people have collected data about the status of the world and studied the processes which take place. In addition to pure scientific interest, the main reasons for doing so is that we need this information for the management of our environment today and that we want to know how things might change in the future. It is, however, impossible to get an exact description of our world. An exact description is also not necessary; it is enough that we construct models of our environment which satisfy our information requirements.
In the process of obtaining user-specific information of the world, various steps can be distinguished (Bregt, 1992). These steps are presented in Figure 1.
1
European Soil Bureau Research Report No. 4
Figure 1: From real world to user-specific information.
1
European Soil Bureau Research Report No. 4
In the first step, data about the real world are collected and processes are described. As we are dealing in this paper with geographical information systems and process models, the storage of data in a database and the implementation of a described process in an algorithm are included in the first step. In the second step (analysis), the data or a combination of data and process model are used in an analysis to derive the requested information. Note that in this step, data alone can be analyzed, but that process models are applied only in combination with data. In the third step, the information obtained is presented (presentation) and may be used to initiate certain actions which influence the world.
The current domain of GIS includes data collection, storage, analysis and presentation. It does not, however, include the application of process models.
Data models
Geographical reality can be described completely by recording all possible attributes at all possible points in space and time, or in other words what (attribute) appears where (space) and when (time). It is obvious that this is impossible in practice, so the activity of capturing reality must involve simplification and generalization. Geographical data have been collected and are currently collected by a large number of organizations all over the world. Data models are the logical framework which we use to represent geographical variation.
Data models are formally-defined sets of entities and relationships used to make discrete the continuous complexity of geographic reality (Goodchild, 1992b). It is important not to confuse ‘data model’ with ‘data structure‘ and ‘database’. A data model is a conceptual description of geographical data. A data structure is the implementation of a data model in a form the computer can understand. A database contains geographical data stored in a predefined data structure.
For geographical data a variety of data models are used. Reviews of data models in use have been published by Peuquet (1984) and Goodchild (1992b). According to Goodchild (1992b), data models for geographical data can take two broad forms, depending on whether reality is perceived as an empty space populated by objects, or as a set of layers or fields, each defining the variation of one or more variables. Although both data models describe geographical reality in a discrete way, the conceptual view of the earth is different. The first approach describes discrete geographical objects (object) and in the second approach, continuous geographical phenomena (field). In general terms the object data models are more relevant for data in the areas of utility management, cadastral inventories and social sciences. The field data models are in general more relevant to environmental and physical sciences. In Table 1, the characteristics of the most commonly used data models in commercial GIS packages are presented. For more information see Goodchild (1992b) and Kemp (1993).
The first four field models (cell grids, polygons, TIN and contour) provide a complete coverage of the Earth’s surface. The last two field models (point grid and irregular points) only provide information at some points. In order to obtain a representation of a continuous surface, some form of interpolation needs to be applied, for which a large variety of methods are available (Burrough, 1986; Stein, 1991).
1
European Soil Bureau Research Report No. 4
Table 1: Characteristics of commonly used data models.
Model / Model / Dimensions / Measurement scalecategory / of attributes
object / point / x,y / qualitative, quantitative
object / line / x,y / qualitative, quantitative
object / area / x,y / qualitative, quantitative
object / volume / x,y,z / qualitative, quantitative
field / cell grids / x,y / qualitative, quantitative
field / polygon / x,y / qualitative, quantitative
field / TIN / x,y / quantitative
field / contour / x,y / quantitative
field / grid points / x,y / qualitative, quantitative
field / irregular points / x,y / qualitative, quantitative
TIN = Triangulated Irregular Network
1
European Soil Bureau Research Report No. 4
If we consider the data models in Table 1, we see that in most models in use the z and t dimension is missing. There are various reasons for this:
- the models were proposed before the introduction of geographical information systems. They originate from a period when the main use of the data model was to visualize geographical data on a map. As a map can display geographical data in two dimensions it is obvious why the two-dimensional models in space (x,y) dominate. When GIS was introduced, the main focus was on the management and presentation of geographical data using digital technology. Hardly any attention was paid in the beginning to the question of how this new technology could be used to improve the description of geographical reality. To quote Openshaw (1987): “Such systems are basically concerned with describing the Earth’s surface rather than analyzing it. Or if you prefer, traditional 19th century geography reinvented and clothed in 20th century digital technology.”
- collection of geographical data is generally an extremely time-consuming activity. Quite often only one realization in time is obtained, and data collection often takes place at the Earth’s surface. For these data the two dimensional, timeless data models fit quite well.
As shown by Kemp (1993), there is quite a complex relationship between data model and data structures. The two main data structures in use are raster and vector. The object data models point, line, area and volume can be implemented in both the vector and the raster data structure. The field models polygons, TINs and contours are implemented in the vector data structure, and for grid points and irregular points both the raster and vector data structure are used. Cell grids are implemented in the raster data structure.
Process models
A process model is a mathematical equation or a set of equations which represents the behaviour of a process in the real world. For example, process models exist for the flow of water, crop growth, soil erosion, soil acidification and climate change. As a model is always an approximation of reality, only those aspects of the process are described (modelled) which are relevant for our goal. A large variety of process models has been developed for basically the same real world processes. The reasons for this are:
- due to scientific research our knowledge about the actual processes increases, which results in a continuous improvement of our models;
- for the application of a model input data is required. The amount of input data available differs greatly depending on the scale (local, regional or global) and the area for which the model must be applied. This has resulted in diversity of models tailored towards the amount of input data available.
The stages of creating models for processes following an inductive approach are (Burrough, 1992):
1. observe a relationship between an output value of the process and the values of attributes that can be taken as inputs;
2. make an empirical description of that relationship;
3. test the generality of the empirical description;
4. unravel the physical processes underlying the relationship followed by a description of the process in terms of natural physical or stochastic laws.
There seems to be no general accepted classification of process models. Examples of model types found in literature (France and Thornley, 1984; Burrough, 1989) are:
- empirical and mechanistic models. An empirical model describes a process based on empiricism, whereas a mechanistic model attempts to give a description with understanding.
- static and dynamic models. A static model does not contain time as a variable. Any specifically time-dependent components of the behaviour of the system are ignored. Since all processes in the world involve change, a static model is always an approximation. It might be a good approximation perhaps because the phenomena are close to equilibrium. A dynamic model, on the other hand, contains the time variable in the equations.
- deterministic and stochastic models. A deterministic model is one that makes definite predictions for quantities (such as crop yield, rainfall), without any associated probability distribution. A stochastic model, on the other hand, contains some random elements or probability distributions. The model can not only predict the expected value of a quantity, but also the variance. The greater the uncertainty in the behaviour of the process, the more important it is to follow a stochastic line. Stochastic models tend to be technically difficult to handle and can quickly become very complex. Another approach in dealing with uncertainty is to use a combination of a deterministic model and Monte Carlo simulation to obtain probability estimates.
- spatial dimensions modelled. We can distinguish between one-dimensional (1D), two-dimensional (2D) and three-dimensional (3D) models.
- qualitative and quantitative models. A qualitative model makes predictions on a qualitative level, such as not suitable, suitable or highly suitable. The input for a qualitative model can be both qualitative and quantitative. A quantitative model, on the other hand, produces quantitative output.
Models constructed for real-world processes often contain combinations of the model types described above. For example, we may have a dynamic deterministic quantitative one-dimensional model for describing crop growth (Van Diepen et al. 1989) or a static empirical qualitative one-dimensional model for land evaluation (Van Lanen, 1991). For the integration with GIS the characteristics, ‘dimensions’ and ‘static-dynamic’ are of major importance, because they have direct consequences for the data model to be applied. In Table 2, the general data requirements of the different model types are given.
1
European Soil Bureau Research Report No. 4
Table 2: General data requirements of model types.
Model type / Data requirementsstatic 1D / x,y
static 2D / x,y,(z)
static 3D / x,y,z
dynamic 1D / x,y,t
dynamic 2D / x,y,(z),t
dynamic 3D / x,y,z,t
Figure 2: Integration of GIS and process models
1
European Soil Bureau Research Report No. 4
For the application of a model, the location or the area for which the obtained results apply is important, so even when a one-dimensional model is used, the minimum data requirement is the location of the input data. In the case of a two-dimensional model, the data requirements depend on the dimensions modelled. When we model space in the x and y dimensions, as, for instance, in erosion models, the data requirements are the location (x and y) of the data. When modelling also includes the z dimension then x,y and z are required.
Integration
An important step in answering users’ questions is analysis, and in a number of publications (Burrough, 1986; Goodchild, 1992b; Goodchild et al., 1992) this aspect is often stressed as the major benefit of GIS. At the same time, it is concluded that current analysis methods in GIS are limited. As stated before, a combination of GIS and process models opens the door to a large variety of new analyses.
Conceptual constraints
The aspects in which GIS can play a role are indicated. The data stored in the GIS and analytical capabilities of GIS (overlay, interpolation, transformation, etc.) can be used to prepare input data for the process model. The output can be visualized with the GIS.
The possibilities of the GIS/process model integration depend on the ability of GIS to support the data requirements of the model. From a conceptual point of view, the possibilities can be detected by combining Table 1 and Table 2. It can be seen that at this moment only static one- and two-dimensional models are supported by current commercial GIS.
Implementation alternatives
Until now the GIS process model integration has been approached from a conceptual point of view. It is also worthwhile looking at the integration from an implementation point of view. Bulens et al. (1991) make a distinction between ad hoc, partial and full integration when integrating GIS and process models. Stuart and Stocks (1993) distinguish between a loose-coupled and a tight-coupled approach. As the difference between the ad hoc and the partial integration of Bulens et al. (1990) represents more a difference in the advanced nature of the interface than a conceptual difference in integration, the classification of Stuart and Stocks (1993) is used here. In the loose-coupled approach of Stuart and Stocks (1993), the process model and GIS are linked loosely through an interface. This interface may consist of simple manual transfer of ASCII data files. In general the data are selected from GIS and stored in ASCII files; the files are used as input for the model, sometimes together with other data (time series). The output from the model is imported into GIS for presentation. The loose coupling is flexible and a large number of models can be integrated.
On the other hand, this approach has some drawbacks. The integration involves a lot a interactive work and the fit between model and GIS is, at best, reasonable. In situations where many scenarios need to be executed, the manual procedures may become a large obstacle. A more sophisticated form of a loose-coupled approach involves the development of programs which ease the transfer of data. The process model might have been changed to read the data directly from the database, or a menu might have been developed which combines operations. Also the development of special tool boxes which support the integration fall in this category. Such a tool box contains special functions or procedures which support the integration. In other words, an application has been developed which controls the integration. Disregarding the additional programming required to ease the integration, the main characteristic of the loose-coupled approach is that GIS and process model remain separate items.