SPIN!, IST-99-10536, 15.06.199931
Part B
B1. Title. Spatial Mining for Data of Public Interest
SPIN!
Proposal No. IST-1999-10536
Proposal for:
IST programme, 1.1.2-5.1.4 Cross-Programme Action CPA4: New Indicators and statistical methods
B3. Objectives......
B4. Contribution to programme/key action objectives......
B5. Innovations......
State of the Art......
Technological & Scientific Advances......
Distribution of Workload on work packages......
Introduction to workpackages......
Risk management......
Pert diagram......
Work package description......
C2. Contents for part C......
C3. Community added value and contribution to EU policies......
C4. Contribution to Community social objectives......
C5. Project management......
C6. Description of the consortium......
C7. Description of the participants......
GMD - German National Research Center for Information Technology......
Department of Informatics of the University of Bari......
School of Geography at the University of Leeds......
The Institute for Information Transmission Problems, Russian Academy of Sciences (IITP RAS)......
Dialogis Software & Services GmbH, St. Augustin, Germany......
Professional GEO Systems B.V. (PGS), Amsterdam......
GeoForschungsZentrum, Potsdam, Germany Description of the partner......
Manchester Metropolitan University/MIMAS......
C8. Economic development and scientific and technological prospects......
Appendix – Publications of partners cited in part B......
References partner P1 – GMD......
References partner P2 - University of Bari......
References partner P3 – IITP, Russian Academy of Sciences......
References partner 4 – Leeds......
References partner P5 – Dialogis......
References partner P6 – PGS......
B3. Objectives
To develop an integrated interactive internet-enabled spatial data mining system. Data mining systems (DMS) and geographical information systems (GIS) are complementary tools for describing, transforming, analysing and modelling data about real world systems. Most contemporary GIS facilitate only very basic spatial analysis and data mining functionality and many are confined to simplistic analysis that involves comparing maps or descriptive statistical displays like histograms and pie charts. There is growing demand for integrated geographical or spatial data mining systems (SDMS) from public and private sector organisations who need both enhanced decision making capabilities and innovative solutions to a wide range of different problems. An integrated, user friendly SDMS operable over the internet offers exciting new possibilities for all manner of geographical research and spatial decision making. Thus the overall objective of SPIN! is to develop a state of the art, fully functional, truly integrated, internet-enabled, easily extendable and modifiable GIS-DMS platform, SPIN - a comprehensive and intuitive SDMS for data of public interest. In recent years, a number of project partners have developed the technological components and scientific tools that are needed to develop the kernel of this type of SDMS. During this project these individual efforts and the associated expertise and experience will be united in a joint European effort. SPIN! Consortium partners from statistical offices and seismic research centres will use the system in applied research and provide feedback to direct the development efforts. The applications of SPIN will clearly demonstrate the generic utility and additional benefits that this type of SDMS will have over existing technologies. Industrial partners will develop a business model for web-based information brokering with georeferenced statistical data, and estimate the likely economic impacts of the technology. The following scenarios describe some of the wide ranging potential benefits that statistical analysts, environmental decision makers, seismic data experts, biodiversity researchers and other public and private sector users can expect from such a system and introduce some of the main features that SPIN will include.
To improve knowledge discovery by providing an enhanced capability to visualise data mining results in spatial temporal and attribute dimensions. Imagine a statistical officer has to prepare a report describing unusual aspects of African demography inter-related with socio-economics and the physical environment. Suppose initially the officer applies a data mining technique to classify all countries based on death rate and life expectancy and one classified subgroup with unusually high death rate and low life expectancy includes 40 African countries and only 51 in all. Suppose the officer creates a statistical display of all the classified groups (Fig. 1) and then decides to map the geographical distribution of the unusual subgroup distinguishing between African countries and those elsewhere (Fig. 2). The geographical distribution of the subgroups shown by the map may initiate ideas for further analysis. For instance, the analyst may wish to select sets of countries from the map to take a closer look at their demography and other geographical variables that describe socio-economic and environmental conditions. In addition, the officer may wish to discover what demographic attributes best characterise each continent at different points in time and investigate which groups of demographic attributes have interesting spatio-temporal co-distributions and inter-relationships with other socio-economic and environmental variables. All the analysis, some of which is quite complex could clearly be performed quicker and easier if an integrated SDMS with a linked display component and reporting system were available for use. It would be a major benefit if the maps and other data displays were automatically generated by a knowledge base of statistical display and thematic data mapping and these were automatically linked so that information the officer is focussing on during the analysis is simultaneously highlighted in all the relevant displays. This type of linked GIS style display component will be developed as a fundamental part of the integrated visualisation component of SPIN, which would facilitate this kind of statistical analysis (see partner P1, publication 3).
Figure 1. Descriptions of interesting subgroups
Figure 2. Visualisation of the subgroup.
To develop new and integrated ways of revealing complex patterns in spatio-temporally referenced data that were previously undiscovered using existing methods. Suppose an environmental decision maker is asked to look for relations between lung cancer and environmental pollution. What may be desired initially is some kind of exploratory spatial data analysis (ESDA) technique that automatically detects unusual spatial clustering of lung cancer incidence in the entire data set and for specific time periods. Additional spatial and aspatial analysis methods might then used to try and explain any unusual spatial clustering patterns observed using a range of other spatio-temporal and aspatio-temporal variables. In SPIN, exploratory spatio-temporal pattern analysis techniques derived from existing ESDA tools will be integrated with a wide variety of temporal, spatial and aspatial analysis methods. Partner P4 has developed a suite of ESDA tools that detect unusual clusters of incidence and produce mapable output that reveals the clustering pattern. Temporal versions of these tools and outputs will be developed along with the mechanisms for exporting the results of the analysis into other temporal, spatial and aspatial data mining techniques. Having all the tools available in one integrated SDMS would allow the decision maker to perform an in-depth, spatio-temporal analysis quickly and thereby help develop understanding of the geographical processes and inter-relationships that may result in an increased risk of contracting lung cancer. The analytical speed up will allow the decision maker to generate and test more hypotheses regarding the observed spatial, temporal and spatio-temporal patterns and to investigate even more advanced hypotheses about causal relationships.
To enhance decision making capabilities by developing interactive GIS techniques, which provide an integrated exploratory and statistical basis for investigating spatial patterns. Seismic data experts regularly use GIS to help them spot geoenvironmental data patterns related to seismic activity. However, the complexity of geoenvironmental processes and noise in the spatial patterns of these variables makes it very difficult to objectively compare seismic maps with other geoenvironmental maps and identify interesting patterns and relationships. To help reduce the likelihood of becoming overly subjective, a seismologist may wish to initially classify and select groups of areas with similar geoenvironmental characteristics and then perform statistical tests to investigate general differences in localised distributions of selected areas belonging to the same geoenvironmental group in the classification. An interactive version of SPIN will clearly aid the seismologist in the process classifying and selecting these areas and in performing the statistical tests. By simplifying this analysis task, the user can focus on looking for interesting patterns and testing a great number of alternative hypotheses.
To deepen the understanding of spatio-temporal patterns by visual simulation. Imagine a biodiversity researcher wants to investigate the migratory flight route of a flock of storks travelling from Europe to Africa. Suppose the researcher uses a global positioning system (GPS) to track the progress of these birds and wishes to visually simulate the migration to provide an overview of the migratory route, the speed of different parts of the journey and identify areas where the storks rested along the way. SPIN will provide the capability to develop and play back this type of simulation over the internet. The same technique can be applied in many other areas, for example, logistics companies may want to use it to help keep track of orders and optimise transport routes or transport planners may desire it to aid the development of integrated transport networks.
To publish and disseminate geographical data mining services over the internet. Suppose the various analysts described above (i.e. the statistical officer, the environmental decision maker, the seismic data expert and the biodiversity researcher) want to distribute their results quickly and cost effectively to encourage similar applications and promote world-wide scientific exchange of their research. Furthermore, suppose they want to publish both the conclusions and the details of their entire geographical data mining investigation so that other similar research can extend, generalise and build on their analyses. Imagine also that these researchers want to enable others to access and use the same analysis tools that were available to them. To realise all of this, they would probably need a relatively automatic way to plug-in their specific application to a Java-based internet enabled SDMS. This would then enable anyone with a standard web-browser to replicate and perform similar analyses wherever and whenever desired (see partner P4, publications 2 and 9; partner P1, publication 1,2; partner P3, publication 1,2). The proposed SDMS, SPIN will provide this type of capability in an integrated organised fashion.
B4. Contribution to programme/key action objectives
The proposal contributes to the IST programme objective of building key, user-friendly applications that enable the potential of the information society in several ways:
- Merging data mining and GIS based technology offers exciting new possibilities for spatial data research that is applicable in a wide variety of problem domains. Much expert geographical analysis has been restricted by prescribing in advance and exclusively following either a statistical or a GIS based approach. When both approaches have been applied, error prone and cumbersome data transfer between different applications has been necessary, nonetheless, useful information has been extracted from georeferenced data much more effectively by employing both approaches simultaneously. Clearly an integrated SPIN will facilitate such analysis and help to develop understanding of a wide range of geographical processes faster enhancing research and decision making in diverse application areas.
- SPIN will provide a user friendly interface to advanced data mining functionality, GIS and exploratory spatial data analysis tools that can be accessed via the internet.
- The system will enable quick and cost effective dissemination of information via the internet and enhance web-based research capabilities.
The objective of nurturing emergent technologies is supported by the development of an innovative business model. A web-based brokering service is proposed that is designed to add value to the dissemination of data and information providing a key to the commercialisation of the software and the service it facilitates.
The proposal contributes to CPA4 (New indicators and statistical methods) by developing new tools for extracting information from data by adapting data mining functions specifically for spatial analysis. This includes adapting methods from Bayesian statistics, machine learning and other adaptive techniques so they can be launched from an integrated environment, which assists experimental comparison of their relative strengths and weaknesses.
A further contribution to CPA4 derives from developing technology for the user-friendly dissemination of statistical data. SPIN will enable the dissemination of interactive statistical maps and provide data mining services over the internet, where the users need nothing but a standard web-browser such as Netscape or Internet Explorer. Many of the problems relevant to this use of SPIN will be addressed in an application that aims to facilitate the analysis of census data over the internet. The proposed web-based brokering service aims to go even further by enhancing the user-friendly and cost-effective dissemination of data.
The proposed system will be generic and easily adaptable to diverse application areas and the research is specifically relevant to the following key actions of the cross-programmatic action (CPA) of the IST programme:
- Key Action I.4: Systems and services for citizen administration; systems enhancing the efficiency and user-friendliness of administrations. This is addressed in work package WP9 by the application to develop user friendly dissemination of statistical data.
- Key Action I.5: Intelligent environmental monitoring and management systems; environmental risk and emergency management systems (in conjunction with hazards and earth observation). These are addressed in work package WP8 by an application of the proposed system to the analysis of seismic and volcano data.
- Key Action II.3.2: New methods of work and electronic commerce. New market mediation systems, to develop innovative market place concepts and technologies. This will be addressed in the web-based brokering application in work package WP9.
- Key Action II.4.3: Digital object transfer. This will be addressed by a specific task within work package WP2 that aims to develop efficient and appropriate means of distributing data and maps over the internet.
- Key Action III.1: The future priority action line concerning geographic information is also clearly addressed.
B5. Innovations
State of the Art
Contemporary GIS are monolithic closed systems that can be difficult to use and are usually very expensive. In the last few years a new generation of GIS has been emerging that enable interactive, dynamic maps to be disseminated via the Internet (see partner P1, publication 1, 3; partner P4, publication 4; partner P3, publication 10, 11). So far, most of these systems are confined to projecting descriptive statistical displays, such as histograms or pie charts, onto geographical space (maps). As decision making and inference using these projected map displays is not always straight-forward, data mining offers great potential benefits. The range of application areas is huge and there are many different types of applications in statistical analysis, urban planning, environmental decision making, and geomarketing for example.
Largely unconnected to GIS research a wide range of analysis techniques now commonly referred to as data mining functions have been developed. These data mining functions are extensions of analytical techniques known for decades and have been packaged in various ways to form a large number of essentially very similar data mining systems (DMS). Some DMS provide user friendly interfaces and visual programming environments that the non-expert can use to help automate the search for hidden patterns in large databases. Interest in DMS has boomed in recent years partly as a result of the packaged nature of the technology and improving graphical user interfaces, but mainly because of the desperate need for commercial enterprises to make returns on often large investments in data warehouses. Since the GIS revolution in the early 1980s there has been an explosion of geographically referenced information forming a rapidly expanding geocyberspace (see partner P4, publication 1), wherein much of the data is also temporally referenced. Commercial enterprises and government organisations have been swamped by this data explosion with few tools to extract useful information that can be applied in decision making contexts to solve problems and improve their function. By combining the strengths of GIS and DMS the proposed SDMS, SPIN, will have even greater functionality and should be a huge help to decision makers and spatial analysts charged with the task of backing up their intuitive insights using real world data. Some of the integrated components not currently present in either GIS or DMS include exploratory spatial data analysis methods that search for geographical patterns and relationships in complex space-time-attribute domains.
Extending and integrating GIS and DMS to develop an internet enabled geographical data mining system is a logical progression for spatial data analysis technology. This development is poised to play a major role in the proposed terms of reference 1999-2003 of the Commission on Visualisation and Virtual Environments of the International Cartographic Association (MacEachren and Kraak 1999 [1]) and it can be expected that a great deal of research effort is needed to this effect in coming years. DMS and GIS are quite complex tools with wide ranging functionality and capabilities, so the SPIN! Consortium does not propose to start from scratch, but to build on existing tools. Many of these existing tools have been developed by various partners during 4th framework research, and many have passed the prototype stage and have well established user communities. One major advantage of the SPIN! Consortium is that the software developers will have access to the source code of all the various module components, which facilitates a seamless integration of all the technology in SPIN. (This would not be possible if the system were to be developed on top of third party proprietary products.) The system will be based on open standards such as Java and TCP/IP. The evolutionary prototype development approach proposed has many benefits. Users will be able to provide feedback on SPIN prototype requirements and performance throughout the project (starting from day one), and progressive prototype versions of the system will guide the development effort to fulfil user expectations by the end. The early development of prototypes is known to be one of the most effective counter-measures to limit the risks of such software development.