semantic web adapter

for

geo spatial data

Shantnu Jain

December 4, 2006

Department of Geographic Information Sciences

The University of Texas at Dallas

Richardson, TX75083, U.S.A.

Under the Guidance of

Dr. Latifur Khan

Asst. Professor

Department of Computer Science

The University of Texas at Dallas

Richardson, TX75083, U.S.A.

Acknowledgements

I am highly grateful to Dr. Latifur Khan, Asst. Professor at the Department of Computer Science, University of Texas at Dallas for providing this challenging opportunity work in the area of Geospatial Semantic Web Research and for his continuous guidance and support during my project. I would also like to express my gratitude towards Dr. Bhavani Thuraisingham, Professor at the Department of Computer Science, UT Dallas for her support during my work in Data and Applications Security Laboratory, UT Dallas.

I am also thankful to Mr. Alam Ashraful, Doctoral student and Mr. Ganesh Subbaiah, Masters student at the Department of Computer Science, UT Dallas. Last but not the least, I express my special thanks to Dr. Ronald Briggs, Professor of Geography and Political Economy and Director of Graduate Program in Geographic Information Sciences, Dr. Fang Qiu, Assoc. Professor and Dr. Kevin Curtin, Asst. Professor at Department of Geographical Information Systems, UT Dallas for all their help and support during my graduate coursework.

Shantnu Jain

Candidate for M.S. in Geographic Information Systems

UT Dallas

December 2006

TABLE OF CONTENTS

Abstract

1IntroductioN

1.1Semantic Web

1.2Components of Semantic Web

1.3Research Question

2literature review and overview of Related Work

2.1Literature Review

2.2Related Work

3System Overview

4Project Overview

4.1Semantic Web Adapter: Converting Shape File to RDF

4.1.1What is Shapefile?

4.1.2What is Resource Description Framework (RDF)?

4.2Architecture and Working of Semantic Web Adapter

5Results

6Potential Applications

7Conclusion

8directions for Future Research

9References

10Appendix

Abstract

With the growth of the World Wide Web has come the insight that currently available methods for finding and using information on the web are often insufficient. In order to move the Web from a data repository to an information resource, a totally new way of organizing information is needed. The advent of the Semantic Web promises better retrieval methods by incorporating the data’s semantics and exploiting the semantics during the search process. Such a development needs special attention from the geospatial perspective so that the particularities of geospatial meaning are capturedappropriately. The creation of the Geospatial Semantic Web needs the development of multiple spatial and terminological ontologies, each with a formal semantics; the representation of those semantics such that they are available both to machines for processing and to people for understanding. The processing of geospatial queries against these ontologies and the evaluation of the retrieval results based on the match between the semantics of the expressed information need. This will lead to a new framework for geospatial information retrieval based on the semantics of spatial and terminological ontologies. By explicitly representing the role of semantics in different components of the information retrieval process (people, interfaces, search systems, and information resources), the Semantic Geospatial Web will enable users to retrieve more precisely the data they need, based on the semantics associated with these data. This project presents a novel way of achieving semantic interoperability by convert shape file Geospatial data format into RDF document called the Geo Semantic Mediator, which is the backbone of semantic web.

1IntroductioN

The growth of the World Wide Web has resulted in multitude of information which requires special techniques for integration to provide more intelligent data services for the user. Today’s information retrieval methods are typically limited to keyword searches offering no support for any deeper structures that might lie hidden in the data or that people typically use to reason; therefore, users may often miss critical information when searching the Web. At the same time, the structure of the posted data is flat, which increases the difficulty of interpreting the data consistently. Higher-level computational operations that need to compare, query, analyze, combine, or integrate data cannot be carried out due to the lack of methods that make compatible information available.There would exist a much higher potential for exploiting the Web if tools were available that better match human reasoning and could integrate different formats.In this vein, the research community has begun an effort to investigate foundations for the next stage of the Web, called the Semantic Web [1, 2].RDF is the w3c standard for encoding metadata for the semantic webIn this project wepropose a conversion mediator that converts shape file to RDF document for achieving data level semantics when the underlying data model necessitates the need for suitable annotation for achieving data semantics, This Semantic Mediator is the first step towards this challenge.

1.1Semantic Web

The Semantic Web is a project to create a universal medium for information exchange by putting documents with computer process-able meaning (semantics) on the World Wide Web.The semantic web is a vision of web pages that are understandable by computers, so that they can search websites and perform actions in a standardized way [5].

Currently, the web pages are based primarily on documents written in Hypertext Markup Language (HTML), a markup convention that is used for coding a body of text interspersed with multimedia objects such as images and interactive forms.

The HTML of a web page can make simple, document-level assertions such as the title of the document. But there is no capability within the HTML itself to unambiguously assert that, say, item number X586172 is a Gizmo with a retail price of €199, or that it is a consumer product. The Semantic Web addresses this shortcoming, using the descriptive technologies Resource Description Framework (RDF) and Web Ontology Language (OWL), and the data-centric, customizable Extensible Markup Language (XML). These technologies are combined in order to provide descriptions that supplement or replace the content of Web documents. Thus, content may manifest as descriptive data stored in Web-accessible databases, or as markup within documents (particularly, in Extensible HTML (XHTML) interspersed with XML, or, more often, purely in XML, with layout/rendering cues stored separately). The machine-readable descriptions enable content managers to add meaning to the content, thereby facilitating automated information gathering and research by computers.

1.2Components of Semantic Web

XML, XML Schema, RDF, OWL, SWRL

The Semantic Web comprises the standards and tools of XML, XML Schema, RDF, RDF Schema and OWL. The OWL Web Ontology Language Overview describes the function and relationship of each of these components of the Semantic Web:

Figure 1: Semantic Web Layers

W3C Semantic Web Layer’s

  • XML provides a surface syntax for structured documents, but imposes no semantic constraints on the meaning of these documents.
  • XML Schema is a language for restricting the structure of XML documents.
  • RDF is a simple data model for referring to objects ("resources") and how they are related. An RDF-based model can be represented in XML syntax.
  • RDF Schema is a vocabulary for describing properties and classes of RDF resources, with a semantics for generalization-hierarchies of such properties and classes.
  • OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes.
  • SWRL is a proposal for a Semantic Web rules-language, combining sublanguages of the OWL (OWL DL and Lite) with those of the Rule Markup Language.

The intent is to enhance the usability and usefulness of the Web and its interconnected resources through:

  • Documents "marked up" with semantic information (an extension of the HTML <meta> tags used in today's Web pages to supply information for Web search engines using web crawlers). This could be machine-understandable information about the human-understandable content of the document (such as the creator, title, description, etc., of the document) or it could be purely metadata representing a set of facts (such as resources and services elsewhere in the site). (Note that anything that can be identified with a Uniform Resource Identifier (URI) can be described, so the semantic web can reason about animals, people, places, ideas, etc.)
  • common metadata vocabularies (ontologies) and maps between vocabularies that allow document creators to know how to mark up their documents so that agents can use the information in the supplied metadata (so that Author in the sense of 'the Author of the page' won't be confused with Author in the sense of a book that is the subject of a book review).
  • automated agents to perform tasks for users of the Semantic Web using this metadata
  • Web-based services (often with agents of their own) to supply information specifically to agents (for example, a Trust service that an agent could ask if some online store has a history of poor service or spamming).

The primary facilitators of this technology are URIs (which identify resources) along with XML and namespaces. These, together with a bit of logic, form RDF, which can be used to say anything about anything. Like RDF, many other technologies such as Topic Maps and pre-web artificial intelligence technologies likely to contribute to the Semantic Web. Work on this project is mainly based are on RDF.

1.3Research Question

This project is an attempt to answer the following: Is it possible to convert industry standard Geo-Spatial format “Shapefile” into Semantic Web standard format “RDF”?

2literature review and overview of Related Work

2.1Literature Review

Geospatial Web Services need to adopt Semantic Web technologies in order to become more clearly and machine-readably understandable to the wider Web, as well as to make accessible the many large and diverse geo datasets coming on line at a frightening pace [7]. OWL-S and technologies for working with it show promise; however there is a need to enable both the processing of geospatial relationships and the description of tightly coupled service content, in order for Geospatial Semantic Web Services to become a reality [7].

The service oriented Architecture has provided a foundation for Geo Spatial Data Integration by enabling clients to connect to geospatial service providers. However, Security, data interoperability and absence of sophisticated semantic model are major drawbacks in the current approaches. A new approach is suggested to handle the existing problems by a unified data and service semantic model for geospatial domain [8].

The enormous variety of encodings of geospatial semantics makes it particularly challenging to process requests for geospatial information. Work in the area of GIS interoperability [9], [10] and the work led by the Open GIS consortium addressed some basic issues, primarily related to the geometry of geospatial features.

2.2Related Work

Based on the concept of Semantic Web, the classic example in Geo Spatial Domain is SPIRIT (Spatially-Aware Information Retrieval on the Internet) which has been engaged in the design and implementation of a spatially-aware search engine to find documents and datasets on the web relating to places or regions referred to in a query [6]. SPIRIT is a combined research project from different schools headed by School of Computer Science at CardiffUniversity.

The two critical issues addressed by SPIRIT are as follows:

  • Developing a spatially-aware search engine displaying knowledge of the structure and terminology of geographical space and application-specific concepts.
  • Automatic semantic annotation of Internet resources with metadata on the geographical context of resource content.

The project has created software tools and techniques that can be used to produce search engines and websites that display intelligence in the recognition of geographical terminology.

(SPIRIT is a European Commission funded research project dedicated to the development of a geographically-aware web search engine. The result is an experimental prototype that demonstrates novel geographical search engine functionality using a small sample of the web that dates back several years. The size of the SPIRIT web, with its 900,000 geo-tagged documents, is less than 1/8000th of the current web as accessed by Google. Consequently the prototype search engine will often fail to find web resources that might be relevant to a particular search. SPIRIT provides geographical coverage for a sample of web pages for the UK, France, Germany and Switzerland. SPIRIT can interpret spatial terminology such as near, north, outside and within a specified distance, and it will distinguish between different places with the same name. Furthermore it indexes all types of web pages that contain geographical references and is not confined to sites that refer to organizations listed in business directories or yellow pages. More information about the SPIRIT project is available on the SPIRIT website ( [11].

3System Overview

Figure 2: DAGIS* System Diagram [3]

*DAGIS is the geospatial integration research thesis project being carried by Ganesh Subbiah MS Computer Science Student at Data and Application Security Lab, at Department of Computer Science, UTD

DAGIS Components [3]

Terminologies shown in the diagram above are explained below:

Client: A client module is the agent program that places the service query to the servicerequestor module.

Service Requestor: The Service Requestor automatically discovers the relevant servicesfor the service query placed by the client. Automatic Discovery involves performingcapability search by the requestor on the Matchmaker for finding the appropriate webservice with the capabilities to provide relevant response to the user query. In theenactment phase, the service requestor does backward reasoning to find the appropriateinputs that are needed to be posed

Matchmaker:Matchmaker Module is the registry of services where the Service

Providers register or publish their service descriptions. In our design, we employ theOWL-S/UDDI Matchmaker architecture [11]. In this architecture the UDDI registry isembedded with OWL-S profile descriptions. The Service provider services register withthe Matchmaker by advertising through OWLS profile descriptions. Capability Search isemployed in this architecture where the search is done based on the Input, Output,Preconditions and Effects(IOPEs) of a service with the service requestor’s IOPEs. Theresult of the match depends on the degree of similarity in the concepts of thesecapabilities involved in the match. The degree of a match is described using the OWLrelations.

Reasoner/ Inference Engine: OWL-DL reasoners are required during matchmakingprocess to compute the level of match, for composition of matched services. In thisproject, Pellet Reasoner is used through the OWL-S API. Reasoner will also handlematching the security requirements and capabilities for the requested service and thematched services.

Data Flow Design

Figure 3: DAGIS Flow Diagram [3]

Based on DAGIS Framework, user poses a query. This query needs to be disambiguated to find all the geospatial and non-geospatial operations involved in the query. Then Service Requestor performs theAutomatic Service Discovery for relevant features using the UDDI/OWLS Matchmaker.During this Matchmaking process a set of web services are matched and ServiceComposition is done on the selected service, this composite service is then executed andthe result is provided to the user.

4Project Overview

The main motivation for this work was to provide relevant information to a query whichinvolves both geospatial and non geospatial information across different data sources, which are in different data formats, with minimal human intervention.This challenge of providing better query mechanismsthrough automated reasoning is difficult with the current GIS and Search Engines. TheOWL-S Specification Ontology was provided to the existing web services availableon the World Wide Web and OWL-S API[3, 4] is being used for executing these semanticweb services. This project is a part geospatial integration research project at Data and Application Security Lab, at Department of Computer Science, UTD, in which my work was confined to converting shape file to RDF through a semantic web adapter which is explained in next Section.

4.1Semantic Web Adapter: Converting Shape File to RDF

Semantic Web Adapter is a tool to convert GIS Industry standard format “Shapefile” to Semantic Web standard format “RDF”. But, before moving further lets take a look at ‘What is shape file’ and ‘What is RDF’?

4.1.1What is Shapefile?

The ESRI Shapefile is a popular geospatialvector data format for geographic information systems software. It is developed and regulated by ESRI as a (mostly) open specification for data interoperability among ESRI and other software products. A "shapefile" commonly refers to a collection of files with ".shp", ".shx", ".dbf", and other extensions on a common prefix name. The actual shapefile relates specifically to files with the ".shp" extension; however this file alone is incomplete for distribution, as it depends on the other supporting files.Shapefiles spatially describe points, polygons, polylines.

4.1.2What is Resource Description Framework (RDF)?

On the Semantic Web (SemWeb), computers do the browsing (and searching, and querying, and...) for us. The SemWeb enables computers to seek out knowledge distributed throughout the Web, mesh it, and then take action based on it. Take an analogy: the current web is a decentralized platform for distributed presentations, while the SemWeb is a decentralized platform for distributed knowledge. Resource Description Framework (RDF) is the W3C standard for encoding knowledge.

There, of course, is knowledge on the current web, but it's off limits to computers. Consider a Wikipedia page, which might convey a lot of information to the human reader, but to the computer displaying the page all it sees is presentation markup. To the extent that computers make sense of HTML, images, Flash, etc., it's almost always for the purpose of creating a presentation for the end user. The real content, the knowledge the files are conveying to the human, is opaque to the computer.