Data Intensive Approach to
Semantics

a.Project Summary

-Realizing Semantic Technology,whose objective is enabling computer systems to provide novel intelligent services or enhancing existing services using the meaning of data, is concerned as an important research topic; however, we are still facing many obstacles to practically utilizing Semantic Technology in real world application. The goal of this proposal is to present a general and theoreticalmethodology for extracting implicit useful semantics from relational model and realizing semantically-enabled intelligent applications.

b.Background

-Semantics is about shared meanings, associations, and rules about the uses of things, and realizing utilization of semantics means making computer system able to work with the meaning of data. If computer systems are able to process the meaning of data, they can provide intelligent and enhanced services in more efficient and effective ways.

-So far, many semantic technology projects have focused on high-level reasoning, artificial intelligent, natural language processing, and mathematical propositions. These approaches do not have enough performance and scalability and are not sufficient to be usedforapplications in real world enterprise environment which requires massive and intensive data processing. On the other hand, there are some cases which successfully enable intelligent services by extracting hiding implicit semantics thorough data intensive analysis and empirical knowledge.

-The PageRank algorithm, which is one of the reasons of the big success of IT Company Google, calculates importance of web documents using the number of back-links. In other words, it extends the implicit meaning of links among web documents into importance of the documents. Collaborative Filtering which is used in Amazon.com can be another good successful case of using implicit semantics of data. It is a technology that is to predict a user’s preference based on the user’s and other similar user’s previous transaction log. In the case of Collaborative Filtering, the system does not have to ‘understand’ the user’s preference, but it still can perform a effective recommendation to users.

-Another good example is the case of Korean Government Public Procurement Ontology System.Previous System used in the past was typical product search system built on relational model which does not consider semantics of data. We have analyze the not only schema of data but also instance-level of data, and we have implemented practical ontology search system that operates on large size real world data, balancing power of inference, performance, and scalability.

-The successful cases explained above have common factors in that they analyze and utilize the extracted semantics based on data centric analysis. We need a generalized theoretical methodology for extracting, managing, querying implicit and explicit semantics embedded in relational model, and it will become a key to the practical semantically-enabled application in real world.

. Research Objectives

-Most enterprise data in real world are stored in the “tables” of relational model without considering the utilization of their semantics. Finding the unveiled semantics from these data and enriching them with the semantics can enable more intelligent and meaningful services. The goal of this proposal is to define a general methodology to extract, manage, and query the semantics within data in relational model for the practical utilization of the semantics. The top considerations for this research are as follow: (1) The analysis over data should focus on the correct data utilization based on the actual data analysis and refinement methods, truncating unusable elements such as complex data representation techniques. (2) Vocabulary used in system should be formal and controlled (3) The concept hierarchies of fundamental entity should not be too deep, and essential including concrete entity realization. (4) Complex rules should be disintegrated with more simple features with reliable methods such as normalization or statistical analysis. (5) It is important to guarantee performance and scalability by continuous database tuning, indexing, and simple inference procedure.

d. Expected Impact of Research

-It is expected that our proposed research has far-reaching implications. (1) A company who has a large-scale massive database in practical enterprise environment can enrich their database semantically. It leads them to have a variety of intelligent services with a flexible and fast way. (2) By defining extraction methodology of implicit semantic information in the area of traditional database design, this research will act as a bridge between the paradigm of traditional database design and new paradigm of ontology design such as RDF, OWL. In other words, it enables smooth transitions from traditional DB world to more intelligent semantic world.

1