adaptive IE tool

Fabio Ciravegna, Department of Computer Science, University of Sheffield



Information Extraction, Knowledge Management and the Semantic Web

Knowledge management (KM) is the key source for competitive advantage. The success or failure of a company can depend on the ability to find the right information at the right time and to correctly integrate new information with existing structured knowledge, in order to facilitate communication and knowledge sharing and to support knowledge-based organisations. The vast majority of information is textual, therefore tools for structuring textual data from its content constitute one of the fundamental steps in successfully managing information.

Information Extraction from texts (IE) is an automatic method for locating important facts in electronic documents for successive use, e.g. for document annotation or for information storing (such as populating an ontology with instances). IE can provide support in document annotation either in an automatic way (unsupervised extraction of information) or semi-automatic way (e.g. as support for human annotators in locating relevant facts in documents, via information highlighting).

So far, limited application of IE to the Web has been attempted, especially for the Semantic Web (SW). Its use has been generally intended as a technology providing generic support to annotation. When using IE to support more specific tasks, experts are generally to be involved in the development cycle in order to develop grammars or knowledge bases. A crucial aspect of creating the Semantic Web, however, is to enable users who are not logic experts to create machine-readable Web content. On the other hand, much emphasis has been so far given to building tools for manual annotation of documents. The main problem with manual annotation is that it is a difficult, slow, time-consuming and tedious process that involves high costs and very often a large number of errors. Only recently the problem of producing automatic or semi-automatic methods for annotating documents has become a focus for the research community.

Manual annotation of document by naive Web users is quite unlikely to be correct or even performed at all. The semantics of ontologies can be opaque to a layman - selecting the correct concepts or even the correct ontology could be out of reach for such users. According to some researchers, the SW will be very dynamic and based on a great number of small ontological components. These components will be continuously extended, merged or created in a distributed manner. Therefore the annotation services associated to them will have to be constantly adjusted or revised according to these changes. This poses a number of obvious constraints and requirements on the technology to support annotation in terms of usability, portability and maintainability.

Thus, in order to use IE as support for annotation for the SW, we have to develop a new kind of technology where issues such as usability (e.g. by naive Web users), cost of new applications, portability and maintainability are main design issues. Application areas such as Knowledge Management are currently posing similar requirements to IE: interestingly these are areas in which the SW could play (or is already playing) a major role.

<< Back Next >>

Last updated: November 24, 2002