|
|
|
adaptive IE tool
|
|
|
|
|
Fabio Ciravegna, Department of Computer Science, University of Sheffield |
||
|
AMILCARE |
||||
|
An Adaptive Information Extraction Tool for the Semantic Web |
||||
|
Amilcare can work in three modes of operation - training mode, test mode and production mode. The training mode is used to induce rules, so to learn how to perform IE in a specific application scenario. Input in training mode is: (1) a scenario (e.g. an ontology in the SW); (2) a training corpus annotated with the information to be extracted. Output of the training phase is a set of rules able to reproduce annotation on texts of the same type. The testing mode is used to test the induced rules on an unseen tagged corpus, so to understand how well it performs for a specific application. When running in test mode Amilcare first of all removes all the annotations from the corpus, then re-annotates the corpus using the induced rules. Finally the results are automatically compared with the original annotations and the results are presented to the user. Output of the test phase is: (1) the corpus reannotated by the system; (2) a set of accuracy statistics on the test corpus: recall, precision and details on the mistakes the system does. During testing it is possible to decide to retrain the learner with different system parameters in order to tune its accuracy (e.g. to obtain more recall and/or more precision). Tuning takes a fraction of time with respect to training. The production mode is used when an application is released. Amilcare annotates the provided documents. If a user is available to revise its results, the learner uses the user corrections to retrain. The training/test/production modes can actually be interleaved so to produce an annotation based on active learning. In active learning user annotation and system annotation are interleaved in order to minimize the amount of user annotation. This greatly reduces the burden of document annotation. This adaptive methodology meets a number of the requirements imposed by SW usage scenarios:
|
||||
| ||||
|
Last updated: November 24, 2002 |
||||