adaptive IE tool

Fabio Ciravegna, Department of Computer Science, University of Sheffield



Amilcare’s default architecture includes the connection with Annie, Gate ’s shallow IE system which performs tokenization, part of speech tagging, gazetteer lookup and named entity recognition. Any other preprocessor can be connected via the API. The preprocessor is also the only language-dependent module, the rest of the system being language independent (experiments where performed in English and Italian).

It is possible to influence the preprocessing phase performed by Gate, for example excluding it or customizing the type of analysis done. To some extent it is also possible to modify the resources used by Gate’s modules. Typically, disabling the whole pre-processor in a block is very useful when it is necessary to run Amilcare many times on the same corpus. Considering that the pre-processor is quite slow in the current implementation, especially if run on a number of texts, it is possible to run just once the pre-processor on a specific corpus and then it can be freely disabled.

Last updated: November 24, 2002