|
|
|
adaptive IE tool
|
 |
 |
 |
|










|
|
|
Fabio Ciravegna, Department of Computer Science, University of Sheffield
[F.Ciravegna@dcs.shef.ac.uk]
|
| |
|
|
DEVELOPMENT CYCLE WITH AMILCARE
|
|
Amilcare comprehensively supports the user in the whole application
development cycle, from design to delivery and even during post-marketing
assistance via its unique set of tools. Human computer interaction experts
and information extraction experts have worked together in the design of
tools for user support.
|
| |
|
The application development cycle is shown in the next figure.
|
| |
|
| |
|
Application development is divided in the following steps:
- Application design: the goal
of this step is to define a template, i.e., a kind of form the system
must fill with the extracted information. Amilcare provides a set of
tools for helping the user to identify the correct application settings:
it provides a graphical interface that allows information highlighting
in text examples, coupled with a set of methods for the semi-automatic
organization of information into templates and (in future releases)
unsupervised methods for helping identifying the information present in
the relevant documents. Considering that choosing a representative set
of texts may be difficult, a number of statistical tools are provided
for checking the representativeness of the corpus selected by the user,
so to avoid the (not infrequent) problems of wrong example selection.
- System training: in this phase
the system learns how to extract information for a particular
application by analysing a number of user-defined examples (i.e. a set
of documents with associated the information to be extracted). a simple
graphical interface is provided that allows information highlighting via
mouse. Considering that providing examples can be tedious, Amilcare
provides facilities for reducing the quantity of texts to be tagged via
active learning, a strategy that may reduce the need of training
examples up to 80%.
- Result validation: a
fundamental step in the application development is the tuning of results
according to the specific application needs: given that a 100% accurate
information extraction process is out of grasp of the current
technology, it is necessary to be able to balance the ability to find
information (recall) with the precision in information identification
sot to identify the correct mix of precision and recall. Amilcare
provides a set of tools for result monitoring, both from a qualitative
point of view (inspecting the system results on a set of test texts with
error highlighting) and statistical point of view (accuracy, precision,
recall). Amilcare’s tuning interface is designed to bridge the user’s
qualitative vision (“you are not capturing enough information”) with the
numerical concepts the system is able to manipulate (e.g. moving error
thresholds in order to obtain higher recall). CPU time needed for
retuning is 1/10 of the initial learning time.
- Application delivery: once the
system performance has been tuned to the application needs the
information extraction engine can be delivered as a black box module to
be integrated in the user environment. A powerful API allow text feeding and result extraction.
- Post-marketing monitoring
: Amilcare provides tools that are fundamental once
the application has been delivered to the final user. They allow to statistically
compare both the corpus received for analysis and the results obtained at
training/testing time with those on the corpus received. This is fundamental
because the kind of texts received can change in time (e.g. initially only
very short texts were received but then long texts start to appear) and the
user must be sure that such a change (that may not be noticed by the system
administrator) does not affect the system performances. Moreover Amilcare is
also able to statistically monitor its accuracy on new texts by measuring
the statistical distribution of identified information across texts and issue
worning in case such distribution radically differs from the one observed on
the training corpus.
|
| |
|
|
| |
|
Last updated: November 24, 2002
|
| |