AmilcareApi
Class API

java.lang.Object
  |
  +--AmilcareApi.API

public class API
extends java.lang.Object

Title: API for Amilcare Description: Adaptive system for information extraction from text Copyright: Copyright (c) 2000 Company: Dept. Computer Science Univ. of Sheffield


Field Summary
static AmilcareApi.IEResults[][] taggingResults
           
static AmilcareApi.IEResults[][] templateResults
           
 
Constructor Summary
API()
           
 
Method Summary
static void extractInformation()
          used to run Amilcare's rules on the current test corpus
static void extractInformation(boolean showAnimation)
           
static java.lang.String getAmilcarePath()
          it return Amilcare's operating directory
static int getAnnotationAccuracy(java.lang.String tagName)
          it returns the F-measure (a reasoned average of precision and recall) obtained for the annotation tagName (F is a number 0
static int getAnnotationActualMatches(java.lang.String tagName)
          it returns the number of matches (either correct, missing or partial) for the annotation tagName
static int getAnnotationCorrectMatches(java.lang.String tagName)
          it returns the number of correct matches for the annotation tagName
static java.lang.String[] getAnnotationList()
          it returns an array of Strings that are the tags used by Amilcare for the current scenario
static int getAnnotationMissingMatches(java.lang.String tagName)
          it returns the number of missing matches for the annotation tagName
static int getAnnotationPartialMatches(java.lang.String tagName)
          it returns the number of partial matches for the annotation tagName.
static int getAnnotationPossibleMatches(java.lang.String tagName)
          it returns the number of instances present in the corpus for the annotation tagName
static int getAnnotationPrecision(java.lang.String tagName)
          it returns the precision obtained for the annotation tagName (prec is a number 0
static int getAnnotationRecall(java.lang.String tagName)
          it returns the recall obtained for the annotation tagName (rec is a number 0
static int getAnnotationWrongMatches(java.lang.String tagName)
          it returns the number of wrong matches for the annotation tagName
static boolean getGazEnabled()
          checks if the gazetteer is currently enabled
static boolean getNercEnabled()
          checks if the namedEntityRecognizer is currently enabled
static boolean getPosEnabled()
          checks if the pos tagger is currently enabled
static boolean getPreEnabled()
          checks if the preprocessor is currently enabled
static java.lang.String getPreprocessedCorpusName()
          The connection between Gate (preprocessing) and Amilcare is done through a file.
static java.lang.String getRuleFileDir()
          it returns the directory in which the rule file are stored by Amilcare
static java.lang.String getScenarioFile()
          returns the scenario file currently loaded
static java.lang.String getScenarioName()
          it returns the current scenartio mnemonic name
static boolean getSplitCorpusInTwo()
           
static int getStatus()
          it returns the status Amilcare is in.
static AmilcareApi.IEResults[][] getTaggingResults()
          It returns the tagging results as generated by Amilcare for external use.
static AmilcareApi.IEResults[][] getTemplateResults()
          It returns the Template results as generated by Amilcare for external use.
static java.lang.String getTestCorpusName()
          gets the current Test corpus name
static java.lang.String getTrainingCorpusName()
          gets the current training corpus name
static void init()
           
static void learnRules()
          used to learn rules from the current training corpus given the current scenario
static void learnRules(boolean showAnimation)
           
static void loadScenario(java.lang.String fileName)
          it reads a scenario file and sets Amilcare to (learn how to) extract information for that scenario A scenario name is composed of a set of lines whose order is important: 1.
static void main(java.lang.String[] args)
          it tests the whole API
static void preprocessCorpus()
          it preprocesses the file in test or training coprus as defined by the scenario
static void printTaggingResults()
          it prints the tagging results in a formatted way
static void printTemplateResults()
          it prints the template results in a formatted way
static void resetResults()
          resets the system results so that in case of error the old results are not returned as new.
static void setAmilcarePath(java.lang.String value)
          sets Amilcare's working directory.
static void setAnnotationList(java.util.Collection annotList)
          it sets the annotation list to be used by Amilcare.
static void setDialogBoxesActive(boolean bool)
          setDialogBoxesActive: tells Amilcare if the dialog boxes for errors are welcome (bool=true) or unwelcome (i.e.
static void setGazEnabled(boolean enable)
          it allows to enable or disable the gazetteer in the preprocessing stage
static void setNercEnabled(boolean enable)
          it allows to enable or disable the NamedEntityRecognizer in Annie in the preprocessing stage
static void setPosEnabled(boolean enable)
          it allows to enable or disable the sentence splitter in the preprocessing stage
static void setPreEnabled(boolean enable)
          it allows to enable or disable the Gate-based preprocessing stage
static void setPreprocessedCorpusName(java.lang.String fileName)
          The connection between Gate (preprocessing) and Amilcare is done through a file.
static void setRuleFileDir(java.lang.String dirName)
          it sets the directory in which the rule file are stored by Amilcare
static void setScenarioFile(java.lang.String filename)
          sets the scenario file without loading it
static void setScenarioName(java.lang.String name)
          it sets the scenartio mnemonic name.
static void setSplitCorpusInTwo(boolean value)
          it asks amilcare to select half the corpus for learning and to use the other half for testing.
static void setSplitterEnabled(boolean enable)
          it allows to enable or disable the sentence splitter in the preprocessing stage
static void setTestCorpusName(java.lang.String fileName)
          sets the current Test corpus file to the parameter
static void setTrainingCorpusName(java.lang.String fileName)
          sets the current training corpus file to the parameter
static void stop()
          used to stop Amilcare when running.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

taggingResults

public static AmilcareApi.IEResults[][] taggingResults

templateResults

public static AmilcareApi.IEResults[][] templateResults
Constructor Detail

API

public API()
Method Detail

resetResults

public static void resetResults()
resets the system results so that in case of error the old results are not returned as new.


init

public static void init()

getTaggingResults

public static AmilcareApi.IEResults[][] getTaggingResults()
It returns the tagging results as generated by Amilcare for external use.

Returns:
an array whose elements are texts in the corpus (if only one text was provided it returs a one-el array). Each text is represented by an array of IEResults structures Caveat! some of the elements can be null (both at text level and at IEResults structures)

printTaggingResults

public static void printTaggingResults()
it prints the tagging results in a formatted way


getTemplateResults

public static AmilcareApi.IEResults[][] getTemplateResults()
It returns the Template results as generated by Amilcare for external use.

Returns:
an array whose elements are texts in the corpus (if only one text was provided it returs a one-el array). Each text is represented by an array of IEResults structures Caveat! some of the elements can be null (both at text level and at IEResults structures)

printTemplateResults

public static void printTemplateResults()
it prints the template results in a formatted way


setSplitterEnabled

public static void setSplitterEnabled(boolean enable)
                               throws java.lang.Exception
it allows to enable or disable the sentence splitter in the preprocessing stage

Throws:
java.lang.Exception - if the module is not available

setPosEnabled

public static void setPosEnabled(boolean enable)
                          throws java.lang.Exception
it allows to enable or disable the sentence splitter in the preprocessing stage

Throws:
java.lang.Exception - if the module was not available

getPosEnabled

public static boolean getPosEnabled()
checks if the pos tagger is currently enabled


setGazEnabled

public static void setGazEnabled(boolean enable)
                          throws java.lang.Exception
it allows to enable or disable the gazetteer in the preprocessing stage

Throws:
java.lang.Exception - if the module is not available

getGazEnabled

public static boolean getGazEnabled()
checks if the gazetteer is currently enabled


setNercEnabled

public static void setNercEnabled(boolean enable)
                           throws java.lang.Exception
it allows to enable or disable the NamedEntityRecognizer in Annie in the preprocessing stage

Throws:
java.lang.Exception - if the module is not available

getNercEnabled

public static boolean getNercEnabled()
checks if the namedEntityRecognizer is currently enabled


setDialogBoxesActive

public static void setDialogBoxesActive(boolean bool)
setDialogBoxesActive: tells Amilcare if the dialog boxes for errors are welcome (bool=true) or unwelcome (i.e. they are disruptive wrt the external application strategy: bool=false). If unwelcome the errors and all the messages are written on the std output. It is duty of the calling system to tell the user about the problem


setPreEnabled

public static void setPreEnabled(boolean enable)
                          throws java.lang.Exception
it allows to enable or disable the Gate-based preprocessing stage

Throws:
java.lang.Exception - if the module is not available

getPreEnabled

public static boolean getPreEnabled()
checks if the preprocessor is currently enabled


getRuleFileDir

public static java.lang.String getRuleFileDir()
it returns the directory in which the rule file are stored by Amilcare


setRuleFileDir

public static void setRuleFileDir(java.lang.String dirName)
it sets the directory in which the rule file are stored by Amilcare


loadScenario

public static void loadScenario(java.lang.String fileName)
it reads a scenario file and sets Amilcare to (learn how to) extract information for that scenario A scenario name is composed of a set of lines whose order is important: 1. name of the scenario (e.g. seminarAnnouncement)
2. a list of tags (with angular brackets) (one for each line) introduced by the tag <__++SLOTS> and ended by </__++SLOTS>. For example:
<__++SLOTS>
<speaker>
<stime>
<etime>
<location>
</__++SLOTS>
3. the type of task to be performed (e.g. Information Tagging and Template FIlling)
4. reserved for future use
5. the name of the training corpus (e.g. D:\Fabio\Java\Tagging\Interface\Data\Corpus\1seminar.xml)
6. reserved for future use
7. the name of the test corpus (e.g. D:\Fabio\Java\Tagging\Interface\Data\Corpus\1seminar.xml)
8. the number of texts to be analysed from the corpus (e.g.: 10)
9. the random seed to be used in ncross folder experiments (e.g. 0) (advanced use only -- put 0 if unsure)
10. true or false, according if the end of line is an interesting features to be used by Amilcare (advanced users only -- leave false if unsure)
11. the directory where Amilcare will put the induced rules (e.g. D:\Fabio\Java\Tagging\Interface\Data\Rules\)
12. reserved for future use
13. reserved for future use
14. reserved for future use
15. an int: the lenght of the window for rule pattern (e.g. 4). Please note that this is the window to the either left or right of the tag
if 4 is specified the rule will put 4 words to the left and four to the right (i.e. 8 words)!!
16. a double indicating the maximum error threshold for rules (e.g. 0.3333333333333333) (advance use only -- set 0.33333 if unsure)
17. a double for the minimum error threshold for rules (e.g. 0.03333333333333333)(advance use only -- set 0.033333 if unsure)
18. an int: the minimum number of matches for a tagging rule to be accepted (e.g. 1) (advance use: put 1 if unsure)
19. a double: the balance between precision and recall for tagging rules (advance use: put 1.0 if unsure)
20. an int: the minimum number of matches for a correction rule to be accepted (e.g. 1) (advance use: put 1 if unsure)
21. a double: the balance between precision and recall for correction rules (advance use: put 1.0 if unsure)
22. the name of the file where the preprocessed corpus is stored (as returned by getPreprocessedCorpus)

Returns:
nothing, as side effect it calls setScenarioFile

getScenarioFile

public static java.lang.String getScenarioFile()
returns the scenario file currently loaded

Returns:
a complete file path

setScenarioFile

public static void setScenarioFile(java.lang.String filename)
sets the scenario file without loading it


getTrainingCorpusName

public static java.lang.String getTrainingCorpusName()
gets the current training corpus name


setTrainingCorpusName

public static void setTrainingCorpusName(java.lang.String fileName)
sets the current training corpus file to the parameter


getTestCorpusName

public static java.lang.String getTestCorpusName()
gets the current Test corpus name


setTestCorpusName

public static void setTestCorpusName(java.lang.String fileName)
sets the current Test corpus file to the parameter


getPreprocessedCorpusName

public static java.lang.String getPreprocessedCorpusName()
The connection between Gate (preprocessing) and Amilcare is done through a file. This file is rewritten automatically each time amilcare is run either for training or for IE. If the file is given each time a different name, it can be reused in order to save the GAte's processing time. This is particularly useful when the same document can be processed more than once. This method allows to retrieve the filename where Amilcare is writing the intermediate results


setPreprocessedCorpusName

public static void setPreprocessedCorpusName(java.lang.String fileName)
The connection between Gate (preprocessing) and Amilcare is done through a file. This file is rewritten automatically each time amilcare is run either for training or for IE. If the file is given each time a different name, it can be reused in order to save the GAte's processing time. This is particularly useful when the same document can be processed more than once. This method allows to set the filename where Amilcare is writing the intermediate results the calling program is expected to check for filename validity!


getAnnotationList

public static java.lang.String[] getAnnotationList()
it returns an array of Strings that are the tags used by Amilcare for the current scenario


setAnnotationList

public static void setAnnotationList(java.util.Collection annotList)
it sets the annotation list to be used by Amilcare.


extractInformation

public static void extractInformation()
used to run Amilcare's rules on the current test corpus


extractInformation

public static void extractInformation(boolean showAnimation)

preprocessCorpus

public static void preprocessCorpus()
it preprocesses the file in test or training coprus as defined by the scenario


learnRules

public static void learnRules()
used to learn rules from the current training corpus given the current scenario


learnRules

public static void learnRules(boolean showAnimation)

stopAmilcare

public static void stopAmilcare()
used to stop Amilcare when running. Please take into account that stopping can take a while


getScenarioName

public static java.lang.String getScenarioName()
it returns the current scenartio mnemonic name


setScenarioName

public static void setScenarioName(java.lang.String name)
it sets the scenartio mnemonic name.


setSplitCorpusInTwo

public static void setSplitCorpusInTwo(boolean value)
it asks amilcare to select half the corpus for learning and to use the other half for testing. The selection is pseudo-random (currently Amilcare selects every other text for learning and all the others for training--- this strategy is not to be taken for granted!!!) obviously Amilcare still uses the maximum number of texts to be considered (i.e. for learning and testing it considers texts from half the corpus up to the maximum number of texts)


getSplitCorpusInTwo

public static boolean getSplitCorpusInTwo()

getAnnotationCorrectMatches

public static int getAnnotationCorrectMatches(java.lang.String tagName)
it returns the number of correct matches for the annotation tagName


getAnnotationMissingMatches

public static int getAnnotationMissingMatches(java.lang.String tagName)
it returns the number of missing matches for the annotation tagName


getAnnotationPartialMatches

public static int getAnnotationPartialMatches(java.lang.String tagName)
it returns the number of partial matches for the annotation tagName. A partial match is defined here as a match for which either the start is correct or the end (we do not consider partial something that overlaps in any other way)


getAnnotationWrongMatches

public static int getAnnotationWrongMatches(java.lang.String tagName)
it returns the number of wrong matches for the annotation tagName


getAnnotationPossibleMatches

public static int getAnnotationPossibleMatches(java.lang.String tagName)
it returns the number of instances present in the corpus for the annotation tagName


getAnnotationActualMatches

public static int getAnnotationActualMatches(java.lang.String tagName)
it returns the number of matches (either correct, missing or partial) for the annotation tagName


getAnnotationPrecision

public static int getAnnotationPrecision(java.lang.String tagName)
it returns the precision obtained for the annotation tagName (prec is a number 0

getAnnotationRecall

public static int getAnnotationRecall(java.lang.String tagName)
it returns the recall obtained for the annotation tagName (rec is a number 0

getAnnotationAccuracy

public static int getAnnotationAccuracy(java.lang.String tagName)
it returns the F-measure (a reasoned average of precision and recall) obtained for the annotation tagName (F is a number 0

setAmilcarePath

public static void setAmilcarePath(java.lang.String value)
sets Amilcare's working directory. it is equivalent to setting the AmilcarePath in the java command (java -d amilcarepath=value) TO be effective it must be set AFTER Amilcare's init!


getAmilcarePath

public static java.lang.String getAmilcarePath()
it return Amilcare's operating directory


getStatus

public static int getStatus()
it returns the status Amilcare is in. It is used to produce a progress bar in the interface


main

public static void main(java.lang.String[] args)
it tests the whole API