Review of the 4th Annual CLUK Colloquium

held on 10th-11th January, 2001, University of Sheffield

by Paul Johnson, University of Sunderland

This was the fourth CLUK colloquium: a forum for research students involved in the fields of computational linguistics and natural language processing within the UK. In essence, it is intended as a place for these students to meet, discuss and explore ideas, and perhaps even promote collaborative projects. Moreover, it allows these students, still early in their academic careers, to present their findings in a friendly and supportive environment, and gives them a platform from which to gain the attention of the wider research community.

The colloquium consisted of 13 presentations, of which 3 were by guest speakers, and a single poster session of 7 posters. These took place over a period of 2 days, with each of the research students' presentations lasting half an hour (including questions), the invited speakers an hour, and a further hour allocated to the poster session. The timing of these were mostly kept to schedule, only one running noticeably overtime at the very end of the colloquium. The poster session also overran by of an hour, but since this was because of its popularity, and it was rounding off the end of the first day, it didn't cause any undue or unwelcome disruption. Otherwise, the running of the colloquium went very smoothly, and an informal and pleasant atmosphere was created.

Although detailed instructions for arriving were provided, the accompanying maps were unfortunately vague and incomplete. All other aspects of the event's organisation were very good though.

Around 30 people were in attendance, consisting, on the whole, of those with work to present. Enthusiasm was high throughout, and nearly all attending the conference attended every presentation.

Having missed the first session of the day, I am unable to comment on the first guest speaker's talk (Dr Massimo Poesio of Edinburgh University on Centering Theory). However, the other two were highly polished and informative.

Prof. Donia Scott of Brighton University talked about reference architectures for natural language generation systems, why one was needed, how it would assist future work to use existing and current research more easily, and allow for easy comparisons between different systems. She went on to describe the collaborative RAGS (Reference Architecture for Generation Systems) project which she has been involved in; how they attempted to define a reference architecture, the difficulties they encountered, and the refocusing from a rigid architecture of modules to the more flexible one of tasks.

Keith Preston from British Telecom talked on the commercial application of language technology, touching on a similar theme to Prof. Scott in his suggestion that in the future, research should attempt to make greater use and reuse of existing knowledge, as opposed to frequently "reinventing the wheel". He suggests that the field of natural language processing should attempt to match the success of the computer hardware industry, where computational processing power has been doubled approximately every 18 months since the sixties. This, he stressed, was not due to a single technology constantly being improved upon, but rather a parallel process by which a new technology (building upon past research) has tended to emerge just as past technology reaches the limits of its capabilities.

As to the remaining presentations by the research students, quality varied, though the majority tended towards the more professional. Among them were: Jon Herring (Brighton) discussing a project that attempts to show how historical information about a language could theoretically be used to help in the mapping of its various phonemes into graphemes. Collocation extraction from the WordNet corpus by Darren Pearce (Sussex), in which restrictions are placed on the possible synonym substitutions that could usually be made. The measuring of text reuse (either word-for-word copying, or through the simple manipulation of a text's surface forms) by Paul Clough (Sheffield), specifically with respect to the domain of journalism (where text reuse is a frequent occurrence). David Roberts (Leeds) exploring the possibility of automating the process of indexing articles from journals (focusing on medical journals for the purpose of his research) using a controlled vocabulary and an appropriate base of indexing rules.

This was the first time a poster session was included in CLUK and general consensus appeared to herald it as a success, allowing for a number of works to be presented that otherwise would not have been on show. Among those presenting a poster were: Gabriela Cavaglia and Adam Kilgarriff (Brighton) investigating the possibility and potential of using the web as a huge corpus for a variety of language based studies. Jennifer Pedler (London) addressing the problem experienced by dyslexics in detecting spelling errors where the mistakes are also real words (as these aren't generally picked up by spell checkers) by constructing a corpus of these errors; it is proposed that this will form the basis of a specialised spell checker. The implementation of a common framework for both the generation and analysis of the semantics of discourse by Debora Field (UMIST), based on the concept that speech is a planned activity.

From my own personal point of view it was a useful and informative experience. Included as part of the poster session, it offered me the opportunity to present my research (the use of multimedia in second language learning) to those interested or with helpful advice to offer. Most significantly, it brought my attention to a moderately related project of which I was previously unaware, and provided some useful suggestions on my intended experimental methodology.

A future colloquium was discussed for 2002 (no venue set, though Leeds was put forward on a provisional basis), and suggestions made that additional half-day workshops should be introduced in the future between the annual CLUKs. It was generally agreed, though, that for future meetings a larger and wider audience would be beneficial. Attendance from more established researchers would provide more authoritative knowledge and advice for those still working towards their PhD theses, whilst more students would allow for a greater dissemination of ideas and knowledge.