Machine Translation

Translation has always been of fundamentally importance, both for accessing information in a foreign language and for communication with speakers of other languages. Recently the Internet has grown such that the majority of documents on the world wide web are in languages other than English. Similarly, the majority of users are not native English speakers. The need for translation is clearly growing more important, and moreover we need automatic systems to deal with the massive demand for translation services. Modern machine translation techniques use a data-driven approach, which learns how to translate from raw data in the form of parallel texts which are translations of one another in a pair of languages. These models are typically very basic in form, which has the dual benefits of simplifying the learning process while also ensuring that the models can be easily ported to new language pairs. A pressing research question is how best to incorporate deeper linguistic analysis into data-driven translation models, which could address many of their systematic errors.

Our contribution   The NLP group has done pioneering work in translation from rule-based methods in the early beginnings of Artificial Intelligence to modern-day statistical systems. The group currently works on automatic induction of translation models from parallel corpora for both phrase-based and syntax-based transducers, developing decoders for finding the best translation, and acquiring parallel translation corpora.

People

Trevor Cohn, Rob Gaizauskas, Lucia Specia, Yorick Wilks

Projects

Current and Recent
  • ACCURAT: Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation
  • TrendMiner: Large-scale, Cross-lingual Trend Mining and Summarisation of Real-time Media Streams
Past