The University of Sheffield
Natural Language Processing Group

Machine Translation and Text Adaptation

Translation has always been a task of fundamental importance, both for accessing information in a foreign language and for communication with speakers of other languages. The majority of the content on the world wide web is in languages other than English. Similarly, the majority of users are not English speakers. The sheer volume of this content and its dynamic nature makes it impossible for humans to manually translate it. The need automatic systems to deal with the massive demand for translation services becomes evident, either as a fully automated solution or as a way of supporting human translators.

Modern machine translation techniques use a data-driven approach, which learns how to translate from raw data in the form of parallel texts which are translations of one another in a pair of languages. These models are typically very basic in form and make very strong locality assumptions. This has the dual benefits of simplifying the learning process while also ensuring that the models can be easily ported to new language pairs. A pressing research question is how best to incorporate wider context, including multimodal information (such as images) into these models, which could address many of their systematic errors.

A very closely related area is that of "monolingual translation", where translation models are built to adapt text in its original form into a different version in the same language. One concrete example is that of Text Simplification. In this case, models are built to translate texts into simpler variants, for example by replacing rare words with their more well-known counterparts and by splitting long sentences with complex syntax with multiple, shorter sentences.

Our contribution   The NLP group has done pioneering work in translation from rule-based methods in the early beginnings of Artificial Intelligence to modern-day statistical and neural systems. The group currently works focuses on automatic induction of translation models from parallel corpora using additional context models from textual and non-textual sources, such as visual information. The group has recently also started work on Text Adaptation, including lexical and sentence simplification for English texts targeted at non-native speakers of English.


Rob Gaizauskas, Lucia Specia, Yorick Wilks


  • Barista: Non-Parametric Models of Phrase-based Machine Translation
  • Expert: EXPloiting Empirical appRoaches to Translation
  • QTLaunchpad: Preparation and Launch of a Large-Scale Action for Quality Translation Technology
  • SLaTr: A Joint Model of Spoken Language Translation
  • TaaS: Terminology as a Service
  • TrendMiner: Large-scale, Cross-lingual Trend Mining and Summarisation of Real-time Media Streams
  • ACCURAT: Analysis and Evaluation of Comparable Corpora for Under Resourced Areas of Machine Translation