Monolingual editing

8/5/2023

To do that, rather than just providing the SRC data and the MT output, an APE system requires triplets of each segment, composed of: There is extensive research in testing new types of models and architectures in order to obtain better outputs, and MT systems may integrate optimisers and auto-correction components that fine-tune the results.ĪPE is focused on learning how to edit and improve the quality of the output of an MT system. From these models, an MT system generates and outputs a suggested TGT translation for each new SRC sentence. In neural machine translation (NMT), the translation and language models are usually trained jointly on the parallel data. In general, MT development requires parallel data Footnote 1 to learn translation models (used to translate phrases, words or sub-word units from the SRC into the TGT language) and monolingual data to learn a language model (used to adapt the translation according to the TGT language properties). To establish the difference between these three types of systems, adapted to three different tasks, we will not dig more into their similarities. The most important input for all these systems is parallel data, also called ‘gold standard reference’, which contains aligned segments or sentences in two languages: one being the human translation (or target, TGT) of the other (the source, SRC).

The development of systems for MT, APE, and MT Quality Estimation (QE) shares some similarities, since all of these projects involve the implementation of full MT systems, or of some of their components. However, the article extends its reach to cover uses of APE beyond these shared tasks, namely by focusing on methods of evaluating APE, the connections between APE and other applications of MT, commercial uses of APE, and the connection to human post-editing, besides analysing the challenges and role of APE in a global context. This review pays special attention to the APE shared tasks which have been held at WMT conferences since 2015, because these are privileged forums in which research teams test the most advanced technologies available each year, in comparable conditions. For research purposes, it is also important that publications look at the reasons why even minor improvements are achieved, so that different approaches can be used in new contexts and projects. These questions underline the research dimension of an APE project, since the capacity of a method to improve the quality of the MT output is only valid if it may generalise beyond its dataset. Which of the analysed APE methods has the highest potential? What is the relation between the original MT output quality and the APE results? ( 2015b) proposes that an APE project should also try to answer three questions:ĭoes APE yield consistent quality improvements across different language pairs? A comparative study of APE methods by Chatterjee et al. The capacity to present results that fulfil each of these purposes depends on the technological approach, which may be limited by available tools. It is important to note, at this stage, that APE can only be considered as a useful addition to an MT workflow if it exploits elements that have been omitted from the MT process, either by applying alternative technological approaches, or by incorporating new training data.Īn APE project may focus on specific types of errors, it may dedicate a part of the research to text analysis, it may focus on the editing process, or it may implement specific linguistic resources.

This and other questions will be discussed in the final sections of this article. Indeed, there are doubts as to whether it is a better strategy to implement an APE system or to fine-tune the MT system that produced the output with newly-available post-edited data. The usefulness of an APE system depends on how much it can add to a process which has already explored the best available methods and has produced the best possible output. To adapt the output of a general-purpose MT system to the lexicon/style requested in a specific application domain. To provide professional translators with improved MT output to reduce human post-editing effort To cope with systematic errors of an MT system whose decoding process is not accessible To improve MT output by exploiting information unavailable to the decoder, or by performing deeper text analysis that is too expensive at the decoding stage Automatic Post-editing (APE) is an area of research aiming at exploring methods for learning from human post-edited data and applying the results to produce better Machine Translation (MT) output.

0 Comments

Monolingual editing

Leave a Reply.

Author

Archives

Categories