Title
Knowledge Distillation for Quality Estimation
Date Issued
01 January 2021
Access level
metadata only access
Resource Type
conference paper
Author(s)
University of Sheffield
Publisher(s)
Association for Computational Linguistics (ACL)
Abstract
Quality Estimation (QE) is the task of automatically predicting Machine Translation quality in the absence of reference translations, making it applicable in real-time settings, such as translating online social media conversations. Recent success in QE stems from the use of multilingual pre-trained representations, where very large models lead to impressive results. However, the inference time, disk and memory requirements of such models do not allow for wide usage in the real world. Models trained on distilled pre-trained representations remain prohibitively large for many usage scenarios. We instead propose to directly transfer knowledge from a strong QE teacher model to a much smaller model with a different, shallower architecture. We show that this approach, in combination with data augmentation, leads to light-weight QE models that perform competitively with distilled pre-trained representations with 8x fewer parameters.
Start page
5091
End page
5099
Language
English
OCDE Knowledge area
Lingüística
Ciencias de la computación
Scopus EID
2-s2.0-85123937079
ISBN
9781954085541
Resource of which it is part
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
ISBN of the container
978-195408554-1
Conference
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
Sponsor(s)
This work was supported by funding from the Bergamot project (EU H2020 Grant No. 825303).
Sources of information:
Directorio de Producción Científica
Scopus