Archive | 2021

On the differences between BERT and MT encoder spaces and how to address them in translation tasks

 
 
 
 

Abstract


Various studies show that pretrained language models such as BERT cannot straightforwardly replace encoders in neural machine translation despite their enormous success in other tasks. This is even more astonishing considering the similarities between the architectures. This paper sheds some light on the embedding spaces they create, using average cosine similarity, contextuality metrics and measures for representational similarity for comparison, revealing that BERT and NMT encoder representations look significantly different from one another. In order to address this issue, we propose a supervised transformation from one into the other using explicit alignment and fine-tuning. Our results demonstrate the need for such a transformation to improve the applicability of BERT in MT.

Volume None
Pages 337-347
DOI 10.18653/v1/2021.acl-srw.35
Language English
Journal None

Full Text