Appl. Intell. | 2021

Adversarial training with Wasserstein distance for learning cross-lingual word embeddings

 
 
 
 

Abstract


Recent studies have managed to learn cross-lingual word embeddings in a completely unsupervised manner through generative adversarial networks (GANs). These GANs-based methods enable the alignment of two monolingual embedding spaces approximately, but the performance on the embeddings of low-frequency words (LFEs) is still unsatisfactory. The existing solution is to set up the low sampling rates for the embeddings of LFEs based on word-frequency information. However, such a solution has two shortcomings. First, this solution relies on the word-frequency information that is not always available in real scenarios. Second, the uneven sampling may cause the models to overlook the distribution information of LFEs, thereby negatively affecting their performance. In this study, we propose a novel unsupervised GANs-based method that effectively improves the quality of LFEs, circumventing the above two issues. Our method is based on the observation that LFEs tend to be densely clustered in the embedding space. In these dense embedding points, obtaining fine-grained alignment through adversarial training is difficult. We use this idea to introduce a noise function that can disperse the dense embedding points to a certain extent. In addition, we train a Wasserstein critic network to encourage the noise-adding embeddings and the original embeddings to have similar semantics. We test our approach on two common evaluation tasks, namely, bilingual lexicon induction and cross-lingual word similarity. Experimental results show that the proposed model has stronger or competitive performance compared with the supervised and unsupervised baselines.

Volume 51
Pages 7666-7678
DOI 10.1007/S10489-020-02136-X
Language English
Journal Appl. Intell.

Full Text