IEEE Transactions on Multimedia | 2021

Semantic Example Guided Image-to-Image Translation

 
 
 

Abstract


Many image-to-image (I2I) translation problems are in nature of high diversity that a single input may have various counterparts. The multi-modal network that can build a many-to-many mapping between two visual domains has been proposed in prior works. However, most of them are guided by sampled noises. Some others encode the reference image into a latent vector, which would eliminate the semantic information of the reference image. In this work, we aim to provide a solution to control the output based on references semantically. Given a reference image and an input in another domain, we first perform semantic matching between the two visual content and generate an auxiliary image, which explicitly encourages the semantic characteristic to be preserved. A deep network then is used for I2I translation and the final outputs are expected to be semantically similar to both the input and the reference. However, few paired data can satisfy that dual-similarity in a supervised fashion, and so we build up a self-supervised framework in the training stage. We improve the quality and diversity of the outputs by employing non-local blocks and a multi-task architecture. We assess the proposed method through extensive qualitative and quantitative evaluations and also present comparisons with several state-of-the-art models.

Volume 23
Pages 1654-1665
DOI 10.1109/TMM.2020.3001536
Language English
Journal IEEE Transactions on Multimedia

Full Text