Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Zhe Gan is active.

Publication


Featured researches published by Zhe Gan.


computer vision and pattern recognition | 2017

Semantic Compositional Networks for Visual Captioning

Zhe Gan; Chuang Gan; Xiaodong He; Yunchen Pu; Kenneth Tran; Jianfeng Gao; Lawrence Carin; Li Deng

A Semantic Compositional Network (SCN) is developed for image captioning, in which semantic concepts (i.e., tags) are detected from the image, and the probability of each tag is used to compose the parameters in a long short-term memory (LSTM) network. The SCN extends each weight matrix of the LSTM to an ensemble of tag-dependent weight matrices. The degree to which each member of the ensemble is used to generate an image caption is tied to the image-dependent probability of the corresponding tag. In addition to captioning images, we also extend the SCN to generate captions for video clips. We qualitatively analyze semantic composition in SCNs, and quantitatively evaluate the algorithm on three benchmark datasets: COCO, Flickr30k, and Youtube2Text. Experimental results show that the proposed method significantly outperforms prior state-of-the-art approaches, across multiple evaluation metrics.


computer vision and pattern recognition | 2017

StyleNet: Generating Attractive Visual Captions with Styles

Chuang Gan; Zhe Gan; Xiaodong He; Jianfeng Gao; Li Deng

We propose a novel framework named StyleNet to address the task of generating attractive captions for images and videos with different styles. To this end, we devise a novel model component, named factored LSTM, which automatically distills the style factors in the monolingual text corpus. Then at runtime, we can explicitly control the style in the caption generation process so as to produce attractive visual captions with the desired style. Our approach achieves this goal by leveraging two sets of data: 1) factual image/video-caption paired data, and 2) stylized monolingual text data (e.g., romantic and humorous sentences). We show experimentally that StyleNet outperforms existing approaches for generating visual captions with different styles, measured in both automatic and human evaluation metrics on the newly collected FlickrStyle10K image caption dataset, which contains 10K Flickr images with corresponding humorous and romantic captions.


meeting of the association for computational linguistics | 2017

Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling

Zhe Gan; Chunyuan Li; Changyou Chen; Yunchen Pu; Qinliang Su; Lawrence Carin

Recurrent neural networks (RNNs) have shown promising performance for language modeling. However, traditional training of RNNs using back-propagation through time often suffers from overfitting. One reason for this is that stochastic optimization (used for large training sets) does not provide good estimates of model uncertainty. This paper leverages recent advances in stochastic gradient Markov Chain Monte Carlo (also appropriate for large training sets) to learn weight uncertainty in RNNs. It yields a principled Bayesian learning algorithm, adding gradient noise during training (enhancing exploration of the model-parameter space) and model averaging when testing. Extensive experiments on various RNN models and across a broad range of applications demonstrate the superiority of the proposed approach relative to stochastic optimization.


computer vision and pattern recognition | 2016

Learning Weight Uncertainty with Stochastic Gradient MCMC for Shape Classification

Chunyuan Li; Andrew Stevens; Changyou Chen; Yunchen Pu; Zhe Gan; Lawrence Carin

Learning the representation of shape cues in 2D & 3D objects for recognition is a fundamental task in computer vision. Deep neural networks (DNNs) have shown promising performance on this task. Due to the large variability of shapes, accurate recognition relies on good estimates of model uncertainty, ignored in traditional training of DNNs, typically learned via stochastic optimization. This paper leverages recent advances in stochastic gradient Markov Chain Monte Carlo (SG-MCMC) to learn weight uncertainty in DNNs. It yields principled Bayesian interpretations for the commonly used Dropout/DropConnect techniques and incorporates them into the SG-MCMC framework. Extensive experiments on 2D & 3D shape datasets and various DNN models demonstrate the superiority of the proposed approach over stochastic optimization. Our approach yields higher recognition accuracy when used in conjunction with Dropout and Batch-Normalization.


international conference on acoustics, speech, and signal processing | 2017

Character-level deep conflation for business data analytics

Zhe Gan; Prabhdeep Singh; Ameet Joshi; Xiaodong He; Jianshu Chen; Jianfeng Gao; Li Deng

Connecting different text attributes associated with the same entity (conflation) is important in business data analytics since it could help merge two different tables in a database to provide a more comprehensive profile of an entity. However, the conflation task is challenging because two text strings that describe the same entity could be quite different from each other for reasons such as misspelling. It is therefore critical to develop a conflation model that is able to truly understand the semantic meaning of the strings and match them at the semantic level. To this end, we develop a character-level deep conflation model that encodes the input text strings from character level into finite dimension feature vectors, which are then used to compute the cosine similarity between the text strings. The model is trained in an end-to-end manner using back propagation and stochastic gradient descent to maximize the likelihood of the correct association. Specifically, we propose two variants of the deep conflation model, based on long-short-term memory (LSTM) recurrent neural network (RNN) and convolutional neural network (CNN), respectively. Both models perform well on a real-world business analytics dataset and significantly outperform the baseline bag-of-character (BoC) model.


neural information processing systems | 2016

Variational Autoencoder for Deep Learning of Images, Labels and Captions

Yunchen Pu; Zhe Gan; Ricardo Henao; Xin Yuan; Chunyuan Li; Andrew Stevens; Lawrence Carin


international conference on artificial intelligence and statistics | 2015

Learning Deep Sigmoid Belief Networks with Data Augmentation

Zhe Gan; Ricardo Henao; David E. Carlson; Lawrence Carin


international conference on machine learning | 2015

Scalable Deep Poisson Factor Analysis for Topic Modeling

Zhe Gan; Changyou Chen; Ricardo Henao; David E. Carlson; Lawrence Carin


neural information processing systems | 2015

Deep temporal sigmoid belief networks for sequence modeling

Zhe Gan; Chunyuan Li; Ricardo Henao; David E. Carlson; Lawrence Carin


international conference on artificial intelligence and statistics | 2016

Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization

Changyou Chen; David E. Carlson; Zhe Gan; Chunyuan Li; Lawrence Carin

Collaboration


Dive into the Zhe Gan's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge