Zichao Yang
Carnegie Mellon University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zichao Yang.
computer vision and pattern recognition | 2016
Zichao Yang; Xiaodong He; Jianfeng Gao; Li Deng; Alexander J. Smola
This paper presents stacked attention networks (SANs) that learn to answer natural language questions from images. SANs use semantic representation of a question as query to search for the regions in an image that are related to the answer. We argue that image question answering (QA) often requires multiple steps of reasoning. Thus, we develop a multiple-layer SAN in which we query an image multiple times to infer the answer progressively. Experiments conducted on four image QA data sets demonstrate that the proposed SANs significantly outperform previous state-of-the-art approaches. The visualization of the attention layers illustrates the progress that the SAN locates the relevant visual clues that lead to the answer of the question layer-by-layer.
north american chapter of the association for computational linguistics | 2016
Zichao Yang; Diyi Yang; Chris Dyer; Xiaodong He; Alexander J. Smola; Eduard H. Hovy
We propose a hierarchical attention network for document classification. Our model has two distinctive characteristics: (i) it has a hierarchical structure that mirrors the hierarchical structure of documents; (ii) it has two levels of attention mechanisms applied at the wordand sentence-level, enabling it to attend differentially to more and less important content when constructing the document representation. Experiments conducted on six large scale text classification tasks demonstrate that the proposed architecture outperform previous methods by a substantial margin. Visualization of the attention layers illustrates that the model selects qualitatively informative words and sentences.
international conference on computer vision | 2015
Zichao Yang; Marcin Moczulski; Misha Denil; Nando de Freitas; Alexander J. Smola; Le Song; Ziyu Wang
The fully-connected layers of deep convolutional neural networks typically contain over 90% of the network parameters. Reducing the number of parameters while preserving predictive performance is critically important for training big models in distributed systems and for deployment in embedded devices. In this paper, we introduce a novel Adaptive Fastfood transform to reparameterize the matrix-vector multiplication of fully connected layers. Reparameterizing a fully connected layer with d inputs and n outputs with the Adaptive Fastfood transform reduces the storage and computational costs costs from O(nd) to O(n) and O(n log d) respectively. Using the Adaptive Fastfood transform in convolutional networks results in what we call a deep fried convnet. These convnets are end-to-end trainable, and enable us to attain substantial reductions in the number of parameters without affecting prediction accuracy on the MNIST and ImageNet datasets.
empirical methods in natural language processing | 2016
Zhiting Hu; Zichao Yang; Ruslan Salakhutdinov; Eric P. Xing
Regulating deep neural networks (DNNs) with human structured knowledge has shown to be of great benefit for improved accuracy and interpretability. We develop a general framework that enables learning knowledge and its confidence jointly with the DNNs, so that the vast amount of fuzzy knowledge can be incorporated and automatically optimized with little manual efforts. We apply the framework to sentence sentiment analysis, augmenting a DNN with massive linguistic constraints on discourse and polarity structures. Our model substantially enhances the performance using less training data, and shows improved interpretability. The principled framework can also be applied to posterior regularization for regulating other statistical models.
empirical methods in natural language processing | 2017
Zichao Yang; Phil Blunsom; Chris Dyer; Wang Ling
We propose a general class of language models that treat reference as an explicit stochastic latent variable. This architecture allows models to create mentions of entities and their attributes by accessing external databases (required by, e.g., dialogue generation and recipe generation) and internal state (required by, e.g. language models which are aware of coreference). This facilitates the incorporation of information that can be accessed in predictable locations in databases or discourse context, even when the targets of the reference may be rare words. Experiments on three tasks shows our model variants based on deterministic attention.
international conference on machine learning | 2017
Zhiting Hu; Zichao Yang; Xiaodan Liang; Ruslan Salakhutdinov; Eric P. Xing
international conference on artificial intelligence and statistics | 2015
Zichao Yang; Andrew Gordon Wilson; Alexander J. Smola; Le Song
international conference on machine learning | 2017
Zichao Yang; Zhiting Hu; Ruslan Salakhutdinov; Taylor Berg-Kirkpatrick
international conference on learning representations | 2018
Zhiting Hu; Zichao Yang; Ruslan Salakhutdinov; Eric P. Xing
conference of the european chapter of the association for computational linguistics | 2017
Zichao Yang; Zhiting Hu; Yuntian Deng; Chris Dyer; Alexander J. Smola