Saizheng Zhang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Saizheng Zhang is active.

Explore More

Publication

Featured researches published by Saizheng Zhang.

european conference on machine learning | 2015

Difference target propagation

Dong-Hyun Lee; Saizheng Zhang; Asja Fischer; Yoshua Bengio

Back-propagation has been the workhorse of recent successes of deep learning but it relies on infinitesimal effects (partial derivatives) in order to perform credit assignment. This could become a serious issue as one considers deeper and more non-linear functions, e.g., consider the extreme case of non-linearity where the relation between parameters and cost is actually discrete. Inspired by the biological implausibility of back-propagation, a few approaches have been proposed in the past that could play a similar credit assignment role. In this spirit, we explore a novel approach to credit assignment in deep networks that we call target propagation. The main idea is to compute targets rather than gradients, at each layer. Like gradients, they are propagated backwards. In a way that is related but different from previously proposed proxies for back-propagation which rely on a backwards network with symmetric weights, target propagation relies on auto-encoders at each layer. Unlike back-propagation, it can be applied even when units exchange stochastic bits rather than real numbers. We show that a linear correction for the imperfectness of the auto-encoders, called difference target propagation, is very effective to make target propagation actually work, leading to results comparable to back-propagation for deep networks with discrete and continuous units and denoising auto-encoders and achieving state of the art for stochastic networks.

conference of the international speech communication association | 2016

Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks.

Ying Zhang; Mohammad Pezeshki; Philemon Brakel; Saizheng Zhang; César Laurent; Yoshua Bengio; Aaron C. Courville

Convolutional Neural Networks (CNNs) are effective models for reducing spectral variations and modeling spectral correlations in acoustic features for automatic speech recognition (ASR). Hybrid speech recognition systems incorporating CNNs with Hidden Markov Models/Gaussian Mixture Models (HMMs/GMMs) have achieved the state-of-the-art in various benchmarks. Meanwhile, Connectionist Temporal Classification (CTC) with Recurrent Neural Networks (RNNs), which is proposed for labeling unsegmented sequences, makes it feasible to train an end-to-end speech recognition system instead of hybrid settings. However, RNNs are computationally expensive and sometimes difficult to train. In this paper, inspired by the advantages of both CNNs and the CTC approach, we propose an end-to-end speech framework for sequence labeling, by combining hierarchical CNNs with CTC directly without recurrent connections. By evaluating the approach on the TIMIT phoneme recognition task, we show that the proposed model is not only computationally efficient, but also competitive with the existing baseline systems. Moreover, we argue that CNNs have the capability to model temporal correlations with appropriate context information.

Neural Computation | 2017

Stdp-compatible approximation of backpropagation in an energy-based model

Yoshua Bengio; Thomas Mesnard; Asja Fischer; Saizheng Zhang; Yuhuai Wu

We show that Langevin Markov chain Monte Carlo inference in an energy-based model with latent variables has the property that the early steps of inference, starting from a stationary point, correspond to propagating error gradients into internal layers, similar to backpropagation. The backpropagated error is with respect to output units that have received an outside driving force pushing them away from the stationary point. Backpropagated error gradients correspond to temporal derivatives with respect to the activation of hidden units. These lead to a weight update proportional to the product of the presynaptic firing rate and the temporal rate of change of the postsynaptic firing rate. Simulations and a theoretical argument suggest that this rate-based update rule is consistent with those associated with spike-timing-dependent plasticity. The ideas presented in this article could be an element of a theory for explaining how brains perform credit assignment in deep hierarchies as efficiently as backpropagation does, with neural computation corresponding to both approximate inference in continuous-valued latent variables and error backpropagation, at the same time.

meeting of the association for computational linguistics | 2017

Machine Comprehension by Text-to-Text Neural Question Generation

Xingdi Yuan; Tong Wang; Caglar Gulcehre; Alessandro Sordoni; Philip Bachman; Saizheng Zhang; Sandeep Subramanian; Adam Trischler

We propose a recurrent neural model that generates natural-language questions from documents, conditioned on answers. We show how to train the model using a combination of supervised and reinforcement learning. After teacher forcing for standard maximum likelihood training, we fine-tune the model using policy gradient techniques to maximize several rewards that measure question quality. Most notably, one of these rewards is the performance of a question-answering system. We motivate question generation as a means to improve the performance of question answering systems. Our model is trained and evaluated on the recent question-answering dataset SQuAD.

Archive | 2017

STDP as Presynaptic Activity Times Rate of Change of Postsynaptic Activity Approximates Backpropagation

Yoshua Bengio; Thomas Mesnard; Asja Fischer; Saizheng Zhang; Yuhuai Wu

We introduce a weight update formula that is expressed only in terms of firing rates and their derivatives and that results in changes consistent with those associated with spike-timing dependent plasticity (STDP) rules and biological observations, even though the explicit timing of spikes is not needed. The new rule changes a synaptic weight in proportion to the product of the presynaptic firing rate and the temporal rate of change of activity on the postsynaptic side. These quantities are interesting for studying theoretical explanation for synaptic changes from a machine learning perspective. In particular, if neural dynamics moved neural activity towards reducing some objective function, then this STDP rule would correspond to stochastic gradient descent on that objective function.

neural information processing systems | 2016

Professor Forcing: A New Algorithm for Training Recurrent Networks

Alex Lamb; Anirudh Goyal; Ying Zhang; Saizheng Zhang; Aaron C. Courville; Yoshua Bengio

neural information processing systems | 2016

On Multiplicative Integration with Recurrent Neural Networks

Yuhuai Wu; Saizheng Zhang; Ying Zhang; Yoshua Bengio; Ruslan Salakhutdinov

neural information processing systems | 2016

Architectural complexity measures of recurrent neural networks

Saizheng Zhang; Yuhuai Wu; Tong Che; Zhouhan Lin; Roland Memisevic; Ruslan Salakhutdinov; Yoshua Bengio

arXiv: Computation and Language | 2017

A Deep Reinforcement Learning Chatbot.

Iulian Vlad Serban; Chinnadhurai Sankar; Mathieu Germain; Saizheng Zhang; Zhouhan Lin; Sandeep Subramanian; Taesup Kim; Michael Pieper; Sarath Chandar; Nan Rosemary Ke; Sai Rajeswar Mudumba; Alexandre de Brébisson; Jose Sotelo; Dendi Suhubdy; Vincent Michalski; Alexandre Nguyen; Joelle Pineau; Yoshua Bengio

arXiv: Neural and Evolutionary Computing | 2015