Saizheng Zhang
Université de Montréal
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Saizheng Zhang.
european conference on machine learning | 2015
Dong-Hyun Lee; Saizheng Zhang; Asja Fischer; Yoshua Bengio
Back-propagation has been the workhorse of recent successes of deep learning but it relies on infinitesimal effects (partial derivatives) in order to perform credit assignment. This could become a serious issue as one considers deeper and more non-linear functions, e.g., consider the extreme case of non-linearity where the relation between parameters and cost is actually discrete. Inspired by the biological implausibility of back-propagation, a few approaches have been proposed in the past that could play a similar credit assignment role. In this spirit, we explore a novel approach to credit assignment in deep networks that we call target propagation. The main idea is to compute targets rather than gradients, at each layer. Like gradients, they are propagated backwards. In a way that is related but different from previously proposed proxies for back-propagation which rely on a backwards network with symmetric weights, target propagation relies on auto-encoders at each layer. Unlike back-propagation, it can be applied even when units exchange stochastic bits rather than real numbers. We show that a linear correction for the imperfectness of the auto-encoders, called difference target propagation, is very effective to make target propagation actually work, leading to results comparable to back-propagation for deep networks with discrete and continuous units and denoising auto-encoders and achieving state of the art for stochastic networks.
conference of the international speech communication association | 2016
Ying Zhang; Mohammad Pezeshki; Philemon Brakel; Saizheng Zhang; César Laurent; Yoshua Bengio; Aaron C. Courville
Convolutional Neural Networks (CNNs) are effective models for reducing spectral variations and modeling spectral correlations in acoustic features for automatic speech recognition (ASR). Hybrid speech recognition systems incorporating CNNs with Hidden Markov Models/Gaussian Mixture Models (HMMs/GMMs) have achieved the state-of-the-art in various benchmarks. Meanwhile, Connectionist Temporal Classification (CTC) with Recurrent Neural Networks (RNNs), which is proposed for labeling unsegmented sequences, makes it feasible to train an end-to-end speech recognition system instead of hybrid settings. However, RNNs are computationally expensive and sometimes difficult to train. In this paper, inspired by the advantages of both CNNs and the CTC approach, we propose an end-to-end speech framework for sequence labeling, by combining hierarchical CNNs with CTC directly without recurrent connections. By evaluating the approach on the TIMIT phoneme recognition task, we show that the proposed model is not only computationally efficient, but also competitive with the existing baseline systems. Moreover, we argue that CNNs have the capability to model temporal correlations with appropriate context information.
Neural Computation | 2017
Yoshua Bengio; Thomas Mesnard; Asja Fischer; Saizheng Zhang; Yuhuai Wu
We show that Langevin Markov chain Monte Carlo inference in an energy-based model with latent variables has the property that the early steps of inference, starting from a stationary point, correspond to propagating error gradients into internal layers, similar to backpropagation. The backpropagated error is with respect to output units that have received an outside driving force pushing them away from the stationary point. Backpropagated error gradients correspond to temporal derivatives with respect to the activation of hidden units. These lead to a weight update proportional to the product of the presynaptic firing rate and the temporal rate of change of the postsynaptic firing rate. Simulations and a theoretical argument suggest that this rate-based update rule is consistent with those associated with spike-timing-dependent plasticity. The ideas presented in this article could be an element of a theory for explaining how brains perform credit assignment in deep hierarchies as efficiently as backpropagation does, with neural computation corresponding to both approximate inference in continuous-valued latent variables and error backpropagation, at the same time.
meeting of the association for computational linguistics | 2017
Xingdi Yuan; Tong Wang; Caglar Gulcehre; Alessandro Sordoni; Philip Bachman; Saizheng Zhang; Sandeep Subramanian; Adam Trischler
We propose a recurrent neural model that generates natural-language questions from documents, conditioned on answers. We show how to train the model using a combination of supervised and reinforcement learning. After teacher forcing for standard maximum likelihood training, we fine-tune the model using policy gradient techniques to maximize several rewards that measure question quality. Most notably, one of these rewards is the performance of a question-answering system. We motivate question generation as a means to improve the performance of question answering systems. Our model is trained and evaluated on the recent question-answering dataset SQuAD.
Archive | 2017
Yoshua Bengio; Thomas Mesnard; Asja Fischer; Saizheng Zhang; Yuhuai Wu
We introduce a weight update formula that is expressed only in terms of firing rates and their derivatives and that results in changes consistent with those associated with spike-timing dependent plasticity (STDP) rules and biological observations, even though the explicit timing of spikes is not needed. The new rule changes a synaptic weight in proportion to the product of the presynaptic firing rate and the temporal rate of change of activity on the postsynaptic side. These quantities are interesting for studying theoretical explanation for synaptic changes from a machine learning perspective. In particular, if neural dynamics moved neural activity towards reducing some objective function, then this STDP rule would correspond to stochastic gradient descent on that objective function.
neural information processing systems | 2016
Alex Lamb; Anirudh Goyal; Ying Zhang; Saizheng Zhang; Aaron C. Courville; Yoshua Bengio
neural information processing systems | 2016
Yuhuai Wu; Saizheng Zhang; Ying Zhang; Yoshua Bengio; Ruslan Salakhutdinov
neural information processing systems | 2016
Saizheng Zhang; Yuhuai Wu; Tong Che; Zhouhan Lin; Roland Memisevic; Ruslan Salakhutdinov; Yoshua Bengio
arXiv: Computation and Language | 2017
Iulian Vlad Serban; Chinnadhurai Sankar; Mathieu Germain; Saizheng Zhang; Zhouhan Lin; Sandeep Subramanian; Taesup Kim; Michael Pieper; Sarath Chandar; Nan Rosemary Ke; Sai Rajeswar Mudumba; Alexandre de Brébisson; Jose Sotelo; Dendi Suhubdy; Vincent Michalski; Alexandre Nguyen; Joelle Pineau; Yoshua Bengio
arXiv: Neural and Evolutionary Computing | 2015
Yoshua Bengio; Thomas Mesnard; Asja Fischer; Saizheng Zhang; Yuhai Wu