Yuzong Liu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yuzong Liu is active.

Explore More

Publication

Featured researches published by Yuzong Liu.

international conference on acoustics, speech, and signal processing | 2014

SUBMODULAR SUBSET SELECTION FOR LARGE-SCALE SPEECH TRAINING DATA

Kai Wei; Yuzong Liu; Katrin Kirchhoff; Chris D. Bartels; Jeff A. Bilmes

We address the problem of subselecting a large set of acoustic data to train automatic speech recognition (ASR) systems. To this end, we apply a novel data selection technique based on constrained submodular function maximization. Though NP-hard, the combinatorial optimization problem can be approximately solved by a simple and scalable greedy algorithm with constant-factor guarantees. We evaluate our approach by subselecting data from 1300 hours of conversational English telephone data to train two types large-vocabulary speech recognizers, one with Gaussian mixture model (GMM) based acoustic models, and another based on deep neural networks (DNNs). We show that training data can be reduced significantly, and that our technique outperforms both random selection and a previously proposed selection method utilizing comparable resources. Notably, using the submodular selection method, the DNN system using only about 5% of the training data is able to achieve performance on par with the GMM system using 100% of the training data - with the baseline subset selection methods, however, the DNN system is unable to accomplish this correspondence.

international conference on acoustics, speech, and signal processing | 2013

Submodular feature selection for high-dimensional acoustic score spaces

Yuzong Liu; Kai Wei; Katrin Kirchhoff; Yisong Song; Jeff A. Bilmes

We apply methods for selecting subsets of dimensions from high-dimensional score spaces, and subsets of data for training, using submodular function optimization. Submodular functions provide theoretical performance guarantees while simultaneously retaining extremely fast and scalable optimization via an accelerated greedy algorithm. We evaluate this approach on two applications: data subset selection for phone recognizer training, and semi-supervised learning for phone segment classification. Interestingly, the first application uses submodularity twice: first for score space sub-selection and then for data subset selection. Our approach is computationally efficient but still consistently outperforms a number of baseline methods.

international conference on acoustics, speech, and signal processing | 2014

Unsupervised submodular subset selection for speech data

Kai Wei; Yuzong Liu; Katrin Kirchhoff; Jeff A. Bilmes

We conduct a comparative study on selecting subsets of acoustic data for training phone recognizers. The data selection problem is approached as a constrained submodular optimization problem. Previous applications of this approach required transcriptions or acoustic models trained in a supervised way. In this paper we develop and evaluate a novel and entirely unsupervised approach, and apply it to TIMIT data. Results show that our method consistently outperforms a number of baseline methods while being computationally very efficient and requiring no labeling.

spoken language technology workshop | 2014

Graph-based semi-supervised acoustic modeling in DNN-based speech recognition

Yuzong Liu; Katrin Kirchhoff

This paper describes the combination of two recent machine learning techniques for acoustic modeling in speech recognition: deep neural networks (DNNs) and graph-based semi-supervised learning (SSL). While DNNs have been shown to be powerful supervised classifiers and have achieved considerable success in speech recognition, graph-based SSL can exploit valuable complementary information derived from the manifold structure of the unlabeled test data. Previous work on graph-based SSL in acoustic modeling has been limited to frame-level classification tasks and has not been compared to, or integrated with, state-of-the-art DNN/HMM recognizers. This paper represents the first integration of graph-based SSL with DNN based speech recognition and analyzes its effect on word recognition performance. The approach is evaluated on two small vocabulary speech recognition tasks and shows a significant improvement in HMM state classification accuracy as well as a consistent reduction in word error rate over a state-of-the-art DNN/HMM baseline.

IEEE Transactions on Audio, Speech, and Language Processing | 2016

Graph-Based Semisupervised Learning for Acoustic Modeling in Automatic Speech Recognition

Yuzong Liu; Katrin Kirchhoff

In this paper, we investigate how to apply graph-based semisupervised learning to acoustic modeling in speech recognition. Graph-based semisupervised learning is a widely used transductive semisupervised learning method in which labeled and unlabeled data are jointly represented as a weighted graph; the resulting graph structure is then used as a constraint during the classification of unlabeled data points. We investigate suitable graph-based learning algorithms for speech data and evaluate two different frameworks for integrating graph-based learning into state-of-the-art, deep neural network (DDN)-based speech recognition systems. The first framework utilizes graph-based learning in parallel with a DNN classifier within a lattice-rescoring framework, whereas the second framework relies on an embedding of graph neighborhood information into continuous space using an autoencoder. We demonstrate significant improvements in framelevel phonetic classification accuracy and consistent reductions in word error rate on large-vocabulary conversational speech recognition tasks.

ieee automatic speech recognition and understanding workshop | 2015

Acoustic modeling with neural graph embeddings

Yuzong Liu; Katrin Kirchhoff

Graph-based learning (GBL) is a form of semi-supervised learning that has been successfully exploited in acoustic modeling in the past. It utilizes manifold information in speech data that is represented as a joint similarity graph over training and test samples. Typically, GBL is used at the output level of an acoustic classifier; however, this setup is difficult to scale to large data sets, and the graph-based learner is not optimized jointly with other components of the speech recognition system. In this paper we explore a different approach where the similarity graph is first embedded into continuous space using a neural autoencoder. Features derived from this encoding are then used at the input level to a standard DNN-based speech recognizer. We demonstrate improved scalability and performance compared to the standard GBL approach as well as significant improvements in word error rate on a medium-vocabulary Switchboard task.

conference of the international speech communication association | 2016

Novel Front-End Features Based on Neural Graph Embeddings for DNN-HMM and LSTM-CTC Acoustic Modeling.

Yuzong Liu; Katrin Kirchhoff

In this paper we investigate neural graph embeddings as frontend features for various deep neural network (DNN) architectures for speech recognition. Neural graph embedding features are produced by an autoencoder that maps graph structures defined over speech samples to a continuous vector space. The resulting feature representation is then used to augment the standard acoustic features at the input level of a DNN classifier. We compare two different neural graph embedding methods, one based on a local neighborhood graph encoding, and another based on a global similarity graph encoding. They are evaluated in DNN-HMM-based and LSTM-CTC-based ASR systems on a 110-hour Switchboard conversational speech recognition task. Significant improvements in word error rates are achieved by both methods in the DNN-HMM system, and by global graph embeddings in the LSTM-CTC system.

Computer Speech & Language | 2017

SVitchboard-II and FiSVer-I: Crafting high quality and low complexity conversational english speech corpora using submodular function optimization

Yuzong Liu; Rishabh K. Iyer; Katrin Kirchhoff; Jeff A. Bilmes

Abstract We introduce a set of benchmark corpora of conversational English speech derived from the Switchboard-I and Fisher datasets. Traditional automatic speech recognition (ASR) research requires considerable computational resources and has slow experimental turnaround times. Our goal is to introduce these new datasets to researchers in the ASR and machine learning communities in order to facilitate the development of novel speech recognition techniques on smaller but still acoustically rich, diverse, and hence interesting corpora. We select these corpora to maximize an acoustic quality criterion while limiting the vocabulary size (from 10 words up to 10,000 words), where both “acoustic quality” and vocabulary size are adeptly measured via various submodular functions. We also survey numerous submodular functions that could be useful to measure both “acoustic quality” and “corpus complexity” and offer guidelines on when and why a scientist may wish use to one vs. another. The corpora selection process itself is naturally performed using various state-of-the-art submodular function optimization procedures, including submodular level-set constrained submodular optimization (SCSC/SCSK), difference-of-submodular (DS) optimization, and unconstrained submodular minimization (SFM), all of which are fully defined herein. While the focus of this paper is on the resultant speech corpora, and the survey of possible objectives, a consequence of the paper is a thorough empirical comparison of the relative merits of these modern submodular optimization procedures. We provide baseline word recognition results on all of the resultant speech corpora for both Gaussian mixture model (GMM) and deep neural network (DNN)-based systems, and we have released all of the corpora definitions and Kaldi training recipes for free in the public domain.

north american chapter of the association for computational linguistics | 2013