Wei-Ning Hsu
Massachusetts Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Wei-Ning Hsu.
north american chapter of the association for computational linguistics | 2016
Mitra Mohtarami; Yonatan Belinkov; Wei-Ning Hsu; Yu Zhang; Tao Lei; Kfir Bar; Scott Cyphers; James R. Glass
Community question answering platforms need to automatically rank answers and questions with respect to a given question. In this paper, we present the approaches for the Answer Selection and Question Retrieval tasks of SemEval-2016 (task 3). We develop a bag-of-vectors approach with various vectorand text-based features, and different neural network approaches including CNNs and LSTMs to capture the semantic similarity between questions and answers for ranking purpose. Our evaluation demonstrates that our approaches significantly outperform the baselines.
conference of the international speech communication association | 2016
Wei-Ning Hsu; Yu Zhang; Ann Lee; James R. Glass
Deep neural network models have achieved considerable success in a wide range of fields. Several architectures have been proposed to alleviate the vanishing gradient problem, and hence enable training of very deep networks. In the speech recognition area, convolutional neural networks, recurrent neural networks, and fully connected deep neural networks have been shown to be complimentary in their modeling capabilities. Combining all three components, called CLDNN, yields the best performance to date. In this paper, we extend the CLDNN model by introducing a highway connection between LSTM layers, which enables direct information flow from cells of lower layers to cells of upper layers. With this design, we are able to better exploit the advantages of a deeper structure. Experiments on the GALE Chinese Broadcast Conversation/News Speech dataset indicate that our model outperforms all previous models and achieves a new benchmark, which is 22.41% character error rate on the dataset.
international conference on acoustics, speech, and signal processing | 2015
Cheng-Tao Chung; Wei-Ning Hsu; Cheng-Yi Lee; Lin-Shan Lee
This paper presents a novel approach for enhancing the multiple sets of acoustic patterns automatically discovered from a given corpus. In a previous work it was proposed that different HMM configurations (number of states per model, number of distinct models) for the acoustic patterns form a two-dimensional space. Multiple sets of acoustic patterns automatically discovered with the HMM configurations properly located on different points over this two-dimensional space were shown to be complementary to one another, jointly capturing the characteristics of the given corpus. By representing the given corpus as sequences of acoustic patterns on different HMM sets, the pattern indices in these sequences can be relabeled considering the context consistency across the different sequences. Good improvements were observed in preliminary experiments of pattern spoken term detection (STD) performed on both TIMIT and Mandarin Broadcast News with such enhanced patterns.
spoken language technology workshop | 2016
Tuka Alhanai; Wei-Ning Hsu; James R. Glass
The Arabic language, with over 300 million speakers, has significant diversity and breadth. This proves challenging when building an automated system to understand what is said. This paper describes an Arabic Automatic Speech Recognition system developed on a 1,200 hour speech corpus that was made available for the 2016 Arabic Multi-genre Broadcast (MGB) Challenge. A range of Deep Neural Network (DNN) topologies were modeled including; Feed-forward, Convolutional, Time-Delay, Recurrent Long Short-Term Memory (LSTM), Highway LSTM (H-LSTM), and Grid LSTM (GLSTM). The best performance came from a sequence discriminatively trained G-LSTM neural network. The best overall Word Error Rate (WER) was 18.3% (p < 0:001) on the development set, after combining hypotheses of 3 and 5 layer sequence discriminatively trained G-LSTM models that had been rescored with a 4-gram language model.
spoken language technology workshop | 2016
Wei-Ning Hsu; Yu Zhang; James R. Glass
Recurrent neural networks (RNNs) are naturally suitable for speech recognition because of their ability of utilizing dynamically changing temporal information. Deep RNNs have been argued to be able to model temporal relationships at different time granularities, but suffer vanishing gradient problems. In this paper, we extend stacked long short-term memory (LSTM) RNNs by using grid LSTM blocks that formulate computation along not only the temporal dimension, but also the depth dimension, in order to alleviate this issue. Moreover, we prioritize the depth dimension over the temporal one to provide the depth dimension more updated information, since the output from it will be used for classification. We call this model the prioritized Grid LSTM (pGLSTM). Extensive experiments on four large datasets (AMI, HKUST, GALE, and MGB) indicate that the pGLSTM outperforms alternative deep LSTM models, beating stacked LSTMs with 4% to 7% relative improvement, and achieve new benchmarks among uni-directional models on all datasets.
national conference on artificial intelligence | 2015
Wei-Ning Hsu; Hsuan-Tien Lin
neural information processing systems | 2017
Wei-Ning Hsu; Yu Zhang; James R. Glass
international conference on computational linguistics | 2016
Salvatore Romeo; Giovanni Da San Martino; Alberto Barrón-Cedeño; Alessandro Moschitti; Yonatan Belinkov; Wei-Ning Hsu; Yu Zhang; Mitra Mohtarami; James R. Glass
conference of the international speech communication association | 2017
Wei-Ning Hsu; Yu Zhang; James R. Glass
arXiv: Computation and Language | 2016
Wei-Ning Hsu; Yu Zhang; James R. Glass