William Hartmann | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where William Hartmann is active.

Explore More

Publication

Featured researches published by William Hartmann.

IEEE Transactions on Audio, Speech, and Language Processing | 2013

A Direct Masking Approach to Robust ASR

William Hartmann; Arun Narayanan; Eric Fosler-Lussier; DeLiang Wang

Recently, much work has been devoted to the computation of binary masks for speech segregation. Conventional wisdom in the field of ASR holds that these binary masks cannot be used directly; the missing energy significantly affects the calculation of the cepstral features commonly used in ASR. We show that this commonly held belief may be a misconception; we demonstrate the effectiveness of directly using the masked data on both a small and large vocabulary dataset. In fact, this approach, which we term the direct masking approach, performs comparably to two previously proposed missing feature techniques. We also investigate the reasons why other researchers may have not come to this conclusion; variance normalization of the features is a significant factor in performance. This work suggests a much better baseline than unenhanced speech for future work in missing feature ASR.

international conference on acoustics, speech, and signal processing | 2011

Investigations into the incorporation of the Ideal Binary Mask in ASR

William Hartmann; Eric Fosler-Lussier

While much work has been dedicated to exploring how best to incorporate the Ideal Binary Mask (IBM) in automatic speech recognition (ASR) for noisy signals, we demonstrate that the simple use of masked speech can outperform standard spectral reconstruction methods. We explore the effects of both the accuracy of the mask estimation and the strength of the language model on our results. The relative performance of these techniques is directly tied to the accuracy of the estimated mask. Although the use of masked speech fails when significant numbers of errors are present, the maximum performance for spectral reconstruction techniques also drops significantly. This implies improvements in mask estimation can provide greater gains in ASR performance than improvements in the incorporation of the IBM in ASR. Previous work may have ignored the direct use of masked speech due to its poor performance on tasks without a strong language model.

conference of the international speech communication association | 2016

Sage: The New BBN Speech Processing Platform.

Roger Hsiao; Ralf Meermeier; Tim Ng; Zhongqiang Huang; Maxwell Jordan; Enoch Kan; Tanel Alumäe; Jan Silovsky; William Hartmann; Francis Keith; Omer Lang; Man-Hung Siu; Owen Kimball

To capitalize on the rapid development of Speech-to-Text (STT) technologies and the proliferation of open source machine learning toolkits, BBN has developed Sage, a new speech processing platform that integrates technologies from multiple sources, each of which has particular strengths. In this paper, we describe the design of Sage, which allows the easy interchange of STT components from different sources. We also describe our approach for fast prototyping with new machine learning toolkits, and a framework for sharing STT components across different applications. Finally, we report Sage’s state-of-the-art performance on different STT tasks.

conference of the international speech communication association | 2016

Comparison of Multiple System Combination Techniques for Keyword Spotting.

William Hartmann; Le Zhang; Kerri Barnes; Roger Hsiao; Stavros Tsakalidis; Richard M. Schwartz

System combination is a common approach to improving results for both speech transcription and keyword spotting—especially in the context of low-resourced languages where building multiple complementary models requires less computational effort. Using state-of-the-art CNN and DNN acoustic models, we analyze the performance, cost, and trade-offs of four system combination approaches: feature combination, joint decoding, hitlist combination, and a novel lattice combination method. Previous work has focused solely on accuracy comparisons. We show that joint decoding, lattice combination, and hitlist combination perform comparably, significantly better than feature combination. However, for practical systems, earlier combination reduces computational cost and storage requirements. Results are reported on four languages from the IARPA Babel dataset.

conference of the international speech communication association | 2016

Two-Stage Data Augmentation for Low-Resourced Speech Recognition.

William Hartmann; Tim Ng; Roger Hsiao; Stavros Tsakalidis; Richard M. Schwartz

Abstract : Low resourced languages suffer from limited training data and resources. Data augmentation is a common approach to increasing the amount of training data. Additional data is synthesized by manipulating the original data with a variety of methods. Unlike most previous work that focuses on a single technique, we combine multiple, complementary augmentation approaches. The first stage adds noise and perturbs the speed of additional copies of the original audio. The data is further augmented in a second stage, where a novel fMLLR-based augmentation is applied to bottleneck features to further improve performance. A reduction in word error rate is demonstrated on four languages from the IARPA Babel program. We present an analysis exploring why these techniques are beneficial.

international conference on acoustics, speech, and signal processing | 2012

ASR-driven top-down binary mask estimation using spectral priors

William Hartmann; Eric Fosler-Lussier

Typical mask estimation algorithms use low-level features to estimate the interfering noise or instantaneous SNR. We propose a simple top-down approach to mask estimation. The estimated mask is based on a specific hypothesis of the underlying speech without using information about the interference or the instantaneous SNR. In this pilot study, we observe a 9% reduction in word error over a baseline recognition system on the Aurora4 corpus, though much greater gains could theoretically be achieved through improvements to the model selection process. We also present SNR improvement results showing our method performs as well as a standard MMSE-based method, demonstrating that speech recognition can aid speech enhancement. Thus, the relationship between recognition and enhancement need not be one way: linguistic information can play a significant role in speech enhancement.

conference of the international speech communication association | 2015