Lorin Netsch | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lorin Netsch is active.

Explore More

Publication

Featured researches published by Lorin Netsch.

international conference on acoustics, speech, and signal processing | 1989

Speaker verification over long distance telephone lines

Jayant M. Naik; Lorin Netsch; George R. Doddington

The authors present the results of speaker-verification technology development for use over long-distance telephone lines. A description is given of two large speech databases that were collected to support the development of new speaker verification algorithms. Also discussed are the results of discriminant analysis techniques which improve the discrimination between true speakers and imposters. A comparison is made of the performance of two speaker-verification algorithms, one using template-based dynamic time warping, and the other, hidden Markov modeling.<<ETX>>

Journal of the Acoustical Society of America | 2003

Enrollment and modeling method and apparatus for robust speaker dependent speech models

Lorin Netsch; Barbara Wheatley

Speech recognition and the generation of speech recognition models is provided including the generation of unique phonotactic garbage models (15) to identify speech by, for example, English language constraints in addition to noise, silence and other non-speech models (11) and for speech recognition specific word models.

Journal of the Acoustical Society of America | 2000

Speech recognition using middle-to-middle context hidden markov models

Charles T. Hemphill; Lorin Netsch; Christopher M. Kribs

This is a speech recognition method for modeling adjacent word context, comprising: dividing a first word or period of silence into two portions; dividing a second word or period of silence, adjacent to the first word, into two potions; and combining last portion of the first word or period of silence and first portion of the second word or period of silence to make an acoustic model. The method includes constructing a grammar to restrict the acoustic models to the middle-to-middle context.

international conference on acoustics, speech, and signal processing | 2010

Emerging ITU-T standard G.711.0 — lossless compression of G.711 pulse code modulation

Noboru Harada; Yutaka Kamamoto; Takehiro Moriya; Yusuke Hiwasaki; Michael A. Ramalho; Lorin Netsch; Jacek Stachurski; Lei Miao; Herve Marcel Taddei; Fengyan Qi

The ITU-T Recommendation G.711 is the benchmark standard for narrowband telephony. It has been successful for many decades because of its proven voice quality, ubiquity and utility. A new ITU-T recommendation, denoted G.711.0, has been recently established defining a lossless compression for G.711 packet payloads typically found in IP networks. This paper presents a brief overview of technologies employed within the G.711.0 standard and summarizes the compression and complexity results. It is shown that G.711.0 provides greater than 50% average compression in typical service provider environments while keeping low computational complexity for the encoder/decoder pair (1.0 WMOPS average, <;1.7 WMOPS worst case) and low memory footprint (about 5k octets RAM, 5.7k octets ROM, and 3.6k program memory measured in number of basic operators).

international conference on acoustics, speech, and signal processing | 1992

Speaker verification using temporal decorrelation post-processing

Lorin Netsch; George R. Doddington

A text-dependent method of speaker verification processing which utilizes the statistical correlation between measured features of speech across whole words is described. The correlation is used in a linear discriminant analysis to define uncorrelated world-level features as a metric. Initial results indicate that this method can significantly reduce the amount of storage necessary for speaker-specific speech information. Furthermore, this method provides promise of improved verification performance compared to methods based on hidden Markov model (HMM) state level observation metrics. Since the linear discriminant analysis yields features which are decorrelated over entire words, this method should be more robust to signal distortions which are consistent over the entire utterance.<<ETX>>

international conference on acoustics, speech, and signal processing | 1997

Speaker-independent name dialing with out-of-vocabulary rejection

Coimbatore S. Ramalingam; Lorin Netsch; Yu-Hung Kao

We propose a system for speaker-independent name dialing in which a name enrolled by a user can be used by other members in a family or co-workers in an office. We use speaker-independent sub-word models during enrollment; the recognized sub-word string is later used during recognition. We also present a mechanism for rejecting out-of-vocabulary (OOV) phrases. The best in-vocabulary (IV) correct and OOV rejection performance for other speakers is 90%/60% (IV/OOV) on a database containing eighteen speakers. If the orthography is known, the best performance is 96%/65%.

international conference on acoustics speech and signal processing | 1999

Speaker-dependent name dialing in a car environment with out-of-vocabulary rejection

Coimbatore S. Ramalingam; Yifan Gong; Lorin Netsch; Wallace Anderson; John J. Godfrey; Yu-Hung Kao

We describe a system for name dialing in the car and present results under three driving conditions using real-life data. The names are enrolled in the parked car condition (engine off) and we describe two approaches for endpointing them-energy-based and recognition-based schemes-which result in word-based and phone-based models, respectively. We outline a simple algorithm to reject out-of-vocabulary names. PMC is used for noise compensation. When tested on an internally collected twenty-speaker database, for a list size of 50 and a hand-held microphone, the performance averaged over all driving conditions and speakers was 98%/92% (IV accuracy/OOV rejection); for the hands-free data, it was 98%180%.

international conference on acoustics, speech, and signal processing | 2010

Fractional-bit and value-location lossless encoding in G.711.0 coder

Jacek Stachurski; Lorin Netsch

The paper describes two lossless coding tools employed in the new ITU-T G.711.0 Recommendation: fractional-bit and value-location encoding. Instead of encoding each sample individually as done in G.711, the fractional-bit coding tool identifies the total number of signal levels that exist within an input frame and then combines several samples for joint encoding with fractional-bits per sample. The value-location tool encodes positions of all values within an input frame that differ from a reference value. The method efficiently represents an input frame as a sum of value-location code vectors that are sequentially encoded using Rice, binary, or explicit location encoding. Presented results illustrate how the described coding techniques were adopted for usage within the new ITU-T G.711.0 standard.

advanced video and signal based surveillance | 2013

Sound source localization for video surveillance camera

Jacek Stachurski; Lorin Netsch; Randy Cole

While video analytics used in surveillance applications performs well in normal conditions, it may not work as accurately under adverse circumstances. Taking advantage of the complementary aspects of video and audio can lead to a more effective analytics framework resulting in increased system robustness. For example, sound scene analysis may indicate potential security risks outside field-of-view, pointing the camera in that direction. This paper presents a robust low-complexity method for two-microphone estimation of sound direction. While the source localization problem has been studied extensively, a reliable low-complexity solution remains elusive. The proposed direction estimation is based on the Generalized Cross-Correlation with Phase Transform (GCC-PHAT) method. The novel aspects of our approach include band-selective processing and inter-frame filtering of the GCC-PHAT objective function prior to peak detection. The audio bandwidth, microphone spacing, angle resolution, processing delay and complexity can all be adjusted depending on the application requirements. The described algorithm can be used in a multi-microphone configuration for spatial sound localization by combining estimates from microphone pairs. It has been implemented as a real-time demo on a modified TI DM8127 IP camera. The default 16 kHz audio sampling frequency requires about 5 MIPS processing power in our fixed-point implementation. The test results show robust sound direction estimation under a variety of background noise conditions.

Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications | 1994

Enhanced voice services in the telecommunication network using the Texas Instruments multiserve

Lorin Netsch; R. Rajasekaran; Barry Price

The paper presents efforts that Texas Instruments is pursuing to place enhanced voice services in the telecommunications network. The authors describe the capabilities of the Texas Instruments multiserve platform, which is a system designed to implement enhanced telecommunication services. The paper discusses an example of some of the technology challenges involved in design of the system. The authors provide results of performance evaluation of the platform on important voice service tasks.<<ETX>>

Explore More