Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xianxian Zhang is active.

Publication


Featured researches published by Xianxian Zhang.


IEEE Transactions on Speech and Audio Processing | 2003

CSA-BF: a constrained switched adaptive beamformer for speech enhancement and recognition in real car environments

Xianxian Zhang; John H. L. Hansen

While a number of studies have investigated various speech enhancement and processing schemes for in-vehicle speech systems, little research has been performed using actual voice data collected in noisy car environments. In this paper, we propose a new constrained switched adaptive beamforming algorithm (CSA-BF) for speech enhancement and recognition in real moving car environments. The proposed algorithm consists of a speech/noise constraint section, a speech adaptive beamformer, and a noise adaptive beamformer. We investigate CSA-BF performance with a comparison to classic delay-and-sum beamforming (DASB) in realistic car conditions using a corpus of data recorded in various car noise environments from across the U.S. After analyzing the experimental results and considering the range of complex noise situations in the car environment using the CU-Move corpus, we formulate the three specific processing stages of the CSA-BF algorithm. This method is evaluated and shown to simultaneously decrease word-error-rate (WER) for speech recognition by up to 31% and improve speech quality via the SEGSNR measure by up to +5.5 dB on the average.


Archive | 2005

CU-Move: Advanced In-Vehicle Speech Systems for Route Navigation

John H. L. Hansen; Xianxian Zhang; Murat Akbacak; Umit H. Yapanel; Bryan L. Pellom; Wayne H. Ward; Pongtep Angkititrakul

In this chapter, we present our recent advances in the formulation and development of an in-vehicle hands-free route navigation system. The system is comprised of a multi-microphone array processing front-end, environmental sniffer (for noise analysis), robust speech recognition system, and dialog manager and information servers. We also present our recently completed speech corpus for in-vehicle interactive speech systems for route planning and navigation. The corpus consists of five domains which include: digit strings, route navigation expressions, street and location sentences, phonetically balanced sentences, and a route navigation dialog in a human Wizard-of-Oz like scenario. A total of 500 speakers were collected from across the United States of America during a six month period from April-Sept. 2001. While previous attempts at in-vehicle speech systems have generally focused on isolated command words to set radio frequencies, temperature control, etc., the CU-Move system is focused on natural conversational interaction between the user and in-vehicle system. After presenting our proposed in-vehicle speech system, we consider advances in multi-channel array processing, environmental noise sniffing and tracking, new and more robust acoustic front-end representations and built-in speaker normalization for robust ASR, and our back-end dialog navigation information retrieval sub-system connected to the WWW. Results are presented in each sub-section with a discussion at the end of the chapter.


international conference on acoustics, speech, and signal processing | 2003

CSA-BF: novel constrained switched adaptive beamforming for speech enhancement & recognition in real car environments

Xianxian Zhang; John H. L. Hansen

While a number of studies have investigated various speech enhancement and processing schemes for in-vehicle speech systems, little research has been performed using actual voice data collected in noisy car environments. We propose a new constrained switched adaptive beamforming algorithm (CSA-BF) for speech enhancement and recognition in real moving car environments. The proposed algorithm consists of a speech/noise constraint section, a speech adaptive beamformer, and a noise adaptive beamformer. We investigate CSA-BF performance with a comparison to classic delay-and-sum beamforming (DASB) in realistic car environments using a large quantity of data recorded in various car noise environments from across the United States. After analyzing the experimental results and considering the range of complex noise situations in the car environment using the CU-Move corpus, we formulate the CSA-BF algorithm. This method is shown to decrease WER (word error rate) for speech recognition by up to 31% and improve speech quality via the SEGSNR (segment signal-to-noise ratio) by up to 5.5 dB on the average, simultaneously.


Speech Communication | 2010

Analysis of CFA-BF: Novel combined fixed/adaptive beamforming for robust speech recognition in real car environments

John H. L. Hansen; Xianxian Zhang

Among a number of studies which have investigated various speech enhancement and processing schemes for in-vehicle speech systems, the delay-and-sum beamforming (DASB) and adaptive beamforming are two typical methods that both have their advantages and disadvantages. In this paper, we propose a novel combined fixed/adaptive beamforming solution (CFA-BF) based on previous work for speech enhancement and recognition in real moving car environments, which seeks to take advantage of both methods. The working scheme of CFA-BF consists of two steps: source location calibration and target signal enhancement. The first step is to pre-record the transfer functions between the speaker and microphone array from different potential source positions using adaptive beamforming under quiet environments; and the second step is to use this pre-recorded information to enhance the desired speech when the car is running on the road. An evaluation using extensive actual car speech data from the CU-Move Corpus shows that the method can decrease WER for speech recognition by up to 30% over a single channel scenario and improve speech quality via the SEGSNR measure by up to 1dB on the average.


international conference on acoustics, speech, and signal processing | 2004

Speech enhancement based on a combined multi-channel array with constrained iterative and auditory masked processing

Xianxian Zhang; John H. L. Hansen; K.A. Rehar

While a number of studies have investigated various speech enhancement and noise suppression schemes, most consider either a single channel or array processing framework. Clearly there are potential advantages in leveraging the strengths of array processing solutions in suppressing noise from a direction other than the speaker, with that seen in single channel methods that include speech spectral constraints or psychoacoustically motivated processing. In this paper, we propose to integrate a combined fixed/adaptive beamforming algorithm (CFA-BF) for speech enhancement with two single channel methods based on speech spectral constrained iterative processing (Auto-LSP), and an auditory masked threshold based method using equivalent rectangular bandwidth filtering (GMMSE-AMTERB). After formulating the method, we evaluate performance on a subset of the TIMIT corpus with four real noise sources. We demonstrate a consistent level of noise suppression and voice communication quality improvement using the proposed method as reflected by an overall average 26dB increase in SegSNR from the original degraded audio corpus.


2nd Biennial Workshop on Digital Signal Processing for Mobile and Vehicular Systems, DSP 2005 | 2007

Speaker Source Localization Using Audio-Visual Data and Array Processing Based Speech Enhancement for In-Vehicle Environments

Xianxian Zhang; John H. L. Hansen; Kazuya Takeda; Toshiki Maeno; Kathryn H. Arehart

Interactive systems for in-vehicle applications have as their central goal the primary need to improve driver safety while allowing drivers effective control of vehicle functions, access to on-board or remote information, or safe hands-free human communications. Human-Computer interaction for in-vehicle systems requires effective audio capture, tracking of who is speaking, environmental noise suppression, and robust processing for applications such as route navigation, hands-free mobile communications, and human-to-human communications for hearing impaired subjects. In this chapter, we discuss safety with application for two interactive speech processing frameworks for in-vehicle systems. First, we consider integrating audio-visual processing for detecting the primary speech for a driver using a route navigation system. Integrating both visual and audio content allows us to reject unintended speech to be submitted for speech recognition within the route dialog system. Second, we consider a combined multi-channel array processing scheme based on a combined fixed and adaptive array processing scheme (CFA-BF) with a spectral constrained iterative Auto-LSP and auditory masked GMMSE-AMT-ERB processing for speech enhancement. The combined scheme takes advantage of the strengths offered by array processing methods in noisy environments, as well as speed and efficiency for single channel methods. We evaluate the audio-visual localization scheme for route navigation dialogs and show improved speech accuracy by up to 40% using the CIAIR in-vehicle data corpus from Nagoya, Japan. For the combined array processing and speech enhancement methods, we demonstrate consistent levels of noise suppression and voice communication quality improvement using a subset of the TIMIT corpus with four real noise sources, with an overall average 26dB increase in SegSNR from the original degraded audio corpus.


conference of the international speech communication association | 2002

High performance digit recognition in real car environments.

Umit H. Yapanel; Xianxian Zhang; John H. L. Hansen


conference of the international speech communication association | 2003

CFA-BF: a novel combined fixed/adaptive beamforming for robust speech recognition in real car environments.

Xianxian Zhang; John H. L. Hansen


conference of the international speech communication association | 2004

Audio-visual speaker localization for car navigation systems

Xianxian Zhang; Kazuya Takeda; John H. L. Hansen; Toshiki Maeno


conference of the international speech communication association | 2005

In-set/out-of-set speaker identification based on discriminative speech frame selection

Xianxian Zhang; John H. L. Hansen

Collaboration


Dive into the Xianxian Zhang's collaboration.

Top Co-Authors

Avatar

John H. L. Hansen

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Kathryn H. Arehart

University of Colorado Boulder

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jessica Rossi-Katz

University of Colorado Boulder

View shared research outputs
Top Co-Authors

Avatar

Umit H. Yapanel

University of Colorado Boulder

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bryan L. Pellom

University of Colorado Boulder

View shared research outputs
Top Co-Authors

Avatar

K.A. Rehar

University of Colorado Boulder

View shared research outputs
Top Co-Authors

Avatar

Murat Akbacak

University of Texas at Dallas

View shared research outputs
Researchain Logo
Decentralizing Knowledge