Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shengkui Zhao is active.

Publication


Featured researches published by Shengkui Zhao.


international conference on acoustics, speech, and signal processing | 2015

A learning-based approach to direction of arrival estimation in noisy and reverberant environments

Xiong Xiao; Shengkui Zhao; Xionghu Zhong; Douglas L. Jones; Eng Siong Chng; Haizhou Li

This paper presents a learning-based approach to the task of direction of arrival estimation (DOA) from microphone array input. Traditional signal processing methods such as the classic least square (LS) method rely on strong assumptions on signal models and accurate estimations of time delay of arrival (TDOA) . They only work well in relatively clean conditions, but suffer from noise and reverberation distortions. In this paper, we propose a learning-based approach that can learn from a large amount of simulated noisy and reverberant microphone array inputs for robust DOA estimation. Specifically, we extract features from the generalised cross correlation (GCC) vectors and use a multilayer perceptron neural network to learn the nonlinear mapping from such features to the DOA. One advantage of the learning based method is that as more and more training data becomes available, the DOA estimation will become more and more accurate. Experimental results on simulated data show that the proposed learning based method produces much better results than the state-of-the-art LS method. The testing results on real data recorded in meeting rooms show improved root-mean-square error (RMSE) compared to the LS method.


Signal Processing | 2014

Fast communication: Underdetermined direction of arrival estimation using acoustic vector sensor

Shengkui Zhao; Tigran Saluev; Douglas L. Jones

This paper presents a new approach for the estimation of two-dimensional (2D) direction-of-arrival (DOA) of more sources than sensors using an Acoustic Vector Sensor (AVS). The approach is developed based on Khatri-Rao (KR) product by exploiting the subspace characteristics of the time variant covariance matrices of the uncorrelated quasi-stationary source signals. An AVS is used to measure both the acoustic pressure and pressure gradients in a complete sound field and the DOAs are determined in both horizontal and vertical planes. The identifiability of the presented KR-AVS approach is studied in both theoretic analysis and computer simulations. Computer simulations demonstrated that 2D DOAs of six speech sources are successfully estimated. Superior root mean square error (RMSE) is obtained using the new KR-AVS array approach compared to the other geometries of the non-uniform linear array, the 2D L-shape array, and the 2D triangular array.


conference on industrial electronics and applications | 2012

A real-time 3D sound localization system with miniature microphone array for virtual reality

Shengkui Zhao; Saima Ahmed; Yun Liang; Kyle Rupnow; Deming Chen; Douglas L. Jones

This paper presents a real-time three-dimensional (3D) wideband sound localization system designed with a miniature XYZO microphone array. Unlike the conventional microphone arrays for sound localization using only omnidirectional microphones, the presented microphone array is designed with both bidirectional (pressure gradient) and omnidirectional microphones. Therefore, the array has significantly reduced size and is known as the worlds smallest microphone array design for 3D sound source localization in air. In this paper, we describe the 3D array configuration and perform array calibration. For 3D sound localization, we provide studies on the array output model of the XYZO array, the widely known direction-of-arrival (DOA) estimation methods and the direction search in 3D space. To achieve the real-time processing for 1° search resolution, we accelerate the parallel computations on GPU platform with CUDA programming, and a 130X speedup is achieved compared to a multi-thread CPU implementation. The performance of the proposed system is studied under various reverberation lengths and signal-to-noise levels. We also demonstrate a real-time 3D sound localization demo showing good ability to virtual reality.


design, automation, and test in europe | 2012

Real-time implementation and performance optimization of 3D sound localization on GPUs

Yun Liang; Zheng Cui; Shengkui Zhao; Kyle Rupnow; Yihao Zhang; Douglas L. Jones; Deming Chen

Real-time 3D sound localization is an important technology for various applications such as camera steering systems, robotics audition, and gunshot direction. 3D sound localization adds a new dimension, but also significantly increases the computational requirements. Real-time 3D sound localization continuously processes large volumes of data for each possible 3D direction and acoustic frequency range. Such highly demanding compute requirements outpace current CPU compute abilities. This paper develops a real-time implementation of 3D sound localization on Graphical Processing Units (GPUs). Massively parallel GPU architectures are shown to be well suited for 3D sound localization. We optimize various aspects of GPU implementation, such as number of threads per thread block, register allocation per thread, and memory data layout for performance improvement. Experiments indicate that our GPU implementation achieves 501X and 130X speedup compared to a single-thread and a multi-thread CPU implementation respectively, thus enabling real-time operation of 3D sound localization.


international conference on acoustics, speech, and signal processing | 2014

Robust DOA estimation of multiple speech sources

Nguyen Thi Ngoc Tho; Shengkui Zhao; Douglas L. Jones

It is challenging to determine the directions of arrival of speech signals when there are fewer sensors than sources, particularly in noisy and reverberant environments. The coherence test by Mohan et al. exploits the time-frequency sparseness of non-stationary speech signals to select more relevant time-frequency bins to estimate directions of arrival. With no prior knowledge about the incoming sources, this work proposes a combination of noise-floor tracking, onset detection and a coherence test to robustly identify time-frequency bins where only one source is dominant. After that, the largest eigenvectors of covariance matrices corresponding to these bins are clustered and the directions of arrival of the sources are estimated based on the cluster centroids. Simulation and experimental results show that this method is able to localize 8 sources with small errors using only 3 omnidirectional microphones. The proposed method is robust to background noise and reverberation.


ieee automatic speech recognition and understanding workshop | 2015

Robust speech recognition using beamforming with adaptive microphone gains and multichannel noise reduction

Shengkui Zhao; Xiong Xiao; Zhaofeng Zhang; Thi Ngoc Tho Nguyen; Xionghu Zhong; Bo Ren; Longbiao Wang; Douglas L. Jones; Eng Siong Chng; Haizhou Li

This paper presents a robust speech recognition system using a microphone array for the 3rd CHiME Challenge. A minimum variance distortionless response (MVDR) beamformer with adaptive microphone gains is proposed for robust beamforming. Two microphone gain estimation methods are studied using the speech-dominant time-frequency bins. A multichannel noise reduction (MCNR) postprocessing is also proposed to further reduce the interference in the MVDR processed signal. Experimental results for the ChiME-3 challenge show that both the proposed MVDR beamformer with microphone gains and the MCNR postprocessing improve the speech recognition performance significantly. With the state-of-the-art deep neural network (DNN) based acoustic model, our system achieves a word error rate (WER) of 11.67% on the real test data of the evaluation set.


Circuits Systems and Signal Processing | 2014

New Variable Step-Sizes Minimizing Mean-Square Deviation for the LMS-Type Algorithms

Shengkui Zhao; Douglas L. Jones; Suiyang Khoo; Zhihong Man

The least-mean-square-type (LMS-type) algorithms are known as simple and effective adaptation algorithms. However, the LMS-type algorithms have a trade-off between the convergence rate and steady-state performance. In this paper, we investigate a new variable step-size approach to achieve fast convergence rate and low steady-state misadjustment. By approximating the optimal step-size that minimizes the mean-square deviation, we derive variable step-sizes for both the time-domain normalized LMS (NLMS) algorithm and the transform-domain LMS (TDLMS) algorithm. The proposed variable step-sizes are simple quotient forms of the filtered versions of the quadratic error and very effective for the NLMS and TDLMS algorithms. The computer simulations are demonstrated in the framework of adaptive system modeling. Superior performance is obtained compared to the existing popular variable step-size approaches of the NLMS and TDLMS algorithms.


international conference on acoustics, speech, and signal processing | 2015

Large region acoustic source mapping using movable arrays

Shengkui Zhao; Thi Ngoc Tho Nguyen; Douglas L. Jones

Mapping environmental noise with high resolution on a large scale (such as a city) is prohibitively expensive with current approaches, which use a large, dense array spanning the entire region of interest, or sequential noise measurements at thousands of locations on a dense grid. We propose instead a new acoustic measurement scheme using a small movable array (for example, mounted on a vehicle driving along the streets of a city) to rapidly acquire measurements at many different locations. A multiple-point sparse constrained deconvolution approach for the mapping of acoustic sources (MPSC-DAMAS) and a multiple-point covariance matrix fitting (MP-CMF) approach are developed to accurately estimate the locations and powers of stationary noise sources across the region of interest. Computer simulations of large region acoustic mapping demonstrate that superior resolution and much lower power estimation errors are achieved by the proposed approaches compared to the state-of-the-art SC-DAMAS approach and CMF approach.


IEEE Journal of Selected Topics in Signal Processing | 2015

ITEM: Immersive Telepresence for Entertainment and Meetings—A Practical Approach

Viet Anh Nguyen; Jiangbo Lu; Shengkui Zhao; Dung T. Vu; Hongsheng Yang; Douglas L. Jones; Minh N. Do

This paper presents an immersive telepresence system for entertainment and meetings (ITEM). The system aims to provide a radically new video communication experience by seamlessly merging participants into the same virtual space to allow a natural interaction among them and shared collaborative contents. With the goal to make a scalable, flexible system for various business solutions as well as easily accessible by massive consumers, we address the challenges in the whole pipeline of media processing, communication, and displaying in our design and realization of such a system. Particularly, in this paper we focus on the system aspects that maximize the end-user experience, optimize the system and network resources, and enable various teleimmersive (TI) application scenarios. In addition, we also present a few key technologies, i.e., fast object-based video coding for real world data and spatialized audio capture and 3-D sound localization for group teleconferencing. Our effort is to investigate and optimize the key system components and provide an efficient end-to-end optimization and integration by considering user needs and preferences. Extensive experiments show the developed system runs reliably and comfortably in real time with a minimal setup requirement (e.g., a webcam or a color plus depth camera, an optional microphone array, a laptop/desktop connected to the Internet) for TI communication. With such a really minimal deployment requirement, we present a variety of interesting applications and user experiences created by ITEM.


international conference on acoustics, speech, and signal processing | 2017

On time-frequency mask estimation for MVDR beamforming with application in robust speech recognition

Xiong Xiao; Shengkui Zhao; Douglas L. Jones; Eng Siong Chng; Haizhou Li

Acoustic beamforming has played a key role in the robust automatic speech recognition (ASR) applications. Accurate estimates of the speech and noise spatial covariance matrices (SCM) are crucial for successfully applying the minimum variance distortionless response (MVDR) beamforming. Reliable estimation of time-frequency (TF) masks can improve the estimation of the SCMs and significantly improve the performance of the MVDR beamforming in ASR tasks. In this paper, we focus on the TF mask estimation using recurrent neural networks (RNN). Specifically, our methods include training the RNN to estimate the speech and noise masks independently, training the RNN to minimize the ASR cost function directly, and performing multiple passes to iteratively improve the mask estimation. The proposed methods are evaluated individually and overally on the CHiME-4 challenge. The results show that the proposed methods improve the ASR performance individually and also work complementarily. The overall performance achieves a word error rate of 8.9% with 6-microphone configuration, which is much better than 12.0% achieved with the state-of-the-art MVDR implementation.

Collaboration


Dive into the Shengkui Zhao's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eng Siong Chng

Nanyang Technological University

View shared research outputs
Top Co-Authors

Avatar

Haizhou Li

National University of Singapore

View shared research outputs
Top Co-Authors

Avatar

Xiong Xiao

Nanyang Technological University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Zhihong Man

Swinburne University of Technology

View shared research outputs
Top Co-Authors

Avatar

Xionghu Zhong

Nanyang Technological University

View shared research outputs
Top Co-Authors

Avatar

Viet Anh Nguyen

Nanyang Technological University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge