Y. X. Zou
Peking University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Y. X. Zou.
international conference on acoustics, speech, and signal processing | 2015
Weiqiao Zheng; Y. X. Zou; Christian Ritz
Accurate DOA estimation based on clustering the inter-sensor data ratios (ISDRs) of a single acoustic vector sensor (AVS), referred as AVS-ISDR, relies on reliable extraction of time-frequency points with high local signal-to-noise ratio (HLSNR-TFPs) and its performance degrades in noisy environments. This paper investigates deep neural networks (DNNs) trained with noisy-clean speech pairs under different SNR levels and noise types to improve the performance of AVS-ISDR in noise conditions. The DNNs is trained to learn characteristics reflecting the level of speech information at different TFPs, which helps to generate a reliable spectral mask for obtaining a noise-reduced spectral. Correspondingly, a robust DOA estimation algorithm named as AVS-DNN-ISDR has been developed. Experimental results verify the proposed DNN-based spectral mask improves the reliable HLSNR-TFPs extraction at different SNR levels. Results from simulations and real AVS recordings further validate AVS-DNN-ISDR achieving high DOA estimation accuracy even when the SNR is lower than 0dB.
international conference on digital signal processing | 2015
Jiaer Chen; Yong Qing Wang; Y. X. Zou
This paper proposes an approach to eliminate redundant images adaptively for Wireless Capsule Endoscopy (WCE) video summarization by considering temporal correlation and feature similarity between adjacent WCE frames. The color and texture features, generated by HSV color histogram model and Gray Level Co-occurrence Matrix, have been taken into account. It is noted that frames from different WCE videos may have different dynamic information ranges. Hence a data-driven threshold termed as W-parametric mean value threshold (W-MVT) is developed to improve robustness of the proposed method. By comparing the color-texture feature similarity of adjacent WCE frames with W-MVT sequentially, the temporal correlated images with certain similarity are grouped into the same clip. Eventually, to consider gradient varying characteristic in one clip, the adaptive K-means clustering algorithm is adopted to keep key frames while remove redundant frames further. Experimental results show that two evaluation indicators-F-measure and compression ratio achieve 81.94% and 80.31%, which validates the effectiveness of the proposed WCE redundant image elimination (WCE-RIE) method.
international conference on signal and information processing | 2014
Tao Ma; Y. X. Zou; Zhiqiang Xiang; Lei Li; Yi Li
Wireless capsule endoscopy (WCE) is a promising technology for gastrointestinal disease detection. Since there are more than 50,000 frames in one WCE video of a patient, classifying the whole frame set of the digestive tract into subsets corresponding to esophagus, stomach, small intestine, and colon is necessary, which can help physicians review and diagnose rapidly and accurately. The digestive organ classification in WCE is a challenging task due to the difficulties in feature representation of WCE images. This paper presents a new method of WCE organ classification by incorporating a proposed locality constraint based vector sparse coding (LCVSC) algorithm with the support vector machine classifier. Experimental results validate the effectiveness of the proposed method and it is encouraging to see that a good classification performance is achieved.
international conference on digital signal processing | 2016
Dong Wang; X. S. Guo; Y. X. Zou
Device-free localization (DFL) aims at locating the positions of targets without carrying any emitting devices by monitoring the received signals of preset wireless devices. Research showed that the localization accuracy of conventional DFL algorithms decreases in presence of noise and outliers. To tackle this problem, this paper firstly proposes to study the DFL via sparse representation and the target localization is formulated as a sparse representation classification (SRC) problem. Specifically, an overcomplete sample dictionary is constructed by received signal strength and the target can be located by SRC method. To suppress the adverse impact of noise and outliers, we formulate the DFL-SRC problem in signal subspace. Two DFL algorithms termed as SDSRC and SSDSRC are derived. Experimental results with real recorded data and simulated interferences demonstrate that SDSRC and SSDSRC outperform the nonlinear optimization approach with outlier link rejection in terms of localization accuracy and robustness to noise and outliers.
international conference on acoustics, speech, and signal processing | 2016
Y H. Jin; Y. X. Zou
For mobile speech application, speaker DOA estimation accuracy, interference robustness and compact physical size are three key factors. Considering the size, we utilized acoustic vector sensor (AVS) and proposed a DOA estimation algorithm previously [1], offering high accuracy with larger-than-15dB SNR but is deteriorated by nonspeech interferences (NSI). This paper develops a robust speaker DOA estimation algorithm. It is achieved by deriving the intersensor data ratio model of an AVS in bispectrum domain (BISDR) and exploring the favorable properties of bispectrum, such as zero value of Gaussian process and different distribution of speech and NSI. Specifically, a reliable bispectrum mask is generated to guarantee that the speaker DOA cues, derived from BISDR, are robust to NSI in terms of speech sparsity and large bispectrum amplitude of the captured signals. Intensive experiments demonstrate an improved performance of our proposed algorithm under various NSI conditions even when SIR is smaller than 0dB.
ieee international conference on multimedia big data | 2016
Z. Q. Xiang; X. L. Huang; Y. X. Zou
Big traffic data analysis for intelligent transportation is attracting more and more attention. Due to different designs of vehicles in the same class and the similarity of shape and textures between different classes, vehicle classification is remaining a challenge. In this paper, different from traditional methods that only classify vehicles to two or three types in one viewpoint, a novel method using local and structural features has been proposed for vehicle classification in real-time traffic system that has a good ability to categorize vehicles into more specific types and is robust to the changes in viewpoint. Specifically, local features are obtained using scale invariant feature transform (SIFT), and an efficient L2-norm sparse coding technique is used to reduce computational cost. Besides, vehicle building structures are extracted as structural features. Finally, linear support vector machine (SVM) is used as the classifier. The performance evaluations using real vehicle images extracted from surveillance videos in different viewpoints are carried out and five vehicle classes (SUV, truck, van, bus, car) are considered. Experimental results show that the proposed method can obtain an average accuracy of 95.95% in real-time, which validate the effectiveness of our method.
international conference on digital signal processing | 2015
S. J. Liu; Xuyuan Xu; Y. X. Zou
A time-interleaved analog-to-digital converter (TIADC) is a promising solution for high speed and high resolution ADC. Blind timing skew estimation (TSE) is one of its key techniques for implementing online timing skew mismatch compensation digitally. Assuming the input signal of the TIADC is of spectral sparsity, in this paper, an efficient blind TSE algorithm is developed by employing the all phase Fast Fourier Transform (ApFFT) technique to obtain the accurate phase spectral estimation, which results in the ApFFT-SS-BLTSE algorithm. Experimental results show that, compared to the SS-BLTSE algorithm in [1], the proposed ApFFT-SS-BLTSE algorithm requires less computational cost and is able to offer high TSE accuracy for both narrow and wideband source signals in the presence of additive noise.
international conference on signal and information processing | 2013
Lei Li; Y. X. Zou; Yi Li
Wireless capsule endoscopy (WCE) is a new innovative solution for gastrointestinal disease detection. The image quality of WCE is not satisfactory for medical applications since some of them are dark and low-contrast. The WCE image enhancement is a challenge task, mainly because the diversity of the WCE images of different people and the need to preserve the local fine details of WCE images. Hence, it is difficult to apply the traditional image enhancement methods based on global information to WCE image enhancement. In this paper, motivated by the capability of the anisotropic diffusion method in preserving local fine details, a new adaptive contrast anisotropic diffusion method has been developed to enhance WCE images, where the Hessian matrix is used to obtain better representation of the contrast information of WCE images meanwhile the diffusion parameter of the diffusion coefficient function is automatically determined according to the local characteristic. Experimental results have demonstrated the effectiveness of the proposed image enhancement method.
ieee international conference on multimedia big data | 2016
Chun Wang; Y. X. Zou; Shihan Liu; Wei Shi; Weiqiao Zheng
Playback attack detection (PAD) is essentially a binary classification task which is used to identify the authentic recordings from the playback recordings. For PAD problem, the difference of the acoustic feature between the authentic and playback recordings mainly comes from the recording channel and the ambient noise. Motivated by the excellent performance of the Gaussian Mixture Model-Universal Background Model (GMM-UBM) in modeling the characteristics of speaker and the GMM supervector (GSV) in characterizing speech utterances, this paper proposes an efficient learning based smartphone PAD system using the GSV feature and kernel support vector machine (SVM) (termed as GSV-SVM-PAD). To facilitate the performance of our proposed PAD system, a playback attack detection database (PADD) is designed where more than 14,000 utterances of 100 speakers have been recorded. Extensive experimental results show that the proposed GSV-SVM-PAD system offers a great performance and the lowest number of error classification (NEC) with the increasing tested speakers and utterances. Besides, the NEC of GSV-SVM-PAD is even less than 5 when 7694 utterances of 100 speakers are tested.
asia pacific signal and information processing association annual summit and conference | 2016
Shahab Pasha; Christian Ritz; Y. X. Zou
This paper proposes a novel approach to detecting multiple, simultaneous talkers in multi-party meetings using localisation of active speech sources recorded with an ad-hoc microphone array. Cues indicating the relative distance between sources and microphones are derived from speech signals and room impulse responses recorded by each of the microphones distributed at unknown locations within a room. Multiple active sources are localised by analysing a surface formed from these cues and derived at different locations within the room. The number of localised active sources per each frame or utterance is then counted to estimate when multiple sources are active. The proposed approach does not require prior information about the number and locations of sources or microphones. Synchronisation between microphones is also not required. A meeting scenario with competing speakers is simulated and results show that simultaneously active sources can be detected with an average accuracy of 75% and the number of active sources counted accurately 65% of the time.