Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ruohua Zhou is active.

Publication


Featured researches published by Ruohua Zhou.


IEEE Transactions on Audio, Speech, and Language Processing | 2008

Music Onset Detection Based on Resonator Time Frequency Image

Ruohua Zhou; Marco Mattavelli; Giorgio Zoia

This paper describes a new method for music onset detection. The novelty of the approach consists mainly of two elements: the time-frequency processing and the detection stages. The resonator time frequency image (RTFI) is the basic time-frequency analysis tool. The time-frequency processing part is in charge of transforming the RTFI energy spectrum into more natural energy- change and pitch-change cues that are then used as input elements for the detection of music onsets by detection tools. Two detection algorithms have been developed: an energy-based algorithm and a pitch-based one. The energy-based detection algorithm exploits energy-change cues and performs particularly well for the detection of hard onsets. The pitch-based algorithm successfully exploits stable pitch cues for the onset detection in polyphonic music, and achieves much better performances than the energy-based algorithm when applied to the detection of soft onsets. Results for both the energy-based and pitch-based detection algorithms have been obtained on a large music dataset.


EURASIP Journal on Advances in Signal Processing | 2009

A computationally efficient method for polyphonic pitch estimation

Ruohua Zhou; Joshua D. Reiss; Marco Mattavelli; Giorgio Zoia

This paper presents a computationally efficient method for polyphonic pitch estimation. The method employs the Fast Resonator Time-Frequency Image (RTFI) as the basic time-frequency analysis tool. The approach is composed of two main stages. First, a preliminary pitch estimation is obtained by means of a simple peak-picking procedure in the pitch energy spectrum. Such spectrum is calculated from the original RTFI energy spectrum according to harmonic grouping principles. Then the incorrect estimations are removed according to spectral irregularity and knowledge of the harmonic structures of the music notes played on commonly used music instruments. The new approach is compared with a variety of other frame-based polyphonic pitch estimation methods, and results demonstrate the high performance and computational efficiency of the approach.


multimedia signal processing | 2004

A multi-timbre chord/harmony analyzer based on signal processing and neural networks

Giorgio Zoia; Ruohua Zhou; Daniel Mlynek

The automatic analysis of a polyphonic sound is still a very challenging task, not only for computational reasons but also because of the lack in suitable techniques and restrictions in the application field, or sometimes due to unrealistic goals. A remarkable progress has been made in the last decade, but still, a practical and generic solution for this problem is hard to find. In this paper, we propose a rather general solution for a chord/harmony analyzer, which is able provide good results for different instruments and polyphonic sounds. It is based on the combination of signal processing and neural networks. The sound is analyzed in both time and frequency domains by a hybrid original technique. A rather innovative approach is then used to classify the chords and extract their evolution time. The proposed overall method aims to implement a general purpose listening machine, whose approximated results and approach are nevertheless general enough to allow the implementation of very useful applications.


information sciences, signal processing and their applications | 2007

A new time-frequency representation for music signal analysis: Resonator time-frequency image

Ruohua Zhou; Marco Mattavelli

Most of music related tasks need a joint time-frequency analysis because a music signal varies with time. The existing time-frequency analysis approaches show some serious limitations for application in music signal processing. This paper presents an original frequency-dependent time-frequency analysis tool called the Resonator Time-Frequency Image (RTFI). The RTFI is implemented by a first-order complex resonator bank so as to be computation-efficient. Different music analysis tasks may have different time-frequency resolution requirements. Using the RTFI, one can select different time-frequency resolutions, such as uniform analysis, constant-Q analysis, or ear-like analysis by simply setting different parameters; and letting the RTFI generalize all these analyses in one framework. An example of using the RTFI in music onset detection is introduced.


international conference on acoustics, speech, and signal processing | 2014

Language recognition system using language branch discriminative information

Xianliang Wang; Yulong Wan; Lin Yang; Ruohua Zhou; Yonghong Yan

This paper presents our study of using language branch discriminative information effectively for language recognition. Language branch variability (LBV) method based on factor analysis techniques is proposed. In LBV method, language branch variability factor is obtained by concatenating low-dimensional factors in the language branch variability spaces. Language models are trained within language branches and between languages. Experiments on NIST 2011 Language Recognition Evaluation (LRE) 30s, 10s and 03s tasks show the proposed LBV method provides stable improvement compared to the state-of-art total variability (TV) approach. In 30-second task, it gains relative improvement by 14.6% in equal error rate (EER) and 12.9% in minimum decision cost value (minDCF), and in new metrics of NIST 2011 LRE, it leads to relative improvement of 7.2%-17.7%.


IET Biometrics | 2014

Voice biometrics using linear Gaussian model

Hai Yang; Yunfei Xu; Houjun Huang; Ruohua Zhou; Yonghong Yan

This study introduces a linear Gaussian model-based framework for voice biometrics. The model works with discrete-time linear dynamical systems. The study motivation is to use the linear Gaussian modelling method in voice biometrics, and show that the accuracy offered by the linear Gaussian modelling method is comparable with other state-of-the-art methods such as Probabilistic Linear Discriminant Analysis and two-covariance model. An expectation-maximisation algorithm is derived to train the model and a Bayesian solution is used to calculate the log-likelihood ratio score of all trials of speakers. This approach performed well on the core-extended conditions of the NIST 2010 Speaker Recognition Evaluation, and is competitive compared with the Gaussian probabilistic linear discriminant analysis, in terms of normalised decision cost function.


Applied Mechanics and Materials | 2013

Automatic Transcription of Piano Music Using Audio-Vision Fusion

Yu Long Wan; Zhi Gang Wu; Ruohua Zhou; Yong Hong Yan

Over the last decade many sophisticated and application-specific methods have been proposed for transcription of polyphonic music. However, the performance seems to have reached a limit. This paper describes a high-performance piano transcription system with two main contributions. Firstly, a new onset detection method is proposed using a specific energy envelope matched filter, which has been proved very suitable for piano music. Secondly, a computer-vision method is proposed to enhance audio-only piano music transcription, using the recognition of the players hands on the piano keyboard. We carried out comparable experiments respectively for onset detection and overall system based on the MAPS database and the video database. The results were compared with the best piano transcription system in MIREX 2008, which still kept the best performance in piano subset till MIREX 2012. The results show that the system outperforms the state-of-art method substantially.


IEEE Transactions on Audio, Speech, and Language Processing | 2017

Window-Dominant Signal Subspace Methods for Multiple Short-Term Speech Source Localization

Dongwen Ying; Ruohua Zhou; Junfeng Li; Yonghong Yan

Signal subspace has been widely exploited to localize multiple speech sources. However, most signal subspace methods cannot count the number of sources, and do not make use of speech sparsity in the frequency domain. This paper presents a grid search window-dominant signal subspace (GS-WDSS) method and a closed-form WDSS (CF-WDSS) method to localize short-term speech sources. Such methods are based upon the generalized sparsity assumption that each window containing some time-adjacent bins is dominated by one source, as opposed to the conventional assumption that each individual bin is dominated by one source. The generalized assumption enables the principal eigenvector of the spatial correlation matrix on each window to span the signal subspace of the window-dominant source. The direction-of-arrival (DOA) of the dominant source is estimated from the principal eigenvector. The DOAs and the number of sources are eventually summarized from the DOA histogram of all dominant sources. The conventional assumption is a special case of the generalized assumption. By using the generalized assumption, the performance in estimating DOAs of the window-dominant sources is significantly improved at the cost of acceptable masking effect. The superiority of the proposed methods is verified by simulated and real experiments.


international symposium on chinese spoken language processing | 2016

Robust multiple speech source localization based on phase difference regression

Zhaoqiong Huang; Ge Zhan; Dongwen Ying; Ruohua Zhou; Jielin Pan; Yonghong Yan

Spatial aliasing is a challenging issue that faced by most multiple speech source localization methods. Small-size arrays are widely used to avoid or mitigate spatial aliasing. But they deteriorate the coherence in low frequencies and degrade the performance of localization. This paper proposes a phase difference regression method for multiple speech source localization on a planar array. The time delay histogram is firstly applied to classify the frequency bins into clusters that correspond to speech sources, and then, the phase difference regression is conducted on each cluster. Since the error of the phase difference is limited in the range of [−π, π], the proposed method avoids the ambiguity in the period number of phase. Although conventional regression method considers the period number, it does not bring significant advantage over the proposed method. The experimental results confirm the superiority of the proposed method on large-size arrays.


international conference on intelligent human-machine systems and cybernetics | 2016

Characterization Vector Extraction Using Neural Network for Speaker Recognition

Wenchao Wang; Qingsheng Yuan; Ruohua Zhou; Yonghong Yan

The State-of-the-art speaker recognition system is now using the i-vector framework to make the supervector of the UBM to a low-dimensional vector. In this paper, we propose a new method to do the same convert which contains more speakers information. This method, using the mind of bottleneck, is based on the usual Artificial Neural Network. The low-dimensional vector extracted from the new method is more speaker-dependent and it is effective in interview microphone speech. Our experiment focus on the comparison between usual i-vectors and the new vectors we proposed. The results of our experiment indicate that the equal error rate and the minimum detection cost are improved by using our new method.

Collaboration


Dive into the Ruohua Zhou's collaboration.

Top Co-Authors

Avatar

Yonghong Yan

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Giorgio Zoia

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar

Xianliang Wang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Houjun Huang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Yulong Wan

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Yunfei Xu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Marco Mattavelli

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar

Dongwen Ying

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Hai Yang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Lin Yang

Chinese Academy of Sciences

View shared research outputs
Researchain Logo
Decentralizing Knowledge