Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ryo Mukai is active.

Publication


Featured researches published by Ryo Mukai.


IEEE Transactions on Speech and Audio Processing | 2004

A robust and precise method for solving the permutation problem of frequency-domain blind source separation

Hiroshi Sawada; Ryo Mukai; Shoko Araki; Shoji Makino

Blind source separation (BSS) for convolutive mixtures can be solved efficiently in the frequency domain, where independent component analysis (ICA) is performed separately in each frequency bin. However, frequency-domain BSS involves a permutation problem: the permutation ambiguity of ICA in each frequency bin should be aligned so that a separated signal in the time-domain contains frequency components of the same source signal. This paper presents a robust and precise method for solving the permutation problem. It is based on two approaches: direction of arrival (DOA) estimation for sources and the interfrequency correlation of signal envelopes. We discuss the advantages and disadvantages of the two approaches, and integrate them to exploit their respective advantages. Furthermore, by utilizing the harmonics of signals, we make the new method robust even for low frequencies where DOA estimation is inaccurate. We also present a new closed-form formula for estimating DOAs from a separation matrix obtained by ICA. Experimental results show that our method provided an almost perfect solution to the permutation problem for a case where two sources were mixed in a room whose reverberation time was 300 ms.


IEEE Transactions on Speech and Audio Processing | 2003

The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech

Shoko Araki; Ryo Mukai; Shoji Makino; Tsuyoki Nishikawa; Hiroshi Saruwatari

Despite several recent proposals to achieve blind source separation (BSS) for realistic acoustic signals, the separation performance is still not good enough. In particular, when the impulse responses are long, performance is highly limited. In this paper, we consider a two-input, two-output convolutive BSS problem. First, we show that it is not good to be constrained by the condition T>P, where T is the frame length of the DFT and P is the length of the room impulse responses. We show that there is an optimum frame size that is determined by the trade-off between maintaining the number of samples in each frequency bin to estimate statistics and covering the whole reverberation. We also clarify the reason for the poor performance of BSS in long reverberant environments, highlighting that the framework of BSS works as two sets of frequency-domain adaptive beamformers. Although BSS can reduce reverberant sounds to some extent like adaptive beamformers, they mainly remove the sounds from the jammer direction. This is the reason for the difficulty of BSS in reverberant environments.


Signal Processing | 2007

Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors

Shoko Araki; Hiroshi Sawada; Ryo Mukai; Shoji Makino

This paper presents a new method for blind sparse source separation. Some sparse source separation methods, which rely on source sparseness and an anechoic mixing model, have already been proposed. These methods utilize level ratios and phase differences between sensor observations as their features, and they separate signals by classifying them. However, some of the features cannot form clusters with a well-known clustering algorithm, e.g., the k-means. Moreover, most previous methods utilize a linear sensor array (or only two sensors), and therefore they cannot separate symmetrically positioned sources. To overcome such problems, we propose a new feature that can be clustered by the k-means algorithm and that can be easily applied to more than three sensors arranged non-linearly. We have obtained promising results for two- and three-dimensionally distributed speech separation with non-linear/non-uniform sensor arrays in a real room even in underdetermined situations. We also investigate the way in which the performance of such methods is affected by room reverberation, which may cause the sparseness and anechoic assumptions to collapse.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Grouping Separated Frequency Components by Estimating Propagation Model Parameters in Frequency-Domain Blind Source Separation

Hiroshi Sawada; Shoko Araki; Ryo Mukai; Shoji Makino

This paper proposes a new formulation and optimization procedure for grouping frequency components in frequency-domain blind source separation (BSS). We adopt two separation techniques, independent component analysis (ICA) and time-frequency (T-F) masking, for the frequency-domain BSS. With ICA, grouping the frequency components corresponds to aligning the permutation ambiguity of the ICA solution in each frequency bin. With T-F masking, grouping the frequency components corresponds to classifying sensor observations in the time-frequency domain for individual sources. The grouping procedure is based on estimating anechoic propagation model parameters by analyzing ICA results or sensor observations. More specifically, the time delays of arrival and attenuations from a source to all sensors are estimated for each source. The focus of this paper includes the applicability of the proposed procedure for a situation with wide sensor spacing where spatial aliasing may occur. Experimental results show that the proposed procedure effectively separates two or three sources with several sensor configurations in a real room, as long as the room reverberation is moderately low.


IEEE Transactions on Audio, Speech, and Language Processing | 2006

Blind Extraction of Dominant Target Sources Using ICA and Time-Frequency Masking

Hiroshi Sawada; Shoko Araki; Ryo Mukai; Shoji Makino

This paper presents a method for enhancing target sources of interest and suppressing other interference sources. The target sources are assumed to be close to sensors, to have dominant powers at these sensors, and to have non-Gaussianity. The enhancement is performed blindly, i.e., without knowing the position and active time of each source. We consider a general case where the total number of sources is larger than the number of sensors, and neither the number of target sources nor the total number of sources is known. The method is based on a two-stage process where independent component analysis (ICA) is first employed in each frequency bin and then time-frequency masking is used to improve the performance further. We propose a new sophisticated method for deciding the number of target sources and then selecting their frequency components. We also propose a new criterion for specifying time-frequency masks. Experimental results for simulated cocktail party situations in a room, whose reverberation time was 130 ms, are presented to show the effectiveness and characteristics of the proposed method


EURASIP Journal on Advances in Signal Processing | 2003

Equivalence between frequency-domain blind source separation and frequency-domain adaptive beamforming for convolutive mixtures

Shoko Araki; Shoji Makino; Yoichi Hinamoto; Ryo Mukai; Tsuyoki Nishikawa; Hiroshi Saruwatari

Frequency-domain blind source separation (BSS) is shown to be equivalent to two sets of frequency-domain adaptive beamformers (ABFs) under certain conditions. The zero search of the off-diagonal components in the BSS update equation can be viewed as the minimization of the mean square error in the ABFs. The unmixing matrix of the BSS and the filter coefficients of the ABFs converge to the same solution if the two source signals are ideally independent. If they are dependent, this results in a bias for the correct unmixing filter coefficients. Therefore, the performance of the BSS is limited to that of the ABF if the ABF can use exact geometric information. This understanding gives an interpretation of BSS from a physical point of view.


international conference on acoustics, speech, and signal processing | 2002

Polar coordinate based nonlinear function for frequency-domain blind source separation

Hiroshi Sawada; Ryo Mukai; Shoko Araki; Shoji Makino

This paper presents a new type of nonlinear function for independent component analysis to process complex-valued signals, which is used in frequency-domain blind source separation. The new function is based on the polar coordinates of a complex number, whereas the conventional one is based on the Cartesian coordinates. The new function is derived from the probability density function of frequency-domain signals that are assumed to be independent of the phase. We show that the difference between the two types of functions is in the assumed densities of independent components. Experimental results for separating speech signals show that the new nonlinear function behaves better than the conventional one.


international conference on acoustics, speech, and signal processing | 2004

Underdetermined blind separation for speech in real environments with sparseness and ICA

Shoko Araki; Shoji Makino; Audrey Blin; Ryo Mukai; Hiroshi Sawada

In this paper, we propose a method for separating speech signals when there are more signals than sensors. Several methods have already been proposed for solving the underdetermined problem, and some of these utilize the sparseness of speech signals. These methods employ binary masks to extract the signals, and therefore, their extracted signals contain loud musical noise. To overcome this problem, we propose combining a sparseness approach and independent component analysis (ICA). First, using sparseness, we estimate the time points when only one source is active. Then, we remove this single source from the observations and apply ICA to the remaining mixtures. Experimental results show that our proposed sparseness and ICA (SPICA) method can separate signals with little distortion even in reverberant conditions of T/sub R/=130 and 200 ms.


international conference on acoustics, speech, and signal processing | 2005

Reducing musical noise by a fine-shift overlap-add method applied to source separation using a time-frequency mask

S.A. Raki; Shoji Makino; Hiroshi Sawada; Ryo Mukai

Musical noise is a typical problem with blind source separation using a time-frequency mask. We report that a fine-shift and overlap-add method reduces the musical noise without degrading the separation performance. The effectiveness was confirmed by results of a listening test undertaken in a room with a reverberation time of RT/sub 60/=130 ms.


international conference on acoustics, speech, and signal processing | 2006

Doa Estimation for Multiple Sparse Sources with Normalized Observation Vector Clustering

Shoko Araki; Hiroshi Sawada; Ryo Mukai; Shoji Makino

This paper presents a new method for estimating the direction of arrival (DOA) of source signals whose number N can exceed the number of sensors M. Subspace based methods, e.g., the MUSIC algorithm, have been widely studied, however, they are only applicable when M > N. Another conventional independent component analysis based method allows M ges N, however, it cannot be applied when M < N. By contrast, our new method can be applied where the sources outnumber the sensors (i.e., an underdetermined case M < N) by assuming source sparseness. Our method can cope with 2- or 3-dimensionally distributed sources with a 2- or 3-dimensional sensor array. We obtained promising experimental results for 3 times 4, 3 times 5 and 4 times 5 (#sensors times #speech sources) in a room (RT60= 120 ms)

Collaboration


Dive into the Ryo Mukai's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shoko Araki

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar

Hiroshi Sawada

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Takayuki Kurozumi

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar

Tsuyoki Nishikawa

Nara Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hidehisa Nagano

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar

Takahito Kawanishi

Nippon Telegraph and Telephone

View shared research outputs
Researchain Logo
Decentralizing Knowledge