Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ryoichi Takashima is active.

Publication


Featured researches published by Ryoichi Takashima.


spoken language technology workshop | 2012

Exemplar-based voice conversion in noisy environment

Ryoichi Takashima; Tetsuya Takiguchi; Yasuo Ariki

This paper presents a voice conversion (VC) technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize the target speech signal. The parallel exemplars (dictionary) consist of the source exemplars and target exemplars, having the same texts uttered by the source and target speakers. The input source signal is decomposed into the source exemplars, noise exemplars obtained from the input signal, and their weights (activities). Then, by using the weights of the source exemplars, the converted signal is constructed from the target exemplars. We carried out speaker conversion tasks using clean speech data and noise-added speech data. The effectiveness of this method was confirmed by comparing its effectiveness with that of a conventional Gaussian Mixture Model (GMM)-based method.


international conference on acoustics, speech, and signal processing | 2013

Individuality-preserving voice conversion for articulation disorders based on non-negative matrix factorization

Ryo Aihara; Ryoichi Takashima; Tetsuya Takiguchi; Yasuo Ariki

We present in this paper a voice conversion (VC) method for a person with an articulation disorder resulting from athetoid cerebral palsy. The movement of such speakers is limited by their athetoid symptoms, and their consonants are often unstable or unclear, which makes it difficult for them to communicate. In this paper, exemplar-based spectral conversion using Non-negative Matrix Factorization (NMF) is applied to a voice with an articulation disorder. To preserve the speakers individuality, we used a combined dictionary that is constructed from the source speakers vowels and target speakers consonants. Experimental results indicate that the performance of NMF-based VC is considerably better than conventional GMM-based VC.


Eurasip Journal on Audio, Speech, and Music Processing | 2014

A preliminary demonstration of exemplar-based voice conversion for articulation disorders using an individuality-preserving dictionary

Ryo Aihara; Ryoichi Takashima; Tetsuya Takiguchi; Yasuo Ariki

We present in this paper a voice conversion (VC) method for a person with an articulation disorder resulting from athetoid cerebral palsy. The movement of such speakers is limited by their athetoid symptoms, and their consonants are often unstable or unclear, which makes it difficult for them to communicate. In this paper, exemplar-based spectral conversion using nonnegative matrix factorization (NMF) is applied to a voice with an articulation disorder. To preserve the speaker’s individuality, we used an individuality-preserving dictionary that is constructed from the source speaker’s vowels and target speaker’s consonants. Using this dictionary, we can create a natural and clear voice preserving their voice’s individuality. Experimental results indicate that the performance of NMF-based VC is considerably better than conventional GMM-based VC.


Brain & Development | 2014

Speech intonation in children with autism spectrum disorder

Yasushi Nakai; Ryoichi Takashima; Tetsuya Takiguchi; Satoshi Takada

OBJECTIVE The prosody of children with autism spectrum disorder (ASD) has several abnormal features. We assessed the speech tone of children with ASD and of children with typical development (TD) by using a new quantitative acoustic analysis. METHODS Our study participants consisted of 63 children (26 with ASD and 37 with TD). The participants were divided into 4 groups based on their developmental features and age. We assessed the variety of the fundamental frequency (F0) pattern quantitatively, using pitch coefficient of variation (CV), considering the different F0 mean for each word. RESULTS (1) No significant difference was observed between the ASD and TD group at pre-school age. However, the TD group exhibited significantly greater pitch CV than the ASD group at school age. (2) In pitch CV, range and standard deviation of the whole speech of each participant, no significant differences were observed between the type of participants and age. (3) No significant correlation was found between the pitch CV of each word and the Japanese Autism Screening Questionnaire total score, or between the pitch CV of each word and the intelligence quotient levels in the ASD group. A significant correlation was observed between the pitch CV of each word and social reciprocal interaction score. CONCLUSIONS We assessed the speech tone of children with ASD by using a new quantitative method. Monotonous speech in school-aged children with ASD was detected. The extent of monotonous speech was related to the extent of social reciprocal interaction in children with ASD.


EURASIP Journal on Advances in Signal Processing | 2009

Single-channel talker localization based on discrimination of acoustic transfer functions

Tetsuya Takiguchi; Yuji Sumida; Ryoichi Takashima; Yasuo Ariki

This paper presents a sound source (talker) localization method using only a single microphone, where a Gaussian Mixture Model (GMM) of clean speech is introduced to estimate the acoustic transfer function from a users position. The new method is able to carry out this estimation without measuring impulse responses. The frame sequence of the acoustic transfer function is estimated by maximizing the likelihood of training data uttered from a given position, where the cepstral parameters are used to effectively represent useful clean speech. Using the estimated frame sequence data, the GMM of the acoustic transfer function is created to deal with the influence of a room impulse response. Then, for each test dataset, we find a maximum-likelihood (ML) GMM from among the estimated GMMs corresponding to each position. The effectiveness of this method has been confirmed by talker localization experiments performed in a room environment.


international conference on acoustics, speech, and signal processing | 2010

HMM-based separation of acoustic transfer function for single-channel sound source localization

Ryoichi Takashima; Tetsuya Takiguchi; Yasuo Ariki

This paper presents a sound source (talker) localization method using only a single microphone, where a HMM (Hidden Markov Model) of clean speech is introduced to estimate the acoustic transfer function from a users position. The new method is able to carry out this estimation without measuring impulse responses. The frame sequence of the acoustic transfer function is estimated by maximizing the likelihood of training data uttered from a given position, where the cepstral parameters are used to effectively represent useful clean speech. Using the estimated frame sequence data, the GMM (Gaussian Mixture Model) of the acoustic transfer function is created to deal with the influence of a room impulse response. Then, for each test data set, we find a maximum-likelihood GMM from among the estimated GMMs corresponding to each position. The effectiveness of this method has been confirmed by talker localization experiments performed in a room environment.


international conference on acoustics, speech, and signal processing | 2013

Prediction of unlearned position based on local regression for single-channel talker localization using acoustic transfer function

Ryoichi Takashima; Tetsuya Takiguchi; Yasuo Ariki

This paper presents a sound-source (talker) localization method using only a single microphone. In our previous work, we discussed the single-channel sound-source localization method based on the discrimination of the acoustic transfer function. However, that method requires the training of the acoustic transfer function for each possible position in advance, and it is difficult to estimate the position that has not been pre-trained. In order to estimate such unlearned positions, in this paper, we discuss a single-channel talker localization method based on a regression model, which predicts the position from the acoustic transfer function. For training the regression model, we use the local regression approach, which trains the regression model from only training samples that are similar to the evaluation data. Considering both the linear and non-linear regression models, the effectiveness of this method has been confirmed by sound-source localization experiments performed in different room environments.


Journal of the Acoustical Society of America | 2013

Dimensional feature weighting utilizing multiple kernel learning for single-channel talker location discrimination using the acoustic transfer function.

Ryoichi Takashima; Tetsuya Takiguchi; Yasuo Ariki

This paper presents a method for discriminating the location of the sound source (talker) using only a single microphone. In a previous work, the single-channel approach for discriminating the location of the sound source was discussed, where the acoustic transfer function from a users position is estimated by using a hidden Markov model of clean speech in the cepstral domain. In this paper, each cepstral dimension of the acoustic transfer function is newly weighted, in order to obtain the cepstral dimensions having information that is useful for classifying the users position. Then, this paper proposes a feature-weighting method for the cepstral parameter using multiple kernel learning, defining the base kernels for each cepstral dimension of the acoustic transfer function. The users position is trained and classified by support vector machine. The effectiveness of this method has been confirmed by sound source (talker) localization experiments performed in different room environments.


Journal of the Acoustical Society of America | 2010

Monaural sound-source-direction estimation using the acoustic transfer function of a parabolic reflection board

Ryoichi Takashima; Tetsuya Takiguchi; Yasuo Ariki

This paper presents a sound-source-direction estimation method using only a single microphone with a parabolic reflection board. A simple signal-power-based method using a parabolic antenna has been proposed in the radar field. But the signal-power-based method is not effective for finding the direction of a talking person due to the varying power of the uttered speech signals. In this paper, the sound-source-direction estimation method focuses on the acoustic transfer function instead of the signal power. The use of the parabolic reflection board leads to a difference in the acoustic transfer functions of the target direction and the non-target directions, where the parabolic reflector and its associated microphone rotate together and observe the speech at each angle. The acoustic transfer function is estimated from the observed speech using the statistics of clean speech signals. Its effectiveness has been confirmed by monaural sound-source-direction estimation experiments in a room environment.


international symposium on intelligent signal processing and communication systems | 2009

Gradient-based acoustic features for speech recognition

Takashi Muroi; Ryoichi Takashima; Tetsuya Takiguchi; Yasuo Ariki

This paper proposes a novel feature extraction method for speech recognition based on gradient features on a 2-D time-frequency matrix. Widely used MFCC features lack temporal dynamics. In addition, ΔMFCC is an indirect expression of temporal frequency changes. To extract the temporal dynamics more directly, we propose local gradient features in an area around a reference position. The gradient-based features were originally proposed as HOG (Histograms of Oriented Gradients) and applied to human body detection in image recognition. In this paper, we expand the application to include gradient-based acoustic features in speech recognition. The novel acoustic features were evaluated on a word-speech recognition task, and the results showed a significant improvement for clean speech and even for noisy speech when coupled with MFCC.

Collaboration


Dive into the Ryoichi Takashima's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hisashi Kawai

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar

Sheng Li

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Toshiaki Imada

University of Washington

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge