Ryota Miyauchi
Japan Advanced Institute of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ryota Miyauchi.
PLOS ONE | 2009
Souta Hidaka; Yuko Manaka; Wataru Teramoto; Yoichi Sugita; Ryota Miyauchi; Jiro Gyoba; Yôiti Suzuki; Yukio Iwaya
Background Audition provides important cues with regard to stimulus motion although vision may provide the most salient information. It has been reported that a sound of fixed intensity tends to be judged as decreasing in intensity after adaptation to looming visual stimuli or as increasing in intensity after adaptation to receding visual stimuli. This audiovisual interaction in motion aftereffects indicates that there are multimodal contributions to motion perception at early levels of sensory processing. However, there has been no report that sounds can induce the perception of visual motion. Methodology/Principal Findings A visual stimulus blinking at a fixed location was perceived to be moving laterally when the flash onset was synchronized to an alternating left-right sound source. This illusory visual motion was strengthened with an increasing retinal eccentricity (2.5 deg to 20 deg) and occurred more frequently when the onsets of the audio and visual stimuli were synchronized. Conclusions/Significance We clearly demonstrated that the alternation of sound location induces illusory visual motion when vision cannot provide accurate spatial information. The present findings strongly suggest that the neural representations of auditory and visual motion processing can bias each other, which yields the best estimates of external events in a complementary manner.
intelligent information hiding and multimedia signal processing | 2011
Masashi Unoki; Ryota Miyauchi
There have recently been serious social issues involved in multimedia signal processing such as digital rights management, secure authentication, malicious attacks, and tampering with digital audio/speech signals. Reversible watermarking is a technique that enables these signals to be authenticated and then restored to their original signals by removing watermarks from them. We previously proposed an inaudible digital-audio watermarking approach based on cochlear delay (CD). We investigated how the proposed approach could be developed as reversible watermarking by considering blind detection for inaudible watermarks and the reversibility of audio watermarking. We evaluated inaudible and reversible watermarking with the proposed approach by carrying out three objective tests (PEAQ, LSD, and bit-detection or SNR). The results revealed that reversible watermarking based on CD could be accomplished.
Neuroscience Letters | 2010
Wataru Teramoto; Yuko Manaka; Souta Hidaka; Yoichi Sugita; Ryota Miyauchi; Shuichi Sakamoto; Jiro Gyoba; Yukio Iwaya; Yôiti Suzuki
The alternation of sounds in the left and right ears induces motion perception of a static visual stimulus (SIVM: Sound-Induced Visual Motion). In this case, binaural cues were of considerable benefit in perceiving locations and movements of the sounds. The present study investigated how a spectral cue - another important cue for sound localization and motion perception - contributed to the SIVM. In experiments, two alternating sound sources aligned in the vertical plane were presented, synchronized with a static visual stimulus. We found that the proportion of the SIVM and the magnitude of the perceived movements of the static visual stimulus increased with an increase of retinal eccentricity (1.875-30 degree), indicating the influence of the spectral cue on the SIVM. These findings suggest that the SIVM can be generalized to the whole two dimensional audio-visual space, and strongly imply that there are common neural substrates for auditory and visual motion perception in the brain.
IEICE Technical Report; IEICE Tech. Rep. | 2007
Masashi Unoki; Ryota Miyauchi; Chin-Tuan Tan
The frequency selectivity of an auditory filter system is often conceptualized as a bank of bandpass auditory filters. Over the past 30 years, many simultaneous masking experiments using notched-noise maskers have been done to define the shape of the auditory filters (e.g., Glasberg and Moore 1990; Patterson and Nimmo-Smith 1980; Rosen and Baker, 1994). The studies of Glasberg and Moore (2000) and Baker and Rosen (2006) are notable inasmuch as they measured the human auditory filter shape over most of the range of frequencies and levels encountered in everyday hearing. The advantage of using notched-noise masking is that one can avoid off-frequency listening and investigate filter asymmetry. However, the derived filter shapes are also affected by the effects of suppression. The tunings of auditory filters derived from data collected in forward masking experiments were apparently sharper than those derived from simultaneous masking experiments, especially when the signal levels are low. The tuning of a filter is commonly believed to be affected by cochlear nonlinearity such as the effect of suppression. In past studies, the tunings of auditory filters derived from simultaneous masking data were wider than those of filters derived from nonsimultaneous (forward) masking data (Moore and Glasberg 1978; Glasberg and Moore 1982; Oxenham and Shera 2003). Heinz et al. (2002) showed that a tuning is generally sharpest when stimuli are at low levels and that suppression may affect tuning estimates more at high characteristic frequencies (CFs) than at low CFs. If the suggestion of Heinz et al. (2002) holds, i.e., if suppression affects frequency changes, comparing the filter bandwidths derived from simultaneous and forward masking experiments would indicate this. In this study we attempt to estimate filter tunings using both simultaneous and forward masking experiments with a notched-noise masker to investigate how the effects of suppression affect estimates of frequency selectivity across signal frequencies, signal levels, notch conditions (symmetric and asymmetric), and signal delays. This study extends the study of Unoki and Tan (2005).
intelligent information hiding and multimedia signal processing | 2012
Masashi Unoki; Ryota Miyauchi
There have recently been serious social issues involved in multimedia signal processing such as malicious attacks and tampering with digital audio/speech signals. Fragile speech watermarking is a technique that enables the detection of tampering with the original signals. We previously proposed an inaudible digital-audio watermarking approach based on cochlear delay. We investigated how the proposed approach could be developed as fragile watermarking by considering its robustness against meaningful processing and fragility against malicious modifications. We evaluated the proposed method of detecting tampering with speech by carrying out three objective tests (PESQ, LSD, and bit-detection), robustness tests on speech coding, and a fragility test on malicious modifications. The results revealed that the proposed approach could detect the positions of tampering as well as the forms it took.
conference of the international speech communication association | 2016
Zhi Zhu; Ryota Miyauchi; Yukiko Araki; Masashi Unoki
It has been reported that vocal emotion recognition is challenging for cochlear implant (CI) listeners due to the limited spectral cues with CI devices. As the mechanism of CI, modulation information is provided as a primarily cue. Previous studies have revealed that the modulation components of speech are important for speech intelligibility. However, it is unclear whether modulation information can contribute to vocal emotion recognition. We investigated the relationship between human perception of vocal emotion and the modulation spectral features of emotional speech. For human perception, we carried out a vocal-emotion recognition experiment using noisevocoder simulations with normal-hearing listeners to predict the response from CI listeners. For modulation spectral features, we used auditory-inspired processing (auditory filterbank, temporal envelope extraction, modulation filterbank) to obtain the modulation spectrogram of emotional speech signals. Ten types of modulation spectral feature were then extracted from the modulation spectrogram. As a result, modulation spectral centroid, modulation spectral kurtosis, and modulation spectral tilt exhibited similar trends with the results of human perception. This suggests that these modulation spectral features may be important cues for voice emotion recognition with noise-vocoded speech.
intelligent information hiding and multimedia signal processing | 2010
Masashi Unoki; Toshizo Kosugi; Atsushi Haniu; Ryota Miyauchi
We investigated how the proposed approach with inaudible digital-audio watermarking based on cochlear delay (CD) could be implemented to produce an efficient architecture to further reduce embedding limitations. We also improved our approach by controlling parameter b for time varying CD filters and designing a cascade architecture for these filters. The results revealed that the improved method could be used to reduce sound distortion due to watermarking. We also found that embedding limitations with the improved method involved those with a parallel architecture.
Journal of the Acoustical Society of America | 2016
Zhi Zhu; Ryota Miyauchi; Yukiko Araki; Masashi Unoki
Chatterjee et al. (2015) reported that cochlear implant (CI) listeners have difficulty recognizing vocal emotions due to the limited spectral cues provided by CI devices. Researches on vocal emotion perception of CI listeners have been studying ways to simulate responses of CI listeners by using noise-vocoded speech (NVS) as CI simulations with normal-hearing (NH) listeners. However, it is still unclear whether the results of CI simulations with NH listeners are reliable with regards to CI listeners. This study aims to clarify whether CI listeners can perceive vocal emotion the same way as NH listeners with NVS do. Vocal-emotion recognition experiments were carried out by having both NH and CI listeners listen to original emotional speech and its NVS. The results for CI listeners revealed that they recognized sadness and hot anger more easily than joy and cold anger in both original emotional speech and NVS conditions. Moreover, the results for NH listeners with NVS showed the same trend. The results sugg...
intelligent information hiding and multimedia signal processing | 2012
Nhut Minh Ngo; Masashi Unoki; Ryota Miyauchi; Yôiti Suzuki
This paper proposes an audio-data hiding scheme for amplitude-modulation (AM) radio broadcasting systems. The digital-audio method of watermarking based on cochlear delay (CD) that we previously proposed is employed in this scheme to send an inaudible message. We investigate the feasibility of a data-hiding scheme in the AM domain by applying the method of CD-based inaudible watermarking. The proposed scheme embeds both original and watermarked signals as lower and upper sidebands into the AM signals by using the method of double-sideband with carrier (DSB-WC) and then transmits them to the receivers. Particular receivers in the proposed scheme can detect both original and watermarked signals by demodulating both sidebands with DSB-WC and then extract messages from the watermarked signal with the original signal using CD-based watermarking. The results we obtained from computer simulations revealed that the proposed scheme can transmit messages as watermarks in AM radio systems and then correctly extract the messages from AM signals. The results also indicated that the sound quality of the demodulated signals could be kept at higher levels not only with the proposed scheme but also in traditional AM radio systems. This means that the proposed scheme has the possibility of acting as a hidden-message transmitter as well as having low-level compatibility in AM radio systems.
Journal of the Acoustical Society of America | 2018
Zhi Zhu; Ryota Miyauchi; Yukiko Araki; Masashi Unoki
Previous studies about vocal-emotion recognition with noise-vocoded speech showed that temporal modulation cues provided by the temporal envelope play an important role in the perception of vocal emotion. To clarify the exact feature of temporal envelope that contributes to the perception of vocal emotion, a method based on the mechanism of modulation frequency analysis in the auditory system is necessary. In this study, auditory-based modulation spectral features were used to account for the perceptual data collected from vocal-emotion recognition experiments using noise-vocoded speech. At first, the modulation spectrogram of the emotional noise-vocoded speech was calculated by using an auditory-based modulation filterbank. Then, ten types of modulation spectral features were extracted from the modulation spectrograms. Finally, modulation spectral features and the perceptual data were compared to investigate the contribution of temporal envelope to the perception of vocal emotion with noise-vocoded speech. The results showed that there were high correlations between modulation spectral features and the perceptual data. Therefore, the modulation spectral features should be useful for accounting for the perceptual processing of vocal emotion with noise-vocoded speech. [Work supported by JSPS KAKENHI Grant Number JP. 17J08312, and Grant in Aid for Scientific Research Innovative Areas (No. 18H05004) from MEXT, Japan.]Previous studies about vocal-emotion recognition with noise-vocoded speech showed that temporal modulation cues provided by the temporal envelope play an important role in the perception of vocal emotion. To clarify the exact feature of temporal envelope that contributes to the perception of vocal emotion, a method based on the mechanism of modulation frequency analysis in the auditory system is necessary. In this study, auditory-based modulation spectral features were used to account for the perceptual data collected from vocal-emotion recognition experiments using noise-vocoded speech. At first, the modulation spectrogram of the emotional noise-vocoded speech was calculated by using an auditory-based modulation filterbank. Then, ten types of modulation spectral features were extracted from the modulation spectrograms. Finally, modulation spectral features and the perceptual data were compared to investigate the contribution of temporal envelope to the perception of vocal emotion with noise-vocoded speec...