Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Satoshi Hongo is active.

Publication


Featured researches published by Satoshi Hongo.


Speech Communication | 2011

Two-stage binaural speech enhancement with Wiener filter for high-quality speech communication

Junfeng Li; Shuichi Sakamoto; Satoshi Hongo; Masato Akagi; Yôiti Suzuki

Speech enhancement has been researched extensively for many years to provide high-quality speech communication in the presence of background noise and concurrent interference signals. Human listening is robust against these acoustic interferences using only two ears, but state-of-the-art two-channel algorithms function poorly. Motivated by psychoacoustic studies of binaural hearing (equalization-cancellation (EC) theory), in this paper, we propose a two-stage binaural speech enhancement with Wiener filter (TS-BASE/WF) approach that is a two-input two-output system. In this proposed TS-BASE/WF, interference signals are first estimated by equalizing and cancelling the target signal in a way inspired by the EC theory, a time-variant Wiener filter is then applied to enhance the target signal given the noisy mixture signals. The main advantages of the proposed TS-BASE/WF are (1) effectiveness in dealing with non-stationary multiple-source interference signals, and (2) success in preserving binaural cues after processing. These advantages were confirmed according to the comprehensive objective and subjective evaluations in different acoustical spatial configurations in terms of speech enhancement and binaural cue preservation.


workshop on applications of signal processing to audio and acoustics | 2009

Two-stage binaural speech enhancement with wiener filter based on equalization-cancellation model

Junfeng Li; Shuichi Sakamoto; Satoshi Hongo; Masato Akagi; Yôiti Suzuki

The equalization-cancellation (EC) model has been extensively studied for expressing binaural masking level difference (BMLD) in psychoacoustics. Few research focuses on applying this psychoacoustic model to speech processing applications, such as speech enhancement. In this paper, we propose a two-stage binaural speech enhancement with Wiener filter (TS-BASE/WF) based on the EC model. In this proposed TS-BASE/WF, interfering signals are first estimated by equalizing and cancelling the target signal based on the EC model, and a time-variant Wiener filter is then applied to enhance the target signal given noisy mixture signals. The main advantages of the proposed TS-BASE/WF are: (1) effectiveness in dealing with non-stationary multiple-source interfering signals; (2) success in localizing the target sound source after processing. These advantages were confirmed by comprehensive experiments in different spatial scenarios in terms of speech enhancement and sound localization.


Signal Processing | 2008

Adaptive β-order generalized spectral subtraction for speech enhancement

Junfeng Li; Shuichi Sakamoto; Satoshi Hongo; Masato Akagi; Yôiti Suzuki

The performance degradation of speech communication systems in noisy environments inspired increasing research on speech enhancement and noise reduction. As a well-known single-channel noise reduction technique, spectral subtraction (SS) has widely been used for speech enhancement. However, the spectral order @b set in SS is always fixed to some constants, resulting in performance limitation to a certain degree. In this paper, we first analyze the performance of the @b-order generalized spectral subtraction (GSS) in terms of the gain function to highlight its dependence on the value of spectral order @b. A data-driven optimization scheme is then introduced to quantitatively determine the change of @b with the change of the input signal-to-noise ratio (SNR). Based on the analysis results and considering the non-uniform effect of real-world noise on speech signal, we propose an adaptive @b-order GSS in which the spectral order @b is adaptively updated according to the local SNR in each critical band frame by frame as in a sigmoid function. The performance of the proposed adaptive @b-order GSS is finally evaluated objectively by segmental SNR (SEGSNR) and log-spectral distance (LSD), and subjectively by spectrograms and mean opinion score (MOS), using comprehensive experiments in various noise conditions. Experimental results show that the proposed algorithm yields an average SEGSNR increase of 2.99dB and an average LSD reduction of 2.71dB, which are much larger improvement than that obtained with the competing SS algorithms. The superiority of the proposed algorithm is also demonstrated by the highest MOS ratings obtained from the listening tests.


Archive | 2011

Effects of microphone arrangements on the accuracy of a spherical microphone array (SENZI) in acquiring high-definition 3D sound space information

Jun'ichi Kodama; Shuichi Sakamoto; Satoshi Hongo; Takuma Okamoto; Yukio Iwaya; Yôiti Suzuki

We propose a three-dimensional sound space sensing system using a microphone array on a solid, human-head-sized sphere with numerous microphones, which is called SENZI (Symmetrical object with ENchased ZIllion microphones). It can acquire 3D sound space information accurately for recording and/or transmission to a distant place. Moreover, once recorded, the accurate information might be reproduced accurately for any listener at any time. This study investigated the effects of microphone arrangement and the number of controlled directions on the accuracy of the sound space information acquired by SENZI. Results of a computer simulation indicated that the microphones should be arranged at an interval that is equal to or narrower than 5.7 to avoid the effect of spatial aliasing and that the number of controlled directions should be set densely at intervals of less than 5 when the microphone array radius is 85 mm.


Journal of the Acoustical Society of America | 2008

A two‐stage binaural speech enhancement approach for hearing aids with preserving binaural benefits in noisy environments

Junfeng Li; Shuichi Sakamoto; Satoshi Hongo; Masato Akagi; Yôiti Suzuki

Speech enhancement is one of the most crucial functions, if not the most, in hearing aids, as hearing impaired people have great difficulty in understanding speech in noisy environments. In this paper, we propose a two‐stage binaural speech enhancement approach for hearing aids, which consists of interference estimation by pre‐trained adaptive filters and speech enhancement using the Wiener filters. Main attention is then paid to the theoretical analysis of this system and the experimental comparisons with the traditional binaural speech enhancement approaches. The comparisons are conducted with the following two considerations: interference suppression performance and the ability in preserving binaural cues which give birth to listeners own “binaural gain.” We finally give the general discussion on this proposed binaural speech enhancement algorithm from the viewpoints of theory, through implementation, to evaluation.


international symposium on chinese spoken language processing | 2008

The Improved TS-BASE Approaches with Interference Compensation and Their Evaluations for Speech Enhancement

Junfeng Li; Shuichi Sakamoto; Satoshi Hongo; Masato Akagi; Yôiti Suzuki

We previously proposed a two-stage binaural speech enhancement with Wiener Filter (TS-BASE/WF) approach for hearing aids in adverse environments [6]. In TS-BASE/WF, the interfering signal is estimated by cancelling the target signal through an adaptive filter in the first stage and a time- variant Wiener filter is applied to enhance the target signal in the second stage. In this paper, we introduce an interference compensation approach, which is applied to the adaptive-filter output, to further improve the estimation accuracy of the interfering signal. The performance of TS-BASE with different speech enhancers is then investigated in different conditions. Experimental results show that the improved TS-BASE algorithms with interference compensation outperform the original TS-BASE algorithms, and that TS-BASE/WF gives the higher speech enhancement performance over the improved TS-BASE algorithms with other speech enhancers.


Journal of the Acoustical Society of America | 2006

A new speech enhancement method for two‐input two‐output hearing aids

Junfeng Li; Shuichi Sakamoto; Yôiti Suzuki; Satoshi Hongo

Human beings have the ability to pick up a speech signal in noisy environments, which is known as the cocktail‐party effect. This phenomenon is, however, often degraded in persons with impaired hearing. Therefore, a good method to enhance this phenomenon must be very useful to improve speech intelligibility in ambient noise for hearing‐impaired persons. In this lecture, multi‐input two‐output speech enhancement techniques are first summarized. Subsequently, a new two‐input two‐output speech enhancement method is proposed that is based on the frequency domain binaural model (FDBM) [Nakashima et al., Acoust. Sci. Technol. 24, 172–178 (2003)]. In the proposed method, the interaural differences are first calculated from the noisy observations and employed to determine the sound‐source locations. A speech absence probability (SAP) estimator is then developed using knowledge of estimated source locations. It is further integrated into the original FDBM to improve the interference‐suppression capability. Effecti...


Journal of the Acoustical Society of America | 2006

Binaural speech enhancement by complex wavelet transform based on interaural level and argument differences

Satoshi Hongo; Yôiti Suzuki

Binaural information might enhance speech signals in noisy environments. Most precedent studies in this area have implemented signal processing in time and frequency domains. For this study, an enhancement method using complex wavelet transform (CWT) was proposed. The CWT has a scale domain whose bandwidth is inversely proportional to the scale level and may therefore be well compared to auditory filters. The proposed processing procedure is the following: (1) By computing CWT for a sound signal at every direction of arrival (DOA), a database (DB) of the scale domain wavelet coefficients is prepared for every DOA. (2) For binaural input signals from an unknown direction, the scale domain wavelet coefficients are calculated using CWT. (3) The DOA of the input signal is estimated as the direction for which interaural level and argument differences calculated from wavelet coefficients are the most similar to those in the DB. (4) The input signal is segregated similarly to FDBM (Nakashima et al., 2003) based ...


international conference on wavelet analysis and pattern recognition | 2012

Binaural speech enhancement method by wavelet transform based on interaural level and argument differences

Satoshi Hongo; Shuichi Sakamoto; Yōiti Suzuki

Human binaural processing might enhance signal sounds in noisy environments. Binaural speech enhancement with two outputs facilitates merits of both signal processing itself and that by human binaural processing. Most previous studies in this area have implemented signal processing in the time and frequency domains. The use of wavelet transform (WT) appears to be promising because it has a scale domain whose bandwidth is inversely proportional to the scale level. Therefore, it might well be compared to auditory filters. In this paper, a new binaural speech enhancement algorithm applying Complex Wavelet Transform is proposed. Experiments of objective and subjective evaluations with a directional target signal and an interference sound source generated by convolving HRTFs were conducted to demonstrate the effectiveness of the proposed algorithm.


Journal of the Acoustical Society of America | 2012

Realization of sound space information acquisition system using a 252ch spherical microphone array

Shuichi Sakamoto; Jumpei Matsunaga; Satoshi Hongo; Takuma Okamoto; Yukio Iwaya; Yôiti Suzuki

Sensing of high-definition 3D sound-space information is important to realize total 3D spatial sound technology. Nevertheless, conventional methods cannot sense comprehensive 3D sound-space information at a listening point properly and precisely so that the information can be reproduced simultaneously for many individual remote listeners facing in different directions. To cope with this problem, we proposed a sensing method of 3D sound-space information based on symmetrically and densely arranged microphones called SENZI (Symmetrical object with ENchased Zillion microphones) (Sakamoto et al., 2008). In the system using SENZI, sensed signals from the respective microphones are simply weighted and summed to synthesize a listeners HRTF, reflecting the listeners facing direction. This method is expected to sense 3D sound-space information comprehensively in accordance with the head motion of listeners who are listening in remote places. Dynamic cues provided by the listeners motion are important to render ...

Collaboration


Dive into the Satoshi Hongo's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Junfeng Li

Japan Advanced Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Masato Akagi

Japan Advanced Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Yukio Iwaya

Tohoku Gakuin University

View shared research outputs
Top Co-Authors

Avatar

Takuma Okamoto

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge