Yuma Koizumi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yuma Koizumi is active.

Explore More

Publication

Featured researches published by Yuma Koizumi.

international conference on acoustics, speech, and signal processing | 2016

Pinpoint extraction of distant sound source based on DNN mapping from multiple beamforming outputs to prior SNR

Kenta Niwa; Yuma Koizumi; Tomoko Kawase; Kazunori Kobayashi; Yusuke Hioka

We propose a method for estimating the prior signal-to-noise ratio (SNR), which is used for calculating the Wiener filter for distant sound source extraction, from output signals of beamforming using statistical mapping based on the deep neural network (DNN). Since informative features to estimate the prior SNR are included in multiple beamforming outputs, the SNR can be accurately estimated by this mapping using the DNN. The proposed method was applied to a large microphone array, the design of which was optimized to form effective directivity patterns to extract distant sound sources. Experimental results proved that the target source was clearly extracted with the proposed method.

international conference on acoustics, speech, and signal processing | 2017

DNN-based source enhancement self-optimized by reinforcement learning using sound quality measurements

Yuma Koizumi; Kenta Niwa; Yusuke Hioka; Kazunori Kobayashi; Yoichi Haneda

We investigated whether a deep neural network (DNN)-based source enhancement function can be self-optimized by reinforcement learning (RL). The use of a DNN is a powerful approach to describing the relationship between two sets of variables and can be useful for source enhancement function design. By training the DNN using a huge amount of training data, sound quality of output signals are improved. However, collecting a huge amount of training data is often difficult in practice. To use limited training data efficiently, we focus on the “self-optimization” of DNN-based source enhancement function in which RL is commonly utilized in the development of game playing computers. As a reward for RL, quantitative metrics that reflect a humans perceptual score (perceptual score), e.g., perceptual evaluation methods for audio source separation (PEASS), are utilized. To investigate whether the sound quality is improved by RL-based source enhancement, subjective tests were conducted. It was confirmed that the output sound quality of the RL-based source enhancement function improved as the number of iterations was increased and finally outperformed the conventional method.

international conference on acoustics, speech, and signal processing | 2016

Integrated approach of feature extraction and sound source enhancement based on maximization of mutual information

Yuma Koizumi; Kenta Niwa; Yusuke Hioka; Kazunori Kobayashi; Hitoshi Ohmuro

We investigated informative acoustic feature extraction based on dimension reduction for collecting target sources on a noisy sports field. Although a Wiener filter is often used for sound source enhancement, it is difficult to accurately design the Wiener filter by simply using spatial cues because the noise on a sports field (e.g., cheering from spectators) arrives from the same direction as that of the targeted source. A statistical approach is used to estimate the Wiener filter by using pre-trained acoustic feature models. However, an informative acoustic feature, which provides a powerful clue for clear extraction of the target source, is unknown. For this study, we developed a method for optimizing a projection matrix for dimension reduction by maximizing the mutual information between acoustic features and the Wiener filter. Through experiments using two-directional microphones on a mock sports field, we confirmed that the proposed method outperformed previous methods in terms of both the noise reduction and quality of the recovered sound sources.

international conference on acoustics, speech, and signal processing | 2016

Binaural sound generation corresponding to omnidirectional video view using angular region-wise source enhancement

Kenta Niwa; Yuma Koizumi; Kazunori Kobayashi; Hisashi Uematsu

Web applications for watching omnidirectional video through head-mounted displays (HMDs) or smartphones have been widely distributed. The goal of this study was to generate binaural sounds corresponding to the user viewpoint. Assuming that a microphone array is used for sound recording, the enhanced signal for each angular region can be extracted. By convolving head-related transfer functions (HRTFs) and enhanced signals and re-synthesizing them, binaural sounds corresponding to the user viewpoint can be virtually generated. In this paper, we propose a method for achieving angular region-wise source enhancement by generating a multichannel Wiener filter based on the power spectral density (PSD)-estimation-in-beamspace method. To measure user localization when watching omnidirectional video through an HMD, we used a system that enables the generation of binaural sounds corresponding to the user viewpoint in real time. Through subjective tests, we confirmed that sound localization corresponding to the user viewpoint can be obtained when applying about a 40-degree angular region-wise source enhancement.

workshop on applications of signal processing to audio and acoustics | 2015

Informative acoustic feature selection on microphone array wiener filtering for collecting target source on sports ground

Yuma Koizumi; Kenta Niwa; Yusuke Hioka; Kazunori Kobayashi; Hitoshi Ohmuro

We propose a Wiener filter design method for collecting target sources on a noisy sports field. Because the noise on a sports field, e.g., cheering from the audience, arrives from the same direction as that of the targeted source, it is difficult to accurately design a Wiener filter by simply using spatial cues. This study focused on a combination of spatial cues and acoustic feature modeling. The Wiener filter using our method was designed using a Gaussian-mixture-model-based mapping function with automatically selected informative acoustic features from pre-enhanced observation using spatial cues. Through experiments using two-directional microphones on a mock sports field, it was confirmed that the proposed method outperformed previous methods that used either spatial cues or acoustic feature modeling only.

international conference on acoustics, speech, and signal processing | 2017

Supervised source enhancement composed of nonnegative auto-encoders and complementarity subtraction

Kenta Niwa; Yuma Koizumi; Tomoko Kawase; Kazunori Kobayashi; Yusuke Hioka

A method for constructing deep neural networks (DNNs) for accurate supervised source enhancement is proposed. Attempts were made in previous studies to estimate the power spectral densities (PSDs) of sound sources, which are used to estimate Wiener filters for source enhancement, from the output of multiple beamformings using DNNs. Although performance improved, it was not possible to guarantee accurate PSD estimation since the trained DNNs were treated as black boxes. The proposed DNN construction method uses non-negative auto-encoders and complementarity subtraction. This study also reveals that auto-encoders whose weights are non-negative correspond to non-negative matrix factorization (NMF), which decomposes source PSDs into non-negative spectral bases and their activations. It further introduces a complementarity subtraction method for estimating PSDs accurately. Through several experiments, it was confirmed that the signal-to-interference plus noise ratio improved by approximately 12 dB for datasets captured in various noisy/reverberant rooms.

international conference on acoustics, speech, and signal processing | 2014

Intra-note segmentation via sticky HMM with DP emission

Yuma Koizumi; Katunobu Itou

This paper presents an intra-note segmentation method for mono-phonic recordings based on acoustic feature variation; each musical note is separated into onset, steady and offset states. The task of intra-note segmentation from audio signals is detecting change points of acoustic feature. In proposed method, the Markov process is assumed on state transition, and time-varying acoustic feature is represented by three Dirichlet processes (DP) that are emitted by the each state. In order to express the generative process, the sticky hidden Markov model (HMM) with DP emission is employed. This modeling allows us to automatically estimate the state transition while avoiding the model selection problem by assuming countably infinite of possible acoustic feature in musical notes. Experimental result shows that the detection accuracy of onset-to-steady and steady-to-offset were improved 2.3 points and 20.7 points from previous method, respectively.

international conference on acoustics, speech, and signal processing | 2017

On relationships between amplitude and phase of short-time Fourier transform

Suehiro Shimauchi; Shinya Kudo; Yuma Koizumi; Ken'ichi Furuya

The relationships between the amplitude and phase of the short-time Fourier transform (STFT) are investigated. By choosing the Gaussian window for the STFT, we reveal that the group delay and instantaneous frequency of each signal segment, both of which are derived from the phase by definition, can also be explicitly linked with the amplitude. As a result, the amplitude and phase can also be linked through the group delay or instantaneous frequency without making any assumptions for the phase property of the target signals, e.g., minimum, maximum, or linear phase. The theoretical basis is also confirmed in numerical simulations.

IEEE Transactions on Audio, Speech, and Language Processing | 2017

Informative Acoustic Feature Selection to Maximize Mutual Information for Collecting Target Sources

Yuma Koizumi; Kenta Niwa; Yusuke Hioka; Kazunori Kobayashi; Hitoshi Ohmuro

An informative acoustic-feature-selection method for collecting target sources in noisy environments is proposed. Wiener filtering is a powerful framework for sound-source enhancement. For Wiener-filter estimation, statistical-mapping functions, such as deep neural network based or Gaussian mixture model based mappings, have been used. In this framework, it is essential to find informative acoustic features that provide effective cues for Wiener-filter estimation. In this study, we measured the informativeness of acoustic features using mutual information between acoustic features and supervised Wiener-filter parameters, e.g., prior signal-to-noise ratios, and developed a method for automatically selecting informative acoustic features from a large number of feature candidates. To automatically select optimum features, we derived a differentiable objective function in proportion to mutual information based on the kernel method. Since the higher order correlations between acoustic features and Wiener-filter parameters are calculated using the kernel method, the statistical dependence of these variables is accurately calculated; thus, only meaningful acoustic features are selected. Through several experiments conducted on a mock sports field, we confirmed that the signal-to-distortion ratio score improved when various types of target sources were surrounded by loud cheering noise.

Journal of the Acoustical Society of America | 2012

Synthesis of performance expression of bowed string instruments using “Expression Mark Functions”

Yuma Koizumi; Katunobu Itou

This paper proposes a method for synthesis of performance expression of bowed string instruments. In order to reflect a creators intention on a synthetic sound, physical model is efficient because the sound can be synthesized by the performance imagination. The physics of the bowed string instruments are still some issues remain unresolved, bowed string sound synthesis using only the physical model is difficult. In this paper, a method using a transfer function for expression mark is proposed. “Expression mark functions” is estimated from recorded performance sounds, using spectrum of string motion and inverse filter of resonant properties of the instruments. Bowed String motion is a triangular wave called the Helmholtz wave, and it can be determined from bowed string position. Changes of waveform from expression performance were estimated by non-negative matrix factorization. Resonant properties of an instrument were measured as TSP response using a “direct conduction speaker”. The expression mark funct...

Explore More