Yusuke Hiwasaki
Spacelabs Healthcare
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yusuke Hiwasaki.
IEICE Transactions on Information and Systems | 2006
Yusuke Hiwasaki; Hitoshi Ohmuro; Takeshi Mori; Sachiko Kurihara; Akitoshi Kataoka
This paper proposes a wideband speech coder in which a G.711 bitstream is embedded. This coder has an advantage over conventional coders in that it has a high interoperability with existing terminals so costly transcoding involving decoding and re-encoding can be avoided. We also propose a partial mixing method that effectively reduces the mixing complexity in multiple-point remote conferences. To reduce the complexity, we take advantage of the scalable structure of the bitstream and mix only the lower band of the signal. For the higher band, the main speaker location is selected among remote locations and is redistributed with the mixed lower-band signal. By subjective evaluations, we show that the speech quality can be maintained even when the speech signals are partially mixed.
IEEE Transactions on Audio, Speech, and Language Processing | 2013
Shoichi Koyama; Ken'ichi Furuya; Yusuke Hiwasaki; Yoichi Haneda
For transmission of a physical sound field in a large area, it is necessary to transform received signals of a microphone array into driving signals of a loudspeaker array to reproduce the sound field. We propose a method for transforming these signals by using planar or linear arrays of microphones and loudspeakers. A continuous transform equation is analytically derived based on the physical equation of wave propagation in the spatio-temporal frequency domain. By introducing spatial sampling, the uniquely determined transform filter, called a wave field reconstruction filter (WFR filter), is derived. Numerical simulations show that the WFR filter can achieve the same performance as that obtained using the conventional least squares (LS) method. However, since the proposed WFR filter is represented as a spatial convolution, it has many advantages in filter design, filter size, computational cost, and filter stability over the transform filter designed by the LS method.
international conference on acoustics, speech, and signal processing | 2001
Jes Thyssen; Yang Gao; Adil Benyassine; Eyal Shlomot; Carlo Murgia; Huan-Yu Su; Kazunori Mano; Yusuke Hiwasaki; Hiroyuki Ehara; Kazutoshi Yasunaga; Claude Lamblin; Balazs Kovesi; Joachim Stegmann; Hong-Goo Kang
This paper presents the 4 kbit/s speech coding candidate submitted by AT&T, Conexant, Deutsche Telekom, France Telecom, Matsushita, and NTT for the ITU-T 4 kbit/s selection phase. The algorithm was developed jointly based on the qualification version of Conexant. This paper focuses on the development carried out during the collaboration in order to narrow the gap to the requirements in an attempt to provide toll quality at 4 kbit/s. This objective is currently being verified in independent subjective tests coordinated by ITU-T and carried out in multiple languages. Subjective tests carried out during the development indicate that the collaboration work has been successful in improving the quality, and that meeting a majority of the requirements in the extensive selection phase test is a realistic goal.
IEEE Transactions on Audio, Speech, and Language Processing | 2012
Shoichi Koyama; Ken'ichi Furuya; Yusuke Hiwasaki; Yoichi Haneda
It has been possible to reproduce point sound sources between listeners and a loudspeaker array by using the focused-source method. However, this method requires physical parameters of the sound sources to be reproduced, such as source positions, directions, and original signals. This fact makes it difficult to apply the method to real-time reproduction systems because decomposing received signals into such parameters is not a trivial task. This paper proposes a method for recreating virtual sound sources in front of a planar or linear loudspeaker array. The method is based on wave field synthesis but extended to include the inverse wave propagator often used in acoustical holography. Virtual sound sources can be placed between listeners and a loudspeaker array even when the received signals of a microphone array equally aligned with the loudspeaker array are only known. Numerical simulation results are presented to compare the proposed and focused-source methods. A system was constructed consisting of linear microphone and loudspeaker arrays and measurement experiments were conducted in an anechoic room. When comparing the sound field reproduced using the proposed method with that using the focused-source method, it was found that the proposed method could reproduce the sound field at almost the same accuracy.
IEEE Communications Magazine | 2009
Yusuke Hiwasaki; Hitoshi Ohmuro
In March 2008 the ITU-T approved a new wideband speech codec called ITU-T G.711.1. This Recommendation extends G.711, the most widely deployed speech codec, to 7 kHz audio bandwidth and is optimized for voice over IP applications. The most important feature of this codec is that the G.711.1 bitstream can be transcoded into a G.711 bitstream by simple truncation. G.711.1 operates at 64, 80, and 96 kb/s, and is designed to achieve very short delay and low complexity. ITU-T evaluation results show that the codec fulfils all the requirements defined in the terms of reference. This article presents the codec requirements and design constraints, describes how standardization was conducted, and reports on the codec performance and its initial deployment.
international conference on acoustics, speech, and signal processing | 2010
Noboru Harada; Yutaka Kamamoto; Takehiro Moriya; Yusuke Hiwasaki; Michael A. Ramalho; Lorin Netsch; Jacek Stachurski; Lei Miao; Herve Marcel Taddei; Fengyan Qi
The ITU-T Recommendation G.711 is the benchmark standard for narrowband telephony. It has been successful for many decades because of its proven voice quality, ubiquity and utility. A new ITU-T recommendation, denoted G.711.0, has been recently established defining a lossless compression for G.711 packet payloads typically found in IP networks. This paper presents a brief overview of technologies employed within the G.711.0 standard and summarizes the compression and complexity results. It is shown that G.711.0 provides greater than 50% average compression in typical service provider environments while keeping low computational complexity for the encoder/decoder pair (1.0 WMOPS average, <;1.7 WMOPS worst case) and low memory footprint (about 5k octets RAM, 5.7k octets ROM, and 3.6k program memory measured in number of basic operators).
IEEE Transactions on Audio, Speech, and Language Processing | 2014
Shoichi Koyama; Ken’ichi Furuya; Yusuke Hiwasaki; Yoichi Haneda; Yôiti Suzuki
For sound field reproduction that includes height (with-height reproduction), it is more efficient to record and reproduce the sound field with lower resolution in elevation than in azimuth due to the spatial abilities of human auditory perception. We propose a sound field reproduction method using horizontally arranged cylindrical arrays of microphones and loudspeakers, which is based on the wave field reconstruction (WFR) filter. With the use of cylindrical array configurations, it is possible to reproduce sound waves arriving from upper and lower directions with a smaller number of array elements at angular positions. The WFR filter is analytically derived in the cylindrical harmonic domain and allows direct transformation from the received signals of the microphones into the driving signals of the loudspeakers. A model in which microphones are mounted on a rigid cylindrical baffle is introduced to stabilize the WFR filter. Numerical simulation results indicated that the reproduction accuracy in the neighboring region along the central axis of the cylindrical array was better preserved when using the proposed method than when the method with planar arrays was used.
international conference on acoustics, speech, and signal processing | 2008
Yusuke Hiwasaki; Takeshi Mori; Shigeaki Sasaki; Hitoshi Ohmuro; Akitoshi Kataoka
This paper describes an ITU-T G.711 embedded wideband speech coder, submitted as a candidate to the ITU-T G.711 wideband extension standardization qualification phase. The codec generates a bitstream comprised of three layers: a G.711 core layer with noise shaping, a time-domain weighted vector quantized narrowband enhancement layer, and an MDCT-based higher band enhancement layer. Through subjective evaluations, the coder was found to meet all tested requirements and objectives set in terms of reference, with a low computational complexity at 10 WMOPS.
workshop on applications of signal processing to audio and acoustics | 2011
Shoichi Koyama; Ken'ichi Furuya; Yusuke Hiwasaki; Yoichi Haneda
In this paper, we propose a novel method of sound field reproduction using a microphone array and loudspeaker array. Our objective is to obtain the driving signal of a planar or linear loudspeaker array only from the sound pressure distribution acquired by the planar or linear microphone array. In this study, we derive a formulation of the transform from the received signals of the microphone array to the driving signals of the loudspeaker array. The transform is achieved as a mean of a filter in a spatio-temporal frequency domain. Numerical simulation results are presented to compare the proposed method with the method based on the conventional least square algorithm. The reproduction accuracies were found to be almost the same, however, the filter size and amount of calculation required for the proposed method were much smaller than those for the least square algorithm based one.
workshop on applications of signal processing to audio and acoustics | 2013
Shoichi Koyama; Ken'ichi Furuya; Yusuke Hiwasaki; Yoichi Haneda
Sound field reproduction methods calculate driving signals of loudspeakers to reproduce the desired sound field. In common recording and reproduction systems, sound pressures at multiple positions obtained in a recording room are only known as the desired sound field; therefore, signal transformation algorithms from sound pressures into driving signals (SP-DS conversion) are necessary. Although several SP-DS conversion methods have been proposed, they do not take into account a priori information about the recorded sound field. However, approximate positions of sound sources can be obtained by using the received signals of microphones or other sensor data. We propose an SP-DS conversion method based on the maximum a posteriori (MAP) estimation when array configurations of the microphones and loudspeakers are planar or linear. The optimal basis functions and their coefficients for representing driving signals of the loudspeakers are optimized based on the prior information of the source positions. Numerical simulation results indicate that the proposed method can achieve higher reproduction accuracy compared to the current SP-DS conversion methods, especially in higher frequencies above the spatial Nyquist frequency.