Shozo Makino | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shozo Makino is active.

Explore More

Publication

Featured researches published by Shozo Makino.

international conference on acoustics, speech, and signal processing | 2011

Bit rate reduction of the MELP coder using Lempel-Ziv segment quantization

Minoru Kohata; Motoyuki Suzuki; Akinori Ito; Shozo Makino

We previously proposed a new segment quantization method called Lempel-Ziv segment quantization (LZSQ), which is a modified version of Lempel-Ziv (LZ) coding that can be applied to a continuous information source. In the present paper, LZSQ is applied to the mixed excitation linear prediction (MELP) coder, which is a standardized vocoder-type speech coder that operates at 2.4 kbit/s, in order to reduce the bit rate to below 2.4 kbit/s, while preserving the quality of the coded speech. LZSQ is applied to six coding parameters of the MELP coder to reduce the total bit rate as much as possible. As a result, the total bit rate of the modified MELP coder was reduced to 1.57 kbit/s, while the subjective quality of the modified MELP coder is equivalent to that of the previous MELP coder.

intelligent information hiding and multimedia signal processing | 2010

Improvement of Packet Loss Concealment for MP3 Audio Based on Switching of Concealment Method and Estimation of MDCT Signs

Akinori Ito; Kiyoshi Konno; Masashi Ito; Shozo Makino

This paper describes packet loss concealment methods for MP3 audio. The proposed methods are based on estimation of modified discrete cosine transform (MDCT) coefficients of the lost packets. The estimation of MDCT coefficients of lower dimensions is performed by switching two concealment methods: the sign correction method and the correlation-based method. The concealment methods are switched based on redundant side information calculated subband-by-subband for reducing MDCT prediction errors. Next, a method for improving estimation of MDCT coefficients of higher dimensions was proposed. The method estimates the absolute value and sign of an MDCT coefficient independently. The subjective evaluation experiment proved that both of the improvement methods for lower and higher dimensions effectively improved the subjective audio quality.

2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications Workshops | 2011

Utterance Classification for Combination of Multiple Simple Dialog Systems

Seong-Jun Hahm; Akinori Ito; Kentaro Awano; Masashi Ito; Shozo Makino

This paper describes an utterance classification method for combining multiple dialog systems. For reducing effort of developing spoken dialog systems, several dialog systems have been proposed that do not require complicated dialog description. However, these systems are so simple that only very limited type of dialogs are accepted by these systems. We propose a spoken dialog development by combining these simple dialog systems for developing a dialog system that accepts more flexible dialogs. Combination of dialog systems is based on utterance classification. We conducted an utterance classification experiment, and 77.1% of the utterances including out-of-task utterances were correctly classified.

international conference natural language processing | 2010

Document expansion using relevant web documents for spoken document retrieval

Ryo Masumura; Akinori Ito; Yu Uno; Masashi Ito; Shozo Makino

Recently, automatic indexing of a spoken document using a speech recognizer attracts attention. However, index generation from an automatic transcription has many problems because the automatic transcription has many recognition errors and Out-Of-Vocabulary words. To solve this problem, we propose a document expansion method using Web documents. To obtain important keywords which included in the spoken document but lost by recognition errors, we acquire Web documents relevant to the spoken document. Then, an index of the spoken document is generated by combining an index that generated from the automatic transcription and the Web documents. We propose a method for retrieval of relevant documents, and the experimental result shows that the retrieved Web document contained many OOV words. Next, we propose a method for combining the recognized index and the Web index. The experimental result shows that the index of the spoken document generated by the document expansion was closer to an index from the manual transcription than the index generated by the conventional method. Finally, we conducted a spoken document retrieval experiment, and the document-expansion-based index gave better retrieval precision than the conventional indexing method.

Computer Speech & Language | 2014