Akemi Iida | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Akemi Iida is active.

Explore More

Publication

Featured researches published by Akemi Iida.

Speech Communication | 2003

A corpus-based speech synthesis system with emotion

Akemi Iida; Nick Campbell; Fumito Higuchi; Michiaki Yasumura

We propose a new approach to synthesizing emotional speech by a corpus-based concatenative speech synthesis system (ATR CHATR) using speech corpora of emotional speech. In this study, neither emotional-dependent prosody prediction nor signal processing per se is performed for emotional speech. Instead, a large speech corpus is created per emotion to synthesize speech with the appropriate emotion by simple switching between the emotional corpora. This is made possible by the normalization procedure incorporated in CHATR that transforms its standard predicted prosody range according to the source database in use. We evaluate our approach by creating three kinds of emotional speech corpus (anger, joy, and sadness) from recordings of a male and a female speaker of Japanese. The acoustic characteristics of each corpus are different and the emotions identifiable. The acoustic characteristics of each emotional utterance synthesized by our method show clear correlations to those of each corpus. Perceptual experiments using synthesized speech confirmed that our method can synthesize recognizably emotional speech. We further evaluated the methods intelligibility and the overall impression it gives to the listeners. The results show that the proposed method can synthesize speech with a high intelligibility and gives a favorable impression. With these encouraging results, we have developed a workable text-to-speech system with emotion to support the immediate needs of nonspeaking individuals. This paper describes the proposed method, the design and acoustic characteristics of the corpora, and the results of the perceptual evaluations.

International Journal of Speech Technology | 2003

Speech Database Design for a Concatenative Text-to-Speech Synthesis System for Individuals with Communication Disorders

Akemi Iida; Nick Campbell

ATRs CHATR is a corpus-based text-to-speech (TTS) synthesis system that selects concatenation units from a natural speech database. The systems approach enables us to create a voice output communication aid (VOCA) using the voices of individuals who are anticipating the loss of phonatory functions. The advantage of CHATR is that individuals can use their own voice for communication even after vocal loss. This paper reports on a case study of the development of a VOCA using recordings of Japanese read speech (i.e., oral reading) from an individual with amyotrophic lateral sclerosis (ALS). In addition to using the individuals speech, we designed a speech database that could reproduce the characteristics of natural utterances in both general and specific situations. We created three speech corpora in Japanese to synthesize ordinary daily speech (i.e., in a normal speaking style): (1) a phonetically balanced sentence set, to assure that the system was able to synthesize all speech sounds; (2) readings of manuscripts, written by the same individual, for synthesizing talks regularly given as a source of natural intonation, articulation and voice quality; and (3) words and short phrases, to provide daily vocabulary entries for reproducing natural utterances in predictable situations. By combining one or more corpora, we were able to create four kinds of source database for CHATR synthesis. Using each source database, we synthesized speech from six test sentences. We selected the source database to use by observing selected units of synthesized speech and by performing perceptual experiments where we presented the speech to 20 Japanese native speakers. Analyzing the results of both observations and evaluations, we selected a source database compiled from all corpora. Incorporating CHATR, the selected source database, and an input acceleration function, we developed a VOCA for the individual to use in his daily life. We also created emotional speech source databases designed for loading separately to the VOCA in addition to the compiled speech database.

asia pacific computer and human interaction | 1998

Emotional speech as an effective interface for people with special needs

Akemi Iida; Nick Campbell; Michiaki Yasumura

The paper describes an application concept of an affective communication system for people with disabilities and elderly people, summarizes the universal nature of emotion and its vocal expression, and reports on the work on designing a corpus database of emotional speech for a speech synthesis in the proposed system. Three corpora of emotional speech (joy, anger and sadness) have been designed and tested for the use with CHATR, the concatenated speech synthesis system at ATR. Each text corpus was designed to bring out a speakers emotion. The result of perceptual experiments was proved to be significant and so was the result of CHATR synthesized speech. This indicates that the subjects successfully identified the emotion types of the synthesized speech from implicit phonetic information and hence this study has proved the validity of using a corpus of emotional speech as a database for the concatenated speech synthesis system.

Journal of the Acoustical Society of America | 2008

Developing a bilingual communication aid for a Japanese ALS patient using voice conversion technique

Akemi Iida; Shimpei Kajima; Keiichi Yasu; John M. Kominek; Yasuhiro Aikawa; Takayuki Arai

A bilingual communication aid for a Japanese amyotrophic lateral sclerosis (ALS) patient has been developed. From our previous research, a corpus-based speech synthesis method was ideal for synthesizing speech with voice quality identifiable as the patient’s own. However, a recording of a large amount of speech, which is a burden for the patient, is required for such system. In this study, a voice conversion technique was applied so that a smaller amount of recording is needed for synthesis. An English speech synthesis system with the patient’s voice was developed using Festival, a corpus-based speech synthesizer with voice conversion technique. Two methods for Japanese speech synthesis were attempted using HTS toolkit. The first used an acoustic model built from all 503 recordings of the patient. The second used an acoustic model built from 503 wavefiles of which voice was converted to the patient’s from a native speaker’s. The latter method requires fewer recordings of the patient’s. The result of the perceptual experiment showed that the voice synthesized with the latter was perceived to have a closer voice quality to the patient’s natural speech. Lastly, GUI on windows was developed for the patient to synthesize speech by typing in the text.

Journal of the Acoustical Society of America | 2006

Building an English speech synthetic voice using a voice transformation model from a Japanese male voice

Akemi Iida; Shimpei Kajima; Kiichi Yasu; Takayuki Arai; Tsutomu Sugawara

This work reports development of an English speech synthetic voice using a voice transformation model for a Japanese amyotrophic lateral sclerosis patient as part of a project of developing a bilingual communication aid for this patient. The patient, who had a tracheotomy 3 years ago and had difficulty in speaking, wishes to speak in his own voice in his native language and in English. A Japanese speech synthesis system was developed using atr chatr 6 years ago and the authors have worked in developing a diphone‐based synthesis using festival speech synthesis system and festvox by having the patient read the diphone list. However, it was not an easy task for the patient to phonate and, moreover, to pronounce words in a foreign language. We therefore used a voice transformation model in festival to develop the patient’s English speech synthetic voice which enables text‐to‐speech synthesis. We trained using 30 sentences read by the patient and those synthesized with an existing festival diphone voice create...

international symposium on computer architecture | 2000