Shuichi Itahashi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shuichi Itahashi is active.

Explore More

Publication

Featured researches published by Shuichi Itahashi.

international conference on spoken language processing | 1996

Segmentation of spoken dialogue by interjections, disfluent utterances and pauses

Kazuyuki Takagi; Shuichi Itahashi

The paper attempts to segment spontaneous speech of human to human spoken dialogues into a relatively large unit of speech, that is, a sub phrasal unit segmented by interjections, disfluent utterances and pauses. A spontaneous speech model incorporating prosody was developed, in which three kinds of speech segment models and the transition probabilities among them were specified. The segmentation experiments showed that 87.6% of the segment boundaries were located correctly within 50 msec, 81.2% within 30 msec, which showed 10.1 point increase in performance comparing with the initial model without prosodic information.

Journal of the Acoustical Society of America | 1996

Design and data collection for a spoken dialog database in the Real World Computing (RWC) program

Kazuyo Tanaka; Satoru Hayamizu; Yoichi Yamashita; Kiyohiro Shikano; Shuichi Itahashi; Ryu-ichi Oka

The RWC program is constructing substantial databases for advancing and evaluating research and development conducted under its program and related domains. In this presentation the motivation of this effort, a basic design of spoken dialog databases, and the current status of data collection work are described. At the first stage, some fundamental data collection has been carried out to determine several environmental conditions and data‐filing specifications. Here, two topics are selected for the dialog: one was dialogs between car dealers and customers, and the other was dialogs between travel agents and customers. Professional dealers and agents were employed to produce reality in the conversations. To date, 60 samples of dialogs were recorded and 48 of them were filed into CD‐ROMs which included about 10 h of speech waveforms with transcriptions and labeling‐related information. The speech data are almost completely spontaneous but are of good quality in the acoustic‐phonetic sense. The CD‐ROMs are r...

2009 Oriental COCOSDA International Conference on Speech Database and Assessments | 2009

Utilization of acoustical feature in visualization of multiple speech corpora

Kimiko Yamakawa; Hideaki Kikuchi; Tomoko Matsui; Shuichi Itahashi

The purpose of this study is to visualize the similarities among multiple speech corpora. In order for users to easily utilize various speech corpora, we reported a visualization method based on the corpus attribute using MDS. We had proposed the eight attributes as the speech corpus features. However, these attributes contained no acoustical feature of the speech corpus. The acoustical feature is important information in some intended use of corpus. In this paper, we propose a new attribute, the acoustical feature of speech corpora, in addition to the conventional attributes. The results of visualization indicates that the method using the new attribute can visualize better the similarities between multiple speech corpora. This will facilitate searching efficiently the specific corpus that fits a users needs. Based on the obtained results, we built a corpus search system which corpus users can use as a benchmark of corpus selection. The outline and possibility of this system are described.

2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) | 2011

Interactive visualization and search system for speech corpora

Shuichi Itahashi; Tomoko Kajiyama; Kimiko Yamakawa; Yuichi Ishimoto; Tomoko Matsui

We have already reported a corpus similarity visualization method based on the corpus attribute using multidimensional scaling that makes it easy for users to utilize various speech corpora. In this paper, we present a revised visualization method that is based on a ring structure like a planisphere. By using only a mouse, a user can choose appropriate search keys for each of the multiple attributes and can easily filter information by adjusting the keys. Retrieved results are displayed inside the rings, and the user can filter and browse them in real time. This will facilitate efficient searching of the specific corpus that fits users needs.

Journal of the Acoustical Society of America | 2015

Investigating the effectiveness of Globalvoice CALL software in native Japanese English speech

Hiroyuki Obari; Hiroaki Kojima; Shuichi Itahashi

Many variables exist in the evaluation of the speech of non-native speakers of English. In previous research, rhythmic accent and pauses are more salient than segmental features in English utterances to make one’s speech more intelligible. In this regard, prosodic features such as intonation and rhythm are crucial in comprehensible speech. One of the main goals of English education in Japan is to help Japanese students speak English more intelligibly in order to be more clearly understood while taking part in international communication. In this talk, several key parameters such as speech duration, speech power, and F0 (pitch), are all introduced to help determine to what extent Japanese students are able to improve their English pronunciation through the use of Globalvoice CALL software. Globalvoice CALL enables students to input English words or sentences for the purpose of practicing their pronunciation and prosody in order to reduce their Japanese-accented speech. The purpose of this paper is to inves...

2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA) | 2014

Revised catalogue specifications of speech corpora with user-friendly visualization and search system

Shuichi Itahashi; Tomoko Ohsuga; Yuichi Ishimoto; Hiroaki Kojima; Kiyotaka Uchimoto; Shunsuke Kozawa

It is well known that speech corpora are indispensable to speech research; several data centers of speech corpora have been set up worldwide in order to meet this demand that serve as a repository for various speech corpora. However, they use different specification systems for their corpora, and so it is difficult for speech corpora users to compare and select suitable corpora. It would be more convenient for the users if each data center used a common specification system for describing its corpora. Based on this idea, we have already proposed a set of specification attributes and items as the first step towards standardization, but the scale of the retrieval system was limited. This paper introduces a revised version of the speech corpora specification attributes and items to be connected with the large-scale metadata database “SHACHI” combined with the “Concentric Ring View (CRV) System” to improve the user interface.

Journal of the Acoustical Society of America | 2006

Marking up Japanese Map Task Dialogue Corpus

Tomoko Ohsuga; Shuichi Itahashi; Syun Tutiya

This presentation reports an outline of marking up the Japanese Map Task Dialogue Corpus. The project was conducted by an independent group of researchers from the faculties of Chiba University and other institutions. This corpus contains 128 dialogues by 64 Japanese native speakers. The basic design of the dialogues and recordings conform to the original HCRC Map Task Corpus of the University of Edinburgh. Two speakers participated in the map task dialogues; an instruction giver who has a map with a route and an instruction follower who has a map without a route. The giver verbally instructs the follower to draw the route on his/her map. The corpus is marked up according to XML‐based TEI (Text Encoding Initiative) P5 format, which is developed in an effort to provide interchangeable and shareable text data. The transcriptions of TEI format are viewed as ‘‘tags‘‘ that describe the start and end times of utterances, the duration of pauses, nonverbal events, and synchronization of overlapping utterances. Th...

Recent Research Towards Advanced Man-Machine Interface Through Spoken Language | 1996

Considerations on a Common Speech Database

Shuichi Itahashi

Publisher Summary As information processing technology develops, the associated input/output modalities changed from being totally dependent on the characteristics of the machine to accommodating the characteristics of human beings. Speech is the principal human input/output modality. To promote speech processing studies, a lot of speech data of various kinds spoken by many people are required. To develop speech processing systems, it is necessary to compare and estimate the performance of various analyses, syntheses, and recognition methods. The best way to do so is to analyze, synthesize, and recognize common speech data according to each method and compare the results. A collection of speech data used for this purpose is generally called a speech database or speech corpus. This chapter discusses various aspects, which should be considered when speech databases are created. It includes choosing the word sets and speakers, presentation of text and utterance timing, recording medium, microphone, editing, and labeling. This chapter also mentions utilization of speech databases such as recognition performance indices, choosing the vocabulary subsets, and controlled distribution of the database.

Journal of the Acoustical Society of America | 1988

Formant frequency estimation by moment calculation of the speech spectrum

Kazuyuki Takagi; Shuichi Itahashi

Moment calculation is applied to extract the formant frequencies of a speech spectrum. Three kinds of first‐order moments divide a spectrum into four frequency regions. The centers of gravity of the first three regions are calculated to give the 0th order estimation of the 1st, 2nd, and 3rd formant frequencies. Then the upper and the lower bounds of each region are modified so that the estimated frequency comes closer to the major peak of the spectrum, utilizing the second‐order and the third‐order moments that represent the variance and skewness of the spectral pattern. The process repeats until the k th estimation equals the (k − 1) th estimation. This modification improves the estimation precision significantly. An experiment with model spectra generated by an all‐pole model gave estimation precision of 3% using formant frequencies typical of the five Japanese vowels. Speech materials spoken by five male and five female speakers were used for this experiment. The speech waveform was sampled at a rate o...

The Journal of The Acoustical Society of Japan (e) | 1999