Shoichi Matsunaga | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shoichi Matsunaga is active.

Explore More

Publication

Featured researches published by Shoichi Matsunaga.

international acm sigir conference on research and development in information retrieval | 2003

Speech-based and video-supported indexing of multimedia broadcast news

Yoshihiko Hayashi; Katsutoshi Ohtsuki; Katsuji Bessho; Osamu Mizuno; Yoshihiro Matsuo; Shoichi Matsunaga; Minoru Hayashi; Takaaki Hasegawa; Naruhiro Ikeda

This paper describes an automatic content indexing system for news programs, with a special emphasis on its segmentation process. The process can successfully segment an entire news program into topic-centered news stories; the primary tool is a linguistic topic segmentation algorithm. Experiments show that the resulting speech-based segments are fairly accurate, and scene change points supplied by an external video processor can be of help in improving segmentation effectiveness.

ieee automatic speech recognition and understanding workshop | 2003

Automatic indexing of multimedia content by integration of audio, spoken language, and visual information

Katsutoshi Ohtsuki; Katsuji Bessho; Yoshihiro Matsuo; Shoichi Matsunaga; Yoshihiko Hayashi

This paper describes an automatic multimedia content indexing system that includes acoustic segmentation, automatic speech recognition, topic segmentation, and video indexing features. The system is intended for indexing of multimedia news programs. Speech segments extracted from news content are delivered to the speech recognition module. The speech recognition result is segmented into topics using a segmentation algorithm based on word conceptual vectors. The indexing results derived from audio and speech information are integrated with video indexing results to extract the story structure. Experimental results show that topic segmentation using word conceptual vectors is superior to the conventional method using local word co-occurrence frequencies, and that the integrated segmentation provides better news story structures than would be possible with any single type of information.

ieee automatic speech recognition and understanding workshop | 1997

Topic extraction based on continuous speech recognition in broadcast-news speech

Katsutoshi Ohtsuki; Shoichi Matsunaga; T. Matsuoka; Sadaoki Furui

The paper reports on topic extraction in Japanese broadcast news speech. We studied, using continuous speech recognition, the extraction of several topic words from broadcast news. A combination of multiple topic words represents the content of the news. This is more detailed and more flexible than a single word or a single category. A topic extraction model shows the degree of relevance between each topic word and each word in the articles. For all words in an article, topic words which have high total relevance score are extracted from the article. We trained the topic extraction model with five years of newspapers, using the frequency of topic words taken from headlines and words in articles. The degree of relevance between topic words and words in articles is calculated on the basis of statistical measures, i.e., mutual information or the /spl chi//sup 2/ value. In topic extraction experiments for recognized broadcast news speech, we extracted five topic words using a /spl chi//sup 2/ based model and found that 75% of them agreed with topic words chosen by subjects.

Systems and Computers in Japan | 1986

Speech recognition based on top‐down and bottom‐up phoneme recognition

Shoichi Matsunaga; Kiyohiro Shikano

This paper discusses a speech recognition system which integrates the top-down and bottom-up phoneme recognitions. The system is based on the recognition of phonemes, where the top-down and bottom-up processings are combined using a table called a blackboard. In top-down processing, the segmentation and the scoring are performed for each phoneme in the total speech interval, and in the bottom-up processing, only for the interval in which the phoneme segmentation can be performed with certainty. By this scheme, the two recognition processings cooperate, while maintaining their independence. In the proposed system, the linguistic processing and the acoustic processing are structured hierarchically. The two parts are combined through the blackboard, avoiding duplicated processings in the same environment. To evaluate the constructed system, a spoken word recognition experiment with the word dictionaries composed of 100 or 643 city names, and the continuous speech recognition experiment for 235 minimal phrases uttered by two examinees were performed. It was observed as a result that the recognition performance by the traditional top-down processing is almost maintained, while the processing time is decreased to one-half or one-third in word recognition and less than one-fourth in minimal phrase recognition.

Systems and Computers in Japan | 1988

Reduction of Word and Minimal Phrase Candidates for Speech Recognition Based on Phoneme Recognition

Shoichi Matsunaga; Masaki Kohda

This paper discusses the selection of candidates in speech recognition based on the phoneme recognition. The method is based on the result of phoneme recognition for the part of speech input, for which the segmentation is performed with a high reliability. Using the information concerning the order of the phonemes or phoneme chains, and the information concerning the top and tail phonemes, the candidates are selected. Since only the part for which the segmentation can be performed with a high reliability is used, the candidate reduction has a great effect for the clearly uttered speech, and vice versa. Consequently, the method has the feature that the recognition rate is degraded less by the candidate selection. First, the proposed selection method is introduced into the word recognition. The candidate selection is applied to all words in the dictionary. A recognition experiment was performed for the cases of the word dictionary composed of 643 city names, with 100 city names uttered by 50 examinees as the input. As a result, the word candidates were reduced to 16 percent, maintaining almost the same recognition performance as in the case without candidate reduction. Next, the proposed candidate selection is introduced into the phase recognition. In the method, the location of the phoneme to be rejected is estimated in the candidate selection in the derivation of hypothesis, and based on that result, the syntax tree is back-tracked. An experiment was performed for 235 phrases uttered by 2 examinees. As a result, the phrase candidates were reduced to 21 percent, compared with the case without candidate selection.

International Journal of Pattern Recognition and Artificial Intelligence | 1994

DICTATION MACHINE BASED ON JAPANESE CHARACTER SOURCE MODELING

Kiyohiro Shikano; Tomokazu Yamada; Takeshi Kawabata; Shoichi Matsunaga; Sadaoki Furui; Toshiyuki Hanazawa

This paper describes a phonetic typewriter and a dictation machine that utilize the underlying statistical structure of phoneme or character sequences. The approach of using syllable or character trigrams is applied to language source modeling. The language source models are obtained by calculating trigram probabilities from a large text database. These models are combined with the HMM-LR continuous speech recognition system.3,6 The phonetic typewriter is tested using 274 phrases uttered by one male speaker. The syllable source model achieves a 94.9% phoneme recognition rate with the test-set phoneme perplexity of 3.9. Without the syllable source model, the phoneme recognition rate is only 73.2%. A trigram model based on characters is also evaluated. This character source model can reduce the syllable perplexity significantly to 7.7, compared with 10.5 of the syllable source model. The character source model achieves a 78.5% character transcription rate for the 274 phrase utterances. The experimental results show that a syllable source model and a character source model are very effective for realizing a Japanese dictation machine.

Archive | 2003