Yoshiaki Itoh | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yoshiaki Itoh is active.

Explore More

Publication

Featured researches published by Yoshiaki Itoh.

Journal of Information Processing | 2009

Construction of a Test Collection for Spoken Document Retrieval from Lecture Audio Data

Tomoyosi Akiba; Kiyoaki Aikawa; Yoshiaki Itoh; Tatsuya Kawahara; Hiroaki Nanjo; Hiromitsu Nishizaki; Norihito Yasuda; Yoichi Yamashita; Katunobu Itou

The lecture is one of the most valuable genres of audiovisual data. Though spoken document processing is a promising technology for utilizing the lecture in various ways, it is difficult to evaluate because the evaluation require a subjective judgment and/or the verification of large quantities of evaluation data. In this paper, a test collection for the evaluation of spoken lecture retrieval is reported. The test collection consists of the target spoken documents of about 2, 700 lectures (604 hours) taken from the Corpus of Spontaneous Japanese (CSJ), 39 retrieval queries, the relevant passages in the target documents for each query, and the automatic transcription of the target speech data. This paper also reports the retrieval performance targeting the constructed test collection by applying a standard spoken document retrieval (SDR) method, which serves as a baseline for the forthcoming SDR studies using the test collection.

Multimedia Systems | 2005

An algorithm for similar utterance section extraction for managing spoken documents

Yoshiaki Itoh; Kazuyo Tanaka; Shi-wook Lee

This paper proposes a new, efficient algorithm for extracting similar sections between two time sequence data sets. The algorithm, called Relay Continuous Dynamic Programming (Relay CDP), realizes fast matching between arbitrary sections in the reference pattern and the input pattern and enables the extraction of similar sections in a frame synchronous manner. In addition, Relay CDP is extended to two types of applications that handle spoken documents. The first application is the extraction of repeated utterances in a presentation or a news speech because repeated utterances are assumed to be important parts of the speech. These repeated utterances can be regarded as labels for information retrieval. The second application is flexible spoken document retrieval. A phonetic model is introduced to cope with the speech of different speakers. The new algorithm allows a user to query by natural utterance and searches spoken documents for any partial matches to the query utterance. We present herein a detailed explanation of Relay CDP and the experimental results for the extraction of similar sections and report results for two applications using Relay CDP.

multimedia signal processing | 2008

Highlight scene extraction of sports broadcasts using sports news programs

Yoshiaki Itoh; Shigenobu Sakaki; Kazunori Kojima; Masaaki Ishigame

This paper proposes a new approach for extracting highlight scenes from sports broadcasts by using sports news programs. In order to extract the highlight scenes from sports broadcasts without fail, we use sports news programs and identify identical or similar sections between sports broadcasts and sports news programs that cover the sports broadcasts. To extract identical or similar sections between two video data sets efficiently, we developed a two-step method that combines relay-CDP and active-search. We evaluated this method from the standpoint of the extraction accuracy of the highlight scenes, and computation time, through experiments using actual broadcast data sets.

multimedia signal processing | 2010

Time-space acoustical feature for fast video copy detection

Yoshiaki Itoh; Masahiro Erokuumae; Kazunori Kojima; Masaaki Ishigame; Kazuyo Tanaka

We propose a new time-space acoustical feature for fast video copy detection to search a video segment for a number of video streams to find illegal video copies on Internet video site and so on. We extract a small number of feature vectors from acoustically peculiar points that express the point of local maximum/minimum in the time sequence of acoustical power envelopes in video data. The relative values of the feature points are extracted, so called time-space acoustical feature, because the volume in the video stream differs in different recording environments. The features can be obtained quickly compared with representative features such as MFCC, and they require a short processing time for matching because the number and the dimension of each feature vector are both small. The accuracy and the computation time of the proposed method is evaluated using recorded TV movie programs for input data, and a 30 sec. −3 min. segment in DVD for reference data, assuming a copyright holder of a movie searches the illegal copies for video streams. We could confirm that the proposed method completed all processes within the computation time of the former feature extraction with 93.2% of F-measure in 3 minutes video segment detection.

spoken language technology workshop | 2008

Open vocabulary spoken document retrieval by subword sequence obtained from speech recognizer

Go Kuriki; Yoshiaki Itoh; Kazunori Kojima; Masaaki Ishigame; Kazuyo Tanaka; Shi-wook Lee

We present a method for open vocabulary retrieval based on a spoken document retrieval (SDR) system using subword models. The present paper proposes a new approach to open vocabulary SDR system using subword models which do not require subword recognition. Instead, subword sequences are obtained from the phone sequence outputted containing an out of vocabulary (OOV) word, a speech recognizer outputs a word sequence whose phone sequence is considered to be similar to the OOV word. When OOV words are provided in a query, the proposed system is able to retrieve the target section by comparing the phone sequences of the query and the word sequence generated by the speech recognizer.

Eurasip Journal on Audio, Speech, and Music Processing | 2008

Automatic music boundary detection using short segmental acoustic similarity in a music piece

Yoshiaki Itoh; Akira Iwabuchi; Kazunori Kojima; Masaaki Ishigame; Kazuyo Tanaka; Shi-wook Lee

The present paper proposes a new approach for detecting music boundaries, such as the boundary between music pieces or the boundary between a music piece and a speech section for automatic segmentation of musical video data and retrieval of a designated music piece. The proposed approach is able to capture each music piece using acoustic similarity defined for short-term segments in the music piece. The short segmental acoustic similarity is obtained by means of a new algorithm called segmental continuous dynamic programming, or segmental CDP. The location of each music piece and its music boundaries are then identified by referring to multiple similar segments and their location information, avoiding oversegmentation within a music piece. The performance of the proposed method is evaluated for music boundary detection using actual music datasets. The present paper demonstrates that the proposed method enables accurate detection of music boundaries for both the evaluation data and a real broadcasted music program.

multimedia signal processing | 2007

Music Boundary Detection Using Similarity in a Music Selection

Yoshiaki Itoh; A. Iwabuchi; K. Kqjima; Masaaki Ishigame; K. Tanaka; Shi-vvook Lee

This paper proposes a new method of extracting music boundaries, such as a boundary between musical selections, or a boundary between a musical selection and a speech, for automatic segmentation of ideo data and other applications. The method utilizes acoustic similarity in a music selection. Similar partial sections are first extracted, by means of a new algorithm called Segmental Continuous Dynamic Programming, or Segmental CDP. The music boundary is identified by reference to multiple similar sections and their location information, as extracted by Segmental CDP. The performance of the proposed method is evaluated for music boundary extraction using actual music data sets. The study demonstrates that the proposed method enables to extract music boundaries well for both evaluation data and a real broadcasted music program.

conference of the international speech communication association | 2010

Constructing Japanese Test Collections for Spoken Term Detection

Yoshiaki Itoh; Hiromitsu Nishizaki; Xinhui Hu; Hiroaki Nanjo; Tomoyosi Akiba; Tatsuya Kawahara; Seiichi Nakagawa; Tomoko Matsui; Yoichi Yamashita; Kiyoaki Aikawa

conference of the international speech communication association | 2006