Hiromitsu Nishizaki | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hiromitsu Nishizaki is active.

Explore More

Publication

Featured researches published by Hiromitsu Nishizaki.

Journal of Information Processing | 2013

Spoken Term Detection Using Phoneme Transition Network from Multiple Speech Recognizers' Outputs

Satoshi Natori; Yuto Furuya; Hiromitsu Nishizaki; Yoshihiro Sekiguchi

Spoken Term Detection (STD) that considers the out-of-vocabulary (OOV) problem has generated signifi- cant interest in the field of spoken document processing. This study describes STD with false detection control using phoneme transition networks (PTNs) derived from the outputs of multiple speech recognizers. PTNs are similar to subword-based confusion networks (CNs), which are originally derived from a single speech recognizer. Since PTN- formed index is based on the outputs of multiple speech recognizers, it is robust to recognition errors. Therefore, PTN should also be robust to recognition errors in an STD task, when compared to the CN-formed index from a single speech recognition system. Our PTN-formed index was evaluated on a test collection. The experiment showed that the PTN-based approach effectively detected OOV terms, and improved the F-measure value from 0.370 to 0.639 when compared with a baseline approach. Furthermore, we applied two false detection control parameters, one is based on the majority voting scheme. The other is a measure of the ambiguity of CN, to the calculation of detection score. By introducing these parameters, the performance of STD was found to be better (0.736 for the F-measure value) than that without any parameters (0.639).

north american chapter of the association for computational linguistics | 2004

An empirical study on multiple LVCSR model combination by machine learning

Takehito Utsuro; Yasuhiro Kodama; Tomohiro Watanabe; Hiromitsu Nishizaki; Seiichi Nakagawa

This paper proposes to apply machine learning techniques to the task of combining outputs of multiple LVCSR models. The proposed technique has advantages over that by voting schemes such as ROVER, especially when the majority of participating models are not reliable. In this machine learning framework, as features of machine learning, information such as the model IDs which output the hypothesized word are useful for improving the word recognition rate. Experimental results show that the combination results achieve a relative word error reduction of up to 39% against the best performing single model and that of up to 23% against ROVER. We further empirically show that it performs better when LVCSR models to be combined are chosen so as to cover as many correctly recognized words as possible, rather than choosing models in descending order of their word correct rates.

asia-pacific signal and information processing association annual summit and conference | 2013

Entropy-based false detection filtering in spoken term detection tasks

Satoshi Natori; Yuto Furuya; Hiromitsu Nishizaki; Yoshihiro Sekiguchi

This paper describes spoken term detection (STD) and inexistent STD (iSTD) methods using term detection entropy based on a phoneme transition network (PTN)-formed index. Our previously reported STD method uses a PTN derived from multiple automatic speech recognizers (ASRs) as an index. A PTN is almost the same as a sub-word-based confusion network, which is derived from the output of an ASR. In the previous study, our PTN was very effective in detecting query terms. However, the PTN generated many false detection errors. In this study, we focus on entropy of the PTN-formed index. Entropy is used to filter out false detection candidates in the second pass of the STD process. Our proposed method was evaluated using the Japanese standard test-set for the STD and iSTD tasks. The experimental results of the STD task showed that entropy-based filtering is effective for improving STD at a high-recall range. In addition, entropy-based filtering was also demonstrated to work well for the iSTD task.

asia pacific signal and information processing association annual summit and conference | 2014

Re-ranking of spoken term detections using CRF-based triphone detection models

Naoki Sawada; Satoshi Natori; Hiromitsu Nishizaki

Conventional spoken term detection (STD) techniques, which use a text-based matching approach based on automatic speech recognition (ASR) systems, are not robust for speech recognition errors. This paper proposes a conditional random fields (CRF)-based re-ranking approach, which recomputes detection scores produced by a phoneme-based dynamic time warping (DTW) STD approach. In the re-ranking approach, we tackle STD as a sequence labeling problem. We use CRF-based triphone detection models based on features generated from multiple types of phoneme-based transcriptions. They train recognition error patterns such as phoneme-to-phoneme confusions on the CRF framework. Therefore, the models can detect a triphone, which is one of triphones composing a query term, with detection probability. In the experimental evaluation on the Japanese OOV test collection, the CRF-based approach alone could not outperform the conventional DTW-based approach we have already proposed; however, it worked well in the re-ranking (second-pass) process for the detections from the DTW-based approach. The CRF-based re-ranking approach made a 2.4% improvement of F-measure in the STD performance.

asia-pacific signal and information processing association annual summit and conference | 2013

Evaluation of the usefulness of spoken term detection in an electronic note-taking support system

Chifuyu Yonekura; Yuto Furuya; Satoshi Natori; Hiromitsu Nishizaki; Yoshihiro Sekiguchi

The usefulness of a spoken term detection (STD) technique in an electronic note-taking support system is assessed through a subjective evaluation experiment. In this experiment, while listening to a lecture, subjects recorded electronic notes using the system. They answered questions related to the lecture while browsing the recorded notes. The response time required to correctly answer the questions was measured. When the subjects browsed the notes, half of them used the STD technique and half did not. The experimental results indicate that the subjects who used the STD technique answered all questions faster than those who did not use the STD technique. This indicates that the STD technique worked well in the electronic note-taking system.

conference of the international speech communication association | 2016

Recurrent Neural Network-Based Phoneme Sequence Estimation Using Multiple ASR Systems' Outputs for Spoken Term Detection.

Naoki Sawada; Hiromitsu Nishizaki

This paper describes a novel correct phoneme sequence estimation method that uses a recurrent neural network (RNN)-based framework for spoken term detection (STD). In an automatic speech recognition (ASR)-based STD framework, ASR performance (word or subword error rate) affects STD performance. Therefore, it is important to reduce ASR errors to obtain good STD results. In this study, we use an RNN-based phoneme estimator, which estimates a correct phoneme sequence of an utterance from some sorts of phoneme-based transcriptions produced by multiple ASR systems in post-processing, to reduce phoneme errors. With two types of test speech corpora, the proposed phoneme estimator obtained phoneme-based N-best transcriptions with fewer phoneme recognition errors than the N-best transcriptions from the best ASR system we prepared. In addition, the STD system with the RNN-based phoneme estimator drastically improved STD performance with two test collections for STD compared to our previously proposed STD system with a conditional random fields-based phoneme estimator.

asia pacific signal and information processing association annual summit and conference | 2015

Score normalization using phoneme-based entropy for spoken term detection

Hiromitsu Nishizaki; Naoki Sawada

This study investigates and demonstrates the effectiveness of utilizing the entropy of a query term in spoken term detection (STD) for score normalization. It is important to normalize scores of detected terms because the optimal threshold for the decision process of detected candidates is commonly set for all query terms. A query term with higher phoneme-based entropy rather than the average entropy value of a query set is probably difficult to correctly recognize using automatic speech recognition. Thus, it cannot be detected with high accuracy if the same threshold is set for all query terms. Therefore, we propose a score normalization method in which a calibrated matching score between a query term and an index made of target spoken documents is dynamically calculated using phoneme-based entropy of the query term on a dynamic time warping-based STD framework. We evaluated this framework with query entropy on an STD task. The result indicated that it worked quite well and significantly improved STD performance compared with the baseline STD system with a pooling-based evaluation framework.

asia pacific signal and information processing association annual summit and conference | 2014

Selection of best match keyword using spoken term detection for spoken document indexing

Kentaro Domoto; Takehito Utsuro; Naoki Sawada; Hiromitsu Nishizaki

This paper presents a novel keyword selection-based spoken document-indexing framework that selects the best match keyword from query candidates using spoken term detection (STD) for spoken document retrieval. Our method comprises creating a keyword set including keywords that are likely to be in a spoken document. Next, an STD is conducted for all the keywords as query terms for STD; then, the detection result, a set of each keyword and its detection intervals in the spoken document, is obtained. For the keywords that have competitive intervals, we rank them based on the matching cost of STD and select the best one with the longest duration among competitive detections. This is the final output of STD process and serves as an index word for the spoken document. The proposed framework was evaluated on lecture speeches as spoken documents in an STD task. The results show that our framework was quite effective for preventing false detection errors and in annotating keyword indices to spoken documents.

NTCIR | 2011