Shi-wook Lee
National Institute of Advanced Industrial Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shi-wook Lee.
international conference on acoustics, speech, and signal processing | 2005
Shi-wook Lee; Kazuyo Tanaka; Yoshiaki Itoh
The paper describes subword-based approaches for open-vocabulary spoken document retrieval. First, the feasibility of subword units in spoken document retrieval is investigated, and our previously proposed sub-phonetic segment units are compared to typical subword units, such as syllables, phonemes, and triphones. Next, we explore the linear combination of retrieval score from multiple subword representations to improve retrieval performance. Experimental evaluation of open-vocabulary spoken document retrieval tasks demonstrates that our proposed sub-phonetic segment units are more effective than typical subword units, and the linear combination of multiple subword representations resulted in a consistent improvement in the F-measure.
Multimedia Systems | 2005
Yoshiaki Itoh; Kazuyo Tanaka; Shi-wook Lee
This paper proposes a new, efficient algorithm for extracting similar sections between two time sequence data sets. The algorithm, called Relay Continuous Dynamic Programming (Relay CDP), realizes fast matching between arbitrary sections in the reference pattern and the input pattern and enables the extraction of similar sections in a frame synchronous manner. In addition, Relay CDP is extended to two types of applications that handle spoken documents. The first application is the extraction of repeated utterances in a presentation or a news speech because repeated utterances are assumed to be important parts of the speech. These repeated utterances can be regarded as labels for information retrieval. The second application is flexible spoken document retrieval. A phonetic model is introduced to cope with the speech of different speakers. The new algorithm allows a user to query by natural utterance and searches spoken documents for any partial matches to the query utterance. We present herein a detailed explanation of Relay CDP and the experimental results for the extraction of similar sections and report results for two applications using Relay CDP.
asia-pacific signal and information processing association annual summit and conference | 2013
Kazuma Konno; Yoshiaki Itoh; Kazunori Kojima; Masaaki Ishigame; Kazuyo Tanaka; Shi-wook Lee
In spoken term detection, the retrieval of OOV (Out-Of-Vocabulary) query terms are very important because query terms are likely to be OOV terms. To improve the retrieval performance for OOV query terms, the paper proposes a re-scoring method after determining the candidate segments. Each candidate segment has a matching score and a segment number. Because highly ranked candidate is usually reliable and a user is assumed to select query terms so that they are the special terms for the target documents and they appear frequently in the target documents, we give a high priority to the candidate segments that are included in highly ranked documents by adjusting the matching score. We conducted the performance evaluation experiments for the proposed method using open test collections for SpokenDoc-2 in NTCIR-10. Results showed the retrieval performance was more than 7.0 points improved by the proposed method for two test sets in the test collections, and demonstrated the effectiveness of the proposed method.
conference of the international speech communication association | 2016
Shi-wook Lee; Kazuyo Tanaka; Yoshiaki Itoh
This paper proposes a sequence-to-frame dynamic time warping (DTW) combination approach to improve out-ofvocabulary (OOV) spoken term detection (STD) performance gain. The goal of this paper is twofold: first, we propose a method that directly adopts the posterior probability of deep neural network (DNN) and Gaussian mixture model (GMM) as the similarity distance for sequence-to-frame DTW. Second, we investigate combinations of diverse schemes in GMM and DNN, with different subword units and acoustic models, estimate the complementarity in terms of performance gap and correlation of the combined systems, and discuss the performance gain of the combined systems. The results of evaluations conducted of the combined systems on an out-ofvocabulary spoken term detection task show that the performance gain of DNN-based systems is better than that of GMM-based systems. However, the performance gain obtained by combining DNNand GMM-based systems is insignificant, even though DNN and GMM are highly heterogeneous. This is because the performance gap between DNN-based systems and GMM-based systems is quite large. On the other hand, score fusion of two heterogeneous subword units, triphone and sub-phonetic segments, in DNN-based systems provides significantly improved performance.
conference of the international speech communication association | 2016
Masato Obara; Kazunori Kojima; Kazuyo Tanaka; Shi-wook Lee; Yoshiaki Itoh
There has been much discussion recently regarding spoken term detection (STD) in speech processing communities. Query-by-Example (QbE) has also been an important topic in spoken-term detection (STD), where a query is issued using a speech signal. This paper proposes a rescoring method using a posteriorgram, which is a sequence of posterior probabilities obtained by a deep neural network (DNN) to be matched against both a speech signal of a query and spoken documents. Because direct matching between two posteriorgrams requires significant computation time, we first apply a conventional STD method that performs matching at a subword or state level, where the subword denotes an acoustic model, and the state composes a hidden Markov model of the acoustic model. Both the spoken query and the spoken documents are converted to subword sequences, using an automatic speech recognizer. After obtaining scores of candidates by subword/state matching, matching at the frame level using the posteriorgram is performed with continuous dynamic programming (CDP) verification for the top N candidates acquired by the subword/state matching. The score of the subword/state matching and the score of the posteriorgram matching are integrated and rescored, using a weighting coefficient. To reduce computation time, the proposed method is restricted to only top candidates for rescoring. Experiments for evaluation have been carried out using open test collections (Spoken-Doc tasks of NTCIR-10 workshops), and the results have demonstrated the effectiveness of the proposed method.
international conference on speech and computer | 2013
Yoshiaki Itoh; Hiroyuki Saito; Kazuyo Tanaka; Shi-wook Lee
Spoken term detection STD is one of key technologies for spoken document processing. This paper describes a method to realize pseudo real-time spoken term detection using pre-retrieval results. Pre-retrieval results for all combination of syllable bigrams are prepared beforehand. The retrieval time depends on the number of candidate sections of the pre-retrieval results. Therefore, the paper proposes the method to control the retrieval time by the number. A few top candidates are obtained in almost real-time by limiting the small number of candidate sections. While a user is confirming the candidate sections, the system can conduct the rest of retrieval by increasing the number of candidate sections gradually. The paper demonstrate the proposed method enables pseudo real-time spoken term detection by evaluation experiments using actual presentation speech corpus; Corpus of Spontaneous Japanese CSJ.
spoken language technology workshop | 2008
Go Kuriki; Yoshiaki Itoh; Kazunori Kojima; Masaaki Ishigame; Kazuyo Tanaka; Shi-wook Lee
We present a method for open vocabulary retrieval based on a spoken document retrieval (SDR) system using subword models. The present paper proposes a new approach to open vocabulary SDR system using subword models which do not require subword recognition. Instead, subword sequences are obtained from the phone sequence outputted containing an out of vocabulary (OOV) word, a speech recognizer outputs a word sequence whose phone sequence is considered to be similar to the OOV word. When OOV words are provided in a query, the proposed system is able to retrieve the target section by comparing the phone sequences of the query and the word sequence generated by the speech recognizer.
spoken language technology workshop | 2006
Ken Sadohara; Shi-wook Lee; Hiroaki Kojima
The goal of the present paper is to explore the feasibility of a topic segmentation method without using large vocabulary continuous speech recognition (LVCSR). The proposed method is domain-independent in the sense that it is not constrained by vocabulary and does not require training data. For a sequence of sub-word units obtained using a continuous sub-word recognizer, the proposed method merges similar adjacent parts of the sequence in an agglomerative manner to produce a hierarchical cluster tree. The proposed method uses a string kernel to efficiently compute the similarity between two strings of sub-word units based on the frequencies of any sub-strings appearing in the strings. By carefully excluding the influence of the sub-strings that are irrelevant to the topic of interest, topically coherent clusters are formed without linguistic knowledge. An empirical study on a Japanese news speech corpus shows that the method performs better than a topic segmenter using LVCSR.
Journal of the Acoustical Society of America | 2016
Yoshino Shimizu; Eitaro Iwasaki; Shi-wook Lee; Kazuyo Tanaka; Kazunori Kojima; Yoshiaki Itoh
We propose a new integration method of multiple search results for improving search accuracy of Spoken Term Detection (STD). A usual STD system prepares two types of recognition results of spoken documents. If a query consists of in-vocabulary (IV) terms, the results using word-based recognizer are used, and if a query includes out-of-vocabulary (OOV) terms, the results using subword-based recognizer are used. The paper proposes an integration method of these two search results. Each utterance has a similarity score included in the search results. The scores of two results for an utterance has been integrated linearly using a constant weighting factor so far. Our preliminary experiments showed the search accuracy using the subword-based results was higher for some IV queries. In the same way, that using the word-based results was higher for some OOV queries. In the proposed method, the similarity scores of the two search results are compared for the same utterance and a higher weighing factor is given to ...
asia pacific signal and information processing association annual summit and conference | 2015
Ryota Konno; Kazunori Kojima; Kazuyo Tanaka; Shi-wook Lee; Yoshiaki Itoh
In spoken-term detection (STD), the detection of out-of-vocabulary (OOV) query terms is crucial because query terms are likely to be OOV terms. This paper proposes a rescoring method that uses the posterior probabilities output by a deep neural network (DNN) to improve detection accuracy for OOV query terms. Conventional STD methods for OOV query terms search a query subword sequence for subword sequences of speech data by using an automatic speech recognizer. A detailed matching in the proposed method is performed by using the probabilities output by the DNN. A pseudo query at the frame or state level is generated so as to align the obtained probability at the frame level. To reduce the computational burden on the DNN, we apply the proposed method to only top candidate utterances, which can be quickly found by a conventional STD method. Experiments were conducted to evaluate the performance of the proposed method, using the open test collections for the SpokenDoc tasks of the NTCIR-9 and NTCIR-10 workshops as benchmarks. The proposed method improved the mean average precision between 5 and 20 points, surpassing the best accuracy obtained at the workshops. These results demonstrated the effectiveness of the proposed method.
Collaboration
Dive into the Shi-wook Lee's collaboration.
National Institute of Advanced Industrial Science and Technology
View shared research outputsNational Institute of Advanced Industrial Science and Technology
View shared research outputsNational Institute of Advanced Industrial Science and Technology
View shared research outputs