Takafumi Koshinaka | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Takafumi Koshinaka is active.

Explore More

Publication

Featured researches published by Takafumi Koshinaka.

international conference on acoustics, speech, and signal processing | 2010

Speech modeling based on committee-based active learning

Yuzo Hamanaka; Koichi Shinoda; Sadaoki Furui; Tadashi Emori; Takafumi Koshinaka

We propose a committee-based active learning method for large vocabulary continuous speech recognition. In this approach, multiple recognizers are prepared beforehand, and the recognition results obtained from them are used for selecting utterances. Here, a progressive search method is used for aligning sentences, and voting entropy is used as a measure for selecting utterances. We apply our method not only to acoustic models but also to language models and their combination. Our method was evaluated by using 190-hour speech data in the Corpus of Spontaneous Japanese. It proved to be significantly better than random selection. It only required 63 h of data to achieve a word accuracy of 74%, while standard training (i.e., random selection) required 97 h of data. The recognition accuracy of our proposed method was also better than that of the conventional uncertainty sampling method using word posterior probabilities as the confidence measure for selecting sentences.

international conference on acoustics, speech, and signal processing | 2009

Online speaker clustering using incremental learning of an ergodic hidden Markov model

Takafumi Koshinaka; Kentaro Nagatomo; Koichi Shinoda

A novel online speaker clustering method suitable for real-time applications is proposed. Using an ergodic hidden Markov model, it employs incremental learning based on a variational Bayesian framework and provides probabilistic (non-deterministic) decisions for each input utterance, directly considering the specific history of preceding utterances. It makes possible more robust cluster estimation and precise classification of utterances than do conventional online methods. Experiments on meeting-speech data show that the proposed method produces 70–80% fewer errors than a conventional method does.

international conference on acoustics, speech, and signal processing | 2005

An HMM-based text segmentation method using variational Bayes approach and its application to LVCSR for broadcast news

Takafumi Koshinaka; Ken-ichi Iso; Akitoshi Okumura

Recent progress in large vocabulary continuous speech recognition (LVCSR) has raised the possibility of applying information retrieval techniques to the resulting text. This paper presents a novel unsupervised text segmentation method. Assuming a generative model of a text stream as a left-to-right hidden Markov model (HMM), text segmentation can be formulated as model parameter estimation and model selection using the text stream. The formulation is derived based on the variational Bayes framework, which is expected to work well with highly sparse data such as text. The effectiveness of the proposed method is demonstrated through a series of experiments, where broadcast news programs are automatically transcribed and segmented into separate news stories.

international conference on acoustics, speech, and signal processing | 2016

Domain adaptation using maximum likelihood linear transformation for PLDA-based speaker verification

Qiongqiong Wang; Hitoshi Yamamoto; Takafumi Koshinaka

While i-vector-PLDA frameworks employing huge amounts of development data have achieved significant success in speaker recognition, it is infeasible to collect a sufficiently large amount of data for every real application. This paper proposes a method to perform supervised domain adaptation of PLDA in i-vector-based speaker recognition systems with available resource-rich mismatched data and small amounts of matched data, under two assumptions: (1) between-speaker and within-speaker covariances depend on domains; (2) features in one domain can be transformed into another domain by means of an affine transformation. Maximum likelihood linear transformation (MLLT) is used to infer the relationship between the datasets of two domains in training PLDA. The proposed method improves performance over that achieved without adaptation. Using a score fusion technique, it outperforms a conventional method based on linear combination.

international conference on acoustics, speech, and signal processing | 2013

Anomaly detection of motors with feature emphasis using only normal sounds

Yumi Ono; Yoshifumi Onishi; Takafumi Koshinaka; Soichiro Takata; Osamu Hoshuyama

This paper proposes an anomaly detection method for sound signals observed from motors in operation without using abnormal signals. It is based on feature emphasis and effectively detects anomalies that appear in a small subset of features. To emphasize the features, the method optimally estimates the contribution rates of various features to the dissimilarity score between an observed signal and the distribution of normal signals. We report here our evaluation of the method using sound data observed from PCs and fans in operation. The evaluation demonstrates that the proposed method emphasizes a small subset of narrow frequency ranges of sounds and that it achieves an error reduction rate of up to 76%.

spoken language technology workshop | 2012

A noise-robust speech recognition method composed of weak noise suppression and weak Vector Taylor Series Adaptation

Shuji Komeiji; Takayuki Arakawa; Takafumi Koshinaka

This paper proposes a noise-robust speech recognition method composed of weak noise suppression (NS) and weak Vector Taylor Series Adaptation (VTSA). The proposed method compensates defects of NS and VTSA, and gains only the advantages by them. The weak NS reduces distortion by over-suppression that may accompany noise-suppressed speech. The weak VTSA avoids over-adaptation by offsetting a part of acoustic-model adaptation that corresponds to the suppressed noise. Evaluation results with the AURORA2 database show that the proposed method achieves as much as 1.2 points higher word accuracy (87.4%) than a method with VTSA alone (86.2%) that is always better than its counterpart with NS.

asia pacific signal and information processing association annual summit and conference | 2016

Fast and accurate personal authentication using ear acoustics

Takayuki Arakawa; Takafumi Koshinaka; Shohei Yano; Hideki Irisawa; Ryoji Miyahara; Hitoshi Imaoka

This paper presents a biometric personal-authentication method that exploits acoustic characteristics of human ears. It transmits a probe signal into the ear and receives its reflection, which contains personal identity information about the shape of the ear canal. Based on a study of effective and efficient acoustic feature representation and the use of audio equipment suitable for acquiring features with low within-individual variability, the proposed method achieves a promising equal error rate of 0.97% with only 12 feature components. A prototype system for Android smartphones is also presented.

Archive | 2008