Dean Luo
University of Tokyo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dean Luo.
ieee automatic speech recognition and understanding workshop | 2009
Masayuki Suzuki; Nobuaki Minematsu; Dean Luo; Keikichi Hirose
Automatic estimation of pronunciation proficiency has its specific difficulty. Adequacy in controlling the vocal organs can be estimated from spectral envelopes of input utterances but the envelope patterns are also affected easily by different speakers. To develop a pedagogically sound method for automatic estimation, the envelope changes caused by linguistic factors and those by extra-linguistic factors should be properly separated. For this aim, in our previous study [1], we proposed a mathematically-guaranteed and linguistically-valid speaker-invariant representation of pronunciation, called speech structure. After the proposal, we have examined that representation also for ASR [2], [3], [4] and, through these works, we have learned better how to apply speech structures to various tasks. In this paper, we focus on a proficiency estimation experiment done in [1] and, based on our recently proposed techniques for the structures, we carry out that experiment again but under new and different conditions. Here, we use smaller units of structural analysis, speaker-invariant substructures, and relative structural distances between a learner and a teacher. Results show that correlations between human and machine rating are improved and also show extremely higher robustness to speaker differences compared to widely used GOP scores. Further, we also demonstrate that the proposed representation can classify learners purely based on their pronunciation proficiency, not affected by their age and gender.
Iet Signal Processing | 2013
Yu Qiao; Dean Luo; Nobuaki Minematsu
Automatic phoneme segmentation of a speech sequence is a basic problem in speech engineering. This study investigates unsupervised phoneme segmentation without using prior information on linguistic contents and acoustic models of an input sequence. The authors formulate the unsupervised segmentation as an optimal problem by means of maximum likelihood, and show that the optimal segmentation corresponds to minimising the coding length of the input sequence. Under different assumptions, five different objective functions are developed, namely log determinant, rate distortion (RD), Bayesian log determinant, Mahalanobis distance and Euclidean distance objectives. The authors prove that the optimal segmentations have the transformation-invariant properties, introduce a time-constrained agglomerative clustering algorithm to find the optimal segmentations, and propose an efficient implementation of the algorithm by using integration functions. The experiments are carried out on the TIMIT database to compare the above five objective functions. The results show that RD achieves the best performance, and the proposed method outperforms the previous unsupervised segmentation methods.
conference of the international speech communication association | 2009
Dean Luo; Yu Qiao; Nobuaki Minematsu; Yutaka Yamauchi; Keikichi Hirose
conference of the international speech communication association | 2010
Dean Luo; Yu Qiao; Nobuaki Minematsu; Yutaka Yamauchi; Keikichi Hirose
conference of the international speech communication association | 2008
Dean Luo; Naoya Shimomura; Nobuaki Minematsu; Yutaka Yamauchi; Keikichi Hirose
international symposium on chinese spoken language processing | 2008
Dean Luo; Nobuaki Minematsu; Yutaka Yamauchi; Keikichi Hirose
conference of the international speech communication association | 2006
Chiharu Tsurutani; Yutaka Yamauchi; Nobuaki Minematsu; Dean Luo; Kazutaka Maruyama; Keikichi Hirose
symposium on languages, applications and technologies | 2009
Dean Luo; Nobuaki Minematsu; Yutaka Yamauchi; Keikichi Hirose
symposium on languages, applications and technologies | 2009
Masayuki Suzuki; Dean Luo; Nobuaki Minematsu; Keikichi Hirose
conference of the international speech communication association | 2013
Chiharu Tsurutani; Dean Luo