Du Limin
Chinese Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Du Limin.
international conference on signal processing | 1998
Zhang Xin; Xu Yanjun; Du Limin
In the present paper we suggest a new method for inferring the face location and scale from the color information. At first facial region segmentation is performed with the vertical projection of the gray level pixel value. Second a horizontal projection can be used to localize the vertical edges of the eyes. Within the borders of the eyes we use a gradient decomposed Hough transform to find eye centres in the low brightness region. Once the eyes are reliably detected, we can easily find the position of the mouth depending on a natural constant biometric relation. We calculating two projection curves with the color information. Computer simulation shows that this algorithm is robust.
international conference on signal processing | 2004
Xie Lingyun; Du Limin
The efficiency of search algorithm is the key problem of large vocabulary continuous speech recognition (LVCSR) systems. This paper explores this issue and presents a dynamic pruning method based on Viterbi beam search algorithm. At each time frame, the proposed method can adjust the beam widths based on the current situation of search process. The experimental results show that it can reduce the computation complexity without degrading the recognition rate much. It is advantageous for implementations on real-time systems.
international conference on signal processing | 1996
Hu Hongtao; Du Limin
A pre-classification method and its digital signal processing algorithm implemented by short time Fourier transform for Chinese voiceless consonant speech are proposed. The important features of the spike fill of stops and C-V (C denotes voiceless consonant) boundary are detected and marked automatically first. Then, the Chinese voiceless consonants are divided into stops and non-stops according to these distinctive features. The stops are further divided into unaspirated stops, aspirated-fricative stops. Testing on a data set of 910 C-V syllables from a database of 1267 Chinese all-syllable tokens spoken by a male speaker shows that the algorithm performs well with a 92.4% average correct rate. The proposed method and algorithm are of value to improve the mechanism and performance of a Chinese speech recognition system.
ieee region 10 conference | 2002
Deng Haojiang; Du Limin; Wang Shoujue
The backpropagation neural network (BPNN) has been researched and applied to solve the problem that the training time of the backpropagation network can be excessive, so the structure and training algorithm of priority ordered BP neural networks are proposed. The neurons of its output layer have priority ordered interconnections, during the training course, the training data tails off gradually, so the algorithm may converge rapidly because of the decrease of the complexity of performance function. Compared with the conventional BPNN, the total iterative epochs of priority ordered BPNN are far lower and the performance function can converge more rapidly in a text-independent speaker identification task.
international conference on signal processing | 1998
Hongtao Hu; Du Limin
Based on its physical mechanism and enhancement of the voiced/unvoiced feature, a new method for automatic segmentation of Chinese continuous speech into voiced/unvoiced units is presented, using the wavelet transform. The accuracy and the precision are improved by using the proposed multi-threshold detector.
international conference on signal processing | 2004
Deng Haojiang; Du Limin; Wan Hongjie
In this paper, the text-independent speaker recognition system based on the adapted GMMs was established, and the speaker-independent background model and speaker-dependent models of cohort speaker sets were used to normalize the likelihood score. The approaches to combine likelihood scores using linear and SVM (support vector machine) method in score domain was proposed. The speaker verification experiments over telephone channels showed that based on the likelihood ratio of adapted GMMs system, combination of likelihood scores can improve the verification performance of baseline system using universal background model (UBM). Specially, the approach of score combination using SVM achieved the best performance.
international symposium on intelligent multimedia video and speech processing | 2001
Yan Zhaoli; Du Limin; Feng Ji; Xie Lingyun; Fu Jun
To realize speaker-free speech recognition with DSP, a 2-channel recording system was developed and used to set up a speech library. The filters, including a high-pass filter, an LMS adaptive filter and a combination of these two, were adopted to separate a speakers voice from a noisy background. The noise cancellation effectiveness of these filters was evaluated. It is worth noting that 1/f noise is shown to be a very important factor for the effectiveness of an adaptive filter. After the prior cancellation of 1/f noise at frequencies lower than 1 Hz in this study, the output signal of an adaptive filter can be improved significantly. This is of value not only for speech processing systems, but also for other adaptive filter systems.
Science China-technological Sciences | 2001
Zhou Zhi; Du Limin; Xu Yanjun
The perception of human languages is inherently a multi-modal process, in which audio information can be compensated by visual information to improve the recognition performance. Such a phenomenon in English, German, Spanish and so on has been researched, but in Chinese it has not been reported yet. In our experiment, 14 syllables (/ba, bi, bian, biao, bin, de, di, dian, duo, dong, gai, gan, gen, gu/), extracted from Chinese audiovisual bimodal speech database CAVSR-1.0, were pronounced by 10 subjects. The audio-only stimuli, audiovisual stimuli, and visual-only stimuli were recognized by 20 observers. The audio-only stimuli and audiovisual stimuli both were presented under 5 conditions: no noise, SNR 0 dB, −8 dB, −12 dB, and −16dB. The experimental result is studied and the following conclusions for Chinese speech are reached. Human beings can recognize visual-only stimuli rather well. The place of articulation determines the visual distinction. In noisy environment, audio information can remarkably be compensated by visual information and as a result the recognition performance is greatly improved.
international conference on signal processing | 1998
Li Guoqiang; Du Limin; Hou Ziqiang
We integrate regressive predication techniques into on-line speaker adaptive learning. Based on the perceptual theory of Chinese speech sounds, we build a criterion for avoiding collection to speed up the adaptation process. Moreover, the use of multinomial regression improves the predictive precision further.
international conference on signal processing | 1998
Xu Yanjun; Du Limin; Hou Ziqiang; Jin Guichang
Phase-based stereo matching algorithms are the state-of-the-art stereo techniques, which have the advantage of high accuracy, dense disparity map and parallel computing structure. As to the usual phase wraparound problem in the phase-based method, one way is to adopt a coarse-to-fine strategy, which is efficient but not robust; another is to exploit a scale-adaptive strategy, which is robust but inefficient. In this paper we put forward a grouped scale-adaptive strategy, which integrates the advantages of the previous two strategies into a mixture version. Computer simulation shows that this stereo matching algorithm can obtain accurate disparity estimation robustly and efficiently.