Hung-Yan Gu
National Taiwan University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hung-Yan Gu.
中文計算語言學期刊 | 2009
Hung-Yan Gu; Sung-Feng Tsai
Approximating a spectral envelope via regularized discrete cepstrum coefficients has been proposed by previous researchers. In this paper, we study two problems encountered in practice when adopting this approach to estimate the spectral envelope. The first is which spectral peaks should be selected, and the second is which frequency axis scaling function should be adopted. After some efforts of trying and experiments, we propose two feasible solution methods for these two problems. Then, we combine these solution methods with the methods for regularizing and computing discrete cepstrum coefficients to form a spectral-envelope estimation scheme. This scheme has been verified, by measuring spectral-envelope approximation error, as being much better than the original scheme. Furthermore, we have applied this scheme to building a system for voice timbre transformation. The performance of this system demonstrates the effectiveness of the proposed spectral-envelope estimation scheme.
Robotics and Autonomous Systems | 2011
Chyi-Yeu Lin; Li-Chieh Cheng; Chang-Kuo Tseng; Hung-Yan Gu; Kuo-Liang Chung; Chin-Shyurng Fahn; Kai-Jay Lu; Chih-Cheng Chang
This research is aimed to devise an anthropomorphic robotic head with a human-like face and a sheet of artificial skin that can read a randomly composed simplified musical notation and sing the corresponding content of the song once. The face robot is composed of an artificial facial skin that can express a number of facial expressions via motions driven by internal servo motors. Two cameras, each of them installed inside each eyeball of the face, provide vision capability for reading simplified musical notations. Computer vision techniques are subsequently used to interpret simplified musical notations and lyrics of their corresponding songs. Voice synthesis techniques are implemented to enable the face robot to sing songs by enunciating synthesized sounds. Mouth patterns of the face robot will be automatically changed to match the emotions corresponding to the lyrics of the songs. The experiments show that the face robot can successfully read and then accurately sing a song which is assigned discriminately.
International Journal of Advanced Robotic Systems | 2013
Chyi-Yeu Lin; Li-Chieh Cheng; Chun-Chia Huang; Li-Wen Chuang; Wei-Chung Teng; Chung-Hsien Kuo; Hung-Yan Gu; Kuo-Liang Chung; Chin-Shyurng Fahn
The purpose of this research is to develop multi-talented humanoid robots, based on technologies featuring high-computing and control abilities, to perform onstage. It has been a worldwide trend in the last decade to apply robot technologies in theatrical performance. The more robot performers resemble human beings, the easier it becomes for the emotions of audiences to be bonded with robotic performances. Although all kinds of robots can be theatrical performers based on programs, humanoid robots are more advantageous for playing a wider range of characters because of their resemblance to human beings. Thus, developing theatrical humanoid robots is becoming very important in the field of the robot theatre. However, theatrical humanoid robots need to possess the same versatile abilities as their human counterparts, instead of merely posing or performing motion demonstrations onstage, otherwise audiences will easily become bored. The four theatrical robots developed for this research have successfully performed in a public performance and participated in five programs. All of them were approved by most audiences.
international conference on electrical engineering/electronics, computer, telecommunications and information technology | 2009
Hung-Yan Gu; Chang-Yi Wu
In this paper, an ANN based spectrum-progression model (SPM) is proposed. This model is intended to improve the fluency level of synthetic Mandarin speech under the situation that only a small training corpus is available. In constructing this model, first each target syllable is matched with its reference syllable by using DTW. Then, each warped path, i.e. spectrum-progression path, is time normalized to fixed dimensions, and used to train an ANN based SPM. After training, the SPM is used together with other modules such as text analysis, prosody parameter generation, and signal sample generation to synthesize Mandarin speech. Then, the synthetic speech is used to conduct perception tests. The test results show that the SPM proposed here can indeed improve the fluency level noticeably.
international congress on image and signal processing | 2011
Hung-Yan Gu; Sung-Fung Tsai
In this paper, the idea of segmental GMMs is proposed for voice conversion. Also, to apply this idea to on-line voice conversion, we have developed an automatic GMM selection algorithm based on dynamic programming. In addition, to map a vector of DCC (discrete cepstrum coefficients) with only one Gaussian mixture, we have designed a mixture selection algorithm. For evaluating the performance of the idea, segmental GMMs, three voice conversion system are constructed and used to conduct listening tests. The results of the listening tests show that segmental GMMs proposed here can indeed help to improve the performances in both timbre similarity and voice quality.
international symposium on chinese spoken language processing | 2004
Hung-Yan Gu; Kuo-Hsian Wang
In synthetic Mandarin speech, discontinuity of formant traces at syllable boundaries is a key factor that lowers the fluency level. Therefore, we study an acoustic and articulatory knowledge integrated method to solve this discontinuity problem. First, representative trisyllable contexts are selected and their signals are recorded. The signal of the middle syllable of each trisyllable pronunciation is then extracted to make a synthesis unit. To select a synthesis unit among multiple candidates, a distance function is defined to measure the spectral similarity between two synthesis units to be concatenated. In addition, several linking-restriction rules are derived, according to articulatory knowledge, to prevent some synthesis units being linked into a sequence. Then, a globally best synthesis-unit sequence is searched by using a dynamic programming based algorithm. When this method is applied, the formant traces at syllable boundaries become smoother. Also, subjective evaluation shows that the fluency level of synthetic Mandarin speech can indeed be improved a lot.
international conference on machine learning and cybernetics | 2008
Hung-Yan Gu; Zheng-Fu Lin
In this paper, the vibrato parameters of sung syllables are analyzed by using short-time Fourier transform and the method of analytic signal. After the vibrato parameter values for all training syllables are obtained, they are used to train an artificial neural network (ANN) for each type of vibrato parameter. Then, these ANN models are used to generate the values of vibrato parameters. Next, these parameter values and other music information are used together to control a harmonic-plus-noise (HNM) model to synthesize singing voice signals. With the synthetic singing voice, subjective perception tests are conducted. The result show that the singing voice synthesized with the ANN generated vibrato parameters is apparently more natural than the singing voice synthesized with fixed vibrato parameters.
advanced robotics and its social impacts | 2007
Wei-chen Lee; Hung-Yan Gu; Kuo-Liang Chung; Chyi-Yeu Lin; Chin-Shyurng Fahn; Yah-Syun Lai; Chih-Cheng Chang; Chia-Lun Tsai; Kai-Jay Lu; Huang-Liang Liau; Mao-Kuo Hsu
The objective of this research is to construct a two-wheeled robot that can autonomously read music and sing songs with vocal voice. A musical notation editor was created so that users can easily create a hardcopy of a song and show it to the robot to have more interaction with the robot. The robot can read the music by its vision system and musical notation recognition program, and then sing the song by its voice synthesis system autonomously. The experiment results showed that the accuracy of singing the Mandarin song, a little bee, is about 95% in average, which demonstrated that the mobile robot is promising in entertainment application.
international workshop on cellular neural networks and their applications | 2005
Hung-Yan Gu; Hai-Ching Tsai
In this paper, a pitch-contour model adaptation method is developed for integrated synthesis of Mandarin, Min-Nan, and Hakka speech. The goal is to reduce the repeated efforts of constructing an independent model for each language. In practice, our approach is to construct just one working pitch-contour model from Min-Nan languages training sentences. Then, this model is adapted to generate pitch-contours for the other two languages. Since model adaptation is done in a way that model parameters are not modified, hence no training data from the other two languages are required. To test the proposed model adaptation method, we insert the developed program modules into a previously built speech synthesis system to synthesize speech of the three languages. Initial evaluation shows that the generated pitch-contours for Mandarin and Hakka are perceptually acceptable and the degradation in intonation presentation is very small. Therefore, our approach is feasible in practice.
Journal of Information Science and Engineering | 2014
Hung-Yan Gu; Zheng-Fu Lin
In this paper, the vibrato parameters of sung syllables are analyzed by using short-time Fourier transform and the method of analytic signal. After the vibrato parameter values for all training syllables are obtained, they are used to train an artificial neural network (ANN) for each type of vibrato parameter. Then, these ANN models are used to generate the values of vibrato parameters. Next, these parameter values and other music information are used together to control a harmonic-plus-noise (HNM) model to synthesize singing voice signals. With the synthetic singing voice, subjective perception tests are conducted. The result show that the singing voice synthesized with the ANN generated vibrato parameters is apparently more natural than the singing voice synthesized with fixed vibrato parameters.