Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Lianhong Cai is active.

Publication


Featured researches published by Lianhong Cai.


international conference on multimedia and expo | 2002

Music type classification by spectral contrast feature

Dan-ning Jiang; Lie Lu; Hong-Jiang Zhang; Jianhua Tao; Lianhong Cai

Automatic music type classification is very helpful for the management of digital music databases. In this paper, the octave-based spectral contrast feature is proposed to represent the spectral characteristics of a music clip. It represented the relative spectral distribution instead of average spectral envelope. Experiments show that the octave-based spectral contrast feature performs well in music type classification. Another comparison experiment demonstrates that the octave-based spectral contrast feature has a better discrimination among different music types than mel-frequency cepstral coefficients (MFCC), which is often used in previous music type classification systems.


international conference on multimedia and expo | 2003

Highlight sound effects detection in audio stream

Rui Cai; Lie Lu; Hong-Jiang Zhang; Lianhong Cai

This paper addresses the problem of highlight sound effects detection in audio stream, which is very useful in fields of video summarization and highlight extraction. Unlike researches on audio segmentation and classification, in this domain, it just locates those highlight sound effects in audio stream. An extensible framework is proposed and in current system three sound effects are considered: laughter, applause and cheer, which are tied up with highlight events in entertainments, sports, meetings and home videos. HMMs are used to model these sound effects and a log-likelihood scores based method is used to make final decision. A sound effect attention model is also proposed to extend general audio attention model for highlight extraction and video summarization. Evaluations on a 2-hours audio database showed very encouraging results.


IEEE Transactions on Audio, Speech, and Language Processing | 2006

A flexible framework for key audio effects detection and auditory context inference

Rui Cai; Lie Lu; Alan Hanjalic; Hong-Jiang Zhang; Lianhong Cai

Key audio effects are those special effects that play critical roles in humans perception of an auditory context in audiovisual materials. Based on key audio effects, high-level semantic inference can be carried out to facilitate various content-based analysis applications, such as highlight extraction and video summarization. In this paper, a flexible framework is proposed for key audio effect detection in a continuous audio stream, as well as for the semantic inference of an auditory context. In the proposed framework, key audio effects and the background sounds are comprehensively modeled with hidden Markov models, and a Grammar Network is proposed to connect various models to fully explore the transitions among them. Moreover, a set of new spectral features are employed to improve the representation of each audio effect and the discrimination among various effects. The framework is convenient to add or remove target audio effects in various applications. Based on the obtained key effect sequence, a Bayesian network-based approach is proposed to further discover the high-level semantics of an auditory context by integrating prior knowledge and statistical learning. Evaluations on 12 h of audio data indicate that the proposed framework can achieve satisfying results, both on key audio effect detection and auditory context inference.


international conference on multimedia and expo | 2004

Speech emotion classification with the combination of statistic features and temporal features

Dan-ning Jiang; Lianhong Cai

For classifying speech emotion, most previous systems used either statistical features or temporal features exclusively. However, these two distinct feature representations appear to be concerned with different aspects of emotion, and should be combined in the task. This work proposes a classification scheme that enables the combination of them both. In the scheme, GMM and HMM are first performed to model the statistical features and temporal features respectively. Then the GMM likelihoods and HMM likelihoods are used as features in a further procedure. Finally, a weighted Bayesian classifier and MLP are applied to accomplish the classification. Experiments on a Chinese speech corpus have demonstrated that the scheme could improve the classification accuracy greatly. More detailed analysis indicated that these two feature representations could compensate each other efficiently in the classification.


international conference on image processing | 2013

Interpretable aesthetic features for affective image classification

Xiaohui Wang; Jia Jia; Jiaming Yin; Lianhong Cai

Images can not only display contents themselves, but also convey emotions, e.g., excitement, sadness. Affective image classification is useful and hot in many fields such as computer vision and multimedia. Current researches usually consider the relationship model between images and emotions as a black box. They extract the traditional discursive visual features such as SIFT and wavelet textures, and use them directly upon various classification algorithms. However, these visual features are not interpretable, and people cannot know why such a set of features induce a particular emotion. And due to the highly subjective nature of images, the classification accuracies on these visual features are not satisfactory for a long time. We propose the interpretable aesthetic features to describe images inspired by art theories, which are intuitive, discriminative and easily understandable. Affective image classification based on these features can achieve higher accuracy, compared with the state-of-the-art. Specifically, the features can also intuitively explain why an image tends to convey a certain emotion. We also develop an emotion guided image gallery to demonstrate the proposed feature collection.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Emotional Audio-Visual Speech Synthesis Based on PAD

Jia Jia; Shen Zhang; Fanbo Meng; Yongxin Wang; Lianhong Cai

Audio-visual speech synthesis is the core function for realizing face-to-face human-computer communication. While considerable efforts have been made to enable talking with computer like people, how to integrate the emotional expressions into the audio-visual speech synthesis remains largely a problem. In this paper, we adopt the notion of Pleasure-Displeasure, Arousal-Nonarousal, and Dominance-Submissiveness (PAD) 3-D-emotional space, in which emotions can be described and quantified from three different dimensions. Based on this new definition, we propose a unified model for emotional speech conversion using Boosting-Gaussian mixture model (GMM), as well as a facial expression synthesis model. We further present an emotional audio-visual speech synthesis approach. Specifically, we take the text and the target PAD values as input, and employ the text-to-speech (TTS) engine to first generate the neutral speeches. Then the Boosting-GMM is used to convert the neutral speeches to emotional speeches, and the facial expression is synthesized simultaneously. Finally, the acoustic features of the emotional speech are used to modulate the facial expression in the audio-visual speech. We designed three objective and five subjective experiments to evaluate the performance of each model and the overall approach. Our experimental results on audio-visual emotional speech datasets show that the proposed approach can effectively and efficiently synthesize natural and expressive emotional audio-visual speeches. Analysis on the results also unveil that the mutually reinforcing relationship indeed exists between audio and video information.


international conference on biometrics | 2006

Multi-level fusion of audio and visual features for speaker identification

Zhiyong Wu; Lianhong Cai; Helen M. Meng

This paper explores the fusion of audio and visual evidences through a multi-level hybrid fusion architecture based on dynamic Bayesian network (DBN), which combines model level and decision level fusion to achieve higher performance. In model level fusion, a new audio-visual correlative model (AVCM) based on DBN is proposed, which describes both the inter-correlations and loose timing synchronicity between the audio and video streams. The experiments on the CMU database and our own homegrown database both demonstrate that the methods can improve the accuracies of audio-visual bimodal speaker identification at all levels of acoustic signal-to-noise-ratios (SNR) from 0dB to 30dB with varying acoustic conditions.


The Visual Computer | 2013

Affective image adjustment with a single word

Xiaohui Wang; Jia Jia; Lianhong Cai

We present a complete system that automatically adjusts image color to meet a desired emotion. It will be more convenient for users, especially for non-professional users, to adjust an image with a semantic user input, for example, to make it lovelier. The whole algorithm is fully automatic, without any user interactions, and the inputs are simply the original image and an affective word (e.g. lovely). To achieve this goal, we solve several non-trivial problems. First, in order to find the proper color themes (template of colors) to reflect the expression of the affective word, we exploit the theoretical and empirical concepts in famous art theories and build a color theme—affective word relation model allowing efficient selection of candidate themes. Furthermore, we propose a novel strategy to select the most suitable color theme among the candidates. Second, to adjust image colors, we propose the Radial Basis Functions (RBF) based interpolation method, which is more effective in many scenarios as evidenced in experiments. We also evaluate the system with comprehensive user studies and its capability is confirmed by the results.


international conference on biometrics | 2007

A new approach to fake finger detection based on skin elasticity analysis

Jia Jia; Lianhong Cai; Kaifu Zhang; Dawei Chen

This work introduces a new approach to fake finger detection, based on the analysis of human skin elasticity. When a user puts a finger on the scanner surface, a sequence of fingerprint images which describes the finger deformation process is captured. Then two features which represent the skin elasticity are extracted from the image sequence: 1) the correlation coefficient of the fingerprint area and the signal intensity; 2) the standard deviation of the fingerprint area extension in x and y axes. Finally the Fisher Linear Discriminant is used to discriminate the finger skin from other materials such as gelatin. The experiments carried out on a dataset of real and fake fingers show that the proposed approach and features are effective in fake finger detection.


international conference on pattern recognition | 2004

Face pose estimation and its application in video shot selection

Zhiguang Yang; Haizhou Ai; Bo Wu; Shihong Lao; Lianhong Cai

In this paper, a face pose estimation method and its application in video shot selection for face image preprocessing is introduced. The pose estimator is learned by a boosting regression algorithm called SquareLev.R that learns poses from simple Haar-type features. It consists of two tree structured subsystems for the left-right angle and up-down angle respectively. As a specific application in video based face recognition, the best shot selection problem is discussed, which results in a real-time system that can automatically select the most frontal face from a video sequence.

Collaboration


Dive into the Lianhong Cai's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Helen M. Meng

The Chinese University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge