Luhong Liang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Luhong Liang is active.

Explore More

Publication

Featured researches published by Luhong Liang.

EURASIP Journal on Advances in Signal Processing | 2002

Dynamic Bayesian networks for audio-visual speech recognition

Ara V. Nefian; Luhong Liang; Xiaobo Pi; Xiaoxing Liu; Kevin P. Murphy

The use of visual features in audio-visual speech recognition (AVSR) is justified by both the speech generation mechanism, which is essentially bimodal in audio and visual representation, and by the need for features that are invariant to acoustic noise perturbation. As a result, current AVSR systems demonstrate significant accuracy improvements in environments affected by acoustic noise. In this paper, we describe the use of two statistical models for audio-visual integration, the coupled HMM (CHMM) and the factorial HMM (FHMM), and compare the performance of these models with the existing models used in speaker dependent audio-visual isolated word recognition. The statistical properties of both the CHMM and FHMM allow to model the state asynchrony of the audio and visual observation sequences while preserving their natural correlation over time. In our experiments, the CHMM performs best overall, outperforming all the existing models and the FHMM.

international conference on multimedia and expo | 2003

A detector tree of boosted classifiers for real-time object detection and tracking

Rainer Lienhart; Luhong Liang; Alexander Kuranov

This paper presents a novel tree classifier for complex object detection tasks together with a general framework for real-time object tracking in videos using the novel tree classifier. A boosted training algorithm with a clustering-and-splitting step is employed to construct branches in the nodes recursively, if and only if it improves the discriminative power compared to a single monolithic node classifier and has a lower computational complexity. A mouth tracking system that integrates the tree classifier under the proposed framework is built and tested on XM2FDB database. Experimental results show that the detection accuracy is equal or better than a single or multiple cascade classifier, while being computational less demanding.

international conference on acoustics, speech, and signal processing | 2002

A coupled HMM for audio-visual speech recognition

Ara V. Nefian; Luhong Liang; Xiaobo Pi; Liu Xiaoxiang; Crusoe Mao; Kevin P. Murphy

In recent years several speech recognition systems that use visual together with audio information showed significant increase in performance over the standard speech recognition systems. The use of visual features is justified by both the bimodality of the speech generation and by the need of features that are invariant to acoustic noise perturbation. The audio-visual speech recognition system presented in this paper introduces a novel audio-visual fusion technique that uses a coupled hidden Markov model (HMM). The statistical properties of the coupled-HMM allow us to model the state asynchrony of the audio and visual observations sequences while still preserving their natural correlation over time. The experimental results show that the coupled HMM outperforms the multistream HMM in audio visual speech recognition.

international conference on multimedia and expo | 2002

Speaker independent audio-visual continuous speech recognition

Luhong Liang; Xiaoxing Liu; Yibao Zhao; Xiaobo Pi; Ara V. Nefian

The increase in the number of multimedia applications that require robust speech recognition systems determined a large interest in the study of audio-visual speech recognition (AVSR) systems. The use of visual features in AVSR is justified by both the audio and visual modality of the speech generation and the need for features that are invariant to acoustic noise perturbation. The speaker independent audio-visual continuous speech recognition system presented relies on a robust set of visual features obtained from the accurate detection and tracking of the mouth region. Further, the visual and acoustic observation sequences are integrated using a coupled hidden Markov (CHMM) model. The statistical properties of the CHMM can model the audio and visual state asynchrony while preserving their natural correlation over time. The experimental results show that the current system tested on the XM2VTS database reduces by over 55% the error rate of the audio only speech recognition system at SNR of 0 dB.

computer vision and pattern recognition | 2009

Fast car detection using image strip features

Wei Zheng; Luhong Liang

This paper presents a fast method for detecting multi-view cars in real-world scenes. Cars are artificial objects with various appearance changes, but they have relatively consistent characteristics in structure that consist of some basic local elements. Inspired by this, we propose a novel set of image strip features to describe the appearances of those elements. The new features represent various types of lines and arcs with edge-like and ridge-like strip patterns, which significantly enrich the simple features such as haar-like features and edgelet features. They can also be calculated efficiently using the integral image. Moreover, we develop a new complexity-aware criterion for RealBoost algorithm to balance the discriminative capability and efficiency of the selected features. The experimental results on widely used single view and multi-view car datasets show that our approach is fast and has good performance.

Signal Processing-image Communication | 2010

No-reference perceptual image quality metric using gradient profiles for JPEG2000

Luhong Liang; Shiqi Wang; Jianhua Chen; Siwei Ma; Debin Zhao; Wen Gao

No-reference measurement of perceptual image quality is a crucial and challenging issue in modern image processing applications. One of the major difficulties is that some inherent features of natural images and artifacts are possibly rather ambiguous. In this paper, we tackle this problem using statistical information on image gradient profiles and propose a novel quality metric for JPEG2000 images. The key part of the metric is a histogram representing the sharpness distribution of the gradient profiles, from which a blur metric that is insensitive to inherently blurred structures in the natural image is established. Then a ringing metric is built based on ringing visibilities of regions associated with the gradient profiles. Finally, a combination model optimized through plenty of experiments is developed to predict the perceived image quality. The proposed metric achieves performance competitive with the state-of-the-art no-reference metrics on public datasets and is robust to various image contents.

international conference on image processing | 2001

Face detection based on template matching and support vector machines

Haizhou Ai; Luhong Liang; Guangyou Xu

A face detection algorithm integrating template matching and support vector machines (SVM) is presented. Two types of templates: eyes-in-whole and face itself, are used for coarse filtering, and the SVM classifier is used for classification. A bootstrap method is used to collect non-face samples for SVM training under a template matching constrained subspace, which greatly reduces the complexity of training the SVM. Comparative experimental results demonstrate its effectiveness.

visual communications and image processing | 2010

An efficient coding scheme for surveillance videos captured by stationary cameras

Xianguo Zhang; Luhong Liang; Qian Huang; Yazhou Liu; Tiejun Huang; Wen Gao

In this paper, a new scheme is presented to improve the coding efficiency of sequences captured by stationary cameras (or namely, static cameras) for video surveillance applications. We introduce two novel kinds of frames (namely background frame and difference frame) for input frames to represent the foreground/background without object detection, tracking or segmentation. The background frame is built using a background modeling procedure and periodically updated while encoding. The difference frame is calculated using the input frame and the background frame. A sequence structure is proposed to generate high quality background frames and efficiently code difference frames without delay, and then surveillance videos can be easily compressed by encoding the background frames and difference frames in a traditional manner. In practice, the H.264/AVC encoder JM 16.0 is employed as a build-in coding module to encode those frames. Experimental results on eight in-door and out-door surveillance videos show that the proposed scheme achieves 0.12 dB~1.53 dB gain in PSNR over the JM 16.0 anchor specially configured for surveillance videos.

Lecture Notes in Computer Science | 2003

A Bayesian approach to audio-visual speaker identification

Ara V. Nefian; Luhong Liang; Tieyan Fu; Xiao Xing Liu

In this paper we describe a text dependent audio-visual speaker identification approach that combines face recognition and audio-visual speech-based identification systems. The temporal sequence of audio and visual observations obtained from the acoustic speech and the shape of the mouth are modeled using a set of coupled hidden Markov models (CHMM), one for each phoneme-viseme pair and for each person in the database. The use of CHMM in our system is justified by the capability of this model to describe the natural audio and visual state asynchrony as well as their conditional dependence over time. Next, the likelihood obtained for each person in the database is combined with the face recognition likelihood obtained using an embedded hidden Markov model (EHMM). Experimental results on XM2VTS database show that our system improves the accuracy of the audio-only or video-only speaker identification at all levels of acoustic signal-to-noise ratio (SNR) from 5 to 30db.

computer vision and pattern recognition | 2011

A unified framework for locating and recognizing human actions

Yuelei Xie; Hong Chang; Zhe Li; Luhong Liang; Xilin Chen; Debin Zhao

In this paper, we present a pose based approach for locating and recognizing human actions in videos. In our method, human poses are detected and represented based on deformable part model. To our knowledge, this is the first work on exploring the effectiveness of deformable part models in combining human detection and pose estimation into action recognition. Comparing with previous methods, ours have three main advantages. First, our method does not rely on any assumption on video preprocessing quality, such as satisfactory foreground segmentation or reliable tracking; Second, we propose a novel compact representation for human pose which works together with human detection and can well represent the spatial and temporal structures inside an action; Third, with human detection taken into consideration in our framework, our method has the ability to locate and recognize multiple actions in the same scene. Experiments on benchmark datasets and recorded cluttered videos verified the efficacy of our method.

Explore More