Is this you? Create Your Porfile

Xiaoyue Jiang

Northwestern Polytechnical University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiaoyue Jiang is active.

Explore More

Publication

Featured researches published by Xiaoyue Jiang.

european conference on computer vision | 2010

Correlation-based intrinsic image extraction from a single image

Xiaoyue Jiang; Andrew J. Schofield; Jeremy L. Wyatt

Intrinsic images represent the underlying properties of a scene such as illumination (shading) and surface reflectance. Extracting intrinsic images is a challenging, ill-posed problem. Human performance on tasks such as shadow detection and shape-from-shading is improved by adding colour and texture to surfaces. In particular, when a surface is painted with a textured pattern, correlations between local mean luminance and local luminance amplitude promote the interpretation of luminance variations as illumination changes. Based on this finding, we propose a novel feature, local luminance amplitude, to separate illumination and reflectance, and a framework to integrate this cue with hue and texture to extract intrinsic images. The algorithm uses steerable filters to separate images into frequency and orientation components and constructs shading and reflectance images from weighted combinations of these components. Weights are determined by correlations between corresponding variations in local luminance, local amplitude, colour and texture. The intrinsic images are further refined by ensuring the consistency of local texture elements. We test this method on surfaces photographed under different lighting conditions. The effectiveness of the algorithm is demonstrated by the correlation between our intrinsic images and ground truth shading and reflectance data. Luminance amplitude was found to be a useful cue. Results are also presented for natural images.

Journal of Vision | 2010

What is second-order vision for? Discriminating illumination versus material changes

Andrew J. Schofield; Paul B. Rock; Peng Sun; Xiaoyue Jiang; Mark A. Georgeson

The human visual system is sensitive to second-order modulations of the local contrast (CM) or amplitude (AM) of a carrier signal. Second-order cues are detected independently of first-order luminance signals; however, it is not clear why vision should benefit from second-order sensitivity. Analysis of the first- and second-order contents of natural images suggests that these cues tend to occur together, but their phase relationship varies. We have shown that in-phase combinations of LM and AM are perceived as a shaded corrugated surface whereas the anti-phase combination can be seen as corrugated when presented alone or as a flat material change when presented in a plaid containing the in-phase cue. We now extend these findings using new stimulus types and a novel haptic matching task. We also introduce a computational model based on initially separate first- and second-order channels that are combined within orientation and subsequently across orientation to produce a shading signal. Contrast gain control allows the LM + AM cue to suppress responses to the LM - AM when presented in a plaid. Thus, the model sees LM - AM as flat in these circumstances. We conclude that second-order vision plays a key role in disambiguating the origin of luminance changes within an image.

international conference on natural computation | 2005

A novel immune quantum-inspired genetic algorithm

Ying Li; Yanning Zhang; Yinglei Cheng; Xiaoyue Jiang; Rongchun Zhao

A new algorithm, the immune quantum-inspired genetic algorithm (IQGA), is proposed by introducing immune concepts and methods into quantum-inspired genetic algorithm (QGA). In application to the knapsack problem, which is a well-known combinatorial optimization problem, the proposed algorithm performs better than the conventional GA (CGA), the immune GA (IGA) and QGA.

british machine vision conference | 2011

Shadow Detection based on Colour Segmentation and Estimated Illumination.

Xiaoyue Jiang; Andrew J. Schofield; Jeremy L. Wyatt

In this paper we show how to improve the detection of shadows in natural scenes using a novel combination of colour and illumination features. Detecting shadows is useful because they provide information about both light sources and the shapes of objects thereby illuminated. Recent shadow detection methods use supervised machine learning techniques with input from colour and texture features extracted directly from the original images (e.g. Lalonde et al. ECCV 2010, Zhu et al. CVPR 2010). It seems sensible to augment these with estimates of scene illumination, as can be obtained with an intrinsic image extraction algorithm. Intrinsic image extraction separates the illumination and reflectance components in a scene, and the resulting illumination maps contain robust intensity change features at shadow boundaries. In this paper, we make two main contributions. First we improve upon existing methods for extracting illumination maps. Second we show how to use these illumination maps together with colour segmentation to extend the Lalonde’s approach to shadow detection. Illumination maps are extracted using a steerable filter framework based on global and local correlations in low and high frequency bands respectively. The illumination and colour features so extracted are then input to a decision tree trained to detect shadow edges using AdaBoost. We tested variations of our proposed approach on two public databases of natural scenes. This study showed that our approach improves on that of Lalonde both in terms of sensitivity to shadow edges and rejection of false positives. Following Lalonde we show that our detection results are further improved by imposing an edge continuity constraint via a conditional random field (CRF) model.

european conference on computer vision | 2008

Learning from Real Images to Model Lighting Variations for Face Images

Xiaoyue Jiang; Yuk On Kong; Jianguo Huang; Rongchun Zhao; Yanning Zhang

For robust face recognition, the problem of lighting variation is considered as one of the greatest challenges. Since the nine points of light (9PL) subspace is an appropriate low-dimensional approximation to the illumination cone, it yielded good face recognition results under a wide range of difficult lighting conditions. However building the 9PL subspace for a subject requires 9 gallery images under specific lighting conditions, which are not always possible in practice. Instead, we propose a statistical model for performing face recognition under variable illumination. Through this model, the nine basis images of a face can be recovered via maximum-a-posteriori (MAP) estimation with only one gallery image of that face. Furthermore, the training procedure requires only some real images and avoids tedious processing like SVD decomposition or the use of geometric (3D) or albedo information of a surface. With the recovered nine dimensional lighting subspace, recognition experiments were performed extensively on three publicly available databases which include images under single and multiple distant point light sources. Our approach yields better results than current ones. Even under extreme lighting conditions, the estimated subspace can still represent lighting variation well. The recovered subspace retains the main characteristics of 9PL subspace. Thus, the proposed algorithm can be applied to recognition under variable lighting conditions.

international conference on signal processing | 2004

An optimal algorithm of multisensor image fusion based on wavelet transform

Yinglei Cheng; Rongchun Zhao; Bing Wang; Xiaoyue Jiang

A wavelet-based optimal algorithm for image transform is widely used for image fusion. Although the fusion is proposed to solve the problem of keeping spatial and spectrum information at the same time. Multispectral image of low frequency and high resolution image of high frequency are enhanced through wavelet transform respectively beforehand. Then, they are fused. The proposed method optimizes the traditional wavelet transform fusion techniques. The experimental results of merging images data of different SAR and TM of the same object are presented. It is shown that this approach reduced blocking effects while held actively spectrum information.

asian conference on computer vision | 2009

Perception-Based lighting adjustment of image sequences

Xiaoyue Jiang; Ping Fan; Ilse Ravyse; Hichem Sahli; Jianguo Huang; Rongchun Zhao; Yanning Zhang

In this paper, we propose a 2-step algorithm to reduce the lighting influences between frames in an image sequence First, the lighting parameters of a perceptual lighting model are initialized using an entropy measure Then the difference between two successive frames is used as a cost function for further optimization the above lighting parameters By applying the proposed lighting model optimization on an image sequence, the neighboring frames become similar in brightness and contrast while features are enhanced The effectiveness of the proposed approach is illustrated on the detection and tracking of facial features.

Archive | 2007

Audio Visual Speech Recognition and Segmentation Based on DBN Models

Dongmei Jiang; Guoyun Lv; Ilse Ravyse; Xiaoyue Jiang; Yanning Zhang; Hichem Sahli; Rongchun Zhao

Automatic speech recognition is of great importance in human-machine interfaces. Despite extensive effort over decades, acoustic-based recognition systems remain too inaccurate for the vast majority of real applications, especially those in noisy environments, e.g. crowed environment. The use of visual features in audio-visual speech recognition is motivated by the speech formation mechanism and the natural speech ability of humans to reduce audio ambiguities using visual cues. Moreover, the visual information provides complementary cues that cannot be corrupted by the acoustic noise of the environment. However, problems such as the selection of the optimal set of visual features, and the optimal models for audio-visual integration remain challenging research topics. In recent years, the most common model fusion methods for audio visual speech recognition are Multi-stream Hidden Markov Models (MSHMMs) such as product HMM and coupled HMM. In these models, audio and visual features are imported to two or more parallel HMMs with different topology structures. These MSHMMs describe the correlation of audio and visual speech to some extent, and allow asynchrony within speech units. Compared with the single stream HMM, system performance is improved especially in noisy speech environment. But at the same time, problems remain due to the inherent limitation of the HMM structure, that is, on some nodes, such as phones, syllables or words, constraints are imposed to limit the asynchrony between audio stream and visual stream to phone (or syllable, word) level. Since for large vocabulary continuous speech recognition task, phones are the basic modeling units, audio stream and visual stream are forced to be synchronized at the timing boundaries of phones, which is not coherent with the fact that the visual activity often precedes the audio signal even by 120 ms.Besides the audio visual speech recognition to improve the word recognition rate in noisy environments, the task of audio visual speech units (such as phones or visemes) segmentation also requires a more reasonable speech model which describes the inherent correlation and asynchrony of audio and visual speech.

asian conference on computer vision | 2006

Perception based lighting balance for face detection

Xiaoyue Jiang; Pei Sun; Rong Xiao; Rongchun Zhao

For robust face detection, lighting is considered as one of the greatest challenges. The three-step face detection framework provides a practical method for real-time face detection. In this framework, the last step can employ computation extensive method to remove the false alarm and usually some de-lighting methods are done. It is complex to model the lighting variance precisely. The usually used simplified lighting model fails under non-uniform lighting conditions for the reason that it cannot account for the cast shadow, shading, and highlight, which are the main variances caused by non-uniform lighting. According to the adaptation capacity of the human vision system, we propose a perception based mapping method (PMM) to balance the influence of non-uniform lighting. Experimental results indicate that with PMM as the lighting-filter the false positives caused by lighting variance can be removed more accurately in the face detection tasks. PMM shows its outstanding performance especially under the extreme lighting conditions.

advances in multimedia | 2006

DBN based models for audio-visual speech analysis and recognition

Ilse Ravyse; Dongmei Jiang; Xiaoyue Jiang; Guoyun Lv; Yunshu Hou; Hichem Sahli; Rongchun Zhao

We present an audio-visual automatic speech recognition system, which significantly improves speech recognition performance over a wide range of acoustic noise levels, as well as under clean audio conditions. The system consists of three components: (i) a visual module, (ii) an acoustic module, and (iii) a Dynamic Bayesian Network-based recognition module. The vision module, locates and tracks the speaker head, and mouth movements and extracts relevant speech features represented by contour information and 3D deformations of lip movements. The acoustic module extracts noise-robust features, i.e. the Mel Filterbank Cepstrum Coefficients (MFCCs). Finally we propose two models based on Dynamic Bayesian Networks (DBN) to either consider the single audio and video streams or to integrate the features from the audio and visual streams. We also compare the proposed DBN based system with classical Hidden Markov Model. The novelty of the developed framework is the persistence of the audiovisual speech signal characteristics from the extraction step, through the learning step. Experiments on continuous audiovisual speech show that the segmentation boundaries of phones in the audio stream and visemes in the video stream are close to manual segmentation boundaries.

Explore More