Yimo Guo
University of Oulu
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yimo Guo.
Pattern Recognition | 2012
Yimo Guo; Guoying Zhao; Matti Pietikäinen
In this paper, a feature extraction method is developed for texture description. To obtain discriminative patterns, we present a learning framework which is formulated into a three-layered model. It can estimate the optimal pattern subset of interest by simultaneously considering the robustness, discriminative power and representation capability of features. This model is generalized and can be integrated with existing LBP variants such as conventional LBP, rotation invariant patterns, local patterns with anisotropic structure, completed local binary pattern (CLBP) and local ternary pattern (LTP) to derive new image features for texture classification. The derived descriptors are extensively compared with other widely used approaches and evaluated on two publicly available texture databases (Outex and CUReT) for texture classification, two medical image databases (Hela and Pap-smear) for protein cellular classification and disease classification, and a neonatal facial expression database (infant COPE database) for facial expression classification. Experimental results demonstrate that the obtained descriptors lead to state-of-the-art classification performance.
british machine vision conference | 2011
Yimo Guo; Guoying Zhao; Matti Pietikäinen
Texture classification can be concluded as the problem of classifying images according to textural cues, that is, categorizing a texture image obtained under certain illumination and viewpoint condition as belonging to one of the pre-learned texture classes. Therefore, it would mainly pass through two steps: image representation or description and classification. In this paper, we focus on the feature extraction part that aims to extract effective patterns to distinguish different textures. Among various feature extraction methods, local features have performed well in real-world applications, such as LBP[4], SIFT [2] and Histogram of Oriented Gradients (HOG) [1]. Representative methods also include grey level difference or co-occurrence statistics [10], and methods based on multi-channel filtering or wavelet decomposition [3, 5, 7]. To learn representative structural configuration from texture images, Varma et al. proposed texton methods based on the filter response space and local image patch space [8, 9]. We show in this paper the descriptor MiC that encodes image microscopic configuration by a linear configuration model. The final local configuration pattern (LCP) feature integrates both the microscopic features represented by optimal model parameters and local features represented by pattern occurrences. To be specific, microscopic features capture image microscopic configuration which embodies image configuration and pixel-wise interaction relationships by a linear model. The optimal model parameters are estimated by an efficient least squares estimator. To achieve rotation invariance, which is a desired property for texture features, Fourier transform is applied to the estimated parameter vectors. Finally, the transformed vectors are concatenated with local pattern occurrences to construct LCPs. As this framework is unsupervised, it could avoid the generalization problem suffered by other statistical learning methods. To model the image configuration with respect to each pattern, we estimate optimal weights, associating with intensities of neighboring pixels, to linearly reconstruct the central pixel intensity. This can be expressed by:
asian conference on computer vision | 2010
Yimo Guo; Guoying Zhao; Matti Pietikäinen; Zhengguang Xu
This paper proposes a novel method to deal with the representation issue in texture classification. A learning framework of image descriptor is designed based on the Fisher separation criteria (FSC) to learn most reliable and robust dominant pattern types considering intraclass similarity and inter-class distance. Image structures are thus be described by a new FSC-based learning (FBL) encoding method. Unlike previous handcraft-design encoding methods, such as the LBP and SIFT, supervised learning approach is used to learn an encoder from training samples. We find that such a learning technique can largely improve the discriminative ability and automatically achieve a good tradeoff between discriminative power and efficiency. The commonly used texture descriptor: local binary pattern (LBP) is taken as an example in the paper, so that we then proposed the FBL-LBP descriptor. We benchmark its performance by classifying textures present in the Outex_TC_0012 database for rotation invariant texture classification, KTH-TIPS2 database for material categorization and Columbia-Utrecht (CUReT) database for classification under different views and illuminations. The promising results verify its robustness to image rotation, illumination changes and noise. Furthermore, to validate the generalization to other problems, we extend the application also to face recognition and evaluate the proposed FBL descriptor on the FERET face database. The inspiring results show that this descriptor is highly discriminative.
european conference on computer vision | 2012
Yimo Guo; Guoying Zhao; Matti Pietikäinen
In this paper, we propose a new scheme to formulate the dynamic facial expression recognition problem as a longitudinal atlases construction and deformable groupwise image registration problem. The main contributions of this method include: 1) We model human facial feature changes during the facial expression process by a diffeomorphic image registration framework; 2) The subject-specific longitudinal change information of each facial expression is captured by building an expression growth model; 3) Longitudinal atlases of each facial expression are constructed by performing groupwise registration among all the corresponding expression image sequences of different subjects. The constructed atlases can reflect overall facial feature changes of each expression among the population, and can suppress the bias due to inter-personal variations. The proposed method was extensively evaluated on the Cohn-Kanade, MMI, and Oulu-CASIA VIS dynamic facial expression databases and was compared with several state-of-the-art facial expression recognition approaches. Experimental results demonstrate that our method consistently achieves the highest recognition accuracies among other methods under comparison on all the databases.
IEEE Transactions on Image Processing | 2016
Yimo Guo; Guoying Zhao; Matti Pietikäinen
In this paper, a new dynamic facial expression recognition method is proposed. Dynamic facial expression recognition is formulated as a longitudinal groupwise registration problem. The main contributions of this method lie in the following aspects: 1) subject-specific facial feature movements of different expressions are described by a diffeomorphic growth model; 2) salient longitudinal facial expression atlas is built for each expression by a sparse groupwise image registration method, which can describe the overall facial feature changes among the whole population and can suppress the bias due to large intersubject facial variations; and 3) both the image appearance information in spatial domain and topological evolution information in temporal domain are used to guide recognition by a sparse representation method. The proposed framework has been extensively evaluated on five databases for different applications: the extended Cohn-Kanade, MMI, FERA, and AFEW databases for dynamic facial expression recognition, and UNBC-McMaster database for spontaneous pain expression monitoring. This framework is also compared with several state-of-the-art dynamic facial expression recognition methods. The experimental results demonstrate that the recognition rates of the new method are consistently higher than other methods under comparison.
IEEE Transactions on Image Processing | 2013
Yimo Guo; Guoying Zhao; Ziheng Zhou; Matti Pietikäinen
Video texture synthesis is the process of providing a continuous and infinitely varying stream of frames, which plays an important role in computer vision and graphics. However, it still remains a challenging problem to generate high-quality synthesis results. Considering the two key factors that affect the synthesis performance, frame representation and blending artifacts, we improve the synthesis performance from two aspects: 1) Effective frame representation is designed to capture both the image appearance information in spatial domain and the longitudinal information in temporal domain. 2) Artifacts that degrade the synthesis quality are significantly suppressed on the basis of a diffeomorphic growth model. The proposed video texture synthesis approach has two major stages: video stitching stage and transition smoothing stage. In the first stage, a video texture synthesis model is proposed to generate an infinite video flow. To find similar frames for stitching video clips, we present a new spatial-temporal descriptor to provide an effective representation for different types of dynamic textures. In the second stage, a smoothing method is proposed to improve synthesis quality, especially in the aspect of temporal continuity. It aims to establish a diffeomorphic growth model to emulate local dynamics around stitched frames. The proposed approach is thoroughly tested on public databases and videos from the Internet, and is evaluated in both qualitative and quantitative ways.
IEEE Transactions on Circuits and Systems for Video Technology | 2012
Ziheng Zhou; Guoying Zhao; Yimo Guo; Matti Pietikäinen
An image-based visual speech animation system is presented in this paper. A video model is proposed to preserve the video dynamics of a talking face. The model represents a video sequence by a low-dimensional continuous curve embedded in a path graph and establishes a map from the curve to the image domain. When selecting video segments for synthesis, we loosen the traditional requirement of using triphone as the unit to allow segments to contain longer natural talking motion. Dense videos are sampled from the segments, concatenated, and downsampled to train a video model that enables efficient time alignment and motion smoothing for the final video synthesis. Different viseme definitions are used to investigate the impact of visemes on the video realism of the animated talking face. The system is built on a public database and tested both objectively and subjectively.
international conference on image processing | 2009
Yimo Guo; Guoying Zhao; Jie Chen; Matti Pietikäinen; Zhengguang Xu
Dynamic textures are image sequences with visual pattern repetition in time and space, such as smoke, flames, moving objects and so on. Dynamic texture synthesis is to provide a continuous and infinitely varying stream of images by doing operations on dynamic textures. Considering that the previous video texture method provides high-quality visual results, but its representation does not well explore the temporal correlation among frames, we develop a novel spatial temporal descriptor for frame description accompanied with a similarity measure on the basis of the video texture method. Compared with the previous one, our method considers both the spatial and temporal domains of video sequences in representation; moreover, combines the local and global description on each spatial-temporal plane. From experimental results, the proposed method achieves better performance in both the syntheses of natural scene and human motion. Especially, it has the characteristic to be robust to noise in remodeling videos into infinite time domain.
computer analysis of images and patterns | 2009
Yimo Guo; Guoying Zhao; Jie Chen; Matti Pietikäinen; Zhengguang Xu
A new local feature based image representation method is proposed. It is derived from the local Gabor phase difference pattern (LGPDP). This method represents images by exploiting relationships of Gabor phase between pixel and its neighbors. There are two main contributions: 1) a novel phase difference measure is defined; 2) new encoding rules to mirror Gabor phase difference information are designed. Because of them, this method describes Gabor phase difference more precisely than the conventional LGPDP. Moreover, it could discard useless information and redundancy produced near quadrant boundary, which commonly exist in LGPDP. It is shown that the proposed method brings higher discriminative ability to Gabor phase based pattern. Experiments are conducted on the FRGC version 2.0 and USTB Ear Database to evaluate its validity and generalizability. The proposed method is also compared with several state-of-the-art approaches. It is observed that our method achieves the highest recognition rates among them.
Eurasip Journal on Image and Video Processing | 2014
Rocio A. Lizarraga-Morales; Yimo Guo; Guoying Zhao; Matti Pietikäinen; Raúl Enrique Sánchez-Yáñez
In this paper, we study the use of local spatiotemporal patterns in a non-parametric dynamic texture synthesis method. Given a finite sample video of a texture in motion, dynamic texture synthesis may create a new video sequence, perceptually similar to the input, with an enlarged frame size and longer duration. In general, non-parametric techniques select and copy regions from the input sample to serve as building blocks by pasting them together one at a time onto the outcome. In order to minimize possible discontinuities between adjacent blocks, the proper representation and selection of such pieces become key issues. In previous synthesis methods, the block description has been based only on the intensities of pixels, ignoring the texture structure and dynamics. Furthermore, a seam optimization between neighboring blocks has been a fundamental step in order to avoid discontinuities. In our synthesis approach, we propose to use local spatiotemporal cues extracted with the local binary pattern from three orthogonal plane (LBP-TOP) operator, which allows us to include in the video characterization the appearance and motion of the dynamic texture. This improved representation leads us to a better fitting and matching between adjacent blocks, and therefore, the spatial similarity, temporal behavior, and continuity of the input can be successfully preserved. Moreover, the proposed method simplifies other approximations since no additional seam optimization is needed to get smooth transitions between video blocks. The experiments show that the use of the LBP-TOP representation outperforms other methods, without generating visible discontinuities or annoying artifacts. The results are evaluated using a double-stimulus continuous quality scale methodology, which is reproducible and objective. We also introduce results for the use of our method in video completion tasks. Additionally, we hereby present that the proposed technique is easily extendable to achieve the synthesis in both spatial and temporal domains.