Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mingli Song is active.

Publication


Featured researches published by Mingli Song.


IEEE Transactions on Image Processing | 2013

Discovering Discriminative Graphlets for Aerial Image Categories Recognition

Luming Zhang; Yahong Han; Yi Yang; Mingli Song; Shuicheng Yan; Qi Tian

Recognizing aerial image categories is useful for scene annotation and surveillance. Local features have been demonstrated to be robust to image transformations, including occlusions and clutters. However, the geometric property of an aerial image (i.e., the topology and relative displacement of local features), which is key to discriminating aerial image categories, cannot be effectively represented by state-of-the-art generic visual descriptors. To solve this problem, we propose a recognition model that mines graphlets from aerial images, where graphlets are small connected subgraphs reflecting both the geometric property and color/texture distribution of an aerial image. More specifically, each aerial image is decomposed into a set of basic components (e.g., road and playground) and a region adjacency graph (RAG) is accordingly constructed to model their spatial interactions. Aerial image categories recognition can subsequently be casted as RAG-to-RAG matching. Based on graph theory, RAG-to-RAG matching is conducted by comparing all their respective graphlets. Because the number of graphlets is huge, we derive a manifold embedding algorithm to measure different-sized graphlets, after which we select graphlets that have highly discriminative and low redundancy topologies. Through quantizing the selected graphlets from each aerial image into a feature vector, we use support vector machine to discriminate aerial image categories. Experimental results indicate that our method outperforms several state-of-the-art object/scene recognition models, and the visualized graphlets indicate that the discriminative patterns are discovered by our proposed approach.


IEEE Transactions on Image Processing | 2013

Probabilistic Graphlet Transfer for Photo Cropping

Luming Zhang; Mingli Song; Qi Zhao; Xiao Liu; Jiajun Bu; Chun Chen

As one of the most basic photo manipulation processes, photo cropping is widely used in the printing, graphic design, and photography industries. In this paper, we introduce graphlets (i.e., small connected subgraphs) to represent a photos aesthetic features, and propose a probabilistic model to transfer aesthetic features from the training photo onto the cropped photo. In particular, by segmenting each photo into a set of regions, we construct a region adjacency graph (RAG) to represent the global aesthetic feature of each photo. Graphlets are then extracted from the RAGs, and these graphlets capture the local aesthetic features of the photos. Finally, we cast photo cropping as a candidate-searching procedure on the basis of a probabilistic model, and infer the parameters of the cropped photos using Gibbs sampling. The proposed method is fully automatic. Subjective evaluations have shown that it is preferred over a number of existing approaches.


computer vision and pattern recognition | 2013

Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation

Luming Zhang; Mingli Song; Zicheng Liu; Xiao Liu; Jiajun Bu; Chun Chen

Weakly supervised image segmentation is a challenging problem in computer vision field. In this paper, we present a new weakly supervised image segmentation algorithm by learning the distribution of spatially structured super pixel sets from image-level labels. Specifically, we first extract graph lets from each image where a graph let is a small-sized graph consisting of super pixels as its nodes and it encapsulates the spatial structure of those super pixels. Then, a manifold embedding algorithm is proposed to transform graph lets of different sizes into equal-length feature vectors. Thereafter, we use GMM to learn the distribution of the post-embedding graph lets. Finally, we propose a novel image segmentation algorithm, called graph let cut, that leverages the learned graph let distribution in measuring the homogeneity of a set of spatially structured super pixels. Experimental results show that the proposed approach outperforms state-of-the-art weakly supervised image segmentation methods, and its performance is comparable to those of the fully supervised segmentation models.


computer vision and pattern recognition | 2004

Audio-visual based emotion recognition - a new approach

Mingli Song; Jiajun Bu; Chun Chen; Nan Li

Emotion recognition is one of the latest challenges in intelligent human/computer communication. Most of the previous work on emotion recognition focused on extracting emotions from visual or audio information separately. A novel approach is presented in this paper, including both visual and audio from video clips, to recognize the human emotion. The facial animation parameters (FAPs) compliant facial feature tracking based on active appearance model is performed on the video to generate two vector stream which represent the expression feature and the visual speech one. Combined with the visual vectors, the audio vector is extracted in terms of low level features. Then, a tripled hidden Markov model is introduced to perform the recognition which allows the state asynchrony of the audio and visual observation sequences while preserving their natural correlation over time. The experimental results show that this approach outperforms only using visual or audio separately.


IEEE Transactions on Multimedia | 2014

Weakly Supervised Photo Cropping

Luming Zhang; Mingli Song; Yi Yang; Qi Zhao; Chen Zhao; Nicu Sebe

Photo cropping is widely used in the printing industry, photography, and cinematography. Conventional photo cropping methods suffer from three drawbacks: 1) the semantics used to describe photo aesthetics are determined by the experience of model designers and specific data sets, 2) image global configurations, an essential cue to capture photos aesthetics, are not well preserved in the cropped photo, and 3) multi-channel visual features from an image region contribute differently to human aesthetics, but state-of-the-art photo cropping methods cannot automatically weight them. Owing to the recent progress in image retrieval community, image-level semantics, i.e., photo labels obtained without much human supervision, can be efficiently and effectively acquired. Thus, we propose weakly supervised photo cropping, where a manifold embedding algorithm is developed to incorporate image-level semantics and image global configurations with graphlets, or, small-sized connected subgraph. After manifold embedding, a Bayesian Network (BN) is proposed. It incorporates the testing photo into the framework derived from the multi-channel post-embedding graphlets of the training data, the importance of which is determined automatically. Based on the BN, photo cropping can be casted as searching the candidate cropped photo that maximally preserves graphlets from the training photos, and the optimal cropping parameter is inferred by Gibbs sampling. Subjective evaluations demonstrate that: 1) our approach outperforms several representative photo cropping methods, including our previous cropping model that is guided by semantics-free graphlets, and 2) the visualized graphlets explicitly capture photo semantics and global spatial configurations.


IEEE Transactions on Circuits and Systems for Video Technology | 2008

Bayesian Tensor Approach for 3-D Face Modeling

Dacheng Tao; Mingli Song; Xuelong Li; Jialie Shen; Jimeng Sun; Xindong Wu; Christos Faloutsos; Stephen J. Maybank

Effectively modeling a collection of three-dimensional (3-D) faces is an important task in various applications, especially facial expression-driven ones, e.g., expression generation, retargeting, and synthesis. These 3-D faces naturally form a set of second-order tensors-one modality for identity and the other for expression. The number of these second-order tensors is three times of that of the vertices for 3-D face modeling. As for algorithms, Bayesian data modeling, which is a natural data analysis tool, has been widely applied with great success; however, it works only for vector data. Therefore, there is a gap between tensor-based representation and vector-based data analysis tools. Aiming at bridging this gap and generalizing conventional statistical tools over tensors, this paper proposes a decoupled probabilistic algorithm, which is named Bayesian tensor analysis (BTA). Theoretically, BTA can automatically and suitably determine dimensionality for different modalities of tensor data. With BTA, a collection of 3-D faces can be well modeled. Empirical studies on expression retargeting also justify the advantages of BTA.


IEEE Transactions on Image Processing | 2012

Probabilistic Exposure Fusion

Mingli Song; Dacheng Tao; Chun Chen; Jiajun Bu; Jiebo Luo; Chengqi Zhang

The luminance of a natural scene is often of high dynamic range (HDR). In this paper, we propose a new scheme to handle HDR scenes by integrating locally adaptive scene detail capture and suppressing gradient reversals introduced by the local adaptation. The proposed scheme is novel for capturing an HDR scene by using a standard dynamic range (SDR) device and synthesizing an image suitable for SDR displays. In particular, we use an SDR capture device to record scene details (i.e., the visible contrasts and the scene gradients) in a series of SDR images with different exposure levels. Each SDR image responds to a fraction of the HDR and partially records scene details. With the captured SDR image series, we first calculate the image luminance levels, which maximize the visible contrasts, and then the scene gradients embedded in these images. Next, we synthesize an SDR image by using a probabilistic model that preserves the calculated image luminance levels and suppresses reversals in the image luminance gradients. The synthesized SDR image contains much more scene details than any of the captured SDR image. Moreover, the proposed scheme also functions as the tone mapping of an HDR image to the SDR image, and it is superior to both global and local tone mapping operators. This is because global operators fail to preserve visual details when the contrast ratio of a scene is large, whereas local operators often produce halos in the synthesized SDR image. The proposed scheme does not require any human interaction or parameter tuning for different scenes. Subjective evaluations have shown that it is preferred over a number of existing approaches.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2010

Color to Gray: Visual Cue Preservation

Mingli Song; Dacheng Tao; Chun Chen; Xuelong Li; Chang Wen Chen

Both commercial and scientific applications often need to transform color images into gray-scale images, e.g., to reduce the publication cost in printing color images or to help color blind people see visual cues of color images. However, conventional color to gray algorithms are not ready for practical applications because they encounter the following problems: 1) Visual cues are not well defined so it is unclear how to preserve important cues in the transformed gray-scale images; 2) some algorithms have extremely high time cost for computation; and 3) some require human-computer interactions to have a reasonable transformation. To solve or at least reduce these problems, we propose a new algorithm based on a probabilistic graphical model with the assumption that the image is defined over a Markov random field. Thus, color to gray procedure can be regarded as a labeling process to preserve the newly well--defined visual cues of a color image in the transformed gray-scale image. Visual cues are measurements that can be extracted from a color image by a perceiver. They indicate the state of some properties of the image that the perceiver is interested in perceiving. Different people may perceive different cues from the same color image and three cues are defined in this paper, namely, color spatial consistency, image structure information, and color channel perception priority. We cast color to gray as a visual cue preservation procedure based on a probabilistic graphical model and optimize the model based on an integral minimization problem. We apply the new algorithm to both natural color images and artificial pictures, and demonstrate that the proposed approach outperforms representative conventional algorithms in terms of effectiveness and efficiency. In addition, it requires no human-computer interactions.


computer vision and pattern recognition | 2014

Semi-supervised Coupled Dictionary Learning for Person Re-identification

Xiao Liu; Mingli Song; Dacheng Tao; Xingchen Zhou; Chun Chen; Jiajun Bu

The desirability of being able to search for specific persons in surveillance videos captured by different cameras has increasingly motivated interest in the problem of person re-identification, which is a critical yet under-addressed challenge in multi-camera tracking systems. The main difficulty of person re-identification arises from the variations in human appearances from different camera views. In this paper, to bridge the human appearance variations across cameras, two coupled dictionaries that relate to the gallery and probe cameras are jointly learned in the training phase from both labeled and unlabeled images. The labeled training images carry the relationship between features from different cameras, and the abundant unlabeled training images are introduced to exploit the geometry of the marginal distribution for obtaining robust sparse representation. In the testing phase, the feature of each target image from the probe camera is first encoded by the sparse representation and then recovered in the feature space spanned by the images from the gallery camera. The features of the same person from different cameras are similar following the above transformation. Experimental results on publicly available datasets demonstrate the superiority of our method.


computer vision and pattern recognition | 2013

Semi-supervised Node Splitting for Random Forest Construction

Xiao Liu; Mingli Song; Dacheng Tao; Zicheng Liu; Luming Zhang; Chun Chen; Jiajun Bu

Node splitting is an important issue in Random Forest but robust splitting requires a large number of training samples. Existing solutions fail to properly partition the feature space if there are insufficient training data. In this paper, we present semi-supervised splitting to overcome this limitation by splitting nodes with the guidance of both labeled and unlabeled data. In particular, we derive a nonparametric algorithm to obtain an accurate quality measure of splitting by incorporating abundant unlabeled data. To avoid the curse of dimensionality, we project the data points from the original high-dimensional feature space onto a low-dimensional subspace before estimation. A unified optimization framework is proposed to select a coupled pair of subspace and separating hyper plane such that the smoothness of the subspace and the quality of the splitting are guaranteed simultaneously. The proposed algorithm is compared with state-of-the-art supervised and semi-supervised algorithms for typical computer vision applications such as object categorization and image segmentation. Experimental results on publicly available datasets demonstrate the superiority of our method.

Collaboration


Dive into the Mingli Song's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Luming Zhang

Hefei University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yezhou Yang

Arizona State University

View shared research outputs
Top Co-Authors

Avatar

Li Sun

Zhejiang University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Na Li

Zhejiang University

View shared research outputs
Researchain Logo
Decentralizing Knowledge