Caiming Xiong | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Caiming Xiong is active.

Explore More

Publication

Featured researches published by Caiming Xiong.

european conference on computer vision | 2012

Streaming hierarchical video segmentation

Chenliang Xu; Caiming Xiong; Jason J. Corso

The use of video segmentation as an early processing step in video analysis lags behind the use of image segmentation for image analysis, despite many available video segmentation methods. A major reason for this lag is simply that videos are an order of magnitude bigger than images; yet most methods require all voxels in the video to be loaded into memory, which is clearly prohibitive for even medium length videos. We address this limitation by proposing an approximation framework for streaming hierarchical video segmentation motivated by data stream algorithms: each video frame is processed only once and does not change the segmentation of previous frames. We implement the graph-based hierarchical segmentation method within our streaming framework; our method is the first streaming hierarchical video segmentation method proposed. We perform thorough experimental analysis on a benchmark video data set and longer videos. Our results indicate the graph-based streaming hierarchical method outperforms other streaming video segmentation methods and performs nearly as well as the full-video hierarchical graph-based method.

ACM Transactions on Graphics | 2009

From image parsing to painterly rendering

Kun Zeng; Mingtian Zhao; Caiming Xiong; Song-Chun Zhu

We present a semantics-driven approach for stroke-based painterly rendering, based on recent image parsing techniques [Tu et al. 2005; Tu and Zhu 2006] in computer vision. Image parsing integrates segmentation for regions, sketching for curves, and recognition for object categories. In an interactive manner, we decompose an input image into a hierarchy of its constituent components in a parse tree representation with occlusion relations among the nodes in the tree. To paint the image, we build a brush dictionary containing a large set (760) of brush examples of four shape/appearance categories, which are collected from professional artists, then we select appropriate brushes from the dictionary and place them on the canvas guided by the image semantics included in the parse tree, with each image component and layer painted in various styles. During this process, the scene and object categories also determine the color blending and shading strategies for inhomogeneous synthesis of image details. Compared with previous methods, this approach benefits from richer meaningful image semantic information, which leads to better simulation of painting techniques of artists using the high-quality brush dictionary. We have tested our approach on a large number (hundreds) of images and it produced satisfactory painterly effects.

computer vision and pattern recognition | 2017

Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning

Jiasen Lu; Caiming Xiong; Devi Parikh; Richard Socher

Attention-based neural encoder-decoder frameworks have been widely adopted for image captioning. Most methods force visual attention to be active for every generated word. However, the decoder likely requires little to no visual information from the image to predict non-visual words such as the and of. Other words that may seem visual can often be predicted reliably just from the language model e.g., sign after behind a red stop or phone following talking on a cell. In this paper, we propose a novel adaptive attention model with a visual sentinel. At each time step, our model decides whether to attend to the image (and if so, to which regions) or to the visual sentinel. The model decides whether to attend to the image and where, in order to extract meaningful information for sequential word generation. We test our method on the COCO image captioning 2015 challenge dataset and Flickr30K. Our approach sets the new state-of-the-art by a significant margin.

computer vision and pattern recognition | 2015

Joint action recognition and pose estimation from video

Bruce Xiaohan Nie; Caiming Xiong; Song-Chun Zhu

Action recognition and pose estimation from video are closely related tasks for understanding human motion, most methods, however, learn separate models and combine them sequentially. In this paper, we propose a framework to integrate training and testing of the two tasks. A spatial-temporal And-Or graph model is introduced to represent action at three scales. Specifically the action is decomposed into poses which are further divided to mid-level ST-parts and then parts. The hierarchical structure of our model captures the geometric and appearance variations of pose at each frame and lateral connections between ST-parts at adjacent frames capture the action-specific motion information. The model parameters for three scales are learned discriminatively, and action labels and poses are efficiently inferred by dynamic programming. Experiments demonstrate that our approach achieves state-of-art accuracy in action recognition while also improving pose estimation.

empirical methods in natural language processing | 2017

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

Kazuma Hashimoto; Caiming Xiong; Yoshimasa Tsuruoka; Richard Socher

Transfer and multi-task learning have traditionally focused on either a single source-target pair or very few, similar tasks. Ideally, the linguistic levels of morphology, syntax and semantics would benefit each other by being trained in a single model. We introduce a joint many-task model together with a strategy for successively growing its depth to solve increasingly complex tasks. Higher layers include shortcut connections to lower-level task predictions to reflect linguistic hierarchies. We use a simple regularization term to allow for optimizing all model weights to improve one task’s loss without exhibiting catastrophic interference of the other tasks. Our single end-to-end model obtains state-of-the-art or competitive results on five different tasks from tagging, parsing, relatedness, and entailment tasks.

computer vision and pattern recognition | 2014

Actionness Ranking with Lattice Conditional Ordinal Random Fields

Wei Chen; Caiming Xiong; Ran Xu; Jason J. Corso

Action analysis in image and video has been attracting more and more attention in computer vision. Recognizing specific actions in video clips has been the main focus. We move in a new, more general direction in this paper and ask the critical fundamental question: what is action, how is action different from motion, and in a given image or video where is the action? We study the philosophical and visual characteristics of action, which lead us to define actionness: intentional bodily movement of biological agents (people, animals). To solve the general problem, we propose the lattice conditional ordinal random field model that incorporates local evidence as well as neighboring order agreement. We implement the new model in the continuous domain and apply it to scoring actionness in both image and video datasets. Our experiments demonstrate not only that our new model can outperform the popular ranking SVM but also that indeed action is distinct from motion.

knowledge discovery and data mining | 2012

Random forests for metric learning with implicit pairwise position dependence

Caiming Xiong; David M. Johnson; Ran Xu; Jason J. Corso

Metric learning makes it plausible to learn semantically meaningful distances for complex distributions of data using label or pairwise constraint information. However, to date, most metric learning methods are based on a single Mahalanobis metric, which cannot handle heterogeneous data well. Those that learn multiple metrics throughout the feature space have demonstrated superior accuracy, but at a severe cost to computational efficiency. Here, we adopt a new angle on the metric learning problem and learn a single metric that is able to implicitly adapt its distance function throughout the feature space. This metric adaptation is accomplished by using a random forest-based classifier to underpin the distance function and incorporate both absolute pairwise position and standard relative position into the representation. We have implemented and tested our method against state of the art global and multi-metric methods on a variety of data sets. Overall, the proposed method outperforms both types of method in terms of accuracy (consistently ranked first) and is an order of magnitude faster than state of the art multi-metric methods (16x faster in the worst case).

Multimedia Tools and Applications | 2009

Marker-less registration based on template tracking for augmented reality

Liang Lin; Yongtian Wang; Yue Liu; Caiming Xiong; Kun Zeng

Accurate 3D registration is a key issue in the Augmented Reality (AR) applications, particularly where are no markers placed manually. In this paper, an efficient markerless registration algorithm is presented for both outdoor and indoor AR system. This algorithm first calculates the correspondences among frames using fixed region tracking, and then estimates the motion parameters on projective transformation following the homography of the tracked region. To achieve the illumination insensitive tracking, the illumination parameters are solved jointly with motion parameters in each step. Based on the perspective motion parameters of the tracked region, the 3D registration, the camera’s pose and position, can be calculated with calibrated intrinsic parameters. A marker-less AR system is described using this algorithm, and the system architecture and working flow are also proposed. Experimental results with comparison quantitatively demonstrate the correctness of the theoretical analysis and the robustness of the registration algorithm.

Theriogenology | 2010

The role of brain-derived neurotrophic factor in mouse oocyte maturation in vitro involves activation of protein kinase B

Lin Zhang; Yuan Liang; Y. Liu; Caiming Xiong

Brain-derived neurotrophic factor (BDNF) can promote developmental competence in mammalian oocytes during in vitro maturation, but the signal transduction pathways are not clear. In this study, we investigated (using western blots) the effects of BDNF on the phosphorylation of protein kinase B (PKB) and mitogen-activated protein kinase (MAPK) in mouse oocytes and cumulus cells cultured in vitro. Treatment with BDNF enhanced phosphorylation of PKB in oocytes at 2h (P=0.0006) and 3h (P<0.0001) of in vitro maturation, compared with control oocytes. However, the pan-specific tyrosine kinase (Trk) inhibitor K252a together with BDNF completely inhibited phosphorylation of PKB in the oocytes. Furthermore, BDNF increased phosphorylation of MAPK in oocytes at 16h of in vitro maturation (P=0.0041), but K252a together with BDNF did not reduce phosphorylation of MAPK in the oocytes. For cumulus cells, BDNF significantly prolonged the phosphorylation of PKB and MAPK and increased the total amounts of PKB and MAPK proteins after 16h of in vitro maturation. However, BDNF did not affect apoptosis of the cumulus cells during oocyte maturation in vitro. In conclusion, the PKB pathway is likely to be one signaling cascade activated by BDNF in combination with the TrkB receptor, whereas the MAPK pathway is not involved. These findings may have relevance for BDNF-induced promotion of developmental capacity of in vitro-matured oocytes.

Remote Sensing | 2015

Accurate Annotation of Remote Sensing Images via Active Spectral Clustering with Little Expert Knowledge

Gui-Song Xia; Zifeng Wang; Caiming Xiong; Liangpei Zhang

It is a challenging problem to efficiently interpret the large volumes of remotely sensed image data being collected in the current age of remote sensing “big data”. Although human visual interpretation can yield accurate annotation of remote sensing images, it demands considerable expert knowledge and is always time-consuming, which strongly hinders its efficiency. Alternatively, intelligent approaches (e.g., supervised classification and unsupervised clustering) can speed up the annotation process through the application of advanced image analysis and data mining technologies. However, high-quality expert-annotated samples are still a prerequisite for intelligent approaches to achieve accurate results. Thus, how to efficiently annotate remote sensing images with little expert knowledge is an important and inevitable problem. To address this issue, this paper introduces a novel active clustering method for the annotation of high-resolution remote sensing images. More precisely, given a set of remote sensing images, we first build a graph based on these images and then gradually optimize the structure of the graph using a cut-collect process, which relies on a graph-based spectral clustering algorithm and pairwise constraints that are incrementally added via active learning. The pairwise constraints are simply similarity/dissimilarity relationships between the most uncertain pairwise nodes on the graph, which can be easily determined by non-expert human oracles. Furthermore, we also propose a strategy to adaptively update the number of classes in the clustering algorithm. In contrast with existing methods, our approach can achieve high accuracy in the task of remote sensing image annotation with relatively little expert knowledge, thereby greatly lightening the workload burden and reducing the requirements regarding expert knowledge. Experiments on several datasets of remote sensing images show that our algorithm achieves state-of-the-art performance in the annotation of remote sensing images and demonstrates high potential in many practical remote sensing applications.

Explore More