Yuncheng Li | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yuncheng Li is active.

Explore More

Publication

Featured researches published by Yuncheng Li.

international conference on internet multimedia computing and service | 2015

Using user generated online photos to estimate and monitor air pollution in major cities

Yuncheng Li; Jifei Huang; Jiebo Luo

With the rapid development of economy in China over the past decade, air pollution has become an increasingly serious problem in major cities and caused grave public health concerns in China. Recently, a number of studies have dealt with air quality and air pollution. Among them, some attempt to predict and monitor the air quality from different sources of information, ranging from deployed physical sensors to social media. These methods are either too expensive or unreliable, prompting us to search for a novel and effective way to sense the air quality. In this study, we propose to employ the state of the art in computer vision techniques to analyze photos that can be easily acquired from online social media. Next, we establish the correlation between the haze level computed directly from photos with the official PM 2.5 record of the taken city at the taken time. Our experiments based on both synthetic and real photos have shown the promise of this image-based approach to estimating and monitoring air pollution.

computer vision and pattern recognition | 2016

TGIF: A New Dataset and Benchmark on Animated GIF Description

Yuncheng Li; Yale Song; Liangliang Cao; Joel R. Tetreault; Larry Goldberg; Alejandro Jaimes; Jiebo Luo

With the recent popularity of animated GIFs on social media, there is need for ways to index them with rich meta-data. To advance research on animated GIF understanding, we collected a new dataset, Tumblr GIF (TGIF), with 100K animated GIFs from Tumblr and 120K natural language descriptions obtained via crowdsourcing. The motivation for this work is to develop a testbed for image sequence description systems, where the task is to generate natural language descriptions for animated GIFs or video clips. To ensure a high quality dataset, we developed a series of novel quality controls to validate free-form text input from crowd-workers. We show that there is unambiguous association between visual content and natural language descriptions in our dataset, making it an ideal benchmark for the visual content captioning task. We perform extensive statistical analyses to compare our dataset to existing image and video description datasets. Next, we provide baseline results on the animated GIF description task, using three representative techniques: nearest neighbor, statistical machine translation, and recurrent neural networks. Finally, we show that models fine-tuned from our animated GIF description dataset can be helpful for automatic movie description.

international conference on multimodal interfaces | 2013

A Markov logic framework for recognizing complex events from multimodal data

Young Chol Song; Henry A. Kautz; James F. Allen; Mary D. Swift; Yuncheng Li; Jiebo Luo; Ce Zhang

We present a general framework for complex event recognition that is well-suited for integrating information that varies widely in detail and granularity. Consider the scenario of an agent in an instrumented space performing a complex task while describing what he is doing in a natural manner. The system takes in a variety of information, including objects and gestures recognized by RGB-D and descriptions of events extracted from recognized and parsed speech. The system outputs a complete reconstruction of the agents plan, explaining actions in terms of more complex activities and filling in unobserved but necessary events. We show how to use Markov Logic (a probabilistic extension of first-order logic) to create a model in which observations can be partial, noisy, and refer to future or temporally ambiguous events; complex events are composed from simpler events in a manner that exposes their structure for inference and learning; and uncertainty is handled in a sound probabilistic manner. We demonstrate the effectiveness of the approach for tracking kitchen activities in the presence of noisy and incomplete observations.

IEEE Transactions on Multimedia | 2017

Mining Fashion Outfit Composition Using an End-to-End Deep Learning Approach on Set Data

Yuncheng Li; Liangliang Cao; Jiang Zhu; Jiebo Luo

Composing fashion outfits involves deep under-standing of fashion standards while incorporating creativity for choosing multiple fashion items (e.g., jewelry, bag, pants, dress). In fashion websites, popular or high-quality fashion outfits are usually designed by fashion experts and followed by large audiences. In this paper, we propose a machine learning system to compose fashion outfits automatically. The core of the proposed automatic composition system is to score fashion outfit candidates based on the appearances and metadata. We propose to leverage outfit popularity on fashion-oriented websites to supervise the scoring component. The scoring component is a multimodal multiinstance deep learning system that evaluates instance aesthetics and set compatibility simultaneously. In order to train and evaluate the proposed composition system, we have collected a large-scale fashion outfit dataset with 195K outfits and 368K fashion items from Polyvore. Although the fashion outfit scoring and composition is rather challenging, we have achieved an AUC of 85% for the scoring component, and an accuracy of 77% for a constrained composition task.

acm multimedia | 2015

Pinterest Board Recommendation for Twitter Users

Xitong Yang; Yuncheng Li; Jiebo Luo

Pinboard on Pinterest is an emerging media to engage online social media users, on which users post online images for specific topics. Regardless of its significance, there is little previous work specifically to facilitate information discovery based on pinboards. This paper proposes a novel pinboard recommendation system for Twitter users. In order to associate contents from the two social media platforms, we propose to use MultiLabel classification to map Twitter user followees to pinboard topics and visual diversification to recommend pinboards given user interested topics. A preliminary experiment on a dataset with 2000 users validated our proposed system.

international conference on pattern recognition | 2016

Learning effective Gait features using LSTM

Yang Feng; Yuncheng Li; Jiebo Luo

Human gait is an important biometric feature for person identification in surveillance videos because it can be collected at a distance without subject cooperation. Most existing gait recognition methods are based on Gait Energy Image (GEI). Although the spatial information in one gait sequence can be well represented by GEI, the temporal information is lost. To solve this problem, we propose a new feature learning method for gait recognition. Not only can the learned feature preserve temporal information in a gait sequence, but it can also be applied to cross-view gait recognition. Heatmaps extracted by a convolutional neutral network (CNN) based pose estimate method are used to describe the gait information in one frame. To model a gait sequence, the LSTM recurrent neural network is naturally adopted. Our LSTM model can be trained with unlabeled data, where the identity of the subject in a gait sequence is unknown. When labeled data are available, our LSTM works as a frame to frame view transformation model (VTM). Experiments on a gait benchmark demonstrate the efficacy of our method.

IEEE Transactions on Image Processing | 2017

Adaptive Greedy Dictionary Selection for Web Media Summarization

Yang Cong; Ji Liu; Gan Sun; Quanzeng You; Yuncheng Li; Jiebo Luo

Initializing an effective dictionary is an indispensable step for sparse representation. In this paper, we focus on the dictionary selection problem with the objective to select a compact subset of basis from original training data instead of learning a new dictionary matrix as dictionary learning models do. We first design a new dictionary selection model via l2,0 norm. For model optimization, we propose two methods: one is the standard forward-backward greedy algorithm, which is not suitable for large-scale problems; the other is based on the gradient cues at each forward iteration and speeds up the process dramatically. In comparison with the state-of-the-art dictionary selection models, our model is not only more effective and efficient, but also can control the sparsity. To evaluate the performance of our new model, we select two practical web media summarization problems: 1) we build a new data set consisting of around 500 users, 3000 albums, and 1 million images, and achieve effective assisted albuming based on our model and 2) by formulating the video summarization problem as a dictionary selection issue, we employ our model to extract keyframes from a video sequence in a more flexible way. Generally, our model outperforms the state-of-the-art methods in both these two tasks.

Pattern Recognition | 2017

Multi-modal deep feature learning for RGB-D object detection

Xiangyang Xu; Yuncheng Li; Gangshan Wu; Jiebo Luo

We present an approach for RGB-D object detection, which can exploit both modality-correlated and modality-specific relationships between RGB and depth images.The shared weights strategy and a parameter-free-correlation layer are introduced to extract the modality-correlated representations.The proposed approach can simultaneously generate RGB-D region proposals and perform region-wise RGB-D object recognition. We present a novel multi-modal deep feature learning architecture for RGB-D object detection. The current paradigm for object detection typically consists of two stages: objectness estimation and region-wise object recognition. Most existing RGB-D object detection approaches treat the two stages separately by extracting RGB and depth features individually, thus ignore the correlated relationship between these two modalities. In contrast, our proposed method is designed to take full advantages of both depth and color cues by exploiting both modality-correlated and modality-specific features and jointly performing RGB-D objectness estimation and region-wise object recognition. Specifically, shared weights strategy and a parameter-free correlation layer are exploited to carry out RGB-D-correlated objectness estimation and region-wise recognition in conjunction with RGB-specific and depth-specific procedures. The parameters of these three networks are simultaneously optimized via end-to-end multi-task learning. The multi-modal RGB-D objectness estimation results and RGB-D object recognition results are both boosted by late-fusion ensemble. To validate the effectiveness of the proposed approach, we conduct extensive experiments on two challenging RGB-D benchmark datasets, NYU Depth v2 and SUN RGB-D. The experimental results show that by introducing the modality-correlated feature representation, the proposed multi-modal RGB-D object detection approach is substantially superior to the state-of-the-art competitors. Moreover, compared to the expensive deep architecture (VGG16) that the state-of-the-art methods preferred, our approach, which is built upon more lightweight deep architecture (AlexNet), performs slightly better.

international conference on image processing | 2013

Task-relevant object detection and tracking

Yuncheng Li; Jiebo Luo

Within the context of the Learning from Narrated Demonstration framework, a key vision component is to detect the task-relevant object for further processing. In this paper, we take advantage of the fact the task-relevant object is often connected to the supervisors hand and recast the problem as handheld object detection and tracking. Achieving robust handheld object detection and tracking has its own challenges, including arbitrary object appearance, viewpoint and non-rigid deformation. We propose a robust vision system that integrates speech information to perform handheld object detection via CRF, and MeanShift based tracking. Extensive evaluation on five sets of data has demonstrated the validity and robustness of the proposed system.

international conference on big data | 2015

User-curated image collections: Modeling and recommendation

Yuncheng Li; Tao Mei; Yang Cong; Jiebo Luo

Most state-of-the-art image retrieval and recommendation systems predominantly focus on individual images. In contrast, socially curated image collections, condensing distinctive yet coherent images into one set, are largely overlooked by the research communities. In this paper, we aim to design a novel recommendation system that can provide users with image collections relevant to individual personal preferences and interests. To this end, two key issues need to be addressed, i.e., image collection modeling and similarity measurement. For image collection modeling, we consider each image collection as a whole in a group sparse reconstruction framework and extract concise collection descriptors given the pretrained dictionaries. We then consider image collection recommendation as a dynamic similarity measurement problem in response to users clicked image set, and employ a metric learner to measure the similarity between the image collection and the clicked image set. As there is no previous work directly comparable to this study, we implement several competitive baselines and related methods for comparison. The evaluations on a large scale Pinterest data set have validated the effectiveness of our proposed methods for modeling and recommending image collections.

Explore More