Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yingbin Zheng is active.

Publication


Featured researches published by Yingbin Zheng.


european conference on computer vision | 2012

Learning hybrid part filters for scene recognition

Yingbin Zheng; Yu-Gang Jiang; Xiangyang Xue

This paper introduces a new image representation for scene recognition, where an image is described based on the response maps of object part filters. The part filters are learned from existing datasets with object location annotations, using deformable part-based models trained by latent SVM [1]. Since different objects may contain similar parts, we describe a method that uses a semantic hierarchy to automatically determine and merge filters shared by multiple objects. The merged hybrid filters are then applied to new images. Our proposed representation, called Hybrid-Parts, is generated by pooling the response maps of the hybrid filters. Contrast to previous scene recognition approaches that adopted object-level detections as feature inputs, we harness filter responses of object parts, which enable a richer and finer-grained representation. The use of the hybrid filters is important towards a more compact representation, compared to directly using all the original part filters. Through extensive experiments on several scene recognition benchmarks, we demonstrate that Hybrid-Parts outperforms recent state-of-the-arts, and combining it with standard low-level features such as the GIST descriptor can lead to further improvements.


Pattern Recognition Letters | 2012

A simplified multi-class support vector machine with reduced dual optimization

Xisheng He; Zhe Wang; Cheng Jin; Yingbin Zheng; Xiangyang Xue

Support vector machine (SVM) was initially designed for binary classification. To extend SVM to the multi-class scenario, a number of classification models were proposed such as the one by Crammer and Singer (2001). However, the number of variables in Crammer and Singers dual problem is the product of the number of samples (l) by the number of classes (k), which produces a large computational complexity. This paper presents a simplified multi-class SVM (SimMSVM) that reduces the size of the resulting dual problem from lxk to l by introducing a relaxed classification error bound. The experimental results demonstrate that the proposed SimMSVM approach can greatly speed-up the training process, while maintaining a competitive classification accuracy.


acm multimedia | 2010

Semantic video indexing by fusing explicit and implicit context spaces

Yingbin Zheng; Renzhong Wei; Hong Lu; Xiangyang Xue

This paper addresses the problem of context-based concept fusion (CBCF) for concept detection and semantic video indexing. We introduce a novel framework based on constructing context spaces of concepts, such that the contextual correlations are used to improve the performance of concept detectors. Different from traditional CBCF approach, we present two kinds of such context spaces: explicit context space for modeling the correlation of pairwise concepts, and implicit context space for representing latent themes trained from a set of concepts. The final concept detection scores are then directly fused from explicit and implicit context spaces. Experiments are presented on TRECVid 2006 benchmark and the comparisons with several state-of-the-art approaches demonstrate the effectiveness of proposed framework.


asian conference on computer vision | 2009

Incorporating spatial correlogram into bag-of-features model for scene categorization

Yingbin Zheng; Hong Lu; Cheng Jin; Xiangyang Xue

This paper presents a novel approach to represent the codebook vocabulary in bag-of-features model for scene categorization. Traditional bag-of-features model describes an image as a histogram of the occurrence rate of codebook vocabulary. In our approach, spatial correlogram between codewords is incorporated to approximate the local geometric information. This works by augmenting the traditional vocabulary histogram with the distance distribution of pairwise interest regions. We also combine this correlogram representation with spatial pyramid matching to describe both local and global geometric correspondences. Experimental results show that correlogram representation can outperform the histogram scheme for bag-of-features model, and the combination with spatial pyramid matching improves effectiveness for categorization.


computer vision and pattern recognition | 2009

Content and context-based multi-label image annotation

Hong Lu; Yingbin Zheng; Xiangyang Xue; Y. Zhang

In this paper, we propose a multi-label image annotation framework by incorporating the content and context information of images. Specifically, images are annotated on regional scale. This annotation is independent of the sizes of blocks. Confidences of content based block and image annotation are then obtained. On the other hand, spatial features by combining the block annotation confidence and the spatial context are proposed for main concepts, corresponding to the concepts been annotated, and the auxiliary concepts, corresponding to the concepts that have high co-occurrence with the main concepts in the images. This proposed spatial feature can incorporate the position of the concept and the spatial context between these concepts. Experiments on expanded Corel dataset categories demonstrate the effectiveness of the proposed method.


international conference on image processing | 2010

How context helps: A discriminative codeword selection method for object detection

Renzhong Wei; Hong Lu; Yingbin Zheng; Lei Cen; Cheng Jin; Xiangyang Xue; Weiguo Wu

We first propose in this paper to localize objects in images based on the models learned from the weakly labeled images. This task is termed as region of interest (ROI) detection. Local features such as SIFT or HOG are extracted and the discriminative words from clustered codewords based on SIFT and HOG are selected to model the objects. Then how to find the discriminative words to model the object is important. Existing ROI detection methods consider the information from the foreground objects by selecting the words appearing more in the images belonging to one specific image class. Considering the information from background/context is also helpful for object detection and classification, we propose to select the discriminative words which appear more in the foreground/object and less in the background/context. Second, another task is to give the class label (object in this setting) for a given image and also give the position of the object appearing in the image. This task is termed as objection detection. A normal way for this task after ROI is to extract features from the detected regions and not from the whole image. Since the discriminative words extracted during ROI detection has good discriminative ability, we propose to use these words for object detection. Experimental results on PASCAL VOC 2006 dataset and a larger dataset containing 29 classes demonstrate the effectiveness of the proposed method.


conference on image and video retrieval | 2008

Fudan University: hierarchical video retrieval with adaptive multi-modal fusion

Zichen Sun; Yuanzheng Song; Yingbin Zheng; Hui Yu; Cheng Jin; Hong Lu; Xiangyang Xue

This paper describes Fudan Universitys interactive video retrieval system. The system uses an adaptive multi-modal fusion method and enables the user to browse the results hierarchically across different levels of temporal granularity.


acm multimedia | 2011

Refining local descriptors by embedding semantic information for visual categorization

Yingbin Zheng; Renzhong Wei; Hong Lu; Xiangyang Xue

Local descriptor extraction and vector quantization are the important components of widely-used Bag-of-Features (BoF) model for visual categorization. This paper proposes a simple and efficient approach to refine the local descriptors for vector quantization by embedding semantic information. The original local descriptors are integrated by a sequence of category-independent and category-dependent basis. Particularly, the category-dependent basis is learned by minimizing the joint loss minimization over local descriptors from different categories with a shared regularization penalty, which can be formulated as a linear programming problem. The transferred descriptors are further quantized and aggregated to the visual vocabulary. Experiments are performed on PASCAL VOC 2007 benchmark and the quantitative comparisons with several state-of-the-art approaches demonstrate the effectiveness of our proposed approach.


Archive | 2013

An Adaptive and Link-Based Method for Video Scene Clustering and Visualization

Hong Lu; Kai Chen; Yingbin Zheng; Zhuohong Cai; Xiangyang Xue

In this paper we propose to adaptively cluster video shots into scenes on page rank manner and visualize video content on clustered scenes. The clustering method has been compared with state-of-arts methods and experimental results demonstrate the effectiveness of the proposed method. In visualization, the importance of the shots in the scene can be obtained and incorporated into the visualization parameters. The visualization results of the test videos are shown in global and detail manner.


acm multimedia | 2012

A fast video event recognition system and its application to video search

Yu-Gang Jiang; Qi Dai; Yingbin Zheng; Xiangyang Xue; Jie Liu; Dong Wang

Techniques for recognizing complex events in diverse Internet videos are important in many applications. State-of-the-art video event recognition approaches normally involve modules that demand extensive computation, which prevents their application to large scale problems. In this demonstration, we present a fast video event recognition system, which requires just a few seconds to process a general YouTube video with a few minutes of duration. The development of this system is grounded on several important findings from a large set of empirical studies, where we systematically evaluated many technical options for each critical module of a present-day video event recognition framework. Pooling the insights gained from this study leads to a speeded-up event recognition system that is 220-times faster than a decent baseline while still has a high degree of recognition accuracy. We also demonstrate the technical feasibility of using event recognition results as the sole clue for video search, where the similarity of videos is determined based on the consistency of the event recognition confidence scores. We showcase this capability using an Internet video dataset containing about 10 thousands of YouTube videos. Very promising results were observed.

Collaboration


Dive into the Yingbin Zheng's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge