Is this you? Create Your Porfile

Hongxun Yao

Harbin Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hongxun Yao is active.

Explore More

Publication

Featured researches published by Hongxun Yao.

computer vision and pattern recognition | 2016

Hedged Deep Tracking

Yuankai Qi; Shengping Zhang; Lei Qin; Hongxun Yao; Qingming Huang; Jongwoo Lim; Ming-Hsuan Yang

In recent years, several methods have been developed to utilize hierarchical features learned from a deep convolutional neural network (CNN) for visual tracking. However, as features from a certain CNN layer characterize an object of interest from only one aspect or one level, the performance of such trackers trained with features from one layer (usually the second to last layer) can be further improved. In this paper, we propose a novel CNN based tracking framework, which takes full advantage of features from different CNN layers and uses an adaptive Hedge method to hedge several CNN based trackers into a single stronger one. Extensive experiments on a benchmark dataset of 100 challenging image sequences demonstrate the effectiveness of the proposed algorithm compared to several state-of-theart trackers.

IEEE Transactions on Image Processing | 2012

Task-Dependent Visual-Codebook Compression

Rongrong Ji; Hongxun Yao; Wei Liu; Xiaoshuai Sun; Qi Tian

A visual codebook serves as a fundamental component in many state-of-the-art computer vision systems. Most existing codebooks are built based on quantizing local feature descriptors extracted from training images. Subsequently, each image is represented as a high-dimensional bag-of-words histogram. Such highly redundant image description lacks efficiency in both storage and retrieval, in which only a few bins are nonzero and distributed sparsely. Furthermore, most existing codebooks are built based solely on the visual statistics of local descriptors, without considering the supervise labels coming from the subsequent recognition or classification tasks. In this paper, we propose a task-dependent codebook compression framework to handle the above two problems. First, we propose to learn a compression function to map an originally high-dimensional codebook into a compact codebook while maintaining its visual discriminability. This is achieved by a codeword sparse coding scheme with Lasso regression, which minimizes the descriptor distortions of training images after codebook compression. Second, we propose to adapt our codebook compression to the subsequent recognition or classification tasks. This is achieved by introducing a label constraint kernel (LCK) into our compression loss function. In particular, our LCK can model heterogeneous kinds of supervision, i.e., (partial) category labels, correlative semantic annotations, and image query logs. We validated our codebook compression in three computer vision tasks: 1) object recognition in PASCAL Visual Object Class 07; 2) near-duplicate image retrieval in UKBench; and 3) web image search in a collection of 0.5 million Flickr photographs. Our compressed codebook has shown superior performances over several state-of-the-art supervised and unsupervised codebooks.

Applied Mathematics and Computation | 2007

An image fragile watermark scheme based on chaotic image pattern and pixel-pairs

Shaohui Liu; Hongxun Yao; Wen Gao; Yongliang Liu

Fragile watermarking techniques for digital content have been studied in the past few years. Fragile watermarks are used to determine if a piece of watermarked digital content has been tampered, and distinguish tampered areas from non-tampered areas without referring to the original digital content. In this paper, a general framework for fragile watermark is proposed, and then a novel fragile watermarking scheme for image authentication is presented. The embedding process of fragile watermark starts from computing the difference image between the host image and its chaotic pattern. followed by mapping the difference image into a binary image. The binary image is then inserted into the least significant bit (LSB) bitplane of the host image. In addition, chaotic map is used to generate the chaotic pattern image, which can be used as secret key to improve the security of watermark algorithm. Due to employing permutation transform and chaotic image pattern, the corresponding position relation is broken between pixels in the watermarked image and the watermark. Simulation results and performance analysis show that the presented method is fast, secure and capable of detecting and localizing modification.

IEEE Transactions on Multimedia | 2013

Learning to Distribute Vocabulary Indexing for Scalable Visual Search

Rongrong Ji; Ling-Yu Duan; Jie Chen; Lexing Xie; Hongxun Yao; Wen Gao

In recent years, there is an ever-increasing research focus on Bag-of-Words based near duplicate visual search paradigm with inverted indexing. One fundamental yet unexploited challenge is how to maintain the large indexing structures within a single server subject to its memory constraint, which is extremely hard to scale up to millions or even billions of images. In this paper, we propose to parallelize the near duplicate visual search architecture to index millions of images over multiple servers, including the distribution of both visual vocabulary and the corresponding indexing structure. We optimize the distribution of vocabulary indexing from a machine learning perspective, which provides a “memory light” search paradigm that leverages the computational power across multiple servers to reduce the search latency. Especially, our solution addresses two essential issues: “What to distribute” and “How to distribute”. “What to distribute” is addressed by a “lossy” vocabulary Boosting, which discards both frequent and indiscriminating words prior to distribution. “How to distribute” is addressed by learning an optimal distribution function, which maximizes the uniformity of assigning the words of a given query to multiple servers. We validate the distributed vocabulary indexing scheme in a real world location search system over 10 million landmark images. Comparing to the state-of-the-art alternatives of single-server search [5], [6], [16] and distributed search [23], our scheme has yielded a significant gain of about 200% speedup at comparable precision by distributing only 5% words. We also report excellent robustness even when partial servers crash.

acm multimedia | 2007

Trajectory based event tactics analysis in broadcast sports video

Guangyu Zhu; Qingming Huang; Changsheng Xu; Yong Rui; Shuqiang Jiang; Wen Gao; Hongxun Yao

Most of existing approaches on event detection in sports video are general audience oriented. The extracted events are then presented to the audience without further analysis. However, professionals, such as soccer coaches, are more interested in the tactics used in the events. In this paper, we present a novel approach to extract tactic information from the goal event in broadcast soccer video and present the goal event in a tactic mode to the coaches and sports professionals. We first extract goal events with far-view shots based on analysis and alignment of web-casting text and broadcast video. For a detected goal event, we employ a multi-object detection and tracking algorithm to obtain the players and ball trajectories in the shot. Compared with existing work, we proposed an effective tactic representation called aggregate trajectory which is constructed based on multiple trajectories using a novel analysis of temporal-spatial interaction among the players and the ball. The interactive relationship with play region information and hypothesis testing for trajectory temporal-spatial distribution are exploited to analyze the tactic patterns in a hierarchical coarse-to-fine framework. The experimental results on the data of FIFA World Cup 2006 are promising and demonstrate our approach is effective.

acm multimedia | 2009

Mining city landmarks from blogs by graph modeling

Rongrong Ji; Xing Xie; Hongxun Yao; Wei-Ying Ma

Recent years have witnessed great prosperity in community-contributed multimedia. Discovering, extracting, and summarizing knowledge from these data enables us to make better sense of the world. In this paper, we report our work on mining famous city landmarks from blogs for personalized tourist suggestions. Our main contribution is a graph modeling framework to discover city landmarks by mining blog photo correlations with community supervision. This modeling fuses context, content, and community information in a style that simulates both static (PageRank) and dynamic (HITS) ranking models to highlight representative data from the consensus of blog users. Preliminary, we identify geographical locations of page contents to harvest city sight photos from Web blogs, based on which we structure these photos into a Scene-View hierarchy* within each city. Our graph modeling consists of two phases: First, within a given scene, we present a PhotoRank algorithm to discover its representative views, which analogizes PageRank to model context and content photo correlations for graph-based popularity propagation. Second, among scenes within each city, we present a Landmark-HITS model to discover city landmarks, which integrates author correlations to infer scene popularity in a semi-supervised reinforcement manner. Based on graph modeling, we further achieve personalized tourist suggestions by the collaborative filtering of tourism logs and author correlations. Based on a real-world dataset from Windows Live Spaces blogs containing nearly 400,000 sight photos, we have deployed our framework in a VisualTourism system, with comparisons to state-of-the-arts. We also investigate how the city popularities, user locations (e.g. Asian or Euro. blog users), and sequential events (e.g. Olympic Games) influence our Landmark discovery results and the tourist suggestion tendencies.

international conference on image processing | 2008

Dynamic background modeling and subtraction using spatio-temporal local binary patterns

Shengping Zhang; Hongxun Yao; Shaohui Liu

Traditional background modeling and subtraction methods have a strong assumption that the scenes are of static structures with limited perturbation. These methods will perform poorly in dynamic scenes. In this paper, we present a solution to this problem. We first extend the local binary patterns from spatial domain to spatio-temporal domain, and present a new online dynamic texture extraction operator, named spatio- temporal local binary patterns (STLBP). Then we present a novel and effective method for dynamic background modeling and subtraction using STLBP. In the proposed method, each pixel is modeled as a group of STLBP dynamic texture histograms which combine spatial texture and temporal motion information together. Compared with traditional methods, experimental results show that the proposed method adapts quickly to the changes of the dynamic background. It achieves accurate detection of moving objects and suppresses most of the false detections for dynamic changes of nature scenes.

acm multimedia | 2014

Exploring Principles-of-Art Features For Image Emotion Recognition

Sicheng Zhao; Yue Gao; Xiaolei Jiang; Hongxun Yao; Tat-Seng Chua; Xiaoshuai Sun

Emotions can be evoked in humans by images. Most previous works on image emotion analysis mainly used the elements-of-art-based low-level visual features. However, these features are vulnerable and not invariant to the different arrangements of elements. In this paper, we investigate the concept of principles-of-art and its influence on image emotions. Principles-of-art-based emotion features (PAEF) are extracted to classify and score image emotions for understanding the relationship between artistic principles and emotions. PAEF are the unified combination of representation features derived from different principles, including balance, emphasis, harmony, variety, gradation, and movement. Experiments on the International Affective Picture System (IAPS), a set of artistic photography and a set of peer rated abstract paintings, demonstrate the superiority of PAEF for affective image classification and regression (with about 5% improvement on classification accuracy and 0.2 decrease in mean squared error), as compared to the state-of-the-art approaches. We then utilize PAEF to analyze the emotions of master paintings, with promising results.

IEEE Transactions on Multimedia | 2009

Event Tactic Analysis Based on Broadcast Sports Video

Guangyu Zhu; Changsheng Xu; Qingming Huang; Yong Rui; Shuqiang Jiang; Wen Gao; Hongxun Yao

Most existing approaches on sports video analysis have concentrated on semantic event detection. Sports professionals, however, are more interested in tactic analysis to help improve their performance. In this paper, we propose a novel approach to extract tactic information from the attack events in broadcast soccer video and present the events in a tactic mode to the coaches and sports professionals. We extract the attack events with far-view shots using the analysis and alignment of web-casting text and broadcast video. For a detected event, two tactic representations, aggregate trajectory and play region sequence, are constructed based on multi-object trajectories and field locations in the event shots. Based on the multi-object trajectories tracked in the shot, a weighted graph is constructed via the analysis of temporal-spatial interaction among the players and the ball. Using the Viterbi algorithm, the aggregate trajectory is computed based on the weighted graph. The play region sequence is obtained using the identification of the active field locations in the event based on line detection and competition network. The interactive relationship of aggregate trajectory with the information of play region and the hypothesis testing for trajectory temporal-spatial distribution are employed to discover the tactic patterns in a hierarchical coarse-to-fine framework. Extensive experiments on FIFA World Cup 2006 show that the proposed approach is highly effective.

computer vision and pattern recognition | 2010

Visual tracking via weakly supervised learning from multiple imperfect oracles

Bineng Zhong; Hongxun Yao; Sheng Chen; Rongrong Ji; Xiaotong Yuan; Shaohui Liu; Wen Gao

Long-term persistent tracking in ever-changing environments is a challenging task, which often requires addressing difficult object appearance update problems. To solve them, most top-performing methods rely on online learning-based algorithms. Unfortunately, one inherent problem of online learning-based trackers is drift, a gradual adaptation of the tracker to non-targets. To alleviate this problem, we consider visual tracking in a novel weakly supervised learning scenario where (possibly noisy) labels but no ground truth are provided by multiple imperfect oracles (i.e., trackers), some of which may be mediocre. A probabilistic approach is proposed to simultaneously infer the most likely object position and the accuracy of each tracker. Moreover, an online evaluation strategy of trackers and a heuristic training data selection scheme are adopted to make the inference more effective and fast. Consequently, the proposed method can avoid the pitfalls of purely single tracking approaches and get reliable labeled samples to incrementally update each tracker (if it is an appearance-adaptive tracker) to capture the appearance changes. Extensive comparing experiments on challenging video sequences demonstrate the robustness and effectiveness of the proposed method.

Explore More