Hanning Zhou | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hanning Zhou is active.

Explore More

Publication

Featured researches published by Hanning Zhou.

international conference on computer vision | 2003

Tracking articulated hand motion with eigen dynamics analysis

Hanning Zhou; Huang

This paper introduces the concept of eigen-dynamics and proposes an eigen dynamics analysis (EDA) method to learn the dynamics of natural hand motion from labelled sets of motion captured with a data glove. The result is parameterized with a high-order stochastic linear dynamic system (LDS) consisting of five lower-order LDS. Each corresponding to one eigen-dynamics. Based on the EDA model, we construct a dynamic Bayesian network (DBN) to analyze the generative process of a image sequence of natural hand motion. Using the DBN, a hand tracking system is implemented. Experiments on both synthesized and real-world data demonstrate the robustness and effectiveness of these techniques.

computer vision and pattern recognition | 2004

Static Hand Gesture Recognition based on Local Orientation Histogram Feature Distribution Model

Hanning Zhou; Dennis Lin; Thomas S. Huang

This paper proposes a bottom up approach for static hand gesture recognition. By extending the local orientation histogram feature, we make it applicable to the human hand, an object of very little texture. The key steps are augmenting the local orientation histogram feature vector with its relative image coordinates, and clustering the augmented vector to find a compact yet descriptive representation of the hand shape. The recognition result is given by collective voting of all the local orientation histogram features extracted from the hand region of the testing image. The matching score is evaluated by retrieval in the image feature database of the training hand gestures. Locality sensitive hashing is used to reduce the computational cost for retrieval. The experimental results on labelled real-world images demonstrate superior accuracy and efficiency of our algorithm.

international conference on pattern recognition | 2006

Unusual Event Detection via Multi-camera Video Mining

Hanning Zhou; Don Kimber

This paper describes a framework for detecting unusual events in surveillance videos. Most surveillance systems consist of multiple video streams, but traditional event detection systems treat individual video streams independently or combine them in the feature extraction level through geometric reconstruction. Our framework combines multiple video streams in the inference level, with a coupled hidden Markov model (CHMM). We use two-stage training to bootstrap a set of usual events, and train a CHMM over the set. By thresholding the likelihood of a test segment being generated by the model, we build a unusual event detector. We evaluate the performance of our detector through qualitative and quantitative experiments on two sets of real world videos

acm multimedia | 2001

Bi-level video: video communication at very low bit rates

Jiang Li; Gang Chen; Jizheng Xu; Yong Wang; Hanning Zhou; Keman Yu; King To Ng; Heung-Yeung Shum

The rapid development of wired and wireless networks tremendouslyfacilitates communications between people. However, most of thecurrent wireless networks still work in low bandwidths, and mobiledevices still suffer from weak computational power, short batterylifetime and limited display capability. We developed a very lowbit-rate bi-level video coding technique, which can be used invideo communications almost anywhere, anytime on any device. Thespirit of this method is that rather than giving highest priorityto the basic colors of an image as in conventional DCT-basedcompression methods, we give preference to the outline features ofscenes when we have limited bandwidths. These features can berepresented by bi-level image sequences that are converted fromgray-scale image sequences. By analyzing the temporal correlationbetween successive frames and flexibilities in the scenepresentation using bi-level images, we achieve very high ratioswith our bi-level video compression scheme. Experiments show thatin low bandwidths, our method provides clearer shape, smoothermotion, shorter initial latency and much cheaper computational costthan do DCT-based methods. Our method is especially suitable forsmall mobile devices such as handheld PCs, palm-size PCs and mobilephones that possess small display screens and light computationalpower, and work in low bandwidth wireless networks. We have builtPC and Pocket PC versions of bi-level video phone systems, whichtypically provide QCIF-size video with a frame rate of 5-15 fps fora 9.6 Kbps bandwidth.

international conference on computer vision | 2005

Okapi-Chamfer matching for articulate object recognition

Hanning Zhou; Thomas S. Huang

Recent years have witnessed the rise of many effective text information retrieval systems. By treating local visual features as terms, training images as documents and input images as queries, we formulate the problem of object recognition into that of text retrieval. Our formulation opens up the opportunity to integrate some powerful text retrieval tools with computer vision techniques. In this paper, we propose to improve the efficiency of articulated object recognition by an Okapi-Chamfer matching algorithm. The algorithm is based on the inverted index technique. The inverted index is a widely used way to effectively organize a collection of text documents. With the inverted index, only documents that contain query terms are accessed and used for matching. To enable inverted indexing in an image database, we build a lexicon of local visual features by clustering the features extracted from the training images. Given a query image, we extract visual features and quantize them based on the lexicon, and then look up the inverted index to identify the subset of training images with non-zero matching score. To evaluate the matching scores in the subset, we combined the modified Okapi weighting formula with the Chamfer distance. The performance of the Okapi-Chamfer matching algorithm is evaluated on a hand posture recognition system. We test the system with both synthesized and real world images. Quantitative results demonstrate the accuracy and efficiency of our system

International Gesture Workshop | 2003

Recovering Articulated Motion with a Hierarchical Factorization Method

Hanning Zhou; Thomas S. Huang

Recovering articulated human motion is an important task in many applications including surveillance and human-computer interaction. In this paper, a hierarchical factorization method is proposed for recovering articulated human motion (such as hand gesture) from a sequence of images captured under weak perspective projection. It is robust against missing feature points due to self-occlusion, and various observation noises. The accuracy of our algorithm is verified by experiments on synthetic data.

Pattern Recognition Letters | 2010

Semi-supervised learning for text-line detection

Zongyi Liu; Hanning Zhou; Ning Yang

Automatically detecting text-lines from document images has been long studied. However, most researchers today are focusing on boosting the detection rate instead of noise removal. In this paper, we propose a semi-supervised learning framework that targets to segment Manhattan-layout documents with significant levels of noise. The algorithm consists of three steps: first, an initial segmentation process uses the seed filling algorithm; second, an iterative grouping process uses the projection profiles to estimate the vertical border of page contents; third, an inside page-content noise removal uses the online training and classification. We test our algorithm using two databases. The first is the University of Washington (UW)-III database with 1,600 images of different input qualities that has been widely used by the Document Analysis Research (DAR) communities to measure segmentation algorithm performance. The second is the NILE database created by sampling from 320 journals pages of east Asian, east European and middle Eastern languages. The result shows that our framework achieves competitive performance in terms of both page frame level segmentation and text-line level segmentation, and is particularly strong at filtering noise. It also shows that our algorithm is more adaptive to language variations.

acm multimedia | 2001

Portrait video phone

Jiang Li; Keman Yu; Gang Chen; Yong Wang; Hanning Zhou; Jizheng Xu; King To Ng; Kaibo Wang; Lijie Wang; Heung-Yeung Shum

As the Internet and wirless networks are developed rapidly, the demand of communicating anywhere, anytime on any device emerges. However, most of the current wireless networks still work in low bandwidths, and mobile devices still suffer from weak computational power, short battery lifetime and limited display capability. We developed portrait video phone systems that can run on Pcs and Pocket Pcs at very low bit rates through the Internet. The core technology that portrait video phones employ is the so-called portrait video (or bi-level video) codec. Portrait video codec first converts a full-color video into a black/white image sequence and then compresses it into a black/white portrait-like video. Portrait video processes clearer shape, smoother motion, shorter initial latency, and cheaper computational cost than MPEG2, MPEG4 and H.263 for low bandwidths. Typically the portrait video phone provides QCIF-size video with a frame rate of 5-15 fps for a 9.6 Kbps video bandwidth. The portrait video is so small that it can even be transmitted through an HTTP proxy as text. Experiments show that the portrait video phones work well on ordinary GSM wireless telecommunication networks.

multimedia signal processing | 2006

On Redirecting Documents with a Mobile Camera

Qiong Liu; Paul McEvoy; Don Kimber; Patrick Chiu; Hanning Zhou

This paper presents a method for facilitating document redirection in a physical environment via a mobile camera. With this method, a user is able to move documents among electronic devices, post a paper document to a selected public display, or make a printout of a white board with simple point-and-capture operations. More specifically, the user can move a document from its source to a destination by capturing a source image and a destination image in a consecutive order. The system uses SIFT (scale invariant feature transform) features of captured images to identify the devices a user is pointing to, and issues corresponding commands associated with the identified devices. Unlike RF/IR based remote controls, this method uses the visual features of an object as an always available identifier for many tasks, and therefore is easy to deploy. We present experiments on identifying three public displays and a document scanner in a conference room for evaluation

document recognition and retrieval | 2011

Segmenting texts from outdoor images taken by mobile phones using color features

Zongyi Liu; Hanning Zhou

Recognizing texts from images taken by mobile phones with low resolution has wide applications. It has been shown that a good image binarization can substantially improve the performances of OCR engines. In this paper, we present a framework to segment texts from outdoor images taken by mobile phones using color features. The framework consists of three steps: (i) the initial process including image enhancement, binarization and noise filtering, where we binarize the input images in each RGB channel, and apply component level noise filtering; (ii) grouping components into blocks using color features, where we compute the component similarities by dynamically adjusting the weights of RGB channels, and merge groups hierachically, and (iii) blocks selection, where we use the run-length features and choose the Support Vector Machine (SVM) as the classifier. We tested the algorithm using 13 outdoor images taken by an old-style LG-64693 mobile phone with 640x480 resolution. We compared the segmentation results with Tsars algorithm, a state-of-the-art camera text detection algorithm, and show that our algorithm is more robust, particularly in terms of the false alarm rates. In addition, we also evaluated the impacts of our algorithm on the Abbyys FineReader, one of the most popular commercial OCR engines in the market.

Explore More