Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Cha Zhang is active.

Publication


Featured researches published by Cha Zhang.


international conference on image processing | 2001

Efficient feature extraction for 2D/3D objects in mesh representation

Cha Zhang; Tsuhan Chen

Meshes are dominantly used to represent 3D models as they fit well with graphics rendering hardware. Features such as volume, moments, and Fourier transform coefficients need to be calculated from the mesh representation efficiently. We propose an algorithm to calculate these features without transforming the mesh into other representations such as the volumetric representation. To calculate a feature for a mesh, we show that we can first compute it for each elementary shape such as a triangle or a tetrahedron, and then add up all the values for the mesh. The algorithm is simple and efficient, with many potential applications.


Signal Processing-image Communication | 2004

A survey on image-based rendering—representation, sampling and compression

Cha Zhang; Tsuhan Chen

Image-based rendering (IBR) has attracted a lot of research interest recently. In this paper, we survey the various techniques developed for IBR, including representation, sampling and compression. The goal is to provide an overview of research for IBR in a complete and systematic manner. We observe that essentially all the IBR representations are derived from the plenoptic function, which is seven dimensional and difficult to handle. We classify various IBR representations into two categories based on how the plenoptic function is simplified, namely restraining the viewing space and introducing source descriptions. In the former category, we summarize six common assumptions that were often made in various approaches and discuss how the dimension of the plenoptic function can be reduced based on these assumptions. In the latter category, we further categorize the methods based on what kind of source description was introduced, such as scene geometry, texture map or reflection model. Sampling and compression are also discussed respectively for both categories.


IEEE Transactions on Multimedia | 2002

An active learning framework for content-based information retrieval

Cha Zhang; Tsuhan Chen

We propose a general active learning framework for content-based information retrieval. We use this framework to guide hidden annotations in order to improve the retrieval performance. For each object in the database, we maintain a list of probabilities, each indicating the probability of this object having one of the attributes. During training, the learning algorithm samples objects in the database and presents them to the annotator to assign attributes. For each sampled object, each probability is set to be one or zero depending on whether or not the corresponding attribute is assigned by the annotator. For objects that have not been annotated, the learning algorithm estimates their probabilities with biased kernel regression. Knowledge gain is then defined to determine, among the objects that have not been annotated, which one the system is the most uncertain. The system then presents it as the next sample to the annotator to which it is assigned attributes. During retrieval, the list of probabilities works as a feature vector for us to calculate the semantic distance between two objects, or between the user query and an object in the database. The overall distance between two objects is determined by a weighted sum of the semantic distance and the low-level feature distance. The algorithm is tested on both synthetic databases and real databases of 3D models. In both cases, the retrieval performance of the system improves rapidly with the number of annotated samples. Furthermore, we show that active learning outperforms learning based on random sampling.


european conference on computer vision | 2010

3D deformable face tracking with a commodity depth camera

Qin Cai; David Gallup; Cha Zhang; Zhengyou Zhang

Recently, there has been an increasing number of depth cameras available at commodity prices. These cameras can usually capture both color and depth images in real-time, with limited resolution and accuracy. In this paper, we study the problem of 3D deformable face tracking with such commodity depth cameras. A regularized maximum likelihood deformable model fitting (DMF) algorithm is developed, with special emphasis on handling the noisy input depth data. In particular, we present a maximum likelihood solution that can accommodate sensor noise represented by an arbitrary covariance matrix, which allows more elaborate modeling of the sensors accuracy. Furthermore, an l1 regularization scheme is proposed based on the semantics of the deformable face model, which is shown to be very effective in improving the tracking results. To track facial movement in subsequent frames, feature points in the texture images are matched across frames and integrated into the DMF framework seamlessly. The effectiveness of the proposed method is demonstrated with multiple sequences with ground truth information.


IEEE Transactions on Multimedia | 2008

Maximum Likelihood Sound Source Localization and Beamforming for Directional Microphone Arrays in Distributed Meetings

Cha Zhang; Dinei A. F. Florêncio; Demba Elimane Ba; Zhengyou Zhang

In distributed meeting applications, microphone arrays have been widely used to capture superior speech sound and perform speaker localization through sound source localization (SSL) and beamforming. This paper presents a unified maximum likelihood framework of these two techniques, and demonstrates how such a framework can be adapted to create efficient SSL and beamforming algorithms for reverberant rooms and unknown directional patterns of microphones. The proposed method is closely related to steered response power-based algorithms, which are known to work extremely well in real-world environments. We demonstrate the effectiveness of the proposed method on challenging synthetic and real-world datasets, including over six hours of recorded meetings.


international conference on multimedia and expo | 2011

Calibration between depth and color sensors for commodity depth cameras

Cha Zhang; Zhengyou Zhang

Commodity depth cameras have created many interesting new applications in the research community recently. These applications often require the calibration information between the color and the depth cameras. Traditional checkerboard based calibration schemes fail to work well for the depth camera, since its corner features cannot be reliably detected in the depth image. In this paper, we present a maximum likelihood solution for the joint depth and color calibration based on two principles. First, in the depth image, points on the checker-board shall be co-planar, and the plane is known from color camera calibration. Second, additional point correspondences between the depth and color images may be manually specified or automatically established to help improve calibration accuracy. Uncertainty in depth values has been taken into account systematically. The proposed algorithm is reliable and accurate, as demonstrated by extensive experimental results on simulated and real-world examples.


computer vision and pattern recognition | 2009

Boosted multi-task learning for face verification with applications to web image and video search

Xiaogang Wang; Cha Zhang; Zhengyou Zhang

Face verification has many potential applications including filtering and ranking image/video search results on celebrities. Since these images/videos are taken under uncontrolled environments, the problem is very challenging due to dramatic lighting and pose variations, low resolutions, compression artifacts, etc. In addition, the available number of training images for each celebrity may be limited, hence learning individual classifiers for each person may cause overfitting. In this paper, we propose two ideas to meet the above challenges. First, we propose to use individual bins, instead of whole histograms, of Local Binary Patterns (LBP) as features for learning, which yields significant performance improvements and computation reduction in our experiments. Second, we present a novel Multi-Task Learning (MTL) framework, called Boosted MTL, for face verification with limited training data. It jointly learns classifiers for multiple people by sharing a few boosting classifiers in order to avoid overfitting. The effectiveness of Boosted MTL and LBP bin features is verified with a large number of celebrity images/videos from the web.


international conference on multimodal interfaces | 2015

Image based Static Facial Expression Recognition with Multiple Deep Network Learning

Zhiding Yu; Cha Zhang

We report our image based static facial expression recognition method for the Emotion Recognition in the Wild Challenge (EmotiW) 2015. We focus on the sub-challenge of the SFEW 2.0 dataset, where one seeks to automatically classify a set of static images into 7 basic emotions. The proposed method contains a face detection module based on the ensemble of three state-of-the-art face detectors, followed by a classification module with the ensemble of multiple deep convolutional neural networks (CNN). Each CNN model is initialized randomly and pre-trained on a larger dataset provided by the Facial Expression Recognition (FER) Challenge 2013. The pre-trained models are then fine-tuned on the training set of SFEW 2.0. To combine multiple CNN models, we present two schemes for learning the ensemble weights of the network responses: by minimizing the log likelihood loss, and by minimizing the hinge loss. Our proposed method generates state-of-the-art result on the FER dataset. It also achieves 55.96% and 61.29% respectively on the validation and test set of SFEW 2.0, surpassing the challenge baseline of 35.96% and 39.13% with significant gains.


international conference on multimedia and expo | 2012

Virtual View Reconstruction Using Temporal Information

Shujie Liu; Philip A. Chou; Cha Zhang; Zhengyou Zhang; Chang Wen Chen

The most significant problem in generating virtual views from a limited number of video camera views is handling areas that have become dis-occluded by shifting the virtual view away from the camera view. We propose using temporal information to address this problem, based on the notion that dis-occluded areas may have been seen by some camera in some previous frames. We formulate the problem as one of estimating the underlying state of the object in a stochastic dynamical system, given a sequence of observations. We apply the formulation to improving the visual quality of virtual views generated from a single “color plus depth” camera, and show that our algorithm achieves better results than depth image based rendering using standard inpainting.


workshop on applications of computer vision | 2014

Improving multiview face detection with multi-task deep convolutional neural networks

Cha Zhang; Zhengyou Zhang

Multiview face detection is a challenging problem due to dramatic appearance changes under various pose, illumination and expression conditions. In this paper, we present a multi-task deep learning scheme to enhance the detection performance. More specifically, we build a deep convolutional neural network that can simultaneously learn the face/nonface decision, the face pose estimation problem, and the facial landmark localization problem. We show that such a multi-task learning scheme can further improve the classifiers accuracy. On the challenging FDDB data set, our detector achieves over 3% improvement in detection rate at the same false positive rate compared with other state-of-the-art methods.

Collaboration


Dive into the Cha Zhang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge