Chia-Chih Chen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chia-Chih Chen is active.

Explore More

Publication

Featured researches published by Chia-Chih Chen.

computer vision and pattern recognition | 2012

View invariant human action recognition using histograms of 3D joints

Lu Xia; Chia-Chih Chen; Jake K. Aggarwal

In this paper, we present a novel approach for human action recognition with histograms of 3D joint locations (HOJ3D) as a compact representation of postures. We extract the 3D skeletal joint locations from Kinect depth maps using Shotton et al.s method [6]. The HOJ3D computed from the action depth sequences are reprojected using LDA and then clustered into k posture visual words, which represent the prototypical poses of actions. The temporal evolutions of those visual words are modeled by discrete hidden Markov models (HMMs). In addition, due to the design of our spherical coordinate system and the robust 3D skeleton estimation from Kinect, our method demonstrates significant view invariance on our 3D action dataset. Our dataset is composed of 200 3D sequences of 10 indoor activities performed by 10 individuals in varied views. Our method is real-time and achieves superior results on the challenging 3D action dataset. We also tested our algorithm on the MSR Action 3D dataset and our algorithm outperforms Li et al. [25] on most of the cases.

computer vision and pattern recognition | 2011

Human detection using depth information by Kinect

Lu Xia; Chia-Chih Chen; Jake K. Aggarwal

Conventional human detection is mostly done in images taken by visible-light cameras. These methods imitate the detection process that human use. They use features based on gradients, such as histograms of oriented gradients (HOG), or extract interest points in the image, such as scale-invariant feature transform (SIFT), etc. In this paper, we present a novel human detection method using depth information taken by the Kinect for Xbox 360. We propose a model based approach, which detects humans using a 2-D head contour model and a 3-D head surface model. We propose a segmentation scheme to segment the human from his/her surroundings and extract the whole contours of the figure based on our detection point. We also explore the tracking algorithm based on our detection result. The methods are tested on our database taken by the Kinect in our lab and present superior results.

computer vision and pattern recognition | 2011

A large-scale benchmark dataset for event recognition in surveillance video

Sangmin Oh; Anthony Hoogs; A. G. Amitha Perera; Naresh P. Cuntoor; Chia-Chih Chen; Jong Taek Lee; Saurajit Mukherjee; Jake K. Aggarwal; Hyungtae Lee; Larry S. Davis; Eran Swears; Xiaoyang Wang; Qiang Ji; Kishore K. Reddy; Mubarak Shah; Carl Vondrick; Hamed Pirsiavash; Deva Ramanan; Jenny Yuen; Antonio Torralba; Bi Song; Anesco Fong; Amit K. Roy-Chowdhury; Mita Desai

We introduce a new large-scale video dataset designed to assess the performance of diverse visual event recognition algorithms with a focus on continuous visual event recognition (CVER) in outdoor areas with wide coverage. Previous datasets for action recognition are unrealistic for real-world surveillance because they consist of short clips showing one action by one individual [15, 8]. Datasets have been developed for movies [11] and sports [12], but, these actions and scene conditions do not apply effectively to surveillance videos. Our dataset consists of many outdoor scenes with actions occurring naturally by non-actors in continuously captured videos of the real world. The dataset includes large numbers of instances for 23 event types distributed throughout 29 hours of video. This data is accompanied by detailed annotations which include both moving object tracks and event examples, which will provide solid basis for large-scale evaluation. Additionally, we propose different types of evaluation modes for visual recognition tasks and evaluation metrics along with our preliminary experimental results. We believe that this dataset will stimulate diverse aspects of computer vision research and help us to advance the CVER tasks in the years ahead.

international conference on pattern recognition | 2010

An overview of contest on semantic description of human activities (SDHA) 2010

Michael S. Ryoo; Chia-Chih Chen; Jake K. Aggarwal; Amit K. Roy-Chowdhury

This paper summarizes results of the 1st Contest on Semantic Description of Human Activities (SDHA), in conjunction with ICPR 2010. SDHA 2010 consists of three types of challenges, High-level Human Interaction Recognition Challenge, Aerial View Activity Classification Challenge, and Wide-Area Activity Search and Recognition Challenge. The challenges are designed to encourage participants to test existing methodologies and develop new approaches for complex human activity recognition scenarios in realistic environments. We introduce three new public datasets through these challenges, and discuss results of the stateof-the-art activity recognition systems designed and implemented by the contestants. A methodology using a spatio-temporal voting [19] successfully classified segmented videos in the UT-Interaction datasets, but had a difficulty correctly localizing activities from continuous videos. Both the method using local features [10] and the HMM based method [18] recognized actions from low-resolution videos (i.e. UT-Tower dataset) successfully. We compare their results in this paper.

advanced video and signal based surveillance | 2007

Detection of abandoned objects in crowded environments

Medha Bhargava; Chia-Chih Chen; Michael S. Ryoo; Jake K. Aggarwal

With concerns about terrorism and global security on the rise, it has become vital to have in place efficient threat detection systems that can detect and recognize potentially dangerous situations, and alert the authorities to take appropriate action. Of particular significance is the case of unattended objects in mass transit areas. This paper describes a general framework that recognizes the event of someone leaving a piece of baggage unattended in forbidden areas. Our approach involves the recognition of four sub-events that characterize the activity of interest. When an unaccompanied bag is detected, the system analyzes its history to determine its most likely owner(s), where the owner is defined as the person who brought the bag into the scene before leaving it unattended. Through subsequent frames, the system keeps a lookout for the owner, whose presence in or disappearance from the scene defines the status of the bag, and decides the appropriate course of action. The system was successfully tested on the i-LIDS dataset.

international conference on pattern recognition | 2010

Human Shadow Removal with Unknown Light Source

Chia-Chih Chen; Jake K. Aggarwal

In this paper, we present a shadow removal technique which effectively eliminates a human shadow cast from an unknown direction of light source. A multi-cue shadow descriptor is proposed to characterize the distinctive properties of shadows. We employ a 3-stage process to detect then remove shadows. Our algorithm improves the shadow detection accuracy by imposing the spatial constraint between the foreground subregions of human and shadow. We collect a dataset containing 81 human-shadow images for evaluation. Both descriptor ROC curves and qualitative results demonstrate the superior performance of our method.

machine vision applications | 2009

Detection of object abandonment using temporal logic

Medha Bhargava; Chia-Chih Chen; Michael S. Ryoo; Jake K. Aggarwal

This paper describes a novel framework for a smart threat detection system that uses computer vision to capture, exploit and interpret the temporal flow of events related to the abandonment of an object. Our approach uses contextual information along with an analysis of the causal progression of events to decide whether or not an alarm should be raised. When an unattended object is detected, the system traces it back in time to determine and record who its most likely owner(s) may be. Through subsequent frames, the system searches the scene for the owner and issues an alert if no match is found for the owner over a given period of time. Our algorithm has been successfully tested on two benchmark datasets (PETS 2006 Benchmark Data, 2006; i-LIDS Dataset for AVSS, 2007), and yielded results that are substantially more accurate than similar systems developed by other academic and industrial research groups.

international conference on pattern recognition | 2008

Recognition of box-like objects by fusing cues of shape and edges

Chia-Chih Chen; Jake K. Aggarwal

Boxes are the universal choice for packing, storage, and transportation. In this paper we propose a template-based algorithm for recognition of box-like objects, which is invariant to scale, rotation and translation as well as robust to patterned surfaces and moderate occlusions. The algorithm first over-segments the input image to partition objects into pieces. Based on the smoothness property of surface texture, candidates for component segments of boxes are selected. Guided by a template trained linear discriminant analysis (LDA) classifier, box-like segments are reassembled from these segments of interests. For each box-like segment, we estimate its probability of being a 2D projection of a 3D box model upon the extracted contour and inner edges. Experimental results demonstrate high detection accuracy of boxes and reliable recovery of their 2D models.

computer vision and pattern recognition | 2011

Recognizing human-vehicle interactions from aerial video without training

Jong Taek Lee; Chia-Chih Chen; Jake K. Aggarwal

We propose a novel framework to recognize human-vehicle interactions from aerial video. In this scenario, the object resolution is low, the visual cues are vague, and the detection and tracking of objects are less reliable as a consequence. Any methods that require the accurate tracking of objects or the exact matching of event definition are better avoided. To address these issues, we present a temporal logic based approach which does not require training from event examples. At the low-level, we employ dynamic programming to perform fast model fitting between the tracked vehicle and the rendered 3-D vehicle models. At the semantic-level, given the localized event region of interest (ROI), we verify the time series of human-vehicle relationships with the pre-specified event definitions in a piecewise fashion. With special interest in recognizing a person getting into and out of a vehicle, we have tested our method on a subset of the VIRAT Aerial Video dataset [11] and achieved superior results. Our framework can be easily extended to recognize other types of human-vehicle interactions.

advanced video and signal based surveillance | 2011

AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video

Summary form only given. We present a concept for automatic construction site monitoring by taking into account 4D information (3D over time), that is acquired from highly-overlapping digital aerial images. On the one hand todays maturity of flying micro aerial vehicles (MAVs) enables a low-cost and an efficient image acquisition of high-quality data that maps construction sites entirely from many varying viewpoints. On the other hand, due to low-noise sensors and high redundancy in the image data, recent developments in 3D reconstruction workflows have benefited the automatic computation of accurate and dense 3D scene information. Having both an inexpensive high-quality image acquisition and an efficient 3D analysis workflow enables monitoring, documentation and visualization of observed sites over time with short intervals. Relating acquired 4D site observations, composed of color, texture, geometry over time, largely supports automated methods toward full scene understanding, the acquisition of both the change and the construction sites progress.

Explore More