Youjie Zhou | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Youjie Zhou is active.

Explore More

Publication

Featured researches published by Youjie Zhou.

european conference on computer vision | 2014

Pose Locality Constrained Representation for 3D Human Pose Reconstruction

Xiaochuan Fan; Kang Zheng; Youjie Zhou; Song Wang

Reconstructing 3D human poses from a single 2D image is an ill-posed problem without considering the human body model. Explicitly enforcing physiological constraints is known to be non-convex and usually leads to difficulty in finding an optimal solution. An attractive alternative is to learn a prior model of the human body from a set of human pose data. In this paper, we develop a new approach, namely pose locality constrained representation (PLCR), to model the 3D human body and use it to improve 3D human pose reconstruction. In this approach, the human pose space is first hierarchically divided into lower-dimensional pose subspaces by subspace clustering. After that, a block-structural pose dictionary is constructed by concatenating the basis poses from all the pose subspaces. Finally, PLCR utilizes the block-structural pose dictionary to explicitly encourage pose locality in human-body modeling – nonzero coefficients are only assigned to the basis poses from a small number of pose subspaces that are close to each other in the pose-subspace hierarchy. We combine PLCR into the matching-pursuit based 3D human-pose reconstruction algorithm and show that the proposed PLCR-based algorithm outperforms the state-of-the-art algorithm that uses the standard sparse representation and physiological regularity in reconstructing a variety of human poses from both synthetic data and real images.

IEEE Transactions on Image Processing | 2013

3D Materials Image Segmentation by 2D Propagation: A Graph-Cut Approach Considering Homomorphism

Jarrell W. Waggoner; Youjie Zhou; Jeff P. Simmons; Marc De Graef; Song Wang

Segmentation propagation, similar to tracking, is the problem of transferring a segmentation of an image to a neighboring image in a sequence. This problem is of particular importance to materials science, where the accurate segmentation of a series of 2D serial-sectioned images of multiple, contiguous 3D structures has important applications. Such structures may have distinct shape, appearance, and topology, which can be considered to improve segmentation accuracy. For example, some materials images may have structures with a specific shape or appearance in each serial section slice, which only changes minimally from slice to slice, and some materials may exhibit specific inter-structure topology that constrains their neighboring relations. Some of these properties have been individually incorporated to segment specific materials images in prior work. In this paper, we develop a propagation framework for materials image segmentation where each propagation is formulated as an optimal labeling problem that can be efficiently solved using the graph-cut algorithm. Our framework makes three key contributions: 1) a homomorphic propagation approach, which considers the consistency of region adjacency in the propagation; 2) incorporation of shape and appearance consistency in the propagation; and 3) a local non-homomorphism strategy to handle newly appearing and disappearing substructures during this propagation. To show the effectiveness of our framework, we conduct experiments on various 3D materials images, and compare the performance against several existing image segmentation methods.

international conference on computer vision | 2015

Co-Interest Person Detection from Multiple Wearable Camera Videos

Yuewei Lin; Kareem Abdelfatah; Youjie Zhou; Xiaochuan Fan; Hongkai Yu; Hui Qian; Song Wang

Wearable cameras, such as Google Glass and Go Pro, enable video data collection over larger areas and from different views. In this paper, we tackle a new problem of locating the co-interest person (CIP), i.e., the one who draws attention from most camera wearers, from temporally synchronized videos taken by multiple wearable cameras. Our basic idea is to exploit the motion patterns of people and use them to correlate the persons across different videos, instead of performing appearance-based matching as in traditional video co-segmentation/localization. This way, we can identify CIP even if a group of people with similar appearance are present in the view. More specifically, we detect a set of persons on each frame as the candidates of the CIP and then build a Conditional Random Field (CRF) model to select the one with consistent motion patterns in different videos and high spacial-temporal consistency in each video. We collect three sets of wearable-camera videos for testing the proposed algorithm. All the involved people have similar appearances in the collected videos and the experiments demonstrate the effectiveness of the proposed algorithm.

computer vision and pattern recognition | 2016

Groupwise Tracking of Crowded Similar-Appearance Targets from Low-Continuity Image Sequences

Hongkai Yu; Youjie Zhou; Jeff P. Simmons; Craig Przybyla; Yuewei Lin; Xiaochuan Fan; Yang Mi; Song Wang

Automatic tracking of large-scale crowded targets are of particular importance in many applications, such as crowded people/vehicle tracking in video surveillance, fiber tracking in materials science, and cell tracking in biomedical imaging. This problem becomes very challenging when the targets show similar appearance and the interslice/ inter-frame continuity is low due to sparse sampling, camera motion and target occlusion. The main challenge comes from the step of association which aims at matching the predictions and the observations of the multiple targets. In this paper we propose a new groupwise method to explore the target group information and employ the within-group correlations for association and tracking. In particular, the within-group association is modeled by a nonrigid 2D Thin-Plate transform and a sequence of group shrinking, group growing and group merging operations are then developed to refine the composition of each group. We apply the proposed method to track large-scale fibers from microscopy material images and compare its performance against several other multi-target tracking methods. We also apply the proposed method to track crowded people from videos with poor inter-frame continuity.

IEEE Transactions on Image Processing | 2015

Multiscale Superpixels and Supervoxels Based on Hierarchical Edge-Weighted Centroidal Voronoi Tessellation

Youjie Zhou; Lili Ju; Song Wang

Superpixels and supervoxels play an important role in many computer vision applications, such as image segmentation, object recognition, and video analysis. In this paper, we propose a new hierarchical edge-weighted centroidal Voronoi tessellation (HEWCVT) method for generating superpixels/supervoxels in multiple scales. In this method, we model the problem as a multilevel clustering process: superpixels/supervoxels in one level are clustered to obtain larger size superpixels/supervoxels in the next level. In the finest scale, the initial clustering is directly conducted on pixels/voxels. The clustering energy involves both color similarities and boundary smoothness of superpixels/supervoxels. The resulting superpixels/supervoxels can be easily represented by a hierarchical tree which describes the nesting relation of superpixels/supervoxels across different scales. We first investigate the performance of obtained superpixels/supervoxels under different parameter settings, then we evaluate and compare the proposed method with several state-of-the-art superpixel/supervoxel methods on standard image and video data sets. Both quantitative and qualitative results show that the proposed HEWCVT method achieves superior or comparable performances with other methods.

machine vision applications | 2014

Graph-cut based interactive segmentation of 3D materials-science images

Jarrell W. Waggoner; Youjie Zhou; Jeff P. Simmons; Marc De Graef; Song Wang

Segmenting materials’ images is a laborious and time-consuming process, and automatic image segmentation algorithms usually contain imperfections and errors. Interactive segmentation is a growing topic in the areas of image processing and computer vision, which seeks to find a balance between fully automatic methods and fully-manual segmentation processes. By allowing minimal and simplistic interaction from the user in an otherwise automatic algorithm, interactive segmentation is able to simultaneously reduce the time taken to segment an image while achieving better segmentation results. Given the specialized structure of materials’ images and level of segmentation quality required, we show an interactive segmentation framework for materials’ images that has three key contributions: (1) a multi-labeling approach that can handle a large number of structures while still quickly and conveniently allowing manual addition and removal of segments in real-time, (2) multiple extensions to the interactive tools which increase the simplicity of the interaction, and (3) a web interface for using the interactive tools in a client/server architecture. We show a full formulation of each of these contributions and example results from their application.

european conference on computer vision | 2014

Video-Based Action Detection Using Multiple Wearable Cameras

Kang Zheng; Yuewei Lin; Youjie Zhou; Dhaval Salvi; Xiaochuan Fan; Dazhou Guo; Zibo Meng; Song Wang

This paper is focused on developing a new approach for video-based action detection where a set of temporally synchronized videos are taken by multiple wearable cameras from different and varying views and our goal is to accurately localize the starting and ending time of each instance of the actions of interest in such videos. Compared with traditional approaches based on fixed-camera videos, this new approach incorporates the visual attention of the camera wearers and allows for the action detection in a larger area, although it brings in new challenges such as unconstrained motion of cameras. In this approach, we leverage the multi-view information and the temporal synchronization of the input videos for more reliable action detection. Specifically, we detect and track the focal character in each video and conduct action recognition only for the focal character in each temporal sliding window. To more accurately localize the starting and ending time of actions, we develop a strategy that may merge temporally adjacent sliding windows when detecting durative actions, and non-maximally suppress temporally adjacent sliding windows when detecting momentary actions. Finally we propose a voting scheme to integrate the detection results from multiple videos for more accurate action detection. For the experiments, we collect a new dataset of multiple wearable-camera videos that reflect the complex scenarios in practice.

IEEE Transactions on Systems, Man, and Cybernetics | 2017

Cross-Domain Recognition by Identifying Joint Subspaces of Source Domain and Target Domain

Yuewei Lin; Jing Chen; Yu Cao; Youjie Zhou; Lingfeng Zhang; Yuan Yan Tang; Song Wang

This paper introduces a new method to solve the cross-domain recognition problem. Different from the traditional domain adaption methods which rely on a global domain shift for all classes between the source and target domains, the proposed method is more flexible to capture individual class variations across domains. By adopting a natural and widely used assumption that the data samples from the same class should lay on an intrinsic low-dimensional subspace, even if they come from different domains, the proposed method circumvents the limitation of the global domain shift, and solves the cross-domain recognition by finding the joint subspaces of the source and target domains. Specifically, given labeled samples in the source domain, we construct a subspace for each of the classes. Then we construct subspaces in the target domain, called anchor subspaces, by collecting unlabeled samples that are close to each other and are highly likely to belong to the same class. The corresponding class label is then assigned by minimizing a cost function which reflects the overlap and topological structure consistency between subspaces across the source and target domains, and within the anchor subspaces, respectively. We further combine the anchor subspaces to the corresponding source subspaces to construct the joint subspaces. Subsequently, one-versus-rest support vector machine classifiers are trained using the data samples belonging to the same joint subspaces and applied to unlabeled data in the target domain. We evaluate the proposed method on two widely used datasets: 1) object recognition dataset for computer vision tasks and 2) sentiment classification dataset for natural language processing tasks. Comparison results demonstrate that the proposed method outperforms the comparison methods on both datasets.This paper introduces a new method to solve the cross-domain recognition problem. Different from the traditional domain adaption methods which rely on a global domain shift for all classes between the source and target domains, the proposed method is more flexible to capture individual class variations across domains. By adopting a natural and widely used assumption that the data samples from the same class should lay on an intrinsic low-dimensional subspace, even if they come from different domains, the proposed method circumvents the limitation of the global domain shift, and solves the cross-domain recognition by finding the joint subspaces of the source and target domains. Specifically, given labeled samples in the source domain, we construct a subspace for each of the classes. Then we construct subspaces in the target domain, called anchor subspaces, by collecting unlabeled samples that are close to each other and are highly likely to belong to the same class. The corresponding class label is then assigned by minimizing a cost function which reflects the overlap and topological structure consistency between subspaces across the source and target domains, and within the anchor subspaces, respectively. We further combine the anchor subspaces to the corresponding source subspaces to construct the joint subspaces. Subsequently, one-versus-rest support vector machine classifiers are trained using the data samples belonging to the same joint subspaces and applied to unlabeled data in the target domain. We evaluate the proposed method on two widely used datasets: 1) object recognition dataset for computer vision tasks and 2) sentiment classification dataset for natural language processing tasks. Comparison results demonstrate that the proposed method outperforms the comparison methods on both datasets.

IEEE Transactions on Circuits and Systems for Video Technology | 2017

Visual-Attention-Based Background Modeling for Detecting Infrequently Moving Objects

Yuewei Lin; Yan Tong; Yu Cao; Youjie Zhou; Song Wang

Motion is one of the most important cues to separate foreground objects from the background in a video. Using a stationary camera, it is usually assumed that the background is static, while the foreground objects are moving most of the time. However, in practice, the foreground objects may show infrequent motions, such as abandoned objects and sleeping persons. Meanwhile, the background may contain frequent local motions, such as waving trees and/or grass. Such complexities may prevent the existing background subtraction algorithms from correctly identifying the foreground objects. In this paper, we propose a new approach that can detect the foreground objects with frequent and/or infrequent motions. Specifically, we use a visual-attention mechanism to infer a complete background from a subset of frames and then propagate it to the other frames for accurate background subtraction. Furthermore, we develop a feature-matching-based local motion stabilization algorithm to identify frequent local motions in the background for reducing false positives in the detected foreground. The proposed approach is fully unsupervised, without using any supervised learning for object detection and tracking. Extensive experiments on a large number of videos have demonstrated that the proposed approach outperforms the state-of-the-art motion detection and background subtraction methods in comparison.

workshop on applications of computer vision | 2015

Topology-Preserving Multi-label Image Segmentation

Jarrell W. Waggoner; Youjie Zhou; Jeff P. Simmons; Marc De Graef; Song Wang

Enforcing a specific topology in image segmentation is a very important but challenging problem, which has attracted much attention in the computer vision community. Most recent works on topology-constrained image segmentation focus on binary segmentation, where the topology is often described by the connectivity of both foreground and background. In this paper, we develop a new multi-labeling method to enforce topology in multi-label image segmentation. In this case, we not only require each segment to be a connected region (intra-segment topology), but also require specific adjacency relations between each pair of segments (inter-segment topology). We develop our method in the context of segmentation propagation, where a segmented template image defines the topology, and our goal is to propagate the segmentation to a target image while preserving the topology. Our method requires good spatial structure continuity between the template and the target such that the template segmentation can be used as a good initialization for segmenting the target. In addition, we focus on multi-label segmentation where a segment and its adjacent segments form a ring structure, which is among the most complex type of inter-segment topology for 2D structures. We apply the proposed method to segment 3D metallic image volumes for the underlying grain structures and achieve better results than several comparison methods. Finally, we also apply the proposed method to interactive segmentation and stereo matching applications.

Explore More