Steven Zhiying Zhou | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Steven Zhiying Zhou is active.

Explore More

Publication

Featured researches published by Steven Zhiying Zhou.

ieee virtual reality conference | 2010

A real-time multi-cue hand tracking algorithm based on computer vision

Zhigeng Pan; Yang Li; Mingmin Zhang; Chao Sun; Kangde Guo; Xing Tang; Steven Zhiying Zhou

Although hand tracking algorithm has been widely used in virtual reality and HCI system, it is still a challenging problem in vision-based research area. Due to the robustness and real-time requirements in VR applications, most hand tracking algorithms require special device to achieve satisfactory results. In this paper, we propose an easy-to-use and inexpensive approach to track the hands accurately with a single normal webcam. Outstretched hand is detected by contour & curvature based detection techniques to initialize the tracking region. Robust multi-cue hand tracking is then achieved by velocity-weighted features and color cue. Experiments show that the proposed multi-cue hand tracking approach achieves continuous real-time results even for the situation of cluttered background. The approach fulfills the speed and accuracy requirements of frontal-view vision-based human computer interactions.

computer vision and pattern recognition | 2013

Image Matting with Local and Nonlocal Smooth Priors

Xiaowu Chen; Dongqing Zou; Steven Zhiying Zhou; Qinping Zhao; Ping Tan

In this paper we propose a novel alpha matting method with local and nonlocal smooth priors. We observe that the manifold preserving editing propagation [4] essentially introduced a nonlocal smooth prior on the alpha matte. This nonlocal smooth prior and the well known local smooth prior from matting Laplacian complement each other. So we combine them with a simple data term from color sampling in a graph model for nature image matting. Our method has a closed-form solution and can be solved efficiently. Compared with the state-of-the-art methods, our method produces more accurate results according to the evaluation on standard benchmark datasets.

international symposium on mixed and augmented reality | 2010

Positioning, tracking and mapping for outdoor augmentation

Jayashree Karlekar; Steven Zhiying Zhou; Weiquan Lu; Zhi Chang Loh; Yuta Nakayama; Daniel Hii

This paper presents a novel approach for user positioning, robust tracking and online 3D mapping for outdoor augmented reality applications. As coarse user pose obtained from GPS and orientation sensors is not sufficient for augmented reality applications, sub-meter accurate user pose is then estimated by a one-step silhouette matching approach. Silhouette matching of the rendered 3D model and camera data is carried out with shape context descriptors as they are invariant to translation, scale and rotational errors, giving rise to a non-iterative registration approach. Once the user is correctly positioned, further tracking is carried out with camera data alone. Drifts associated with vision based approaches are minimized by combining different feature modalities. Robust visual tracking is maintained by fusing frame-to-frame and model-to-frame feature matches. Frame-to-frame tracking is accomplished with corner matching while edges are used for model-to-frame registration. Results from individual feature tracker are fused using a pose estimate obtained from an extended Kalman filter (EKF) and a weighted M-estimator. In scenarios where dense 3D models of the environment are not available, online 3D incremental mapping and tracking is proposed to track the user in unprepared environments. Incremental mapping prepares the 3D point cloud of the outdoor environment for tracking.

international conference on computer vision | 2013

Perspective Motion Segmentation via Collaborative Clustering

Zhuwen Li; Jiaming Guo; Loong Fah Cheong; Steven Zhiying Zhou

This paper addresses real-world challenges in the motion segmentation problem, including perspective effects, missing data, and unknown number of motions. It first formulates the 3-D motion segmentation from two perspective views as a subspace clustering problem, utilizing the epipolar constraint of an image pair. It then combines the point correspondence information across multiple image frames via a collaborative clustering step, in which tight integration is achieved via a mixed norm optimization scheme. For model selection, we propose an over-segment and merge approach, where the merging step is based on the property of the ell_1-norm of the mutual sparse representation of two over-segmented groups. The resulting algorithm can deal with incomplete trajectories and perspective effects substantially better than state-of-the-art two-frame and multi-frame methods. Experiments on a 62-clip dataset show the significant superiority of the proposed idea in both segmentation accuracy and model selection.

international conference on computer vision | 2013

Video Co-segmentation for Meaningful Action Extraction

Jiaming Guo; Zhuwen Li; Loong Fah Cheong; Steven Zhiying Zhou

Given a pair of videos having a common action, our goal is to simultaneously segment this pair of videos to extract this common action. As a preprocessing step, we first remove background trajectories by a motion-based figure ground segmentation. To remove the remaining background and those extraneous actions, we propose the trajectory co saliency measure, which captures the notion that trajectories recurring in all the videos should have their mutual saliency boosted. This requires a trajectory matching process which can compare trajectories with different lengths and not necessarily spatiotemporally aligned, and yet be discriminative enough despite significant intra-class variation in the common action. We further leverage the graph matching to enforce geometric coherence between regions so as to reduce feature ambiguity and matching errors. Finally, to classify the trajectories into common action and action outliers, we formulate the problem as a binary labeling of a Markov Random Field, in which the data term is measured by the trajectory co-saliency and the smoothness term is measured by the spatiotemporal consistency between trajectories. To evaluate the performance of our framework, we introduce a dataset containing clips that have animal actions as well as human actions. Experimental results show that the proposed method performs well in common action extraction.

computer vision and pattern recognition | 2015

Simultaneous video defogging and stereo reconstruction

Zhuwen Li; Ping Tan; Robby T. Tan; Danping Zou; Steven Zhiying Zhou; Loong Fah Cheong

We present a method to jointly estimate scene depth and recover the clear latent image from a foggy video sequence. In our formulation, the depth cues from stereo matching and fog information reinforce each other, and produce superior results than conventional stereo or defogging algorithms. We first improve the photo-consistency term to explicitly model the appearance change due to the scattering effects. The prior matting Laplacian constraint on fog transmission imposes a detail-preserving smoothness constraint on the scene depth. We further enforce the ordering consistency between scene depth and fog transmission at neighboring points. These novel constraints are formulated together in an MRF framework, which is optimized iteratively by introducing auxiliary variables. The experiment results on real videos demonstrate the strength of our method.

international symposium on mixed and augmented reality | 2013

Diminished reality using appearance and 3D geometry of internet photo collections

Zhuwen Li; Yuxi Wang; Jiaming Guo; Loong Fah Cheong; Steven Zhiying Zhou

This paper presents a new system level framework for Diminished Reality, leveraging for the first time both the appearance and 3D information provided by large photo collections on the Internet. Recent computer vision techniques have made it possible to automatically reconstruct 3-D structure-from-motion points from large and unordered photo collections. Using these point clouds and a prior provided by GPS, reasonably accurate 6 degree of freedom camera poses can be obtained, thus allowing localization. Once the camera (and hence the user) is correctly localized, photos depicting scenes visible from the users viewpoint can be used to remove unwanted objects indicated by the user in the video sequences. Existing methods based on texture synthesis bring undesirable artifacts and video inconsistency when the background is heterogeneous; the task is rendered even harder for these methods when the background contains complex structures. On the other hand, methods based on plane warping fail when the background has arbitrary shape. Unlike these methods, our algorithm copes with these problems by making use of internet photos, registering them in 3D space and obtaining the 3D scene structure in an offline process. We carefully design the various components during the online phase so as to meet both speed and quality requirements of the task. Experiments on real data collected demonstrate the superiority of our system.

international symposium on mixed and augmented reality | 2010

MTMR: A conceptual interior design framework integrating Mixed Reality with the Multi-Touch tabletop interface

Dong Wei; Steven Zhiying Zhou; Du Xie

This paper introduces a conceptual interior design framework - Multi-Touch Mixed Reality (MTMR), which integrates mixed reality with the multi-touch tabletop interface, to provide an intuitive and efficient interface for collaborative design and an augmented 3D view to users at the same time. Under this framework, multiple designers can carry out design work simultaneously on the top view displayed on the tabletop, while live video of the ongoing design work is captured and augmented by overlaying virtual 3D furniture models to their 2D virtual counterparts, and shown on a vertical screen in front of the tabletop. Meanwhile, the remote clients camera view of the physical room is augmented with the interior design layout in real time, that is, as the designers place, move, and modify the virtual furniture models on the tabletop, the client sees the corresponding life-size 3D virtual furniture models residing, moving, and changing in the physical room through the camera view on his/her screen. By adopting MTMR, which we argue may also apply to other kinds of collaborative work, the designers can expect a good working experience in terms of naturalness and intuitiveness, while the client can be involved in the design process and view the design result without moving around heavy furniture. By presenting MTMR, we hope to provide reliable and precise freehand interactions to mixed reality systems, with multi-touch inputs on tabletop interfaces.

computer vision and pattern recognition | 2014

SCAMS: Simultaneous Clustering and Model Selection

Zhuwen Li; Loong Fah Cheong; Steven Zhiying Zhou

While clustering has been well studied in the past decade, model selection has drawn less attention. This paper addresses both problems in a joint manner with an indicator matrix formulation, in which the clustering cost is penalized by a Frobenius inner product term and the group number estimation is achieved by a rank minimization. As affinity graphs generally contain positive edge values, a sparsity term is further added to avoid the trivial solution. Rather than adopting the conventional convex relaxation approach wholesale, we represent the original problem more faithfully by taking full advantage of the particular structure present in the optimization problem and solving it efficiently using the Alternating Direction Method of Multipliers. The highly constrained nature of the optimization provides our algorithm with the robustness to deal with the varying and often imperfect input affinity matrices arising from different applications and different group numbers. Evaluations on the synthetic data as well as two real world problems show the superiority of the method across a large variety of settings.

international symposium on mixed and augmented reality | 2009

Consistent real-time lighting for virtual objects in augmented reality

Ryan Christopher Yeoh; Steven Zhiying Zhou

We present a technique for rendering realistic shadows of virtual objects in a mixed reality environment by recovering the light source distribution of a scene in real-time, through the segmentation and analysis of a known occluding objects shadows. A fiducial marker provides information about the position of the occluding object and the plane of the surface on which shadows are cast, and serves as the origin of a marker coordinate system. A new shadow segmentation approach is carried out on the shadow image and is able to recover geometrical information on multiple faint shadows. Using normalised iterative reinforcement, noise and artifacts can be suppressed in the final shadow map. The scenes light source distribution is then extrapolated using geometrical data from both the occluding object and its cast shadows. Virtual light sources in a game engine are used to mimic real light sources and achieve consistent illumination and increase the realism of the augmented reality scene.

Explore More