Is this you? Create Your Porfile

Zhou Ren

Nanyang Technological University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhou Ren is active.

Explore More

Publication

Featured researches published by Zhou Ren.

IEEE Transactions on Multimedia | 2013

Robust Part-Based Hand Gesture Recognition Using Kinect Sensor

Zhou Ren; Junsong Yuan; Jingjing Meng; Zhengyou Zhang

The recently developed depth sensors, e.g., the Kinect sensor, have provided new opportunities for human-computer interaction (HCI). Although great progress has been made by leveraging the Kinect sensor, e.g., in human body tracking, face recognition and human action recognition, robust hand gesture recognition remains an open problem. Compared to the entire human body, the hand is a smaller object with more complex articulations and more easily affected by segmentation errors. It is thus a very challenging problem to recognize hand gestures. This paper focuses on building a robust part-based hand gesture recognition system using Kinect sensor. To handle the noisy hand shapes obtained from the Kinect sensor, we propose a novel distance metric, Finger-Earth Movers Distance (FEMD), to measure the dissimilarity between hand shapes. As it only matches the finger parts while not the whole hand, it can better distinguish the hand gestures of slight differences. The extensive experiments demonstrate that our hand gesture recognition system is accurate (a 93.2% mean accuracy on a challenging 10-gesture dataset), efficient (average 0.0750 s per frame), robust to hand articulations, distortions and orientation or scale changes, and can work in uncontrolled environments (cluttered backgrounds and lighting conditions). The superiority of our system is further demonstrated in two real-life HCI applications.

acm multimedia | 2011

Robust hand gesture recognition based on finger-earth mover's distance with a commodity depth camera

Zhou Ren; Junsong Yuan; Zhengyou Zhang

The recently developed depth sensors, e.g., the Kinect sensor, have provided new opportunities for human-computer interaction (HCI). Although great progress has been made by leveraging the Kinect sensor, e.g. in human body tracking and body gesture recognition, robust hand gesture recognition remains an open problem. Compared to the entire human body, the hand is a smaller object with more complex articulations and more easily affected by segmentation errors. It is thus a very challenging problem to recognize hand gestures. This paper focuses on building a robust hand gesture recognition system using the Kinect sensor. To handle the noisy hand shape obtained from the Kinect sensor, we propose a novel distance metric for hand dissimilarity measure, called Finger-Earth Movers Distance (FEMD). As it only matches fingers while not the whole hand shape, it can better distinguish hand gestures of slight differences. The extensive experiments demonstrate the accuracy, efficiency, and robustness of our hand gesture recognition system.

acm multimedia | 2011

Robust hand gesture recognition with kinect sensor

Zhou Ren; Jingjing Meng; Junsong Yuan; Zhengyou Zhang

Hand gesture based Human-Computer-Interaction (HCI) is one of the most natural and intuitive ways to communicate between people and machines, since it closely mimics how human interact with each other. In this demo, we present a hand gesture recognition system with Kinect sensor, which operates robustly in uncontrolled environments and is insensitive to hand variations and distortions. Our system consists of two major modules, namely, hand detection and gesture recognition. Different from traditional vision-based hand gesture recognition methods that use color-markers for hand detection, our system uses both the depth and color information from Kinect sensor to detect the hand shape, which ensures the robustness in cluttered environments. Besides, to guarantee its robustness to input variations or the distortions caused by the low resolution of Kinect sensor, we apply a novel shape distance metric called Finger-Earth Movers Distance (FEMD) for hand gesture recognition. Consequently, our system operates accurately and efficiently. In this demo, we demonstrate the performance of our system in two real-life applications, arithmetic computation and rock-paper-scissors game.

international conference on information and communication security | 2011

Depth camera based hand gesture recognition and its applications in Human-Computer-Interaction

Zhou Ren; Jingjing Meng; Junsong Yuan

Of various Human-Computer-Interactions (HCI), hand gesture based HCI might be the most natural and intuitive way to communicate between people and machines, since it closely mimics how human interact with each other. Its intuitiveness and naturalness have spawned many applications in exploring large and complex data, computer games, virtual reality, health care, etc. Although the market for hand gesture based HCI is huge, building a robust hand gesture recognition system remains a challenging problem for traditional vision-based approaches, which are greatly limited by the quality of the input from optical sensors. [16] proposed a novel dissimilarity distance metric for hand gesture recognition using Kinect sensor, called Finger-Earth Movers Distance (FEMD). In this paper, we compare the performance in terms of speed and accuracy between FEMD and traditional corresponding-based shape matching algorithm, Shape Context. And then we introduce several HCI applications built on top of a accurate and robust hand gesture recognition system based on FEMD. This hand gesture recognition system performs robustly despite variations in hand orientation, scale or articulation. Moreover, it works well in uncontrolled environments with background clusters. We demonstrate that this robust hand gesture recognition system can be a key enabler for numerous hand gesture based HCI systems.

international conference on computer vision | 2011

Minimum near-convex decomposition for robust shape representation

Zhou Ren; Junsong Yuan; Chunyuan Li; Wenyu Liu

Shape decomposition is a fundamental problem for part-based shape representation. We propose a novel shape decomposition method called Minimum Near-Convex Decomposition (MNCD), which decomposes 2D and 3D arbitrary shapes into minimum number of “near-convex” parts. With the degree of near-convexity a user specified parameter, our decomposition is robust to large local distortions and shape deformation. The shape decomposition is formulated as a combinatorial optimization problem by minimizing the number of non-intersection cuts. Two major perception rules are also imposed into our scheme to improve the visual naturalness of the decomposition. The global optimal solution of this challenging discrete optimization problem is obtained by a dynamic subgradient-based branch-and-bound search. Both theoretical analysis and experiment results show that our approach outperforms the state-of-the-art results without introducing redundant parts. Finally we also show the superiority of our method in the application of hand gesture recognition.

computer vision and pattern recognition | 2017

Deep Reinforcement Learning-Based Image Captioning with Embedding Reward

Zhou Ren; Xiaoyu Wang; Ning Zhang; Xutao Lv; Li-Jia Li

Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance of this task. Most state-of-the-art approaches follow an encoder-decoder framework, which generates captions using a sequential recurrent prediction model. However, in this paper, we introduce a novel decision-making framework for image captioning. We utilize a policy network and a value network to collaboratively generate captions. The policy network serves as a local guidance by providing the confidence of predicting the next word according to the current state. Additionally, the value network serves as a global and lookahead guidance by evaluating all possible extensions of the current state. In essence, it adjusts the goal of predicting the correct words towards the goal of generating captions similar to the ground truth captions. We train both networks using an actor-critic reinforcement learning model, with a novel reward defined by visual-semantic embedding. Extensive experiments and analyses on the Microsoft COCO dataset show that the proposed framework outperforms state-of-the-art approaches across different evaluation metrics.

acm multimedia | 2016

Joint Image-Text Representation by Gaussian Visual-Semantic Embedding

Zhou Ren; Hailin Jin; Zhe L. Lin; Chen Fang; Alan L. Yuille

How to jointly represent images and texts is important for tasks involving both modalities. Visual-semantic embedding models have been recently proposed and shown to be effective. The key idea is that by learning a mapping from images into a semantic text space, the algorithm is able to learn a compact and effective joint representation. However, existing approaches simply map each text concept to a single point in the semantic space. Mapping instead to a density distribution provides many interesting advantages, including better capturing uncertainty about each text concept, and enabling better geometric interpretation of concepts such as inclusion, intersection, etc. In this work, we present a novel Gaussian Visual-Semantic Embedding (GVSE) model, which leverages the visual information to model text concepts as Gaussian distributions in semantic space. Experiments in two tasks, image classification and text-based image retrieval on the large scale MIT Places205 dataset, have demonstrated the superiority of our method over existing approaches, with higher accuracy and better robustness.

data compression conference | 2010

Arbitrary Directional Edge Encoding Schemes for the Operational Rate-Distortion Optimal Shape Coding Framework

Zhongyuan Lai; Junhuan Zhu; Zhou Ren; Wenyu Liu; Baolan Yan

We present two edge encoding schemes, namely 8-sector scheme and 16-sector scheme, for the operational rate-distortion (ORD) optimal shape coding framework. Different from the traditional 8-direction scheme that can only encode edges with angles being an integer multiple of π/4, our proposals can encode edges with arbitrary angles. We partition the digital coordinate plane into 8 and 16 sectors, and design the corresponding differential schemes to encode the short and the long component of each vertex. Experiment results demonstrate that our two proposals can reduce a large number of encoding vertices and therefore reduce 10%~20% bits for the basic ORD optimal algorithms and 10%~30% bits for all the ORD optimal algorithms under the same distortion thresholds, respectively. Moreover, the reconstruction contours are more compact compared with those using the traditional 8-direction edge encoding scheme.

arXiv: Computer Vision and Pattern Recognition | 2018

Adversarial Attacks and Defences Competition

Alexey Kurakin; Ian J. Goodfellow; Samy Bengio; Yinpeng Dong; Fangzhou Liao; Ming Liang; Tianyu Pang; Jun Zhu; Xiaolin Hu; Cihang Xie; Jianyu Wang; Zhishuai Zhang; Zhou Ren; Alan L. Yuille; Sangxia Huang; Yao Zhao; Yuzhe Zhao; Zhonglin Han; Junjiajia Long; Yerkebulan Berdibekov; Takuya Akiba; Seiya Tokui; Motoki Abe

To accelerate research on adversarial examples and robustness of machine learning classifiers, Google Brain organized a NIPS 2017 competition that encouraged researchers to develop new methods to generate adversarial examples as well as to develop new ways to defend against them. In this chapter, we describe the structure and organization of the competition and the solutions developed by several of the top-placing teams.

international conference on computer vision | 2015

Scene-Domain Active Part Models for Object Representation

Zhou Ren; Chaohui Wang; Alan L. Yuille

In this paper, we are interested in enhancing the expressivity and robustness of part-based models for object representation, in the common scenario where the training data are based on 2D images. To this end, we propose scene-domain active part models (SDAPM), which reconstruct and characterize the 3D geometric statistics between objects parts in 3D scene-domain by using 2D training data in the image-domain alone. And on top of this, we explicitly model and handle occlusions in SDAPM. Together with the developed learning and inference algorithms, such a model provides rich object descriptions, including 2D object and parts localization, 3D landmark shape and camera viewpoint, which offers an effective representation to various image understanding tasks, such as object and parts detection, 3D landmark shape and viewpoint estimation from images. Experiments on the above tasks show that SDAPM outperforms previous part-based models, and thus demonstrates the potential of the proposed technique.

Explore More