Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yongkang Wong is active.

Publication


Featured researches published by Yongkang Wong.


acm multimedia | 2013

Temporal encoded F-formation system for social interaction detection

Tian Gan; Yongkang Wong; Daqing Zhang; Mohan S. Kankanhalli

In the context of a social gathering, such as a cocktail party, the memorable moments are generally captured by professional photographers or by the participants. The latter case is often undesirable because many participants would rather enjoy the event instead of being occupied by the photo-taking task. Motivated by this scenario, we propose the use of a set of cameras to automatically take photos. Instead of performing dense analysis on all cameras for photo capturing, we first detect the occurrence and location of social interactions via F-formation detection. In the sociology literature, F-formation is a concept used to define social interactions, where each detection only requires the spatial location and orientation of each participant. This information can be robustly obtained with additional Kinect depth sensors. In this paper, we propose an extended F-formation system for robust detection of interactions and interactants. The extended F-formation system employs a heat-map based feature representation for each individual, namely Interaction Space (IS), to model their location, orientation, and temporal information. Using the temporally encoded IS for each detected interactant, we propose a best-view camera selection framework to detect the corresponding best view camera for each detected social interaction. The extended F-formation system is evaluated with synthetic data on multiple scenarios. To demonstrate the effectiveness of the proposed system, we conducted a user study to compare our best view camera ranking with humans ranking using real-world data.


european conference on computer vision | 2016

Marker-Less 3D Human Motion Capture with Monocular Image Sequence and Height-Maps

Yu Du; Yongkang Wong; Yonghao Liu; Feilin Han; Yilin Gui; Zhen Wang; Mohan S. Kankanhalli; Weidong Geng

The recovery of 3D human pose with monocular camera is an inherently ill-posed problem due to the large number of possible projections from the same 2D image to 3D space. Aimed at improving the accuracy of 3D motion reconstruction, we introduce the additional built-in knowledge, namely height-map, into the algorithmic scheme of reconstructing the 3D pose/motion under a single-view calibrated camera. Our novel proposed framework consists of two major contributions. Firstly, the RGB image and its calculated height-map are combined to detect the landmarks of 2D joints with a dual-stream deep convolution network. Secondly, we formulate a new objective function to estimate 3D motion from the detected 2D joints in the monocular image sequence, which reinforces the temporal coherence constraints on both the camera and 3D poses. Experiments with HumanEva, Human3.6M, and MCAD dataset validate that our method outperforms the state-of-the-art algorithms on both 2D joints localization and 3D motion recovery. Moreover, the evaluation results on HumanEva indicates that the performance of our proposed single-view approach is comparable to that of the multi-view deep learning counterpart.


acm multimedia | 2015

Multi-modal & Multi-view & Interactive Benchmark Dataset for Human Action Recognition

Ning Xu; Anan Liu; Weizhi Nie; Yongkang Wong; Fuwu Li; Yuting Su

Human action recognition is one of the most active research areas in both computer vision and machine learning communities. Several methods for human action recognition have been proposed in the literature and promising results have been achieved on the popular datasets. However, the comparison of existing methods is often limited given the different datasets, experimental settings, feature representations, and so on. In particularly, there are no human action dataset that allow concurrent analysis on three popular scenarios, namely single view, cross view, and cross domain. In this paper, we introduce a Multi-modal & Multi-view & Interactive (M2I) dataset, which is designed for the evaluation of the performances of human action recognition under multi-view scenario. This dataset consists of 1760 action samples, including 9 person-person interaction actions and 13 person-object interaction actions. Moreover, we respectively evaluate three representative methods for the single-view, cross-view, and cross domain human action recognition on this dataset with the proposed evaluation protocol. It is experimentally demonstrated that this dataset is extremely challenging due to large intraclass variation, multiple similar actions, significant view difference. This benchmark can provide solid basis for the evaluation of this task and will benefit advancing related computer vision and machine learning research topics.


acm multimedia | 2015

Multi-sensor Self-Quantification of Presentations

Tian Gan; Yongkang Wong; Bappaditya Mandal; Vijay Chandrasekhar; Mohan S. Kankanhalli

Presentations have been an effective means of delivering information to groups for ages. Over the past few decades, technological advancements have revolutionized the way humans deliver presentations. Despite that, the quality of presentations can be varied and affected by a variety of reasons. Conventional presentation evaluation usually requires painstaking manual analysis by experts. Although the expert feedback can definitely assist users in improving their presentation skills, manual evaluation suffers from high cost and is often not accessible to most people. In this work, we propose a novel multi-sensor self-quantification framework for presentations. Utilizing conventional ambient sensors (i.e., static cameras, Kinect sensor) and the emerging wearable egocentric sensors (i.e., Google Glass), we first analyze the efficacy of each type of sensor with various nonverbal assessment rubrics, which is followed by our proposed multi-sensor presentation analytics framework. The proposed framework is evaluated on a new presentation dataset, namely NUS Multi-Sensor Presentation (NUSMSP) dataset, which consists of 51 presentations covering a diverse set of topics. The dataset was recorded with ambient static cameras, Kinect sensor, and Google Glass. In addition to multi-sensor analytics, we have conducted a user study with the speakers to verify the effectiveness of our system generated analytics, which has received positive and promising feedback.


international conference on industrial informatics | 2006

Intelligent Sensor Monitoring For Industrial Underwater Applications

Yongkang Wong; Lek-Heng Ngoh; Wai-Choong Wong; Winston Khoon Guan Seah

Recently, sensors networks are proposed for underwater industrial applications - such as the lucrative business of seismic imaging of underwater oil wells. Underwater sensing systems present a far more challenging problem to solve, given additional communication bandwidth constraints and a sparse deployment of these underwater sensor nodes. We present a suitable wakeup schedule designed for underwater monitoring applications and support our scheme with simulation results.


Journal of Electrical Engineering & Technology | 2015

Human Action Recognition Bases on Local Action Attributes

Jing Zhang; Hong Lin; Weizhi Nie; Lekha Chaisorn; Yongkang Wong; Mohan S. Kankanhalli

Human action recognition received many interest in the computer vision community. Most of the existing methods focus on either construct robust descriptor from the temporal domain, or computational method to exploit the discriminative power of the descriptor. In this paper we explore the idea of using local action attributes to form an action descriptor, where an action is no longer characterized with the motion changes in the temporal domain but the local semantic description of the action. We propose an novel framework where introduces local action attributes to represent an action for the final human action categorization. The local action attributes are defined for each body part which are independent from the global action. The resulting attribute descriptor is used to jointly model human action to achieve robust performance. In addition, we conduct some study on the impact of using body local and global low-level feature for the aforementioned attributes. Experiments on the KTH dataset and the MV-TJU dataset show that our local action attribute based descriptor improve action recognition performance.


wireless communications and networking conference | 2007

A Combinatorics-Based Wakeup Scheme for Target Tracking in Wireless Sensor Networks

Yongkang Wong; L. H. Ngoh; Wai-Choong Wong; Winston Khoon Guan Seah

Sensor networks deployed for the purpose of target tracking and intrusion detection have very specific requirements in terms of sensing coverage, timeliness of data and network lifetime. Many existing work focus on wakeup schemes that only attempt to prolong network usage but failed to address issues such as coverage, timeliness of data delivery, target tracking continuity and deployment cost all at the same time. Given any maximum target speed, our solution based on a combinatorics approach, guarantees delay bounds and node lifetime bounds, critical for any practical deployment of a target tracking application. We support our solution further with simulation results.


international conference on distributed smart cameras | 2014

Scalable Decision-Theoretic Coordination and Control for Real-time Active Multi-Camera Surveillance

Prabhu Natarajan; Trong Nghia Hoang; Yongkang Wong; Kian Hsiang Low; Mohan S. Kankanhalli

This paper presents an overview of our novel decision-theoretic multi-agent approach for controlling and coordinating multiple active cameras in surveillance. In this approach, a surveillance task is modeled as a stochastic optimization problem, where the active cameras are controlled and coordinated to achieve the desired surveillance goal in presence of uncertainties. We enumerate the practical issues in active camera surveillance and discuss how these issues are addressed in our decision-theoretic approach. We focus on two novel surveillance tasks: maximize the number of targets observed in active cameras with guaranteed image resolution and to improve the fairness in observation of multiple targets. We discuss the overview of our novel decision-theoretic frameworks: Markov Decision Process and Partially Observable Markov Decision Process frameworks for coordinating active cameras in uncertain and partially occluded environments.


Pattern Recognition Letters | 2017

A multi-stream convolutional neural network for sEMG-based gesture recognition in muscle-computer interface

Wentao Wei; Yongkang Wong; Yu Du; Yu Hu; Mohan S. Kankanhalli; Weidong Geng

Abstract In muscle-computer interface (MCI), deep learning is a promising technology to build-up classifiers for recognizing gestures from surface electromyography (sEMG) signals. Motivated by the observation that a small group of muscles play significant roles in specific hand movements, we propose a multi-stream convolutional neural network (CNN) framework to improve the recognition accuracy of gestures by learning the correlation between individual muscles and specific gestures with a “divide-and-conquer” strategy. Its pipeline consists of two stages, namely the multi-stream decomposition stage and the fusion stage. During the multi-stream decomposition stage, it first decomposes the original sEMG image into equal-sized patches (streams) by the layout of electrodes on muscles, and for each stream, it independently learns representative features by a CNN. Then during the fusion stage, it fuses the features learned from all streams into a unified feature map, which is subsequently fed into a fusion network to recognize gestures. Evaluations on three benchmark sEMG databases showed that our proposed multi-stream CNN framework outperformed the state-of-the-arts on sEMG-based gesture recognition.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2015

Multi-Camera Saliency

Yan Luo; Ming Jiang; Yongkang Wong; Qi Zhao

A significant body of literature on saliency modeling predicts where humans look in a single image or video. Besides the scientific goal of understanding how information is fused from multiple visual sources to identify regions of interest in a holistic manner, there are tremendous engineering applications of multi-camera saliency due to the widespread of cameras. This paper proposes a principled framework to smoothly integrate visual information from multiple views to a global scene map, and to employ a saliency algorithm incorporating high-level features to identify the most important regions by fusing visual information. The proposed method has the following key distinguishing features compared with its counterparts: (1) the proposed saliency detection is global (salient regions from one local view may not be important in a global context), (2) it does not require special ways for camera deployment or overlapping field of view, and (3) the key saliency algorithm is effective in highlighting interesting object regions though not a single detector is used. Experiments on several data sets confirm the effectiveness of the proposed principled framework.

Collaboration


Dive into the Yongkang Wong's collaboration.

Top Co-Authors

Avatar

Mohan S. Kankanhalli

National University of Singapore

View shared research outputs
Top Co-Authors

Avatar

Junnan Li

National University of Singapore

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Qi Zhao

University of Minnesota

View shared research outputs
Top Co-Authors

Avatar

Wai-Choong Wong

National University of Singapore

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge