Jingjing Meng
Nanyang Technological University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jingjing Meng.
IEEE Transactions on Multimedia | 2013
Zhou Ren; Junsong Yuan; Jingjing Meng; Zhengyou Zhang
The recently developed depth sensors, e.g., the Kinect sensor, have provided new opportunities for human-computer interaction (HCI). Although great progress has been made by leveraging the Kinect sensor, e.g., in human body tracking, face recognition and human action recognition, robust hand gesture recognition remains an open problem. Compared to the entire human body, the hand is a smaller object with more complex articulations and more easily affected by segmentation errors. It is thus a very challenging problem to recognize hand gestures. This paper focuses on building a robust part-based hand gesture recognition system using Kinect sensor. To handle the noisy hand shapes obtained from the Kinect sensor, we propose a novel distance metric, Finger-Earth Movers Distance (FEMD), to measure the dissimilarity between hand shapes. As it only matches the finger parts while not the whole hand, it can better distinguish the hand gestures of slight differences. The extensive experiments demonstrate that our hand gesture recognition system is accurate (a 93.2% mean accuracy on a challenging 10-gesture dataset), efficient (average 0.0750 s per frame), robust to hand articulations, distortions and orientation or scale changes, and can work in uncontrolled environments (cluttered backgrounds and lighting conditions). The superiority of our system is further demonstrated in two real-life HCI applications.
acm multimedia | 2011
Zhou Ren; Jingjing Meng; Junsong Yuan; Zhengyou Zhang
Hand gesture based Human-Computer-Interaction (HCI) is one of the most natural and intuitive ways to communicate between people and machines, since it closely mimics how human interact with each other. In this demo, we present a hand gesture recognition system with Kinect sensor, which operates robustly in uncontrolled environments and is insensitive to hand variations and distortions. Our system consists of two major modules, namely, hand detection and gesture recognition. Different from traditional vision-based hand gesture recognition methods that use color-markers for hand detection, our system uses both the depth and color information from Kinect sensor to detect the hand shape, which ensures the robustness in cluttered environments. Besides, to guarantee its robustness to input variations or the distortions caused by the low resolution of Kinect sensor, we apply a novel shape distance metric called Finger-Earth Movers Distance (FEMD) for hand gesture recognition. Consequently, our system operates accurately and efficiently. In this demo, we demonstrate the performance of our system in two real-life applications, arithmetic computation and rock-paper-scissors game.
international conference on information and communication security | 2011
Zhou Ren; Jingjing Meng; Junsong Yuan
Of various Human-Computer-Interactions (HCI), hand gesture based HCI might be the most natural and intuitive way to communicate between people and machines, since it closely mimics how human interact with each other. Its intuitiveness and naturalness have spawned many applications in exploring large and complex data, computer games, virtual reality, health care, etc. Although the market for hand gesture based HCI is huge, building a robust hand gesture recognition system remains a challenging problem for traditional vision-based approaches, which are greatly limited by the quality of the input from optical sensors. [16] proposed a novel dissimilarity distance metric for hand gesture recognition using Kinect sensor, called Finger-Earth Movers Distance (FEMD). In this paper, we compare the performance in terms of speed and accuracy between FEMD and traditional corresponding-based shape matching algorithm, Shape Context. And then we introduce several HCI applications built on top of a accurate and robust hand gesture recognition system based on FEMD. This hand gesture recognition system performs robustly despite variations in hand orientation, scale or articulation. Moreover, it works well in uncontrolled environments with background clusters. We demonstrate that this robust hand gesture recognition system can be a key enabler for numerous hand gesture based HCI systems.
computer vision and pattern recognition | 2012
Yuning Jiang; Jingjing Meng; Junsong Yuan
Accurate matching of local features plays an essential role in visual object search. Instead of matching individual features separately, using the spatial context, e.g., bundling a group of co-located features into a visual phrase, has shown to enable more discriminative matching. Despite previous work, it remains a challenging problem to extract appropriate spatial context for matching. We propose a randomized approach to deriving visual phrase, in the form of spatial random partition. By averaging the matching scores over multiple randomized visual phrases, our approach offers three benefits: 1) the aggregation of the matching scores over a collection of visual phrases of varying sizes and shapes provides robust local matching; 2) object localization is achieved by simple thresholding on the voting map, which is more efficient than subimage search; 3) our algorithm lends itself to easy parallelization and also allows a flexible trade-off between accuracy and speed by adjusting the number of partition times. Both theoretical studies and experimental comparisons with the state-of-the-art methods validate the advantages of our approach.
eurographics | 2008
Jingjing Meng; Junsong Yuan; Mat Hans; Ying Wu
Mining frequently occurring temporal motion patterns (motion motifs) is importa nt for understanding, organizing and retrieving motion data. However, without any a priori knowledge of th e motifs, such as their lengths, contents, locations and total number, it remains a challenging problem due to the enor mous computational cost involved in analyzing huge motion databases. Moreover, since the same motion motif can exhibit different temporal and spatial variations, it prevents directly applying existing data mining methods tomotion data. In this paper, we propose an efficient motif discovery method which can handle both spatia l and temporal variations of motion data. We translate the motif discovery problem into finding continuous paths ina matching trellis, where each continuous path corresponds to an instance of a motif. A tree-growing me thod is introduced to search for the continuous paths constrained by a branching factor, and to accommodate intra-pattern variations of motifs. By using locality-sensitive hashing (LSH) to find the approximate matches and b the trellis, the overall complexity of our algorithm is only sub-quadratic to the size of the database, and is of lin ear memory cost. Experimental results on a data set of 32, 260 frames show that our method can effectively discover meaningful motion mo tifs regardless of their spatial and temporal variations.
acm multimedia | 2010
Jingjing Meng; Junsong Yuan; Yuning Jiang; Nitya Narasimhan; Venu Vasudevan; Ying Wu
Searching for small objects (e.g., logos) in images is a critical yet challenging problem. It becomes more difficult when target objects differ significantly from the query object due to changes in scale, viewpoint or style, not to mention partial occlusion or cluttered backgrounds. With the goal to retrieve and accurately locate the small object in the images, we formulate the object search as the problem of finding subimages with the largest mutual information toward the query object. Each image is characterized by a collection of local features. Instead of only using the query object for matching, we propose a discriminative matching using both positive and negative queries to obtain the mutual information score. The user can verify the retrieved subimages and improve the search results incrementally. Our experiments on a challenging logo database of 10,000 images highlight the effectiveness of this approach.
computer vision and pattern recognition | 2016
Jingjing Meng; Hongxing Wang; Junsong Yuan; Yap-Peng Tan
We propose to summarize a video into a few key objects by selecting representative object proposals generated from video frames. This representative selection problem is formulated as a sparse dictionary selection problem, i.e., choosing a few representatives object proposals to reconstruct the whole proposal pool. Compared with existing sparse dictionary selection based representative selection methods, our new formulation can incorporate object proposal priors and locality prior in the feature space when selecting representatives. Consequently it can better locate key objects and suppress outlier proposals. We convert the optimization problem into a proximal gradient problem and solve it by the fast iterative shrinkage thresholding algorithm (FISTA). Experiments on synthetic data and real benchmark datasets show promising results of our key object summarization approach in video content mining and search. Comparisons with existing representative selection approaches such as K-mediod, sparse dictionary selection and density based selection validate that our formulation can better capture the key video objects despite appearance variations, cluttered backgrounds and camera motions.
IEEE Transactions on Image Processing | 2015
Yuning Jiang; Jingjing Meng; Junsong Yuan; Jiebo Luo
Searching visual objects in large image or video data sets is a challenging problem, because it requires efficient matching and accurate localization of query objects that often occupy a small part of an image. Although spatial context has been shown to help produce more reliable detection than methods that match local features individually, how to extract appropriate spatial context remains an open problem. Instead of using fixed-scale spatial context, we propose a randomized approach to deriving spatial context, in the form of spatial random partition. The effect of spatial context is achieved by averaging the matching scores over multiple random patches. Our approach offers three benefits: 1) the aggregation of the matching scores over multiple random patches provides robust local matching; 2) the matched objects can be directly identified on the pixelwise confidence map, which results in efficient object localization; and 3) our algorithm lends itself to easy parallelization and also allows a flexible tradeoff between accuracy and speed through adjusting the number of partition times. Both theoretical studies and experimental comparisons with the state-of-the-art methods validate the advantages of our approach.
international conference on image processing | 2011
Yuning Jiang; Jingjing Meng; Junsong Yuan
We propose a new grid-based image representation for discriminative visual object search, with the goal to efficiently locate the query object in a large image collection. After extracting local invariant features, we partition the image into non-overlapping rectangular grid cells. Each grid bundles the local features within it and is characterized by a histogram of visual words. Given both positive and negative queries, each grid is assigned a mutual information score to match and locate the query object. This new image representation offers two great benefits for efficient object search: 1) as the grid bundles local features, the spatial contextual information enhances the discriminative matching; and 2) it enables faster object localization by searching visual object in the grid-level image. To evaluate our approach, we perform experiments on a very challenging logo database BelgaLogos [1] of 10,000 images. The comparison with the state-of-the-art methods highlights the effectiveness of our approach in both accuracy and speed.
IEEE Transactions on Multimedia | 2016
Jingjing Meng; Junsong Yuan; Jiong Yang; Gang Wang; Yap-Peng Tan
Given a specific object as query, object instance search aims to not only retrieve the images or frames that contain the query, but also locate all its occurrences. In this work, we explore the use of spatio-temporal cues to improve the quality of object instance search from videos. To this end, we formulate this problem as the spatio-temporal trajectory search problem, where a trajectory is a sequence of bounding boxes that locate the object instance in each frame. The goal is to find the top- K trajectories that are likely to contain the target object. Despite the large number of trajectory candidates, we build on a recent spatio- temporal search algorithm for event detection to efficiently find the optimal spatio- temporal trajectories in large video volumes , with complexity linear to the video volume size. We solve the key bottleneck in applying this approach to object instance search by leveraging a randomized approach to enable fast scoring of any bounding boxes in the video volume. In addition , we present a new dataset for video object instance search. Experimental results on a 73-hour video dataset demonstrate that our approach improves the performance of video object instance search and localization over the state-of-the-art search and tracking methods.