Is this you? Create Your Porfile

Thanh-Hai Tran

Hanoi University of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thanh-Hai Tran is active.

Explore More

Publication

Featured researches published by Thanh-Hai Tran.

ieee international conference on automatic face gesture recognition | 2015

A new hand representation based on kernels for hand posture recognition

Van-Toi Nguyen; Thi-Lan Le; Thanh-Hai Tran; Rémy Mullot; Vincent Courboulay

Hand posture recognition is an extremely active research topic in Computer Vision and Robotics, with many applications ranging from automatic sign language recognition to human-system interaction. Recently, a new descriptor for object representation based on the kernel method (KDES) has been proposed. While this descriptor has been shown to be efficient for hand posture representation, across-the-board use of KDES for hand posture recognition has some drawbacks. This paper proposes three improvements to KDES to make it more robust to scale change, rotation, and differences in the object structure. First, the gradient vector inside the gradient kernel is normalized, making gradient KDES invariant to rotation. Second, patches with adaptive size are created, to make hand representation more robust to changes in scale. Finally, for patch-level features pooling, a new pyramid structure is proposed, which is more suitable for hand structure. These innovations are tested on three datasets; the results bring out an increase in recognition rate (as compared to the original method) from 84.4% to 91.2%.

european conference on computer vision | 2014

A Visual SLAM System on Mobile Robot Supporting Localization Services to Visually Impaired People

Quoc-Hung Nguyen; Hai Vu; Thanh-Hai Tran; David Van Hamme; Peter Veelaert; Wilfried Philips; Quang-Hoang Nguyen

This paper describes a Visual SLAM system developed on a mobile robot in order to support localization services to visually impaired people. The proposed system aims to provide services in small or mid-scale environments such as inside a building or campus of school where conventional positioning data such as GPS, WIFI signals are often not available. Toward this end, we adapt and improve existing vision-based techniques in order to handle issues in the indoor environments. We firstly design an image acquisition system to collect visual data. On one hand, a robust visual odometry method is adjusted to precisely create the routes in the environment. On the other hand, we utilize the Fast-Appearance Based Mapping algorithm that is may be the most successful for matching places in large scenarios. In order to better estimate robot’s location, we utilize a Kalman Filter that combines the matching results of current observation and the estimation of robot states based on its kinematic model. The experimental results confirmed that the proposed system is feasible to navigate the visually impaired people in the indoor environments.

The National Foundation for Science and Technology Development (NAFOSTED) Conference on Information and Computer Science | 2014

An Efficient Combination of RGB and Depth for Background Subtraction

Van-Toi Nguyen; Hai Vu; Thanh-Hai Tran

This paper describes a new method for background subtraction using RGB and depth data from a Microsoft Kinect sensor. In the first step of the proposed method, noises are removed from depth data using the proposed noise model. Denoising procedure help improving the performance of background subtraction and also avoids major limitations of RGB mostly when illumination changes. Background subtraction then is solved by combining RGB and depth features instead of using individual RGB or depth data. The fundamental idea in our combination strategy is that when depth measurement is reliable, the background subtraction from depth taken priority over all. Otherwise, RGB is used as alternative. The proposed method is evaluated on a public benchmark dataset which is suffered from common problems of the background subtraction such as shadows, reflections and camouflage. The experimental results show better performances in comparing with state-of-the-art. Furthermore, the proposed method is successful with a challenging task such as extracting human fall-down event in a RGB-D image sequence. Therefore, the foreground segmentation is feasibility for the further task such as tracking and recognition.

Multimedia Tools and Applications | 2017

Developing a way-finding system on mobile robot assisting visually impaired people in an indoor environment

Quoc-Hung Nguyen; Hai Vu; Thanh-Hai Tran; Quang-Hoan Nguyen

A way-finding system in an indoor environment consists of several components: localization, representation, path planning, and interaction. For each component, numerous relevant techniques have been proposed. However, deploying feasible techniques, particularly in real scenarios, remains challenging. In this paper, we describe a functional way-finding system deployed on a mobile robot to assist visual impairments (VI). The proposed system deploys state-of-the-art techniques that are adapted to the practical issues at hand. First, we adapt an outdoor visual odometry technique to indoor use by covering manual markers or stickers on ground-planes. The main purpose is to build reliable travel routes in the environment. Second, we propose a procedure to define and optimize the landmark/representative scenes of the environment. This technique handles the repetitive and ambiguous structures of the environment. In order to interact with VI people, we deploy a convenient interface on a smart phone. Three different indoor scenarios and thirteen subjects are conducted in our evaluations. Our experimental results show that VI people, particularly VI pupils, can find the right way to requested targets.

Computer Methods and Programs in Biomedicine | 2017

Continuous detection of human fall using multimodal features from Kinect sensors in scalable environment

Thanh-Hai Tran; Thi-Lan Le; Van-Nam Hoang; Hai Vu

BACKGROUND AND OBJECTIVES Automatic detection of human fall is a key problem in video surveillance and home monitoring. Existing methods using unimodal data (RGB / depth / skeleton) may suffer from the drawbacks of inadequate lighting condition or unreliability. Besides, most of proposed methods are constrained to a small space with off-line video stream. METHODS In this study, we overcome these encountered issues by combining multi-modal features (skeleton and RGB) from Kinect sensor to take benefits of each data characteristic. If a skeleton is available, we propose a rules based technique on the vertical velocity and the height to floor plane of the human center. Otherwise, we compute a motion map from a continuous gray-scale image sequence, represent it by an improved kernel descriptor then input to a linear Support Vector Machine. This combination speeds up the proposed system and avoid missing detection at an unmeasurable range of the Kinect sensor. We then deploy this method with multiple Kinects to deal with large environments based on client server architecture with late fusion techniques. RESULTS We evaluated the method on some freely available datasets for fall detection. Compared to recent methods, our method has a lower false alarm rate while keeping the highest accuracy. We also validated on-line our system using multiple Kinects in a large lab-based environment. Our method obtained an accuracy of 91.5% at average frame-rate of 10fps. CONCLUSIONS The proposed method using multi-modal features obtained higher results than using unimodal features. Its on-line deployment on multiple Kinects shows the potential to be applied in to any of living space in reality.

international conference on computer vision systems | 2015

How Good Is Kernel Descriptor on Depth Motion Map for Action Recognition

Thanh-Hai Tran; Van-Toi Nguyen

This paper presents a new method for action recognition using depth data. Each depth sequence is represented by depth motion maps from three projection views front, side and top to exploit different aspects of the motion. However, different from state of the art works extracting local binary pattern or histogram of oriented gradients, we describe an action based on gradient kernel descriptor. The proposed method is evaluated on two benchmark datasets MSRAction3D and MSRGestures3D and obtains very competitive performances with the best state of the arts methods. Our best recognition rate is 91.57i¾?% on MSRAction3D and 100i¾?% on MSRGestures3D dataset whereas [1] achieved 93.77i¾?% and 94.60i¾?% respectively.

international symposium on information and communication technology | 2015

Recognition of hand gestures from cyclic hand movements using spatial-temporal features

Huong Giang Doan; Hai Vu; Thanh-Hai Tran

Dynamic hand gesture recognition is a challenge field evenly this topic has been studied for a long time because of lack of feasible techniques deployed for Human-Computer Interaction (HCI) applications. In this paper, we propose a new type of gestures which presents a cyclic pattern of hand shapes during a movement. Through mapping of commands (e.g., turn devices on/off; increasing volume/channel) as output of a gesture recognition system, main purposes of the proposed gestures are to provide a natural and feasible way in control alliances in a smart home such as television, light, fan, door, so on. The proposed gestures are represented by both hand shapes and directions. Thanks to cyclic pattern of the hand shapes during performing a command, hand gestures are more easily segmented from video stream. We then focus on several challenges of the proposed gestures such as: non-synchronization phase of the gestures, change of hand shapes along temporal dimension and direction of hand movements. Such issues are addressed using combinations of spatial and temporal features extracted from consecutive frames of a gesture. The proposed algorithms are evaluated on several subjects. Evaluation results confirm that the proposed method obtains accuracy rates at 96% for segmenting a dynamic hand gesture and 95% for recognizing a command, averagely.

knowledge and systems engineering | 2016

Accurate object localization using RFID and Microsoft Kinect sensor

Thi-Son Nguyen; Thanh-Hai Tran; Hai Vu

Object localization is the first requirement for many applications such as navigation, obstacle avoidance, object grasping. In this paper, we present a new method that combines two techniques of localization: RFID (Radio Frequency Identification) based and RGB-D camera based. In our method, each RFID tag with an unique ID will be assigned to one object. Based on the RSSI (Received Signal Strength Indication) received from RFID readers, we make a coarse localization of the object. This localization result is then projected on the image captured by a Kinect sensor to limit the region of search (RoS). If the Kinect sensor provides depth in this RoS, depth distribution of the RoS will be computed and served to narrow again the RoS. Finally, object position is refined by applying a HOG-SVM detector [1] on the RoS of the RGB image. The combination of RFID and RGB-D is twofold. It avoids both false positives and negatives when using only RGB-D information. It reduces the computational time. We have evaluated our method in a real-scene with different positions of object. The combination of RFID and RGB-D helps to reduce the localization error from 1.02m to 0.16m in average compared to using solely RFID. The HOG-SVM detector applied on the RoS obtained higher precision (100%) than applied on the whole RGB image (72.86%) while keeping the same recall (98.96%). It also reduced the computational time from 1.038s per image to 0.39s.

international conference on control, automation, robotics and vision | 2016

Indoor navigation assistance system for visually impaired people using multimodal technologies

Trung-Kien Dao; Thanh-Hai Tran; Thi-Lan Le; Hai Vu; Viet-Tung Nguyen; Dang-Khoa Mac; Ngoc-Diep Do; Thanh-Thuy Pham

In this paper, a complete indoor navigation assistance system for visually impaired people is introduced. Different multimedia technologies are integrated in a single system in order to provide a precise, safe and friendly navigation service. First, the environment is modeled and represented. After that, the user location is determined by combining Wi-Fi and vision information. This combination offers some benefits in comparison with single technology systems such as setup cost, computational time and accuracy. Finally, the interaction between users and the system is performed through natural Vietnamese language with the support of Vietnamese voice synthesis and recognition. The proposed the system has been successfully deployed in a school for visually impaired pupils. Evaluation with various criteria on visually impaired pupils reveals the feasibility of the solution.

Engineering Applications of Artificial Intelligence | 2016

A combination of user-guide scheme and kernel descriptor on RGB-D data for robust and realtime hand posture recognition

Huong-Giang Doan; Van-Toi Nguyen; Hai Vu; Thanh-Hai Tran

This paper presents a robust and real-time hand posture recognition system. To obtain this, key elements of the proposed system contain an user-guide scheme and a kernel-based hand posture representation. We firstly describe a three-stage scheme to train an end-user. This scheme aims to adapt environmental conditions (e.g., background images, distance from device to hand/human body) as well as to learn appearance-based features such as hand-skin color. Thanks to the proposed user-guide scheme, we could precisely estimate heuristic parameters which play an important role for detecting and segmenting hand regions. Based on the segmented hand regions, we utilize a kernel-based hand representation in which three levels of feature are extracted. Whereas pixel-level and patch-level are conventional extractions, we construct image-level which presents a hand pyramid structure. These representations contribute to a Multi-class support vector machine classifier. We evaluate the proposed system in term of the learning time versus the robustness and real time performances. Averagely, the proposed system requires 14s in advanced to guide an end-user. However, the hand posture recognition rate obtains 91.2% accuracy. Performance of the proposed system is comparable with state-of-the-art methods (e.g. Pisharady et al., 2012) but it is a real time system. To recognize a posture, its computational cost is only 0.15s. This is significantly faster than works in Pisharady et al. (2012), which required approximately 2min. The proposed methods therefore are feasible to embed into smart devices, particularly, consumer electronics in domain of home-automation such as televisions, game consoles, or lighting systems.

Explore More