Hanlin Goh
Agency for Science, Technology and Research
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hanlin Goh.
european conference on computer vision | 2012
Hanlin Goh; Nicolas Thome; Matthieu Cord; Joo-Hwee Lim
Recently, the coding of local features (e.g. SIFT) for image categorization tasks has been extensively studied. Incorporated within the Bag of Words (BoW) framework, these techniques optimize the projection of local features into the visual codebook, leading to state-of-the-art performances in many benchmark datasets. In this work, we propose a novel visual codebook learning approach using the restricted Boltzmann machine (RBM) as our generative model. Our contribution is three-fold. Firstly, we steer the unsupervised RBM learning using a regularization scheme, which decomposes into a combined prior for the sparsity of each features representation as well as the selectivity for each codeword. The codewords are then fine-tuned to be discriminative through the supervised learning from top-down labels. Secondly, we evaluate the proposed method with the Caltech-101 and 15-Scenes datasets, either matching or outperforming state-of-the-art results. The codebooks are compact and inference is fast. Finally, we introduce an original method to visualize the codebooks and decipher what each visual codeword encodes.
IEEE Transactions on Neural Networks | 2014
Hanlin Goh; Nicolas Thome; Matthieu Cord; Joo-Hwee Lim
In this paper, we propose a hybrid architecture that combines the image modeling strengths of the bag of words framework with the representational power and adaptability of learning deep architectures. Local gradient-based descriptors, such as SIFT, are encoded via a hierarchical coding scheme composed of spatial aggregating restricted Boltzmann machines (RBM). For each coding layer, we regularize the RBM by encouraging representations to fit both sparse and selective distributions. Supervised fine-tuning is used to enhance the quality of the visual representation for the categorization task. We performed a thorough experimental evaluation using three image categorization data sets. The hierarchical coding scheme achieved competitive categorization accuracies of 79.7% and 86.4% on the Caltech-101 and 15-Scenes data sets, respectively. The visual representations learned are compact and the models inference is fast, as compared with sparse coding methods. The low-level representations of descriptors that were learned using this method result in generic features that we empirically found to be transferrable between different image data sets. Further analysis reveal the significance of supervised fine-tuning when the architecture has two layers of representations as opposed to a single layer.
international conference on image processing | 2011
Hanlin Goh; Lukasz Kusmierz; Joo-Hwee Lim; Nicolas Thome; Matthieu Cord
Our objective is to learn invariant color features directly from data via unsupervised learning. In this paper, we introduce a method to regularize restricted Boltzmann machines during training to obtain features that are sparse and topographically organized. Upon analysis, the features learned are Gabor-like and demonstrate a coding of orientation, spatial position, frequency and color that vary smoothly with the topography of the feature map. There is also differentiation between monochrome and color filters, with some exhibiting color-opponent properties. We also found that the learned representation is more invariant to affine image transformations and changes in illumination color.
international conference on multimedia and expo | 2008
Yiqun Li; Joo-Hwee Lim; Hanlin Goh
A two-stage cascaded classification approach with an optimal candidate selection scheme is proposed to recognize places using images taken by camera phones. An optimal acceptance threshold is chosen to maximize the probability of accepting more positives and rejecting more negatives at the first stage so that an optimal number of candidates are selected. The first classifier is trained using simple color and texture features. The second classifier is trained by scale invariant feature transform (SIFT). For a query image, a number of matching candidates are selected using k nearest neighbor at the first stage and passed on to the second stage for a refining classification to select the best matching result. The searching range is narrowed down dynamically at the second stage depending on the output of the first stage. Experimental results show that this method is promising by improving the recognition accuracy and reducing the computation time.
computer vision and pattern recognition | 2008
Tat-Jun Chin; Hanlin Goh; Joo-Hwee Lim
We investigate the task of efficiently training classifiers to build a robust place recognition system. We advocate an approach which involves densely capturing the facades of buildings and landmarks with video recordings to greedily accumulate as much visual information as possible. Our contributions include (1) a preprocessing step to effectively exploit the temporal continuity intrinsic in the video sequences to dramatically increase training efficiency, (2) training sparse classifiers discriminatively with the resulting data using the AdaBoost principle for place recognition, and (3) methods to speed up recognition using scaled kd-trees and to perform geometric validation on the results. Compared to straightforwardly applying scene recognition methods, our method not only allows a much faster training phase, the resulting classifiers are also more accurate. The sparsity of the classifiers also ensures good potential for recognition at high frame rates. We show extensive experimental results to validate our claims.
international conference on pattern recognition | 2008
Tat-Jun Chin; Hanlin Goh; Ngan Meng Tan
Using integral images for fast computation of sums of rectangular areas is very popular in computer vision. However the method does not extend naturally to rotations at arbitrary angles. We propose a novel solution to elegantly compute integral images at generic angles. Our method is exact in the sense that no approximations are used to derive it and it is vulnerable only to the unavoidable aliasing effects of discretization. Detailed experiments show that our method is more accurate than previously proposed ideas. We also demonstrate its usefulness by detecting 2D barcodes embedded in images.
international conference on multimedia retrieval | 2017
Jie Lin; Olivier Morère; Antoine Veillard; Ling-Yu Duan; Hanlin Goh; Vijay Chandrasekhar
This work focuses on representing very high-dimensional global image descriptors using very compact 64-1024 bit binary hashes for instance retrieval. We propose DeepHash: a hashing scheme based on deep networks. Key to making DeepHash work at extremely low bitrates are three important considerations -- regularization, depth and fine-tuning -- each requiring solutions specific to the hashing problem. In-depth evaluation shows that our scheme outperforms state-of-the-art methods over several benchmark datasets for both Fisher Vectors and Deep Convolutional Neural Network features, by up to 8.5% over other schemes. The retrieval performance with 256-bit hashes is close to that of the uncompressed floating point features -- a remarkable 512x compression.
international conference on acoustics, speech, and signal processing | 2008
Tat-Jun Chin; Hanlin Goh; Joo-Hwee Lim
We investigate the task of efficiently modeling a scene to build a robust place recognition system. We propose an approach which involves densely capturing a place with video recordings to greedily cover as many viewpoints of the place as possible. Our contribution is a framework to (1) effectively exploit the temporal continuity intrinsic in the video sequences to reduce the amount of data to process without losing the unique visual information which describes a place, and (2) train discriminative classifiers with the reduced data for place recognition. We show that our method is more efficient and effective than straightforwardly applying scene or object category recognition methods on the video frames.
ieee international conference on cognitive informatics | 2009
Hanlin Goh; Joo Hwee Lim; Chai Quek
In this paper, we construct a neural-inspired computational model based on the representational capabilities of receptive fields. The proposed model, known as Shape Encoding Receptive Fields (SERF), is able to perform fast and accurate data classification and regression of multi-dimensional data. A SERF is a histogram structure that encodes the shape of multi-dimensional data relative to its center, in a manner similar to the neural coding of sensory stimulus by the receptive fields. The bins of this histogram represent a local region in an n-dimensional space. During the training phase, an ensemble of K SERF structures are initialized and data is summarized into the corresponding bins of each SERF structure. The collection of local data summaries makes each SERF a coarse nonlinear data predictor over the entire feature space. The output prediction of an unknown query is computed by the weighted aggregation of the hypotheses of the ensemble of K SERFs. In our series of experiments, we demonstrate the models superiority to perform fast and accurate data prediction.
international symposium on neural networks | 2008
Hanlin Goh; Joo-Hwee Lim; Chai Quek
Fuzzy associative conjuncted maps (FASCOM) is a fuzzy neural network that represents information by conjuncting fuzzy sets and associates them through a combination of unsupervised and supervised learning. The network first quantizes input and output feature maps using fuzzy sets. They are subsequently conjuncted to form antecedents and consequences, and associated to form fuzzy if-then rules. These associations are learnt through a learning process consisting of three consecutive phases. First, an unsupervised phase initializes based on information density the fuzzy membership functions that partition each feature map. Next, a supervised Hebbian learning phase encodes synaptic weights of the input-output associations. Finally, a supervised error reduction phase fine-tunes the fine-tunes the network and discovers the varying influence of an input dimension across output feature space. FASCOM was benchmarked against other prominent architectures using data taken from three nonlinear data estimation tasks and a real-world road traffic density prediction problem. The promising results compiled show significant improvements over the state-of-the-art for all four data prediction tasks.