Andros Tjandra
Nara Institute of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Andros Tjandra.
international conference on advanced computer science and information systems | 2013
Erdefi Rakun; Mirna Andriani; I. Wayan Wiprayoga; Ken Danniswara; Andros Tjandra
The Sign System for Indonesian Language (SIBI) is a rather complex sign language. It has four components that distinguish the meaning of the sign language and it follows the syntax and the grammar of the Indonesian language. This paper proposes a model for recognizing the SIBI words by using Microsoft Kinect as the input sensor. This model is a part of automatic translation from SIBI to text. The features for each word are extracted from skeleton and color-depth data produced by Kinect. Skeleton data features indicate the angle between human joints and Cartesian axes. Color images are transformed to gray-scale and their features are extracted by using Discrete Cosine Transform (DCT) with Cross Correlation (CC) operation. The images depth features are extracted by running MATLAB regionprops function to get its region properties. The Generalized Learning Vector Quantization (GLVQ) and Random Forest (RF) training algorithm from WEKA data mining tools are used as the classifier of the model. Several experiments with different scenarios have shown that the highest accuracy (96,67%) is obtained by using 30 frames for skeleton combined with 20 frames for region properties image classified by Random Forest.
international conference on acoustics, speech, and signal processing | 2015
Andros Tjandra; Sakriani Sakti; Graham Neubig; Tomoki Toda; Mirna Adriani; Satoshi Nakamura
This paper explores the use of auditory features based on cochleograms; two dimensional speech features derived from gammatone filters within the convolutional neural network (CNN) framework. Furthermore, we also propose various possibilities to combine cochleogram features with log-mel filter banks or spectrogram features. In particular, we combine within low and high levels of CNN framework which we refer to as low-level and high-level feature combination. As comparison, we also construct the similar configuration with deep neural network (DNN). Performance was evaluated in the framework of hybrid neural network - hidden Markov model (NN-HMM) system on TIMIT phoneme sequence recognition task. The results reveal that cochleogram-spectrogram feature combination provides significant advantages. The best accuracy was obtained by high-level combination of two dimensional cochleogram-spectrogram features using CNN, achieved up to 8.2% relative phoneme error rate (PER) reduction from CNN single features or 19.7% relative PER reduction from DNN single features.
ieee automatic speech recognition and understanding workshop | 2015
Andros Tjandra; Sakriani Sakti; Satoshi Nakamura; Mirna Adriani
Many successful methods for training deep neural networks (DNN) rely on an unsupervised pretraining algorithm. It is particularly effective when the number of labeled training samples is not large enough, because pretraining method helps to initialize the parameter values in the appropriate range near a local good minimum, for further discriminative finetuning. However, while the improvement is impressive, training DNN is difficult because the objective function of DNN is highly non-convex function of the parameters. To avoid placing the parameter that generalizes poorly, a robust generative modelling is necessary. This paper explore an alternative of generative modelling for pretraining DNN-based acoustic modelling using Stochastic Gradient Variational Bayes (SGVB) within autoencoder framework called Variational Bayes Autoencoder (VBAE). It performs an efficient approximate inference and learning with directed probabilistic graphical models. During fine-tuning, probabilistic encoder parameters with latent variable components are then used in discriminative training for acoustic model. Here, we investigate the performances of DNN-based acoustic model using the proposed pretrained VBAE in comparison with widely used pretraining algorithms like Restricted Boltzmann Machine (RBM) and Stacked Denoising Autoencoder (SDAE). The results reveal that VBAE pretraining with Gaussian latent variables gave the best performance.
international joint conference on neural network | 2016
Andros Tjandra; Sakriani Sakti; Ruli Manurung; Mirna Adriani; Satoshi Nakamura
Recurrent Neural Networks (RNNs), which are a powerful scheme for modeling temporal and sequential data need to capture long-term dependencies on datasets and represent them in hidden layers with a powerful model to capture more information from inputs. For modeling long-term dependencies in a dataset, the gating mechanism concept can help RNNs remember and forget previous information. Representing the hidden layers of an RNN with more expressive operations (i.e., tensor products) helps it learn a more complex relationship between the current input and the previous hidden layer information. These ideas can generally improve RNN performances. In this paper, we proposed a novel RNN architecture that combine the concepts of gating mechanism and the tensor product into a single model. By combining these two concepts into a single RNN, our proposed models learn long-term dependencies by modeling with gating units and obtain more expressive and direct interaction between input and hidden layers using a tensor product on 3-dimensional array (tensor) weight parameters. We use Long Short Term Memory (LSTM) RNN and Gated Recurrent Unit (GRU) RNN and combine them with a tensor product inside their formulations. Our proposed RNNs, which are called a Long-Short Term Memory Recurrent Neural Tensor Network (LSTMRNTN) and Gated Recurrent Unit Recurrent Neural Tensor Network (GRURNTN), are made by combining the LSTM and GRU RNN models with the tensor product. We conducted experiments with our proposed models on word-level and character-level language modeling tasks and revealed that our proposed models significantly improved their performance compared to our baseline models.
international symposium on neural networks | 2017
Andros Tjandra; Sakriani Sakti; Satoshi Nakamura
Recurrent Neural Network (RNN) are a popular choice for modeling temporal and sequential tasks and achieve many state-of-the-art performance on various complex problems. However, most of the state-of-the-art RNNs have millions of parameters and require many computational resources for training and predicting new data. This paper proposes an alternative RNN model to reduce the number of parameters significantly by representing the weight parameters based on Tensor Train (TT) format. In this paper, we implement the TT-format representation for several RNN architectures such as simple RNN and Gated Recurrent Unit (GRU). We compare and evaluate our proposed RNN model with uncompressed RNN model on sequence classification and sequence prediction tasks. Our proposed RNNs with TT-format are able to preserve the performance while reducing the number of RNN parameters significantly up to 40 times smaller.
2015 International Conference on Technology, Informatics, Management, Engineering & Environment (TIME-E) | 2015
Erdefi Rakun; Mohammad Ivan Fanany; I. Wayan W Wisesa; Andros Tjandra
SIBI (Sistem Isyarat Bahasa Indonesia) is the commonly used sign language in Indonesia. SIBI, which follows Indonesian languages grammatical structure, is a complex and unique sign language. A method to recognize SIBI gestures in a rapid, precise and efficient manner needs to be developed for the SIBI machine translation system. Feature extraction method with space-efficient feature set and at the same time retained its capability to recognize different types of SIBI gestures is the ultimate goal. There are four types of SIBI gestures: root, affix, inflectional and function word gestures. This paper proposed to use heuristic Hidden Markov Model and a feature extraction system to separate inflectional gesture into its constituents, prefix, suffix and root. The separation reduces the amount of feature sets that would otherwise as big as the product of the prefixes, suffixes and root words feature sets of the inflectional word gestures.
2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) | 2017
Andros Tjandra; Sakriani Sakti; Satoshi Nakamura
arXiv: Computation and Language | 2017
Andros Tjandra; Sakriani Sakti; Satoshi Nakamura
international joint conference on natural language processing | 2017
Andros Tjandra; Sakriani Sakti; Satoshi Nakamura
international symposium on neural networks | 2018
Andros Tjandra; Sakriani Sakti; Satoshi Nakamura