Orhan Firat
Middle East Technical University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Orhan Firat.
workshop on statistical machine translation | 2015
Sébastien Jean; Orhan Firat; Kyunghyun Cho; Roland Memisevic; Yoshua Bengio
Neural machine translation (NMT) systems have recently achieved results comparable to the state of the art on a few translation tasks, including English→French and English→German. The main purpose of the Montreal Institute for Learning Algorithms (MILA) submission to WMT’15 is to evaluate this new approach on a greater variety of language pairs. Furthermore, the human evaluation campaign may help us and the research community to better understand the behaviour of our systems. We use the RNNsearch architecture, which adds an attention mechanism to the encoderdecoder. We also leverage some of the recent developments in NMT, including the use of large vocabularies, unknown word replacement and, to a limited degree, the inclusion of monolingual language models.
empirical methods in natural language processing | 2016
Orhan Firat; Baskaran Sankaran; Yaser Al-Onaizan; Fatos T. Yarman-Vural; Kyunghyun Cho
In this paper, we propose a novel finetuning algorithm for the recently introduced multi-way, mulitlingual neural machine translate that enables zero-resource machine translation. When used together with novel many-to-one translation strategies, we empirically show that this finetuning algorithm allows the multi-way, multilingual model to translate a zero-resource language pair (1) as well as a single-pair neural translation model trained with up to 1M direct parallel sentences of the same language pair and (2) better than pivot-based translation strategy, while keeping only one additional copy of attention-related parameters.
ieee international conference on cognitive informatics and cognitive computing | 2013
Orhan Firat; Mete Ozay; Itir Onal; İlke Öztekiny; Fatos T. Yarman Vural
We propose a statistical learning model for classifying cognitive processes based on distributed patterns of neural activation in the brain, acquired via functional magnetic resonance imaging (fMRI). In the proposed learning machine, local meshes are formed around each voxel. The distance between voxels in the mesh is determined by using functional neighborhood concept. In order to define functional neighborhood, the similarities between the time series recorded for voxels are measured and functional connectivity matrices are constructed. Then, the local mesh for each voxel is formed by including the functionally closest neighboring voxels in the mesh. The relationship between the voxels within a mesh is estimated by using a linear regression model. These relationship vectors, called Functional Connectivity aware Local Relational Features (FC-LRF) are then used to train a statistical learning machine. The proposed method was tested on a recognition memory experiment, including data pertaining to encoding and retrieval of words belonging to ten different semantic categories. Two popular classifiers, namely k-Nearest Neighbor and Support Vector Machine, are trained in order to predict the semantic category of the item being retrieved, based on activation patterns during encoding. The classification performance of the Functional Mesh Learning model, which range in 62-68% is superior to the classical multi-voxel pattern analysis (MVPA) methods, which range in 40-48%, for ten semantic categories.
international conference on pattern recognition | 2014
Orhan Firat; Gulcan Can; Fatos T. Yarman Vural
The performance of object recognition and classification on remote sensing imagery is highly dependent on the quality of extracted features, amount of labelled data and the priors defined for contextual models. In this study, we examine the representation learning opportunities for remote sensing. First we attacked localization of contextual cues for complex object detection using disentangling factors learnt from a small amount of labelled data. The complex object, which consists of several sub-parts is further represented under the Conditional Markov Random Fields framework. As a second task, end-to-end target detection using convolutional sparse auto-encoders (CSA) using large amount of unlabelled data is analysed. Proposed methodologies are tested on complex airfield detection problem using Conditional Random Fields and recognition of dispersal areas, park areas, taxi routes, airplanes using CSA. The method is also tested on the detection of the dry docks in harbours. Performance of the proposed method is compared with standard feature engineering methods and found competitive with currently used rule-based and supervised methods.
international conference of the ieee engineering in medicine and biology society | 2013
Itir Onal; Mete Ozay; Orhan Firat; Ilke Öztekin; Fatos T. Yarman Vural
In this study, we propose a new method for analyzing and representing the distribution of discriminative information for data acquired via functional Magnetic Resonance Imaging (fMRI). For this purpose, we form a spatially local mesh with varying size, around each voxel, called the seed voxel. The relationship among each seed voxel and its neighbors is estimated using a linear regression model by minimizing the square error. Then, we estimate the optimal mesh size that represents the connections among each seed voxel and its surroundings by minimizing Akaikes Final Prediction Error (FPE) with respect to the mesh size. The degree of locality is represented by the optimum mesh size. Our results indicate that the local mesh size with the highest discriminative power varies across individual participants. The proposed method was tested on an fMRI study consisting of item recognition (IR) and judgment of recency (JOR) tasks. For each participant, the estimated arc weights of each local mesh with different mesh size are used to classify the type of memory judgment (i.e.IR or JOR). Classification accuracy for each participant was derived using k-Nearest Neighbor (k-NN) method. The results indicate that the proposed local mesh model with optimal mesh size can successfully represent discriminative information for neuroimaging data.
signal processing and communications applications conference | 2012
Orhan Firat; Mete Ozay; Itir Onal; Ilke Öztekin; Fatos T. Yarman Vural
The major goal of this study is to model the memory process using neural activation patterns in the brain. To achieve this goal, neural activation was acquired using functional Magnetic Resonance Imaging (fMRI) during memory encoding and retrieval. fMRI are known are trained for each class using a learning system. The most important component of this learning system is feature space. In this project, an original feature space for the fMRI data is proposed. This feature space is defined by a mesh network which models the relationship between voxels. In the suggested mesh network, the distance between voxels is determined by using physical and functional neighborhood concepts. For the functional neighborhood, the similarities between the time series, gained from voxels, are measured. With the proposed method, a data set with 10 classes is used for the encoding and retrieval processes, and the classifier is trained with the learning algorithms in order to predict the class the data belongs.
Computer Speech & Language | 2017
Caglar Gulcehre; Orhan Firat; Kelvin Xu; Kyunghyun Cho; Yoshua Bengio
Recent advances in end-to-end neural machine translation models have achieved promising results on high-resource language pairs such as En Fr and En De. One of the major factor behind these successes is the availability of high quality parallel corpora. We explore two strategies on leveraging abundant amount of monolingual data for neural machine translation. We observe improvements by both combining scores from neural language model trained only on target monolingual data with neural machine translation model and fusing hidden-states of these two models. We obtain up to 2 BLEU improvement over hierarchical and phrase-based baseline on low-resource language pair, Turkish English. Our method was initially motivated towards tasks with less parallel data, but we also show that it extends to high resource languages such as Cs En and De En translation tasks, where we obtain 0.39 and 0.47 BLEU improvements over the neural machine translation baselines, respectively.
signal processing and communications applications conference | 2012
Orhan Firat; Okan Tarhan Tursun; Fatos T. Yarman Vural
In literature, many target-specific methods are available for target detection on satellite images. Yet for many targets, intra-class variance is high. This situation results in decreased detection performance after generalization. Airfield is one of the targets with high intra-class variance in satellite images. This variance is caused by different compositions observed in airfields. Hence, approaches which aim at detecting airfields in specific regions and compositions are either unsuccessful or inapplicable to images taken from different regions. Context invariants make it possible to generalize target detection algorithms for varying target compositions and regions. In this study, context invariants are proposed for airfield region-of-interest detection and it is observed that context invariance plays an important role in developing robust and reliable algorithm for varying region, climate and compositions.
Computer Speech & Language | 2017
Orhan Firat; Kyunghyun Cho; Baskaran Sankaran; Fatos T. Yarman Vural; Yoshua Bengio
The first attention-based neural-MT for multi-way, multilingual translation is proposed.Multi-way multilingual model is tested on more than 8 languages (En, Fr, Cz, De, Ru, Fi, Tr and Uz).It achieves the translation quality comparable to single-pair NMTs with less parameters.Single attention mechanism supports to align between multiple pairs and directions.Outperforms conventional SMT system on low-resource translation tasks. We propose multi-way, multilingual neural machine translation. The proposed approach enables a single neural translation model to translate between multiple languages, with a number of parameters that grows only linearly with the number of languages. This is made possible by having a single attention mechanism that is shared across all language pairs. We train the proposed multi-way, multilingual model on ten language pairs from WMT15 simultaneously and observe clear performance improvements over models trained on only one language pair. We empirically evaluate the proposed model on low-resource language translation tasks. In particular, we observe that the proposed multilingual model outperforms strong conventional statistical machine translation systems on Turkish-English and Uzbek-English by incorporating the resources of other language pairs.
international conference on machine learning | 2015
Orhan Firat; Emre Aksan; Ilke Öztekin; Fatos T. Yarman Vural
Functional magnetic resonance imaging fMRI produces low number of samples in high dimensional vector spaces which is hardly adequate for brain decoding tasks. In this study, we propose a combination of autoencoding and temporal convolutional neural network architecture which aims to reduce the feature dimensionality along with improved classification performance. The proposed network learns temporal representations of voxel intensities at each layer of the network by leveraging unlabeled fMRI data with regularized autoencoders. Learned temporal representations capture the temporal regularities of the fMRI data and are observed to be an expressive bank of activation patterns. Then a temporal convolutional neural network with spatial pooling layers reduces the dimensionality of the learned representations. By employing the proposed method, raw input fMRI data is mapped to a low-dimensional feature space where the final classification is conducted. In addition, a simple decorrelated representation approach is proposed for tuning the model hyper-parameters. The proposed method is tested on a ten class recognition memory experiment with nine subjects. Results support the efficiency and potential of the proposed model, compared to the baseline multi-voxel pattern analysis techniques.