Radu Tudor Ionescu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Radu Tudor Ionescu is active.

Explore More

Publication

Featured researches published by Radu Tudor Ionescu.

international conference on neural information processing | 2013

Challenges in Representation Learning: A Report on Three Machine Learning Contests

Ian J. Goodfellow; Dumitru Erhan; Pierre Carrier; Aaron C. Courville; Mehdi Mirza; Ben Hamner; Will Cukierski; Yichuan Tang; David Thaler; Dong-Hyun Lee; Yingbo Zhou; Chetan Ramaiah; Fangxiang Feng; Ruifan Li; Xiaojie Wang; Dimitris Athanasakis; John Shawe-Taylor; Maxim Milakov; John Park; Radu Tudor Ionescu; Marius Popescu; Cristian Grozea; James Bergstra; Jingjing Xie; Lukasz Romaszko; Bing Xu; Zhang Chuang; Yoshua Bengio

The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kind of knowledge can be gained from machine learning competitions.

Neural Networks | 2015

Challenges in representation learning

Ian J. Goodfellow; Dumitru Erhan; Pierre Carrier; Aaron C. Courville; Mehdi Mirza; Benjamin Hamner; William Cukierski; Yichuan Tang; David Thaler; Dong-Hyun Lee; Yingbo Zhou; Chetan Ramaiah; Fangxiang Feng; Ruifan Li; Xiaojie Wang; Dimitris Athanasakis; John Shawe-Taylor; Maxim Milakov; John Park; Radu Tudor Ionescu; Marius Popescu; Cristian Grozea; James Bergstra; Jingjing Xie; Lukasz Romaszko; Bing Xu; Zhang Chuang; Yoshua Bengio

The ICML 2013 Workshop on Challenges in Representation Learning(1) focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kind of knowledge can be gained from machine learning competitions.

empirical methods in natural language processing | 2014

Can characters reveal your native language? A language-independent approach to native language identification

Radu Tudor Ionescu; Marius Popescu; Aoife Cahill

A common approach in text mining tasks such as text categorization, authorship identification or plagiarism detection is to rely on features like words, part-of-speech tags, stems, or some other high-level linguistic features. In this work, an approach that uses character n-grams as features is proposed for the task of native language identification. Instead of doing standard feature selection, the proposed approach combines several string kernels using multiple kernel learning. Kernel Ridge Regression and Kernel Discriminant Analysis are independently used in the learning stage. The empirical results obtained in all the experiments conducted in this work indicate that the proposed approach achieves state of the art performance in native language identification, reaching an accuracy that is 1.7% above the top scoring system of the 2013 NLI Shared Task. Furthermore, the proposed approach has an important advantage in that it is language independent and linguistic theory neutral. In the cross-corpus experiment, the proposed approach shows that it can also be topic independent, improving the state of the art system by 32.3%.

PLOS ONE | 2012

An Efficient Rank Based Approach for Closest String and Closest Substring

Liviu P. Dinu; Radu Tudor Ionescu

This paper aims to present a new genetic approach that uses rank distance for solving two known NP-hard problems, and to compare rank distance with other distance measures for strings. The two NP-hard problems we are trying to solve are closest string and closest substring. For each problem we build a genetic algorithm and we describe the genetic operations involved. Both genetic algorithms use a fitness function based on rank distance. We compare our algorithms with other genetic algorithms that use different distance measures, such as Hamming distance or Levenshtein distance, on real DNA sequences. Our experiments show that the genetic algorithms based on rank distance have the best results.

Pattern Recognition Letters | 2015

PQ kernel

Radu Tudor Ionescu; Marius Popescu

We describe an efficient algorithm to compute the PQ kernel in the dual form.We provide open source implementations of the PQ kernel in Matlab and C/C++.We leverage the use of the PQ kernel for large vocabularies.We present extensive object recognition experiments using various kernels. Computer vision researchers have developed various learning methods based on the bag of words model for image related tasks, including image categorization and image retrieval. In this model, images are represented as histograms of visual words from a vocabulary that is obtained by clustering local image descriptors. Next, a classifier is trained on the data. Most often, the learning method is a kernel-based one. Various kernels, such as the linear kernel, the intersection kernel, the ?2 kernel or the Jensen-Shannon kernel, can be plugged into the kernel method. Recent results indicate that the novel PQ kernel of Ionescu and Popescu 8] seems to improve the accuracy over most of the state of the art kernels. The PQ kernel is inspired from a set of rank correlation statistics specific for ordinal data, that are based on counting concordant and discordant pairs among two variables. This paper describes an algorithm to compute the PQ kernel in O ( n APTARANORMAL log n ) time, based on merge sort. Matlab and C/C++ implementations are provided for future development and use at http://pq-kernel.herokuapp.com. Extensive object recognition experiments are conducted to compare the PQ kernel with other state of the art kernels on two benchmark data sets. The PQ kernel has the best results on both data sets, even when a spatial pyramid representation is used. In conclusion, the PQ kernel can be used to obtain a better pairwise similarity between visual word histograms, which, in turn, improves the object recognition accuracy of the bag of visual words system.

Computational Linguistics | 2016

String kernels for native language identification: Insights from behind the curtains

Radu Tudor Ionescu; Marius Popescu; Aoife Cahill

The most common approach in text mining classification tasks is to rely on features like words, part-of-speech tags, stems, or some other high-level linguistic features. Recently, an approach that uses only character p-grams as features has been proposed for the task of native language identification (NLI). The approach obtained state-of-the-art results by combining several string kernels using multiple kernel learning. Despite the fact that the approach based on string kernels performs so well, several questions about this method remain unanswered. First, it is not clear why such a simple approach can compete with far more complex approaches that take words, lemmas, syntactic information, or even semantics into account. Second, although the approach is designed to be language independent, all experiments to date have been on English. This work is an extensive study that aims to systematically present the string kernel approach and to clarify the open questions mentioned above.A broad set of native language identification experiments were conducted to compare the string kernels approach with other state-of-the-art methods. The empirical results obtained in all of the experiments conducted in this work indicate that the proposed approach achieves state-of-the-art performance in NLI, reaching an accuracy that is 1.7% above the top scoring system of the 2013 NLI Shared Task. Furthermore, the results obtained on both the Arabic and the Norwegian corpora demonstrate that the proposed approach is language independent. In the Arabic native language identification task, string kernels show an increase of more than 17% over the best accuracy reported so far. The results of string kernels on Norwegian native language identification are also significantly better than the state-of-the-art approach. In addition, in a cross-corpus experiment, the proposed approach shows that it can also be topic independent, improving the state-of-the-art system by 32.3%.To gain additional insights about the string kernels approach, the features selected by the classifier as being more discriminating are analyzed in this work. The analysis also offers information about localized language transfer effects, since the features used by the proposed model are p-grams of various lengths. The features captured by the model typically include stems, function words, and word prefixes and suffixes, which have the potential to generalize over purely word-based features. By analyzing the discriminating features, this article offers insights into two kinds of language transfer effects, namely, word choice (lexical transfer) and morphological differences. The goal of the current study is to give a full view of the string kernels approach and shed some light on why this approach works so well.

Natural Hazards | 2015

Flood risk perception along the Lower Danube river, Romania

Iuliana Armaş; Radu Tudor Ionescu; Cristina Nenciu Posner

Risk can be seen as both objective, quantifiable, and subjective, constructed at an individual level. This paper focuses on the latter and aims to explore flood perceptions in relation to socio-demographic variables and various economic measures. The data were drawn from four villages on the banks of the Danube using quantitative questionnaires, villages data sheet and in-depth semi-structured interviews. This mixed method approach allowed for ecologically sound findings. Inequality of income and capital are linked with variations of some risk perception dimensions such as disaster temporal proximity, perceived resilience, and also with a reluctance to think about the future and the dangers it might pose. Past floods are associated with most dimensions tested, including income, inequality, and whether the next flood appears to be imminent. Lower-income households expect some form of assistance not from the community, the church, or local authorities, but from the government. This highlights erosion of social values, or inter-household monetisation, as the other major issue, alongside inequality, faced by rural populations living on the banks of one of Europe’s greatest rivers.

symbolic and numeric algorithms for scientific computing | 2013

Local Rank Distance

Radu Tudor Ionescu

Researchers have developed a wide variety of methods for string data, that can be applied with success in different fields such as computational biology, natural language processing and so on. Such methods range from clustering techniques used to analyze the phylogenetic trees of different organisms, to kernel methods used to identify authorship or native language from text. Results of such methods are not perfect and can always be improved. Some of these methods are based on a distance or similarity measure for strings, such as Hamming, Levenshtein, Kendall-tau, rank distance, or string kernel. This paper aims to introduce a new distance measure, termed Local Rank Distance (LRD), inspired from the recently introduced Local Patch Dissimilarity for images. Designed to conform to more general principles and adapted to DNA strings, LRD comes to improve over state of the art methods for phylogenetic analysis. This paper shows two applications of LRD. The first application is the phylogenetic analysis of mammals. Experiments show that phylogenetic trees produced by LRD are better or at least similar to those reported in the literature. The second application is to identify native language of English learners. By working at character level, the proposed method is completely language independent and theory neutral. In conclusion, LRD can be used as a general approach to measure string similarity, despite being designed for DNA.

international conference on neural information processing | 2012

Local patch dissimilarity for images

Liviu P. Dinu; Radu Tudor Ionescu; Marius Popescu

This paper aims to introduce a new distance measure for images, called Local Patch Dissimilarity. This new distance measure is inspired from rank distance which is a distance measure for strings. The distance measure introduced in this paper is based on patches. There are many other patch-based techniques used in image processing. Patches contain contextual information and have advantages in terms of generalization. An algorithm that computes the Local Patch Dissimilarity between two images is presented in this work. Experiments show that the extension of rank distance to images has very good results in image classification, more precisely in handwritten digit recognition.

Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial) | 2017

Learning to Identify Arabic and German Dialects using Multiple Kernels.

Radu Tudor Ionescu; Andrei M. Butnaru

We present a machine learning approach for the Arabic Dialect Identification (ADI) and the German Dialect Identification (GDI) Closed Shared Tasks of the DSL 2017 Challenge. The proposed approach combines several kernels using multiple kernel learning. While most of our kernels are based on character p-grams (also known as n-grams) extracted from speech transcripts, we also use a kernel based on i-vectors, a low-dimensional representation of audio recordings, provided only for the Arabic data. In the learning stage, we independently employ Kernel Discriminant Analysis (KDA) and Kernel Ridge Regression (KRR). Our approach is shallow and simple, but the empirical results obtained in the shared tasks prove that it achieves very good results. Indeed, we ranked on the first place in the ADI Shared Task with a weighted F1 score of 76.32% (4.62% above the second place) and on the fifth place in the GDI Shared Task with a weighted F1 score of 63.67% (2.57% below the first place).

Explore More