Julio Cesar Duarte
Pontifical Catholic University of Rio de Janeiro
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Julio Cesar Duarte.
Journal of the Brazilian Computer Society | 2008
Ruy Luiz Milidiú; Cícero Nogueira dos Santos; Julio Cesar Duarte
We present Entropy Guided Transformation Learning models for three Portuguese Language Processing tasks: Part-of-Speech Tagging, Noun Phrase Chunking and Named Entity Recognition. For Part-of-Speech Tagging, we separately use the Mac-Morpho Corpus and the Tycho Brahe Corpus. For Noun Phrase Chunking, we use the SNR-CLIC Corpus. For Named Entity Recognition, we separately use three corpora: HAREM, MiniHAREM and LearnNEC06.For each one of the tasks, the ETL modeling phase is quick and simple. ETL only requires the training set and no handcrafted templates. ETL also simplifies the incorporation of new input features, such as capitalization information, which are sucessfully used in the ETL based systems. Using the ETL approach, we obtain state-of-the-art competitive performance in all six corpora-based tasks. These results indicate that ETL is a suitable approach for the construction of Portuguese corpus-based systems.
ibero american conference on ai | 2006
Maria Claudia de Freitas; Julio Cesar Duarte; Cícero Nogueira dos Santos; Ruy Luiz Milidiú; Raúl P. Rentería; Violeta Quental
Appositives are structures composed by semantically related noun phrases. In Natural Language Processing, the identification of appositives contributes to the building of semantic lexicons, noun phrase coreference resolution and information extraction from texts. In this paper, we present an appositive identifier for the Portuguese language. We describe experimental results obtained by applying two machine learning techniques: Transformation-based learning (TBL) and Hidden Markov Models (HMM). The results obtained with these two techniques are compared with that of a full syntactic parser, PALAVRAS. The TBL-based system outperformed the other methods. This suggests that a machine learning approach can be beneficial for appositive identification, and also that TBL performs well for this language task.
Current Topics in Artificial Intelligence | 2007
Ruy Luiz Milidiú; Julio Cesar Duarte; Cícero Nogueira dos Santos
Transformation Based Learning (TBL) is an intensively Machine Learning algorithm frequently used in Natural Language Processing. TBL uses rule templates to identify error-correcting patterns. A critical requirement in TBL is the availability of a problem domain expert to build these rule templates. In this work, we propose an evolutionary approach based on Genetic Algorithms to automatically implement the template selection process. We show some empirical evidence that our approach provides template sets with almost the same quality as human built templates.
processing of the portuguese language | 2006
Ruy Luiz Milidiú; Cícero Nogueira dos Santos; Julio Cesar Duarte; Raúl P. Rentería
Semi-supervised learning is frequently used when we have a small labeled training set but a large set of unlabeled samples. In this paper, we combine Hidden Markov Models and Transformation Based Learning in a semi-supervised learning approach. Self-training and Co-training are the two semi-supervised techniques that we apply to our scheme in order to classify Portuguese noun phrases. Our main goal here is to show that we can achieve effective noun phrase extraction using fewer tagged examples by applying a semi-supervised technique. Our models show good improvement with a small labeled corpus and little with a large one.
Journal of the Brazilian Computer Society | 2007
Ruy Luiz Milidiú; Julio Cesar Duarte; Cícero Nogueira dos Santos
Transformation Based Learning (TBL) is a Machine Learning technique frequently used in some Natural Language Processing (NLP) tasks. TBL uses rule templates to identify error-correcting patterns. A critical requirement in TBL is the availability of a problem domain expert to build these rule templates. In this work, we propose an evolutionary approach based on Genetic Algorithms to automatically implement the template generation process. Additionally, we report our findings on five experiments with useful NLP tasks. We observe that our approach provides template sets with a mean loss of performance of 0.5% when compared to human built templates
data compression conference | 2003
Ruy Luiz Milidiú; Eduardo Sany Laber; Lorenza Moreno; Julio Cesar Duarte
Summary form only given. Prefix codes allow text to be decoded without ambiguity, since this code is a variable-length type where no codeword is a prefix of the other. The problem of improving the decoding speed has received special attention in the data compression community. A scheme that employs length-restricted codes to generate the codewords and table look-up is proposed. In order to reduce the bit manipulation, lexical expansion is introduced to decode more than one symbol in a single decoding step.
international conference on machine learning and cybernetics | 2009
Ruy Luiz Milidiú; Julio Cesar Duarte
Boosting is a machine learning technique that combines several weak classifiers to improve the overall accuracy. A well known algorithm based on boosting is AdaBoost. Boosting At Start (BAS) is a boosting framework that generalizes AdaBoost by allowing any initial weight distribution. BAS Committee is a scheme that uses feature clustering to determine the best weight assignments in the BAS framework. One of the drawbacks of BAS Committee is its final step which uses a simple Majority Voting approach over the chosen classifiers. Entropy Guided Transformation Learning (ETL) is a machine learning strategy that combines Decision Trees and Transformation Based Learning avoiding the explicit need of Template Design. Here, we present ETL Voting BAS Committee, a scheme that combines ETL and BAS Committee in order to determine the best combination for the classifiers of the ensemble. Besides that, since no extra assumption is made, ETL Voting is generic and can be used in any committee approach. Our empirical findings indicate that the BAS performance can be improved with a new combination of the classifiers determined by ETL Voting.
meeting of the association for computational linguistics | 2008
Ruy Luiz Milidiú; Cícero Nogueira dos Santos; Julio Cesar Duarte
Inteligencia Artificial,revista Iberoamericana De Inteligencia Artificial | 2007
Ruy Luiz Milidiú; Julio Cesar Duarte; Roberto Cavalcante
the european symposium on artificial neural networks | 2009
Ruy Luiz Milidiú; Julio Cesar Duarte