Lucelene Lopes
Pontifícia Universidade Católica do Rio Grande do Sul
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Lucelene Lopes.
international congress on big data | 2013
Joaquim Assunção; Paulo Fernandes; Lucelene Lopes; Silvio Normey
Some top data mining algorithms, as ensemble classifiers, may be inefficient to very large data set. This paper makes an initial proposal of a distributed ensemble classifier algorithm based on the popular Random Forests for Big Data. The proposed algorithm aims to improve the efficiency of the algorithm by a distributed processing model called MapReduce. At the same time, our proposed algorithm aims to reduce the randomness impact by following an algorithm called Stochastic Aware Random Forests - SARF.
Knowledge Based Systems | 2016
Lucelene Lopes; Paulo Fernandes; Renata Vieira
This paper proposes a new relevance index for terms extracted from domain corpora. We call it term frequency, disjoint corpora frequency (tf-dcf), and it is based on the absolute frequency of each term tempered by its frequency in other (contrasting) corpora. Conceptual differences and mathematical computation of the proposed index are discussed in respect with other similar approaches that also take contrasting corpora into account. To illustrate the efficiency of our index, this paper evaluates tf-dcf against other similar approaches. Finally, other experiments are made in order to analyze the tf-dcf behavior according to the characteristics of contrasting corpora.
Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) on | 2013
Vinicius H. Ferreira; Lucelene Lopes; Renata Vieira; Maria José Bocorny Finatto
This paper presents an automatic method to extract domain specific non-taxonomic relations from previously processed Brazilian Portuguese corpora. The proposed method is detailed and exemplified through a five corpora experiment. The obtained relations can be visualized and handled through an intuitive web interface and the results were evaluated by an human made analysis. The results show the positive performance of the extraction method and their perspectives for different kind of linguistic applications.
acm symposium on applied computing | 2010
Paulo Fernandes; Lucelene Lopes; Duncan D. Ruiz
The use of ensemble classifiers, e.g., Bagging and Boosting, is wide spread to machine learning. However, most of studies in this area are based on empirical comparisons that suffer from a lack of care to the randomness of these methods. This paper describes the dangers of experiments with ensemble classifiers by analyzing the efficiency of Bagging and Boosting methods over 32 different data sets. The experiments show that variations due to randomness are often more relevant than the advantages among methods encountered in the literature. This paper main contribution is the claim, supported by statistical analysis, that no empirical comparison of ensemble classifiers can be scientifically done without paying attention to the random choices taken.
Journal of the Brazilian Computer Society | 2015
Lucelene Lopes; Renata Vieira
BackgroundThis paper presents a policy to choose cutoff points to identify potentially relevant terms in a given domain. Term extraction methods usually generate term lists ordered according to a relevance criteria, and the literature is abundant to offer different relevance indices. However, very few studies turn their attention to how many terms should be kept, i.e., to a cutoff policy.MethodsOur proposed policy provides an estimation of the portion of this list which preserves a good balance between recall and precision, adopting a refined term extraction and tf-dcf relevance index.ResultsA practical study was conducted based on terms extracted from a Brazilian Portuguese corpus, and the results were quantitatively analyzed according to a previously defined reference list.ConclusionsEven thou different extraction procedures and different relevance indices could brought a different outcome, our policy seems to deliver a good balance for the method adopted in our experiments and it is likely to be able to be generalized to other methods.
processing of the portuguese language | 2012
Lucelene Lopes; Renata Vieira
This paper presents the evaluation of a set of heuristics to improve the quality of extracted terms from an annotated domain corpus written in Portuguese. The proposed heuristics start from part-of-speech and grammatical functional annotation of texts, identifying nouns and noun phrases that are the best candidates to be considered terms of the domain. These nouns and noun phrases are submitted to a set of approximative rules (heuristics) that may either discard some, accept others (removing words or not), or even discover implicit terms that can be inferred. The effectiveness of these heuristics is verified through a corpus experiment, on the basis of a reference list for which usual metrics are computed.
modeling, analysis, and simulation on computer and telecommunication systems | 2013
Paulo Fernandes; Lucelene Lopes; Sencer Yeralan
This paper describes a method to obtain symbolic solution of large stochastic models using Gauss-Jordan elimination. Such solution is an efficient alternative to standard simulations and it allows fast and exact solution of very large and complex models that are hard to be dealt even with iterative numerical methods. The proposed method assumes the system described as a structured (modular) Markovian system with discrete states for each system module and transitions among those states ruled by Markovian processes. The mathematical representation of such system is made by a Kronecker (Tensor) formula, i.e., a tensor formulation of small matrices representing each system module transitions and occasional dependencies among modules. Preliminary results of the proposed solution indicate the expected efficiency of the proposed solution.
ACM Transactions on Software Engineering and Methodology | 2016
Ricardo M. Czekster; Paulo Fernandes; Lucelene Lopes; Afonso Sales; Alan R. Santos; Thais Webber
Measuring productivity in globally distributed projects is crucial to improve team performance. These measures often display information on whether a given project is moving forward or starts to demonstrate undesired behaviors. In this paper we are interested in showing how analytical models could deliver insights for the behavior of specific distributed software collaboration projects. We present a model for distributed projects using stochastic automata networks (SAN) formalism to estimate, for instance, the required level of coordination for specific project configurations. We focus our attention on the level of interaction among project participants and its close relation with team’s productivity. The models are parameterized for different scenarios and solved using numerical methods to obtain exact solutions. We vary the team’s expertise and support levels to measure the impact on the overall project performance. As results, we present our derived productivity index for all scenarios and we state implications found in order to analyze popular preconceptions in GSD area, confirming some, and refusing others. Finally, we foresee ways to extend the models to represent more intricate behaviors and communication patterns that are usually present in globally distributed software projects.
Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) on | 2013
Paulo Fernandes; Luis Otávio de Colla Furquim; Lucelene Lopes
This paper proposes a method to enhance lexica by processing domain specific corpora. The proposed method relies on the identification of the more relevant unknown terms in each domain corpus. The innovative points of the proposed approach is to automatically detect unknown terms using MTMDD technology to handle lexical structures, and to automatically rank and identify domain specific terms using gini and tf-dcf indices. The proposed method is experimented in six corpora in order to illustrate its benefits.
web intelligence | 2016
Lucelene Lopes; Paulo Rodrigues Fernandes; Renata Vieira
This paper presents a novel version of ExATO, a term extractor originally designed to extract relevant terms from corpora in Portuguese. In this new version not only corpora in Portuguese can be handled, but also texts in English are accepted. This extension is likely to offer the same quality pattern already achieved for Portuguese. In this paper, we draw the analysis of results in parallel corpora with respect to the intrinsic differences between Portuguese and English languages, and also the environment of usage for ExATO for Portuguese and English corpora. A brief comparison of ExATO and other similar tool is presented to illustrate the higher quality of ExATO extraction from English corpora.