Wellington Santos Martins

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wellington Santos Martins is active.

Explore More

Publication

Featured researches published by Wellington Santos Martins.

Bioinformation | 2009

WebSat ‐ A web software for microsatellite marker development

Wellington Santos Martins; Divino César Soares Lucas; Kelligton Fabricio de Souza Neves; David J. Bertioli

Simple sequence repeats (SSR), also known as microsatellites, have been extensively used as molecular markers due to their abundance and high degree of polymorphism. We have developed a simple to use web software, called WebSat, for microsatellite molecular marker prediction and development. WebSat is accessible through the Internet, requiring no program installation. Although a web solution, it makes use of Ajax techniques, providing a rich, responsive user interface. WebSat allows the submission of sequences, visualization of microsatellites and the design of primers suitable for their amplification. The program allows full control of parameters and the easy export of the resulting data, thus facilitating the development of microsatellite markers. Availability The web tool may be accessed at http://purl.oclc.org/NET/websat/

BMC Microbiology | 2007

The transcriptome analysis of early morphogenesis in Paracoccidioides brasiliensis mycelium reveals novel and induced genes potentially associated to the dimorphic process

Karinne P Bastos; Alexandre M. Bailão; Clayton Luiz Borges; Fabrícia P. de Faria; Maria Ss Felipe; Mirelle Garcia Silva; Wellington Santos Martins; Rogério Bento Fiúza; Maristela Pereira; Célia Ma Soares

BackgroundParacoccidioides brasiliensis is a human pathogen with a broad distribution in Latin America. The fungus is thermally dimorphic with two distinct forms corresponding to completely different lifestyles. Upon elevation of the temperature to that of the mammalian body, the fungus adopts a yeast-like form that is exclusively associated with its pathogenic lifestyle. We describe expressed sequence tags (ESTs) analysis to assess the expression profile of the mycelium to yeast transition. To identify P. brasiliensis differentially expressed sequences during conversion we performed a large-scale comparative analysis between P. brasiliensis ESTs identified in the transition transcriptome and databases.ResultsOur analysis was based on 1107 ESTs from a transition cDNA library of P. brasiliensis. A total of 639 consensus sequences were assembled. Genes of primary metabolism, energy, protein synthesis and fate, cellular transport, biogenesis of cellular components were represented in the transition cDNA library. A considerable number of genes (7.51%) had not been previously reported for P. brasiliensis in public databases. Gene expression analysis using in silico EST subtraction revealed that numerous genes were more expressed during the transition phase when compared to the mycelial ESTs [1]. Classes of differentially expressed sequences were selected for further analysis including: genes related to the synthesis/remodeling of the cell wall/membrane. Thirty four genes from this family were induced. Ten genes related to signal transduction were increased. Twelve genes encoding putative virulence factors manifested increased expression. The in silico approach was validated by northern blot and semi-quantitative RT-PCR.ConclusionThe developmental program of P. brasiliensis is characterized by significant differential positive modulation of the cell wall/membrane related transcripts, and signal transduction proteins, suggesting the related processes important contributors to dimorphism. Also, putative virulence factors are more expressed in the transition process suggesting adaptation to the host of the yeast incoming parasitic phase. Those genes provide ideal candidates for further studies directed at understanding fungal morphogenesis and its regulation.

Nucleic Acids Research | 2006

New softwares for automated microsatellite marker development

Wellington Santos Martins; Daniel de Sousa; Karina Proite; Patricia M. Guimarães; Márcio C. Moretzsohn; David J. Bertioli

Microsatellites are repeated small sequence motifs that are highly polymorphic and abundant in the genomes of eukaryotes. Often they are the molecular markers of choice. To aid the development of microsatellite markers we have developed a module that integrates a program for the detection of microsatellites (TROLL), with the sequence assembly and analysis software, the Staden Package. The module has easily adjustable parameters for microsatellite lengths and base pair quality control. Starting with large datasets of unassembled sequence data in the form of chromatograms and/or text data, it enables the creation of a compact database consisting of the processed and assembled microsatellite containing sequences. For the final phase of primer design, we developed a program that accepts the multi-sequence ‘experiment file’ format as input and produces a list of primer pairs for amplification of microsatellite markers. The program can take into account the quality values of consensus bases, improving success rate of primer pairs in PCR. The software is freely available and simple to install in both Windows and Unix-based operating systems. Here we demonstrate the software by developing primer pairs for 427 new candidate markers for peanut.

BMC Genetics | 2011

The characterization of a new set of EST-derived simple sequence repeat (SSR) markers as a resource for the genetic analysis of Phaseolus vulgaris

Robertha Av Garcia; Priscila N Rangel; Claudio Brondani; Wellington Santos Martins; Leonardo Cunha Melo; Monalisa Sampaio Carneiro; Tereza Co Borba; Rosana Pv Brondani

BackgroundOver recent years, a growing effort has been made to develop microsatellite markers for the genomic analysis of the common bean (Phaseolus vulgaris) to broaden the knowledge of the molecular genetic basis of this species. The availability of large sets of expressed sequence tags (ESTs) in public databases has given rise to an expedient approach for the identification of SSRs (Simple Sequence Repeats), specifically EST-derived SSRs. In the present work, a battery of new microsatellite markers was obtained from a search of the Phaseolus vulgaris EST database. The diversity, degree of transferability and polymorphism of these markers were tested.ResultsFrom 9,583 valid ESTs, 4,764 had microsatellite motifs, from which 377 were used to design primers, and 302 (80.11%) showed good amplification quality. To analyze transferability, a group of 167 SSRs were tested, and the results showed that they were 82% transferable across at least one species. The highest amplification rates were observed between the species from the Phaseolus (63.7%), Vigna (25.9%), Glycine (19.8%), Medicago (10.2%), Dipterix (6%) and Arachis (1.8%) genera. The average PIC (Polymorphism Information Content) varied from 0.53 for genomic SSRs to 0.47 for EST-SSRs, and the average number of alleles per locus was 4 and 3, respectively. Among the 315 newly tested SSRs in the BJ (BAT93 X Jalo EEP558) population, 24% (76) were polymorphic. The integration of these segregant loci into a framework map composed of 123 previously obtained SSR markers yielded a total of 199 segregant loci, of which 182 (91.5%) were mapped to 14 linkage groups, resulting in a map length of 1,157 cM.ConclusionsA total of 302 newly developed EST-SSR markers, showing good amplification quality, are available for the genetic analysis of Phaseolus vulgaris. These markers showed satisfactory rates of transferability, especially between species that have great economic and genomic values. Their diversity was comparable to genomic SSRs, and they were incorporated in the common bean reference genetic map, which constitutes an important contribution to and advance in Phaseolus vulgaris genomic research.

conference on information and knowledge management | 2015

Parallel Lazy Semi-Naive Bayes Strategies for Effective and Efficient Document Classification

Felipe Viegas; Marcos André Gonçalves; Wellington Santos Martins; Leonardo Cristian Rocha

Automatic Document Classification (ADC) is the basis of many important applications such as spam filtering and content organization. Naive Bayes (NB) approaches are a widely used classification paradigm, due to their simplicity, efficiency, absence of parameters and effectiveness. However, they do not present competitive effectiveness when compared to other modern statistical learning methods, such as SVMs. This is related to some characteristics of real document collections, such as class imbalance, feature sparseness and strong relationships among attributes. In this paper, we investigate whether the relaxation of the NB feature independence assumption (aka, Semi-NB approaches) can improve its effectiveness in large text collections. We propose four new Lazy Semi-NB strategies that exploit different ideas for alleviating the NB independence assumption. By being lazy, our solutions focus only on the most important features to classify a given test document, overcoming some Semi-NB issues when applied to ADC such as bias towards larger classes and overfitting and/or lack of generalization of the models. We demonstrate that our Lazy Semi-NB proposals can produce superior effectiveness when compared to state-of-the-art ADC classifiers such as SVM and KNN. Moreover, to overcome some efficiency issues of combining Semi-NB and lazy strategies, we take advantage of current manycore GPU architectures and present a massively parallelized version of the Semi-NB approaches. Our experimental results show that speedups of up to 63.36 times can be obtained when compared to serial solutions, making our proposals very practical in real-situations.

international acm sigir conference on research and development in information retrieval | 2015

An Efficient and Scalable MetaFeature-based Document Classification Approach based on Massively Parallel Computing

Sérgio D. Canuto; Marcos André Gonçalves; Wisllay M. V. dos Santos; Thierson Couto Rosa; Wellington Santos Martins

The unprecedented growth of available data nowadays has stimulated the development of new methods for organizing and extracting useful knowledge from this immense amount of data. Automatic Document Classification (ADC) is one of such methods, that uses machine learning techniques to build models capable of automatically associating documents to well-defined semantic classes. ADC is the basis of many important applications such as language identification, sentiment analysis, recommender systems, spam filtering, among others. Recently, the use of meta-features has been shown to substantially improve the effectiveness of ADC algorithms. In particular, the use of meta-features that make a combined use of local information (through kNN-based features) and global information (through category centroids) has produced promising results. However, the generation of these meta-features is very costly in terms of both, memory consumption and runtime since there is the need to constantly call the kNN algorithm. We take advantage of the current manycore GPU architecture and present a massively parallel version of the kNN algorithm for highly dimensional and sparse datasets (which is the case for ADC). Our experimental results show that we can obtain speedup gains of up to 15x while reducing memory consumption in more than 5000x when compared to a state-of-the-art parallel baseline. This opens up the possibility of applying meta-features based classification in large collections of documents, that would otherwise take too much time or require the use of an expensive computational platform.

conference on information and knowledge management | 2014

On Efficient Meta-Level Features for Effective Text Classification

Sérgio D. Canuto; Thiago Salles; Marcos André Gonçalves; Leonardo C. da Rocha; Gabriel Ramos; Luiz Alberto Oliveira Gonçalves; Thierson Couto Rosa; Wellington Santos Martins

This paper addresses the problem of automatically learning to classify texts by exploiting information derived from meta-level features (i.e., features derived from the original bag-of-words representation). We propose new meta-level features derived from the class distribution, the entropy and the within-class cohesion observed in the k nearest neighbors of a given test document x, as well as from the distribution of distances of x to these neighbors. The set of proposed features is capable of transforming the original feature space into a new one, potentially smaller and more informed. Experiments performed with several standard datasets demonstrate that the effectiveness of the proposed meta-level features is not only much superior than the traditional bag-of-word representation but also superior to other state-of-art meta-level features previously proposed in the literature. Moreover, the proposed meta-features can be computed about three times faster than the existing meta-level ones, making our proposal much more scalable. We also demonstrate that the combination of our meta features and the original set of features produce significant improvements when compared to each feature set used in isolation.

web information systems engineering | 2012

Improving on-demand learning to rank through parallelism

Daniel Xavier de Sousa; Thierson Couto Rosa; Wellington Santos Martins; Rodrigo M. Silva; Marcos André Gonçalves

Traditional Learning to Rank (L2R) is usually conducted in a batch mode in which a single ranking function is learned to order results for future queries. This approach is not flexible since future queries may differ considerably from those present in the training set and, consequently, the learned function may not work properly. Ideally, a distinct learning function should be learned on demand for each query. Nevertheless, on-demand L2R may significantly degrade the query processing time, as the ranking function has to be learned on-the-fly before it can be applied. In this paper we present a parallel implementation of an on-demand L2R technique that reduces drastically the response time of previous serial implementation. Our implementation makes use of thousands of threads of a GPU to learn a ranking function for each query, and takes advantage of a reduced training set obtained through active learning. Experiments with the LETOR benchmark show that our proposed approach achieves a mean speedup of 127x in query processing time when compared to the sequential version, while producing very competitive ranking effectiveness.

BMC Bioinformatics | 2013

SUNPLIN: Simulation with Uncertainty for Phylogenetic Investigations

Wellington Santos Martins; Welton Couto Carmo; Thierson Couto Rosa; Thiago Fernando Rangel

BackgroundPhylogenetic comparative analyses usually rely on a single consensus phylogenetic tree in order to study evolutionary processes. However, most phylogenetic trees are incomplete with regard to species sampling, which may critically compromise analyses. Some approaches have been proposed to integrate non-molecular phylogenetic information into incomplete molecular phylogenies. An expanded tree approach consists of adding missing species to random locations within their clade. The information contained in the topology of the resulting expanded trees can be captured by the pairwise phylogenetic distance between species and stored in a matrix for further statistical analysis. Thus, the random expansion and processing of multiple phylogenetic trees can be used to estimate the phylogenetic uncertainty through a simulation procedure. Because of the computational burden required, unless this procedure is efficiently implemented, the analyses are of limited applicability.ResultsIn this paper, we present efficient algorithms and implementations for randomly expanding and processing phylogenetic trees so that simulations involved in comparative phylogenetic analysis with uncertainty can be conducted in a reasonable time. We propose algorithms for both randomly expanding trees and calculating distance matrices. We made available the source code, which was written in the C++ language. The code may be used as a standalone program or as a shared object in the R system. The software can also be used as a web service through the link: http://purl.oclc.org/NET/sunplin/.ConclusionWe compare our implementations to similar solutions and show that significant performance gains can be obtained. Our results open up the possibility of accounting for phylogenetic uncertainty in evolutionary and ecological analyses of large datasets.

international conference on enterprise information systems | 2017

fgssjoin: A GPU-based Algorithm for Set Similarity Joins.

Rafael David Quirino; Sidney R. Junior; Leonardo Andrade Ribeiro; Wellington Santos Martins

Set similarity join is a core operation for text data integration, cleaning and mining. Most state-of-the-art solutions rely on inherently sequential, CPU-based algorithms. In this paper we propose a parallel algorithm for the set similarity join problem, harnessing the power of GPU systems through filtering techniques and divide-and-conquer strategies that scales well with data size. Experiments show substantial speedups over the fastest algorithms in literature.

Explore More