Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Luis Mateus Rocha is active.

Publication


Featured researches published by Luis Mateus Rocha.


arXiv: Biological Physics | 2003

Singular Value Decomposition and Principal Component Analysis

Michael E. Wall; Andreas Rechtsteiner; Luis Mateus Rocha

This chapter describes gene expression analysis by Singular Value Decomposition (SVD), emphasizing initial characterization of the data. We describe SVD methods for visualization of gene expression data, representation of the data using a smaller number of variables, and detection of patterns in noisy gene expression data. In addition, we describe the precise relation between SVD analysis and Principal Component Analysis (PCA) when PCA is calculated using the covariance matrix, enabling our descriptions to apply equally well to either method. Our aim is to provide definitions, interpretations, examples, and references that will serve as resources for understanding and extending the application of SVD and PCA to gene expression analysis.


BMC Bioinformatics | 2011

The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text

Martin Krallinger; Miguel Vazquez; Florian Leitner; David Salgado; Andrew Chatr-aryamontri; Andrew Winter; Livia Perfetto; Leonardo Briganti; Luana Licata; Marta Iannuccelli; Luisa Castagnoli; Gianni Cesareni; Mike Tyers; Gerold Schneider; Fabio Rinaldi; Robert Leaman; Graciela Gonzalez; Sérgio Matos; Sun Kim; W. John Wilbur; Luis Mateus Rocha; Hagit Shatkay; Ashish V. Tendulkar; Shashank Agarwal; Feifan Liu; Xinglong Wang; Rafal Rak; Keith Noto; Charles Elkan; Zhiyong Lu

BackgroundDetermining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them.ResultsA total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthews Correlation Coefficient (MCC) score measured was 0.55 at an accuracy of 89% and the best AUC iP/R was 68%. Most ACT teams explored machine learning methods, some of them also used lexical resources like MeSH terms, PSI-MI concepts or particular lists of verbs and nouns, some integrated NER approaches. For the IMT, a total of 42 runs were evaluated by comparing systems against manually generated annotations done by curators from the BioGRID and MINT databases. The highest AUC iP/R achieved by any run was 53%, the best MCC score 0.55. In case of competitive systems with an acceptable recall (above 35%) the macro-averaged precision ranged between 50% and 80%, with a maximum F-Score of 55%.ConclusionsThe results of the ACT task of BioCreative III indicate that classification of large unbalanced article collections reflecting the real class imbalance is still challenging. Nevertheless, text-mining tools that report ranked lists of relevant articles for manual selection can potentially reduce the time needed to identify half of the relevant articles to less than 1/4 of the time when compared to unranked results. Detecting associations between full text articles and interaction detection method PSI-MI terms (IMT) is more difficult than might be anticipated. This is due to the variability of method term mentions, errors resulting from pre-processing of articles provided as PDF files, and the heterogeneity and different granularity of method term concepts encountered in the ontology. However, combining the sophisticated techniques developed by the participants with supporting evidence strings derived from the articles for human interpretation could result in practical modules for biological annotation workflows.


Archive | 1998

Selected Self-Organization and the Semiotics of Evolutionary Systems

Luis Mateus Rocha

Heinz von Foerster (1965, 1969, 1977) equated the ability of an organization to classify its environment with the notion of eigenbehavior. He postulated the existence of some stable structures (eigenvalues) which are maintained in the operations of an organization’s dynamics. Following Piaget (von Foerster, 1977), he observed that any specific instance of observation of such an organization will still be the result of an indefinite succession of cognitive/sensory-motor operations. This reiterated the constructivist position that observables do not refer directly to real world objects, but are instead the result of an infinite cascade of cognitive and sensory-motor operations in some environment/subject coupling. Eigenvalues are self-defining, or self-referent, through the imbedding dynamics — implying a complementary relationship (circularity, closure) between eigen-values and cognitive/sensory-motor operators: one implies, or defines, the other. “Eigenvalues represent the externally observable manifestations of the (introspectively accessible) cognitive (operations)” (ibid., p. 278, italics added). Further, “Ontologically, Eigenvalues and objects, and likewise, ontogenetically, stable behavior and the manifestation of a subject’s ‘grasp’ of an object cannot be distinguished” (ibid., p. 280). Eigenbehavior is thus used to define the behavior of autonomous, cognitive systems, which through the closure (self-referential recursion) of the sensory-motor interactions in their nervous systems, give rise to perceptual regularities as objects (Varela, 1979, ch. 13).


PLOS ONE | 2015

Extraction of pharmacokinetic evidence of drug-drug interactions from the literature.

Artemy Kolchinsky; Anália Lourenço; Heng-Yi Wu; Lang Li; Luis Mateus Rocha

Drug-drug interaction (DDI) is a major cause of morbidity and mortality and a subject of intense scientific interest. Biomedical literature mining can aid DDI research by extracting evidence for large numbers of potential interactions from published literature and clinical databases. Though DDI is investigated in domains ranging in scale from intracellular biochemistry to human populations, literature mining has not been used to extract specific types of experimental evidence, which are reported differently for distinct experimental goals. We focus on pharmacokinetic evidence for DDI, essential for identifying causal mechanisms of putative interactions and as input for further pharmacological and pharmacoepidemiology investigations. We used manually curated corpora of PubMed abstracts and annotated sentences to evaluate the efficacy of literature mining on two tasks: first, identifying PubMed abstracts containing pharmacokinetic evidence of DDIs; second, extracting sentences containing such evidence from abstracts. We implemented a text mining pipeline and evaluated it using several linear classifiers and a variety of feature transforms. The most important textual features in the abstract and sentence classification tasks were analyzed. We also investigated the performance benefits of using features derived from PubMed metadata fields, various publicly available named entity recognizers, and pharmacokinetic dictionaries. Several classifiers performed very well in distinguishing relevant and irrelevant abstracts (reaching F1≈0.93, MCC≈0.74, iAUC≈0.99) and sentences (F1≈0.76, MCC≈0.65, iAUC≈0.83). We found that word bigram features were important for achieving optimal classifier performance and that features derived from Medical Subject Headings (MeSH) terms significantly improved abstract classification. We also found that some drug-related named entity recognition tools and dictionaries led to slight but significant improvements, especially in classification of evidence sentences. Based on our thorough analysis of classifiers and feature transforms and the high classification performance achieved, we demonstrate that literature mining can aid DDI discovery by supporting automatic extraction of specific types of experimental evidence.


PLOS ONE | 2015

Computational Fact Checking from Knowledge Networks.

Giovanni Luca Ciampaglia; Prashant Shiralkar; Luis Mateus Rocha; Johan Bollen; Filippo Menczer; Alessandro Flammini

Traditional fact checking by expert journalists cannot keep up with the enormous volume of information that is now generated online. Computational fact checking may significantly enhance our ability to evaluate the veracity of dubious information. Here we show that the complexities of human fact checking can be approximated quite well by finding the shortest path between concept nodes under properly defined semantic proximity metrics on knowledge graphs. Framed as a network problem this approach is feasible with efficient computational techniques. We evaluate this approach by examining tens of thousands of claims related to history, entertainment, geography, and biographical information using a public knowledge graph extracted from Wikipedia. Statements independently known to be true consistently receive higher support via our method than do false ones. These findings represent a significant step toward scalable computational fact-checking methods that may one day mitigate the spread of harmful misinformation.


BioSystems | 2001

Evolution with material symbol systems.

Luis Mateus Rocha

Pattees semantic closure principle is used to study the characteristics and requirements of evolving material symbols systems. By contrasting agents that reproduce via genetic variation with agents that reproduce via self-inspection, we reach the conclusion that symbols are necessary to attain open-ended evolution, but only if the phenotypes of agents are the result of a material, self-organization process. This way, a study of the inter-dependencies of symbol and matter is presented. This study is based first on a theoretical treatment of symbolic representations, and secondly on simulations of simple agents with matter-symbol inter-dependencies. The agent-based simulations use evolutionary algorithms with indirectly encoded phenotypes. The indirect encoding is based on Fuzzy Development programs, which are procedures for combining fuzzy sets in such a way as to model self-organizing development processes.


Artificial Life | 2005

Material Representations: From the Genetic Code to the Evolution of Cellular Automata

Luis Mateus Rocha; Wim Hordijk

We present a new definition of the concept of representation for cognitive science that is based on a study of the origin of structures that are used to store memory in evolving systems. This study consists of novel computer experiments in the evolution of cellular automata to perform nontrivial tasks as well as evidence from biology concerning genetic memory. Our key observation is that representations require inert structures to encode information used to construct appropriate dynamic configurations for the evolving system. We propose criteria to decide if a given structure is a representation by unpacking the idea of inert structures that can be used as memory for arbitrary dynamic configurations. Using a genetic algorithm, we evolved cellular automata rules that can perform nontrivial tasks related to the density task (or majority classification problem) commonly used in the literature. We present the particle catalogs of the new rules following the computational mechanics framework. We discuss if the evolved cellular automata particles may be seen as representations according to our criteria. We show that while they capture some of the essential characteristics of representations, they lack an essential one. Our goal is to show that artificial life can be used to shed new light on the computation-versus-dynamics debate in cognitive science, and indeed function as a constructive bridge between the two camps. Our definitions of representation and cellular automata experiments are proposed as a complementary approach, with both dynamics and informational modes of explanation.


Systems Research | 1996

Eigenbehavior and symbols

Luis Mateus Rocha

In this paper I sketch a rough taxonomy of self-organization which may be of relevance in the study of cognitive and biological systems. I frame the problem both in terms of the language Heinz von Foerster used to formulate much of second-order cybernetics as well as the language of current theories of self-organization and complexity. In particular, I defend the position that, on the one hand, selforganization alone is not rich enough for our intended simulations, and on the other, that genetic selection in biology and symbolic representation in cognitive science alone leave out the very important (self-organizing) characteristics of particular embodiments of evolving and learning systems. I propose the acceptance of the full concept of symbol with its syntactic, semantic, and pragmatic dimensions. I argue that the syntax should be treated operationally in second-order cybernetics.


BMC Bioinformatics | 2005

Protein annotation as term categorization in the gene ontology using word proximity networks

Karin Verspoor; Judith D. Cohn; Cliff Joslyn; Susan M. Mniszewski; Andreas Rechtsteiner; Luis Mateus Rocha; Tiago Simas

BackgroundWe participated in the BioCreAtIvE Task 2, which addressed the annotation of proteins into the Gene Ontology (GO) based on the text of a given document and the selection of evidence text from the document justifying that annotation. We approached the task utilizing several combinations of two distinct methods: an unsupervised algorithm for expanding words associated with GO nodes, and an annotation methodology which treats annotation as categorization of terms from a proteins document neighborhood into the GO.ResultsThe evaluation results indicate that the method for expanding words associated with GO nodes is quite powerful; we were able to successfully select appropriate evidence text for a given annotation in 38% of Task 2.1 queries by building on this method. The term categorization methodology achieved a precision of 16% for annotation within the correct extended family in Task 2.2, though we show through subsequent analysis that this can be improved with a different parameter setting. Our architecture proved not to be very successful on the evidence text component of the task, in the configuration used to generate the submitted results.ConclusionThe initial results show promise for both of the methods we explored, and we are planning to integrate the methods more closely to achieve better results overall.


european conference on artificial life | 1995

Contextual Genetic Algorithms: Evolving Developmental Rules

Luis Mateus Rocha

A genetic algorithm scheme with a stochastic genotype/phenotype relation is proposed. The mechanisms responsible for this intermediate level of uncertainty, are inspired by the biological system of RNA editing found in a variety of organisms. In biological systems, RNA editing represents a significant and potentially regulatory step in gene expression. The artificial algorithm here presented, will propose the evolution of such regulatory steps as an aid to the modeling of differentiated development of artificial organisms according to environmental, contextual, constraints. This mechanism of genetic string editing will then be utilized in the definition of a genetic algorithm scheme, with good scaling and evolutionary properties, in which phenotypes are represented by mathematical structures based on fuzzy set and evidence theories.

Collaboration


Dive into the Luis Mateus Rocha's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Artemy Kolchinsky

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Cliff Joslyn

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Johan Bollen

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar

Tiago Simas

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar

Manuel Marques-Pita

Instituto Gulbenkian de Ciência

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge