Luis Tari | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Luis Tari is active.

Explore More

Publication

Featured researches published by Luis Tari.

Bioinformatics | 2010

Discovering drug–drug interactions

Luis Tari; Saadat Anwar; Shanshan Liang; James Cai; Chitta Baral

Motivation: Identifying drug–drug interactions (DDIs) is a critical process in drug administration and drug development. Clinical support tools often provide comprehensive lists of DDIs, but they usually lack the supporting scientific evidences and different tools can return inconsistent results. In this article, we propose a novel approach that integrates text mining and automated reasoning to derive DDIs. Through the extraction of various facts of drug metabolism, not only the DDIs that are explicitly mentioned in text can be extracted but also the potential interactions that can be inferred by reasoning. Results: Our approach was able to find several potential DDIs that are not present in DrugBank. We manually evaluated these interactions based on their supporting evidences, and our analysis revealed that 81.3% of these interactions are determined to be correct. This suggests that our approach can uncover potential DDIs with scientific evidences explaining the mechanism of the interactions. Contact: [email protected]

Journal of Biomedical Informatics | 2009

Fuzzy c-means clustering with prior biological knowledge

Luis Tari; Chitta Baral; Seungchan Kim

We propose a novel semi-supervised clustering method called GO Fuzzy c-means, which enables the simultaneous use of biological knowledge and gene expression data in a probabilistic clustering algorithm. Our method is based on the fuzzy c-means clustering algorithm and utilizes the Gene Ontology annotations as prior knowledge to guide the process of grouping functionally related genes. Unlike traditional clustering methods, our method is capable of assigning genes to multiple clusters, which is a more appropriate representation of the behavior of genes. Two datasets of yeast (Saccharomyces cerevisiae) expression profiles were applied to compare our method with other state-of-the-art clustering methods. Our experiments show that our method can produce far better biologically meaningful clusters even with the use of a small percentage of Gene Ontology annotations. In addition, our experiments further indicate that the utilization of prior knowledge in our method can predict gene functions effectively. The source code is freely available at http://sysbio.fulton.asu.edu/gofuzzy/.

IEEE Transactions on Knowledge and Data Engineering | 2012

Incremental Information Extraction Using Relational Databases

Luis Tari; Phan Huy Tu; Jörg Hakenberg; Yi Chen; Tran Cao Son; Graciela Gonzalez; Chitta Baral

Information extraction systems are traditionally implemented as a pipeline of special-purpose processing modules targeting the extraction of a particular kind of information. A major drawback of such an approach is that whenever a new extraction goal emerges or a module is improved, extraction has to be reapplied from scratch to the entire text corpus even though only a small part of the corpus might be affected. In this paper, we describe a novel approach for information extraction in which extraction needs are expressed in the form of database queries, which are evaluated and optimized by database systems. Using database queries for information extraction enables generic extraction and minimizes reprocessing of data by performing incremental extraction to identify which part of the data is affected by the change of components or goals. Furthermore, our approach provides automated query generation components so that casual users do not have to learn the query language in order to perform extraction. To demonstrate the feasibility of our incremental extraction approach, we performed experiments to highlight two important aspects of an information extraction system: efficiency and quality of extraction results. Our experiments show that in the event of deployment of a new module, our incremental extraction approach reduces the processing time by 89.64 percent as compared to a traditional pipeline approach. By applying our methods to a corpus of 17 million biomedical abstracts, our experiments show that the query performance is efficient for real-time applications. Our experiments also revealed that our approach achieves high quality extraction results.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2010

Efficient Extraction of Protein-Protein Interactions from Full-Text Articles

Jörg Hakenberg; Robert Leaman; Nguyen Ha Vo; Siddhartha Jonnalagadda; Ryan Sullivan; Christopher M. Miller; Luis Tari; Chitta Baral; Graciela Gonzalez

Proteins and their interactions govern virtually all cellular processes, such as regulation, signaling, metabolism, and structure. Most experimental findings pertaining to such interactions are discussed in research papers, which, in turn, get curated by protein interaction databases. Authors, editors, and publishers benefit from efforts to alleviate the tasks of searching for relevant papers, evidence for physical interactions, and proper identifiers for each protein involved. The BioCreative II.5 community challenge addressed these tasks in a competition-style assessment to evaluate and compare different methodologies, to make aware of the increasing accuracy of automated methods, and to guide future implementations. In this paper, we present our approaches for protein-named entity recognition, including normalization, and for extraction of protein-protein interactions from full text. Our overall goal is to identify efficient individual components, and we compare various compositions to handle a single full-text article in between 10 seconds and 2 minutes. We propose strategies to transfer document-level annotations to the sentence-level, which allows for the creation of a more fine-grained training corpus; we use this corpus to automatically derive around 5,000 patterns. We rank sentences by relevance to the task of finding novel interactions with physical evidence, using a sentence classifier built from this training corpus. Heuristics for paraphrasing sentences help to further remove unnecessary information that might interfere with patterns, such as additional adjectives, clauses, or bracketed expressions. In BioCreative II.5, we achieved an f-score of 22 percent for finding protein interactions, and 43 percent for mapping proteins to UniProt IDs; disregarding species, f-scores are 30 percent and 55 percent, respectively. On average, our best-performing setup required around 2 minutes per full text. All data and pattern sets as well as Java classes that extend third-party software are available as supplementary information (see Appendix).

north american chapter of the association for computational linguistics | 2009

Towards Effective Sentence Simplification for Automatic Processing of Biomedical Text

Siddhartha Jonnalagadda; Luis Tari; Jörg Hakenberg; Chitta Baral; Graciela Gonzalez

The complexity of sentences characteristic to biomedical articles poses a challenge to natural language parsers, which are typically trained on large-scale corpora of non-technical text. We propose a text simplification process, bioSimplify, that seeks to reduce the complexity of sentences in biomedical abstracts in order to improve the performance of syntactic parsers on the processed sentences. Syntactic parsing is typically one of the first steps in a text mining pipeline. Thus, any improvement in performance would have a ripple effect over all processing steps. We evaluated our method using a corpus of biomedical sentences annotated with syntactic links. Our empirical results show an improvement of 2.90% for the Charniak-McClosky parser and of 4.23% for the Link Grammar parser when processing simplified sentences rather than the original sentences in the corpus.

north american chapter of the association for computational linguistics | 2009

Molecular event extraction from Link Grammar parse trees

Jörg Hakenberg; Illés Solt; Domonkos Tikk; Luis Tari; Astrid Rheinländer; Nguyen Quang Long; Graciela Gonzalez; Ulf Leser

We present an approach for extracting molecular events from literature based on a deep parser, using in a query language for parse trees. Detected events range from gene expression to protein localization, and cover a multitude of different entity types, including genes/proteins, binding sites, and locations. Furthermore, our approach is capable of recognizing negation and the speculative character of extracted statements. We first parse documents using Link Grammar (BioLG) and store the parse trees in a database. Events are extracted using a newly developed query language with traverses the BioLG linkages between trigger terms, arguments, and events. The concrete queries are learnt from an annotated corpus. On BioNLP Shared Task data, we achieve an overall F1-measure of 29.6%.

PLOS ONE | 2012

Identifying Novel Drug Indications through Automated Reasoning

Luis Tari; Nguyen Ha Vo; Shanshan Liang; Jagruti Patel; Chitta Baral; James Cai

Background With the large amount of pharmacological and biological knowledge available in literature, finding novel drug indications for existing drugs using in silico approaches has become increasingly feasible. Typical literature-based approaches generate new hypotheses in the form of protein-protein interactions networks by means of linking concepts based on their cooccurrences within abstracts. However, this kind of approaches tends to generate too many hypotheses, and identifying new drug indications from large networks can be a time-consuming process. Methodology In this work, we developed a method that acquires the necessary facts from literature and knowledge bases, and identifies new drug indications through automated reasoning. This is achieved by encoding the molecular effects caused by drug-target interactions and links to various diseases and drug mechanism as domain knowledge in AnsProlog, a declarative language that is useful for automated reasoning, including reasoning with incomplete information. Unlike other literature-based approaches, our approach is more fine-grained, especially in identifying indirect relationships for drug indications. Conclusion/Significance To evaluate the capability of our approach in inferring novel drug indications, we applied our method to 943 drugs from DrugBank and asked if any of these drugs have potential anti-cancer activities based on information on their targets and molecular interaction types alone. A total of 507 drugs were found to have the potential to be used for cancer treatments. Among the potential anti-cancer drugs, 67 out of 81 drugs (a recall of 82.7%) are indeed known cancer drugs. In addition, 144 out of 289 drugs (a recall of 49.8%) are non-cancer drugs that are currently tested in clinical trials for cancer treatments. These results suggest that our method is able to infer drug indications (original or alternative) based on their molecular targets and interactions alone and has the potential to discover novel drug indications for existing drugs.

pacific symposium on biocomputing | 2004

Understanding the global properties of functionally-related gene networks using the gene ontology

Luis Tari; Chitta Baral; Partha Dasgupta

The global behavior of interactions between genes can be investigated by forming the network of functionally-related genes using the annotations based on the Gene Ontology. We define two genes to be connected when the pair of genes is involved in the same biological process. There has been other work on the analysis of different kinds of cellular and metabolic networks, such as gene coexpression network, in which genes are paired when they are found to be coexpressed in the microarray experiments. We observe that our functionally-related gene networks among humans, fruit flies, worms and yeast exhibit the small-world property, but all except the network of worms show the existence of the scale-free property.

data integration in the life sciences | 2005

Collaborative curation of data from bio-medical texts and abstracts and its integration

Chitta Baral; Hasan Davulcu; Mutsumi Nakamura; Prabhdeep Singh; Luis Tari; Lian Yu

We propose an inexpensive and scalable approach for curation that takes advantage of automatic information extraction methods as a starting point, and is based on the premise that if there are a lot of articles, then there must be a lot of readers and authors of these articles. Thus we provide a mechanism by which the readers of the articles can participate and collaborate in the curation of information.

computational intelligence | 2011

Molecular event extraction from Link Grammar parse trees in the BioNLP'09 Shared Task

Jörg Hakenberg; Illés Solt; Domonkos Tikk; Vãu Há Nguyên; Luis Tari; Quang Long Nguyen; Chitta Baral; Ulf Leser

The BioNLP’09 Shared Task deals with extracting information on molecular events, such as gene expression and protein localization, from natural language text. Information in this benchmark are given as tuples including protein names, trigger terms for each event, and possible other participants such as bindings sites. We address all three tasks of BioNLP’09: event detection, event enrichment, and recognition of negation and speculation. Our method for the first two tasks is based on a deep parser; we store the parse tree of each sentence in a relational database scheme. From the training data, we collect the dependencies connecting any two relevant terms of a known tuple, that is, the shortest paths linking these two constituents. We encode all such linkages in a query language to retrieve similar linkages from unseen text. For the third task, we rely on a hierarchy of hand‐crafted regular expressions to recognize speculation and negated events. In this paper, we added extensions regarding a post‐processing step that handles ambiguous event trigger terms, as well as an extension of the query language to relax linkage constraints. On the BioNLP Shared Task test data, we achieve an overall F1‐measure of 32%, 29%, and 30% for the successive Tasks 1, 2, and 3, respectively.

Explore More