José M. Castaño | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where José M. Castaño is active.

Explore More

Publication

Featured researches published by José M. Castaño.

pacific symposium on biocomputing | 2001

Robust relational parsing over biomedical literature: extracting inhibit relations.

James Pustejovsky; José M. Castaño; Jason Zhang; Maciej Kotecki; Brent H. Cochran

We describe the design of a robust parser for identifying and extracting biomolecular relations from the biomedical literature. Separate automata over distinct syntactic domains were developed for extraction of nominal-based relational information versus verbal-based relations. This allowed us to optimize the grammars separately for each module, regardless of any specific relation resulting in significantly better performance. A unique feature of this system is the use of text-based anaphora resolution to enhance the results of argument binding in relational extraction. We demonstrate the performance of our system on inhibition-relations, and present our initial results measured against an annotated text used as a gold standard for evaluation purposes. The results represent a significant improvement over previously published results on extracting such relations from Medline: Precision was 90%, Recall 57%, and Partial Recall 22%. These results demonstrate the effectiveness of a corpus-based linguistic approach to information extraction over Medline.

intelligent systems in molecular biology | 2005

Adaptive String Similarity Metrics for Biomedical Reference Resolution

Ben Wellner; José M. Castaño; James Pustejovsky

In this paper we present the evaluation of a set of string similarity metrics used to resolve the mapping from strings to concepts in the UMLS MetaThesaurus. String similarity is conceived as a single component in a full Reference Resolution System that would resolve such a mapping. Given this qualification, we obtain positive results achieving 73.6 F-measure (76.1 precision and 71.4 recall) for the task of assigning the correct UMLS concept to a given string. Our results demonstrate that adaptive string similarity methods based on Conditional Random Fields outperform standard metrics in this domain.

Journal of Logic, Language and Information | 2004

Global Index Grammars and Descriptive Power

José M. Castaño

We review the properties of Global Index Grammars (GIGs), a grammar formalism that uses a stack of indices associated with productions and has restricted context-sensitive power. We show how the control of the derivation is performed and how this impacts in the descriptive power of this formalism both in the string languages and the structural descriptions that GIGs can generate.

international conference on computational linguistics | 2003

GIGs: restricted context-sensitive descriptive power in bounded polynomial-time

José M. Castaño

We present Global Index Grammars, a grammar formalism that uses a stack of indices associated to its productions. This formalism has restricted context-sensitive descriptive power. The recognition problem for this class of grammars is polynomial: the time complexity of the algorithm presented here is O(n6).

meeting of the association for computational linguistics | 2016

A Machine Learning Approach to Clinical Terms Normalization.

José M. Castaño; Maria Laura Gambarte; Hee Joon Park; Maria del Pilar Avila Williams; David Pérez-Rey; Fernando Campos; Daniel R. Luna; Sonia E. Benítez; Hernán Berinsky; Sofía Zanetti

We propose a machine learning approach for semantic recognition and normalization of clinical term descriptions. Clinical terms considered here are noisy descriptions in Spanish language written by health care professionals in our electronic health record system. These description terms contain clinical findings, family history, suspected disease, among other categories of concepts. Descriptions are usually very short texts presenting high lexical variability containing synonymy, acronyms, abbreviations and typographical errors. Mapping description terms to normalized descriptions requires medical expertise which makes it difficult to develop a rule-based knowledge engineering approach. In order to build a training dataset we use those descriptions that have been previously matched by terminologists to the hospital thesaurus database. We generate a set of feature vectors based on pairs of descriptions involving their individual and joint characteristics. We propose an unsupervised learning approach to discover term equivalence classes including synonyms, abbreviations, acronyms and frequent typographical errors. We evaluate different combinations of features to train MaxEnt and XGBoost models. Our system achieves an F1 score of 89% on the Hospital Italiano de Buenos Aires (HIBA) problem list.

meeting of the association for computational linguistics | 2003

On the Applicability of Global Index Grammars

José M. Castaño

We investigate Global Index Grammars (GIGs), a grammar formalism that uses a stack of indices associated with productions and has restricted context-sensitive power. We discuss some of the structural descriptions that GIGs can generate compared with those generated by LIGs. We show also how GIGs can represent structural descriptions corresponding to HPSGs (Pollard and Sag, 1994) schemas.

international conference on implementation and application of automata | 2003

LR parsing for global index languages (GILs)

José M. Castaño

We present here Global Index Grammars (GIGs) and the characterizing 2 Stack automaton model (LR-2PDA). We present the techniques to construct an LR parsing table for deterministic Global Index Grammars. GILs include languages which are beyond the power of Linear Indexed Grammars/Tree Adjoining Grammars. GILs generalize properties of CF Languages in a straightforward way and their descriptive power is relevant at least for natural language and molecular biology phenomena.

international conference on implementation and application of automata | 2011

Variable and clause ordering in an FSA approach to propositional satisfiability

José M. Castaño; Rodrigo Castaño

We use a finite state (FSA) construction approach to address the problem of propositional satisfiability (SAT). We use a very simple translation from formulas in conjunctive normal form (CNF) to regular expressions and use regular expressions to construct an FSA. As a consequence of the FSA construction, we obtain an ALL-SAT solver and model counter. We compare how several variable ordering (state ordering) heuristics affect the running time of the FSA construction. We also present a strategy for clause ordering (automata composition). We compare the running time of state-of-the-art model counters, BDD based sat solvers and we show that this FSA approach obtains state-of-the-art performance on some hard unsatisfiable benchmarks. This work brings up many questions on the possible use of automata to address SAT.

Archive | 2006

Aligning Ontologies and Integrating Textual Evidence for Pathway Analysis of Microarray Data

Banu Gopalan; Christian Posse; Antonio Sanfilippo; Mary P. Stenzel-Poore; Susan Stevens; José M. Castaño; Nathaniel Beagley; Roderick M. Riensche; Bob Baddeley; Roger P. Simon; James Pustejovsky

Expression arrays are introducing a paradigmatic change in biology by shifting experimental approaches from single gene studies to genome-level analysis, monitoring the ex-pression levels of several thousands of genes in parallel. The massive amounts of data obtained from the microarray data needs to be integrated and interpreted to infer biological meaning within the context of information-rich pathways. In this paper, we present a methodology that integrates textual information with annotations from cross-referenced ontolo-gies to map genes to pathways in a semi-automated way. We illustrate this approach and compare it favorably to other tools by analyzing the gene expression changes underlying the biological phenomena related to stroke. Stroke is the third leading cause of death and a major disabler in the United States. Through years of study, researchers have amassed a significant knowledge base about stroke, and this knowledge, coupled with new technologies, is providing a wealth of new scientific opportunities. The potential for neu-roprotective stroke therapy is enormous. However, the roles of neurogenesis, angiogenesis, and other proliferative re-sponses in the recovery process following ischemia and the molecular mechanisms that lead to these processes still need to be uncovered. Improved annotation of genomic and pro-teomic data, including annotation of pathways in which genes and proteins are involved, is required to facilitate their interpretation and clinical application. While our approach is not aimed at replacing existing curated pathway databases, it reveals multiple hidden relationships that are not evident with the way these databases analyze functional groupings of genes from the Gene Ontology.

finite state methods and natural language processing | 2005

Tagging with Delayed Disambiguation

José M. Castaño; James Pustejovsky

We discuss problems inherent in domain specific tagging (biomedical domain) and their relevance to tagging issues in general. We present a novel approach to this problem which we call tagging with delayed disambiguation (TDD). This approach uses a modified, statistically-driven lexicon together with a small set of morphological, heuristic, and chunking rules which are implemented using finite state machinery. They make use of both delayed disambiguation and the concept of tag underspecification as an ordered sequence of tags.

Explore More