Marco Antonio Valenzuela-Escárcega

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marco Antonio Valenzuela-Escárcega is active.

Explore More

Publication

Featured researches published by Marco Antonio Valenzuela-Escárcega.

meeting of the association for computational linguistics | 2015

A Domain-independent Rule-based Framework for Event Extraction

Marco Antonio Valenzuela-Escárcega; Gus Hahn-Powell; Mihai Surdeanu; Thomas Hicks

We describe the design, development, and API of ODIN (Open Domain INformer), a domainindependent, rule-based event extraction (EE) framework. The proposed EE approach is: simple (most events are captured with simple lexico-syntactic patterns), powerful (the language can capture complex constructs, such as events taking other events as arguments, and regular expressions over syntactic graphs), robust (to recover from syntactic parsing errors, syntactic patterns can be freely mixed with surface, token-based patterns), and fast (the runtime environment processes 110 sentences/second in a real-world domain with a grammar of over 200 rules). We used this framework to develop a grammar for the biochemical domain, which approached human performance. Our EE framework is accompanied by a web-based user interface for the rapid development of event grammars and visualization of matches. The ODIN framework and the domain-specific grammars are available as open-source code.

north american chapter of the association for computational linguistics | 2015

Two Practical Rhetorical Structure Theory Parsers

Mihai Surdeanu; Tom Hicks; Marco Antonio Valenzuela-Escárcega

We describe the design, development, and API for two discourse parsers for Rhetorical Structure Theory. The two parsers use the same underlying framework, but one uses features that rely on dependency syntax, produced by a fast shift-reduce parser, whereas the other uses a richer feature space, including both constituent- and dependency-syntax and coreference information, produced by the Stanford CoreNLP toolkit. Both parsers obtain state-of-the-art performance, and use a very simple API consisting of, minimally, two lines of Scala code. We accompany this code with a visualization library that runs the two parsers in parallel, and displays the two generated discourse trees side by side, which provides an intuitive way of comparing the two parsers.

conference on computational natural language learning | 2017

Tell Me Why: Using Question Answering as Distant Supervision for Answer Justification

Rebecca Sharp; Mihai Surdeanu; Peter Jansen; Marco Antonio Valenzuela-Escárcega; Peter Clark; Michael Hammond

For many applications of question answering (QA), being able to explain why a given model chose an answer is critical. However, the lack of labeled data for answer justifications makes learning this difficult and expensive. Here we propose an approach that uses answer ranking as distant supervision for learning how to select informative justifications, where justifications serve as inferential connections between the question and the correct answer while often containing little lexical overlap with either. We propose a neural network architecture for QA that reranks answer justifications as an intermediate (and human-interpretable) step in answer selection. Our approach is informed by a set of features designed to combine both learned representations and explicit features to capture the connection between questions, answers, and answer justifications. We show that with this end-to-end approach we are able to significantly improve upon a strong IR baseline in both justification ranking (+9% rated highly relevant) and answer selection (+6% P@1).

Database | 2018

Large-scale automated machine reading discovers new cancer-driving mechanisms

Marco Antonio Valenzuela-Escárcega; Özgün Babur; Gus Hahn-Powell; Dane Bell; Thomas Hicks; Enrique Noriega-Atala; Xia Wang; Mihai Surdeanu; Emek Demir; Clayton T. Morrison

Abstract PubMed, a repository and search engine for biomedical literature, now indexes >1 million articles each year. This exceeds the processing capacity of human domain experts, limiting our ability to truly understand many diseases. We present Reach, a system for automated, large-scale machine reading of biomedical papers that can extract mechanistic descriptions of biological processes with relatively high precision at high throughput. We demonstrate that combining the extracted pathway fragments with existing biological data analysis algorithms that rely on curated models helps identify and explain a large number of previously unidentified mutually exclusive altered signaling pathways in seven different cancer types. This work shows that combining human-curated ‘big mechanisms’ with extracted ‘big data’ can lead to a causal, predictive understanding of cellular processes and unlock important downstream applications.

meeting of the association for computational linguistics | 2017

Swanson linking revisited: Accelerating literature-based discovery across domains using a conceptual influence graph

Gus Hahn-Powell; Marco Antonio Valenzuela-Escárcega; Mihai Surdeanu

We introduce a modular approach for literature-based discovery consisting of a machine reading and knowledge assembly component that together produce a graph of influence relations (e.g., “A promotes B”) from a collection of publications. A search engine is used to explore direct and indirect influence chains. Query results are substantiated with textual evidence, ranked according to their relevance, and presented in both a table-based view, as well as a network graph visualization. Our approach operates in both domain-specific settings, where there are knowledge bases and ontologies available to guide reading, and in multi-domain settings where such resources are absent. We demonstrate that this deep reading and search system reduces the effort needed to uncover “undiscovered public knowledge”, and that with the aid of this tool a domain expert was able to drastically reduce her model building time from months to two days.

meeting of the association for computational linguistics | 2016

This before That: Causal Precedence in the Biomedical Domain.

Gus Hahn-Powell; Dane Bell; Marco Antonio Valenzuela-Escárcega; Mihai Surdeanu

Causal precedence between biochemical interactions is crucial in the biomedical domain, because it transforms collections of individual interactions, e.g., bindings and phosphorylations, into the causal mechanisms needed to inform meaningful search and inference. Here, we analyze causal precedence in the biomedical domain as distinct from open-domain, temporal precedence. First, we describe a novel, hand-annotated text corpus of causal precedence in the biomedical domain. Second, we use this corpus to investigate a battery of models of precedence, covering rule-based, feature-based, and latent representation models. The highest-performing individual model achieved a micro F1 of 43 points, approaching the best performers on the simpler temporal-only precedence tasks. Feature-based and latent representation models each outperform the rule-based models, but their performance is complementary to one another. We apply a sieve-based architecture to capitalize on this lack of overlap, achieving a micro F1 score of 46 points.

language resources and evaluation | 2016