Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Olga Babko-Malaya is active.

Publication


Featured researches published by Olga Babko-Malaya.


Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006 | 2006

Issues in Synchronizing the English Treebank and PropBank

Olga Babko-Malaya; Ann Bies; Ann Taylor; Szu-ting Yi; Martha Palmer; Mitch Marcus; Seth Kulick; Libin Shen

The PropBank primarily adds semantic role labels to the syntactic constituents in the parsed trees of the Treebank. The goal is for automatic semantic role labeling to be able to use the domain of locality of a predicate in order to find its arguments. In principle, this is exactly what is wanted, but in practice the PropBank annotators often make choices that do not actually conform to the Treebank parses. As a result, the syntactic features extracted by automatic semantic role labeling systems are often inconsistent and contradictory. This paper discusses in detail the types of mismatches between the syntactic bracketing and the semantic role labeling that can be found, and our plans for reconciling them.


meeting of the association for computational linguistics | 2005

A Parallel Proposition Bank II for Chinese and English

Martha Palmer; Nianwen Xue; Olga Babko-Malaya; Jinying Chen; Benjamin Snyder

The Proposition Bank (PropBank) project is aimed at creating a corpus of text annotated with information about semantic propositions. The second phase of the project, PropBank II adds additional levels of semantic annotation which include eventuality variables, co-reference, coarse-grained sense tags, and discourse connectives. This paper presents the results of the parallel PropBank II project, which adds these richer layers of semantic annotation to the first 100K of the Chinese Treebank and its English translation. Our preliminary analysis supports the hypothesis that this additional annotation reconciles many of the surface differences between the two languages.


Society | 2013

Characterizing Communities of Practice in Emerging Science and Technology Fields

Olga Babko-Malaya; D. Hunter; G. Amis; Adam Meyers; P. Thomas; James Pustejovsky; Marc Verhagen

Emerging fields in science and technology are of great interest to innovation researchers, but such fields are often difficult to identify and characterize. This paper outlines a system for identifying a key element of emerging fields: their community of practice, consisting of active scientists and researchers. The system does not simply count these human actors and the interactions between them. Rather, guided by actant network theory, it also examines other non-human actors with which they interact, such as organizations, publications and terminologies. Using quantitative indicators inspired by actant network theory, and derived from features extracted from the full text and metadata of scientific publications and patents, the system attempts to identify communities of practice associated with emerging fields in science and technology. This paper outlines details of these features and indicators, describes how these indicators are combined using Bayesian models, and reports the results of applying these indicators to document sets associated with emerging scientific and technological fields. The results reported in this paper show that system outputs generally agree with subject matter expert judgments with respect to determining the existence of communities of practice, and appear to offer interesting insights into the development of emerging fields.


Society | 2013

Modeling Debate within a Scientific Community

Olga Babko-Malaya; James Pustejovsky; Marc Verhagen; Adam Meyers

There is growing interest in automating the detection and tracking of new and significant developments in science and technology, as they emerge within a given community. A significant component of detecting such patterns of emergence is identifying the presence of a debate in the scientific community. This often reflects disagreements or uncertainties over technologies or concepts as they are actively being discussed and developed. In this paper, we present an algorithm for recognizing debate in large document collections. We distinguish three distinct styles of debate over a document collection: (i) silent debate, (ii) active disagreement, and (iii) topical uncertainty. Our algorithm employs a number of indicators found in the metadata and full text of publications and patents to identify the presence of these types of debate in the community. The paper outlines the details of these features and indicators and reports on the results of applying these indicators to data from several fields classified by subject matter experts, which show that system outputs have high agreement with SMEs judgments.


Frontiers in Research Metrics and Analytics | 2018

The Termolator: Terminology Recognition based on Chunking, Statistical and Search-based Scores

Adam Meyers; Yifan He; Zachary Glass; John Ortega; Shasha Liao; Angus Grieve-Smith; Ralph Grishman; Olga Babko-Malaya

he Termolator is an open-source high-performing terminology extraction system, available on Github. The Termolator combines several different approaches to get superior coverage and precision. The in-line term component identifies potential instances of terminology using a chunking procedure, similar to noun group chunking, but favoring chunks that contain out-of-vocabulary words, nominalizations, technical adjectives, and other specialized word classes. The distributional component ranks such term chunks according to several metrics including: (a) a set of metrics that favors term chunks that are relatively more frequent in a “foreground” corpus about a single topic than they are in a “background” or multi-topic corpus; (b) a well-formedness score based on linguistic features and (c) a relevance score which measures how often terms appear in articles and patents in a Yahoo web search. We analyse the contributions made by each of these components and show that all modules contribute to the system’s performance, both in terms of the number and quality of terms identified. This paper expands upon previous publications about this research and includes descriptions of some of the improvements made since its initial release. This study also includes a comparison with another terminology extraction system available on-line, Termostat (Drouin 2003).. We found that the systems get comparable results when applied to small amounts of data: about 50% precision for a single foreground file (Einstein’s Theory of Relativity). However, when running the system with 500 patent files as foreground, Termolator performed significantly better than Termostat. For 500 refrigeration patents, Termolator got 70% precision vs Termostat’s 52%. For 500 semiconductor patents, Termolator got 79% precision vs Termostat’s 51%.


advances in social networks analysis and mining | 2013

Towards explanation of scientific and technological emergence

James R. Michaelis; Deborah L. McGuinness; Cynthia Chang; Daniel Hunter; Olga Babko-Malaya

Analysts who are interested in quickly identifying new and emerging scientific advancements have numerous challenges as the breadth, depth, and volume of scientific literature increases. Network analysis and mining is key to the success in this task. The ARBITER system seeks to identify indicators of emergence and provide a system that is capable of analyzing corpora of full text and metadata to identify emerging science topics and explain its reasoning and conclusions. In this paper, we describe a network-modeling framework that is used in the ARBITER system, and describe our novel hybrid approach using probabilistic foundations in combination with semantic technology and introduce our explanation infrastructure. We include a discussion of some challenges and opportunities related to explaining hybrid approaches to indicator-based analysis and emergence detection.


TAGRF '06 Proceedings of the Eighth International Workshop on Tree Adjoining Grammar and Related Formalisms | 2006

Semantic interpretation of unrealized syntactic material in LTAG

Olga Babko-Malaya

This paper presents a LTAG-based analysis of gapping and VP ellipsis, which proposes that resolution of the elided material is part of a general disambiguation procedure, which is also responsible for resolution of underspecified representations of scope.


Applications of Social Media and Social Network Analysis | 2015

Explaining Scientific and Technical Emergence Forecasting

James R. Michaelis; Deborah L. McGuinness; Cynthia Chang; John S. Erickson; Daniel Hunter; Olga Babko-Malaya

In decision support systems such as those designed to predict scientific and technical emergence based on analysis of collections of data the presentation of provenance lineage records in the form of a human-readable explanation has been shown to be an effective strategy for assisting users in the interpretation of results. This work focuses on the development of a novel infrastructure for enabling the explanation of hybrid intelligence systems including probabilistic models—in the form of Bayes nets—and the presentation of corresponding evidence. Our design leverages Semantic Web technologies—including a family of ontologies—for representing and explaining emergence forecasting for entity prominence. Our infrastructure design has been driven by two goals: first, to provide technology to support transparency into indicator-based forecasting systems; second, to provide analyst users context-aware mechanisms to drill down into evidence underlying presented indicators. The driving use case for our explanation infrastructure has been a specific analysis system designed to automate the forecasting of trends in science and technology based on collections of published patents and scientific journal articles.


intelligence and security informatics | 2013

Flexible creation of indicators of scientific and technological emergence: Emerging phenomena and big data

Olga Babko-Malaya; Daniel Hunter; Andy Seidel; Fotios Barlos

This paper describes ARBITER, a system for characterizing scientific and technological fields and detecting emergent fields. ARBITER processes large collections of technoscientific publications and patents to extract full-text and metadata features relevant to rich characterizations of emergent fields. The paper describes how ARBITER uses these indicators in a flexible manner to infer a wide variety of patterns of interest using customizable models that capture the users understanding of what is important in emergence.


north american chapter of the association for computational linguistics | 2004

Different Sense Granularities for Different Applications

Martha Palmer; Olga Babko-Malaya; Hoa Trang Dang

Collaboration


Dive into the Olga Babko-Malaya's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Martha Palmer

University of Colorado Boulder

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ann Bies

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Cynthia Chang

Rensselaer Polytechnic Institute

View shared research outputs
Researchain Logo
Decentralizing Knowledge