Nathan Schneider | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nathan Schneider is active.

Explore More

Publication

Featured researches published by Nathan Schneider.

Computational Linguistics | 2014

Frame-semantic parsing

Dipanjan Das; Desai Chen; André F. T. Martins; Nathan Schneider; Noah A. Smith

Frame semantics is a linguistic theory that has been instantiated for English in the FrameNet lexicon. We solve the problem of frame-semantic parsing using a two-stage statistical model that takes lexical targets (i.e., content words and phrases) in their sentential contexts and predicts frame-semantic structures. Given a target in context, the first stage disambiguates it to a semantic frame. This model uses latent variables and semi-supervised learning to improve frame disambiguation for targets unseen at training time. The second stage finds the targets locally expressed semantic arguments. At inference time, a fast exact dual decomposition algorithm collectively predicts all the arguments of a frame at once in order to respect declaratively stated linguistic constraints, resulting in qualitatively better structures than naïve local predictors. Both components are feature-based and discriminatively trained on a small set of annotated frame-semantic parses. On the SemEval 2007 benchmark data set, the approach, along with a heuristic identifier of frame-evoking targets, outperforms the prior state of the art by significant margins. Additionally, we present experiments on the much larger FrameNet 1.5 data set. We have released our frame-semantic parser as open-source software.

empirical methods in natural language processing | 2014

A Dependency Parser for Tweets

Lingpeng Kong; Nathan Schneider; Swabha Swayamdipta; Archna Bhatia; Chris Dyer; Noah A. Smith

We describe a new dependency parser for English tweets, TWEEBOPARSER. The parser builds on several contributions: new syntactic annotations for a corpus of tweets (TWEEBANK), with conventions informed by the domain; adaptations to a statistical parsing algorithm; and a new approach to exploiting out-of-domain Penn Treebank data. Our experiments show that the parser achieves over 80% unlabeled attachment accuracy on our new, high-quality test set and measure the benefit of our contributions. Our dataset and parser can be found at http://www.ark.cs.cmu.edu/TweetNLP.

north american chapter of the association for computational linguistics | 2015

A Corpus and Model Integrating Multiword Expressions and Supersenses

Nathan Schneider; Noah A. Smith

This paper introduces a task of identifying and semantically classifying lexical expressions in running text. We investigate the online reviews genre, adding semantic supersense annotations to a 55,000 word English corpus that was previously annotated for multiword expressions. The noun and verb supersenses apply to full lexical expressions, whether single- or multiword. We then present a sequence tagging model that jointly infers lexical expressions and their supersenses. Results show that even with our relatively small training corpus in a noisy domain, the joint task can be performed to attain 70% class labeling F1.

conference on computational natural language learning | 2015

Big Data Small Data, In Domain Out-of Domain, Known Word Unknown Word: The Impact of Word Representations on Sequence Labelling Tasks

Lizhen Qu; Gabriela Ferraro; Liyuan Zhou; Weiwei Hou; Nathan Schneider; Timothy Baldwin

Word embeddings -- distributed word representations that can be learned from unlabelled data -- have been shown to have high utility in many natural language processing applications. In this paper, we perform an extrinsic evaluation of five popular word embedding methods in the context of four sequence labelling tasks: POS-tagging, syntactic chunking, NER and MWE identification. A particular focus of the paper is analysing the effects of task-based updating of word representations. We show that when using word embeddings as features, as few as several hundred training instances are sufficient to achieve competitive results, and that word embeddings lead to improvements over OOV words and out of domain. Perhaps more surprisingly, our results indicate there is little difference between the different word embedding methods, and that simple Brown clusters are often competitive with word embeddings across all tasks we consider.

linguistic annotation workshop | 2015

A Hierarchy with, of, and for Preposition Supersenses

Nathan Schneider; Vivek Srikumar; Jena D. Hwang; Martha Palmer

English prepositions are extremely frequent and extraordinarily polysemous. In some usages they contribute information about spatial, temporal, or causal roles/relations; in other cases they are institutionalized, somewhat arbitrarily, as case markers licensed by a particular governing verb, verb class, or syntactic construction. To facilitate automatic disambiguation, we propose a general-purpose, broadcoverage taxonomy of preposition functions that we call supersenses: these are coarse and unlexicalized so as to be tractable for efficient manual annotation, yet capture crucial semantic distinctions. Our resource, including extensive documentation of the supersenses, many example sentences, and mappings to other lexical resources, will be publicly released. Prepositions are perhaps the most beguiling yet pervasive lexicosyntactic class in English. They are everywhere; their functional versatility is dizzying and largely idiosyncratic (1). They are nearly invisible, yet indispensable for situating the where, when, why, and how of events. In a way, prepositions are the bastard children of lexicon and grammar, rising to the occasion almost whenever a noun-noun or verbnoun relation is needed and neither subject nor object is appropriate. Consider the many uses of the word to, just a few of which are illustrated in (1):1 (1) a. My cake is to die for. b. If you want I can treat you to some. c. How about this: you go to the store d. to buy ingredients. e. Then if you give the recipe to me f. I’m happy to make the batter g. and put it in the oven for 30 to 40 minutes h. so you’ll arrive to the sweet smell of chocolate. i. That sounds good to me. j. That’s all there is to it. 1Though infinitival to is traditionally not considered a preposition, we allow it to be labeled with a supersense if the infinitival clause serves as a PURPOSE (as in (1d)) or FUNCTION. See §2. Sometimes a preposition specifies a relationship between two entities or quantities, as in (1g). In other scenarios it serves a case-marking sort of function, marking a complement or adjunct—principally to a verb (1b–1e, 1h, 1i), but also to an argument-taking noun or adjective (1f). Further, it is not always possible to separate the semantic contribution of the preposition from that of other words in the sentence. As amply demonstrated in the literature, prepositions play a key role in multiword expressions (Baldwin and Kim, 2010), as in (1a, 1b, 1j). An adequate descriptive annotation scheme for prepositions must deal with these messy facts. Following a brief discussion of existing approaches to preposition semantics (§1), this paper offers a new approach to characterizing their functions at a coarsegrained level. Our scheme is intended to apply to almost all preposition tokens, though some are excluded on the grounds that they belong to a larger multiword expression or are purely syntactic (§2). The rest of the paper is devoted to our coarse semantic categories, supersenses (§3).2 Many of these categories are based on previous proposals—primarily, Srikumar and Roth (2013a) (so-called preposition relations) and VerbNet (thematic roles; Bonial et al., 2011; Hwang, 2014, appendix C)—but we organize them into a hierarchy and motivate a number of new or altered categories that make the scheme more robust. Because prepositions are so frequent, so polysemous, and so crucial in establishing relations, we believe that a wide variety of NLP applications (including knowledge base construction, reasoning about events, summarization, paraphrasing, and translation) stand to benefit from automatic disambiguation of preposition supersenses. 2Supersense inventories have also been described for nouns and verbs (Ciaramita and Altun, 2006; Schneider et al., 2012; Schneider and Smith, 2015) and adjectives (Tsvetkov et al., 2014). Other inventories characterize semantic functions expressed via morphosyntax: e.g., tense/aspect (Reichart and Rappoport, 2010), definiteness (Bhatia et al., 2014, also hierarchical). A wiki documenting our scheme in detail can be accessed at http://tiny.cc/prepwiki. It maps finegrained preposition senses to our supersenses, along with numerous examples. The wiki is conducive to browsing and to exporting the structure and examples for use elsewhere (e.g., in an annotation tool). From our experience with pilot annotations, we believe that the scheme is fairly stable and broadly applicable.

international joint conference on natural language processing | 2015

Frame-Semantic Role Labeling with Heterogeneous Annotations

Meghana Kshirsagar; Sam Thomson; Nathan Schneider; Jaime G. Carbonell; Noah A. Smith; Chris Dyer

We consider the task of identifying and labeling the semantic arguments of a predicate that evokes a FrameNet frame. This task is challenging because there are only a few thousand fully annotated sentences for supervised training. Our approach augments an existing model with features derived from FrameNet and PropBank and with partially annotated exemplars from FrameNet. We observe a 4% absolute increase in F1 versus the original model.

north american chapter of the association for computational linguistics | 2016

SemEval-2016 Task~10: Detecting Minimal Semantic Units and their Meanings (DiMSUM)

Nathan Schneider; Dirk Hovy; Anders Johannsen; Marine Carpuat

This task combines the labeling of multiword expressions and supersenses (coarse-grained classes) in an explicit, yet broad-coverage paradigm for lexical semantics. Nine systems participated; the best scored 57.7% F1 in a multi-domain evaluation setting, indicating that the task remains largely unresolved. An error analysis reveals that a large number of instances in the data set are either hard cases, which no systems get right, or easy cases, which all systems correctly solve.

international conference on computational linguistics | 2014

CMU: Arc-Factored, Discriminative Semantic Dependency Parsing

Sam Thomson; Brendan O'Connor; Jeffrey Flanigan; David Bamman; Jesse Dodge; Swabha Swayamdipta; Nathan Schneider; Chris Dyer; Noah A. Smith

We present an arc-factored statistical model for semantic dependency parsing, as defined by the SemEval 2014 Shared Task 8 on Broad-Coverage Semantic Dependency Parsing. Our entry in the open track placed second in the competition.

meeting of the association for computational linguistics | 2016

A Corpus of Preposition Supersenses

Nathan Schneider; Jena D. Hwang; Vivek Srikumar; Meredith Green; Abhijit Suresh; Kathryn Conger; Tim O'Gorman; Martha Palmer

We present the first corpus annotated with preposition supersenses, unlexicalized categories for semantic functions that can be marked by English prepositions (Schneider et al., 2015). The preposition supersenses are organized hierarchically and designed to facilitate comprehensive manual annotation. Our dataset is publicly released on the web. 1

meeting of the association for computational linguistics | 2014

Simplified Dependency Annotations with GFL-Web

Michael T. Mordowanec; Nathan Schneider; Chris Dyer; Noah A. Smith

We present GFL-Web, a web-based interface for syntactic dependency annotation with the lightweight FUDG/GFL formalism. Syntactic attachments are specified in GFL notation and visualized as a graph. A one-day pilot of this workflow with 26 annotators established that even novices were, with a bit of training, able to rapidly annotate the syntax of English Twitter messages. The open-source tool is easily installed and configured; it is available at: https://github.com/ Mordeaux/gfl _ web

Explore More