Amy Weinberg
University of Maryland, College Park
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Amy Weinberg.
Natural Language Engineering | 2005
Rebecca Hwa; Philip Resnik; Amy Weinberg; Clara I. Cabezas; Okan Kolak
Broad coverage, high quality parsers are available for only a handful of languages. A prerequisite for developing broad coverage parsers for more languages is the annotation of text with the desired linguistic representations (also known as “treebanking”). However, syntactic annotation is a labor intensive and time-consuming process, and it is difficult to find linguistically annotated text in sufficient quantities. In this article, we explore using parallel text to help solving the problem of creating syntactic annotation in more languages. The central idea is to annotate the English side of a parallel corpus, project the analysis to the second language, and then train a stochastic analyzer on the resulting noisy annotations. We discuss our background assumptions, describe an initial study on the “projectability” of syntactic relations, and then present two experiments in which stochastic parsers are developed with minimal human intervention via projection from English.
meeting of the association for computational linguistics | 2002
Rebecca Hwa; Philip Resnik; Amy Weinberg; Okan Kolak
Recently, statistical machine translation models have begun to take advantage of higher level linguistic structures such as syntactic dependencies. Underlying these models is an assumption about the directness of translational correspondence between sentences in the two languages; however, the extent to which this assumption is valid and useful is not well understood. In this paper, we present an empirical study that quantifies the degree to which syntactic dependencies are preserved when parses are projected directly from English to Chinese. Our results show that although the direct correspondence assumption is often too restrictive, a small set of principled, elementary linguistic transformations can boost the quality of the projected Chinese parses by 76% relative to the unimproved baseline.
Journal of Psycholinguistic Research | 1993
Amy Weinberg
This paper introduces the Minimal Commitment theory. This theory is a subspecies of deterministic parsers. The theory builds representations where immediate dominance and precedence relations are unspecified. Justification that this approach is psycholinguistically justified comes from its ability to provide a cross-linguistically valid theory of Garden path sentences in English and Japanese.
Cognition | 1983
Amy Weinberg
Abstract This paper examines the question of whether and how the grammars proposed by linguists may be said to be ‘realized’ in adequate models of human sentence processing. We first review the assumptions guiding the so-called Derivational Theory of Complexity (DTC) experiments. Recall that the DTC experiments were taken to show that the theory of transformational grammar (TG) known as the Standard Theory was only a partially adequate model for human parsing. In particular, it was assumed (see Fodor et al., 1974) that the DTC experiments demonstrated that while the parser actually used the structural descriptions implicit in a transformational derivation, the computations it used bore little resemblance to the transformations proposed by a TG. The crucial assumptions behind the DTC were that (1) the processing model (or ‘parser’) performs operations in a linear, serial fashion; and (2) the parser incorporates a grammar written in more or less the same format as the competence grammar. If we assume strict seriality, then it also seems easier to embed an Extended Lexical Grammar, such as the model proposed in Bresnan (1978) (as opposed to a TG), into a parsing model. Therefore, this assumption plays an important role in Bresnans critique of TG as an adequate part of a theory of language use. Both Fodor, Bever and Garrett (1974) and Bresnan (1978) attempt to make the grammatical rules compatible with the psycholinguistic data and with assumption (1) by proposing models that limit the amount of active computation performed on-line. They do this by eliminating the transformational component. However, we show that on-line computation need not be associated with added reaction time complexity. That is, we show that a parser that relates deep structure to surface structure by transformational rules (or, more accurately, by parsing rules tailored very closely after those of a transformational model) can be made to comport with the relevant psycholinguistic data, simply by varying assumption (1). In particular, we show that by embedding TG in a parallel computational architecture—an architecture that can be justified as a reasonable one for language use—one can capture the sentence processing complexity differences noted by DTC experimenters. Assumption (2) is also relevant to the evaluation of competing grammars as theories of language use. First we show that Bresnan (1978) must relax this assumption in order to make Extended Lexical Grammar compatible with the psycholinguistic results. Secondly, we analyze Tyler and Marslen-Wilsons (1977) and Tylers (1980) claim that their experiments show that one cannot instantiate a TG in a model of parsing without varying assumption (2). This is because they insist that their experiments support an ‘interactive model’ of parsing that, they believe, is incompatible with the ‘Autonomy of Syntax’ thesis. We show that the Autonomy Thesis bears no relation to their ‘interactive model’. Therefore, adopting this model is no barrier to the direct incorporation of a TG in a parser. Moreover, we show why meeting assumption (2), a condition that we dub the ‘Type Transparency Hypothesis’, is not an absolute criterion for judging the utility of a grammatical theory for the construction of a theory of parsing. We claim that the grammar need not be viewed as providing a parsing algorithm directly or transparently (assumption 2 above). Nevertheless, we insist that the theory of grammar figures centrally in the development of a model of language use even if Type Transparency is weakened in the ways that we suggest. Taken together, these considerations will be shown to bear on the comparative evaluation of candidate parsing models that incorporate transformational grammar, extended-lexical grammar, or the Tyler and Marslen-Wilson proposals.
Machine Translation | 1995
Bonnie J. Dorr; Joseph Garman; Amy Weinberg
Our goal is to construct large-scale lexicons for interlingual MT of English, Arabic, Korean, and Spanish. We describe techniques that predict salient linguistic features of a non-English word using the features of its English gloss (i.e., translation) in a bilingual dictionary. While not exact, owing to inexact glosses and language-to-language variations, these techniques can augment an existing dictionary with reasonable accuracy, thus saving significant time. We have conducted two experiments that demonstrate the value of these techniques. The first tested the feasibility of building a database of thematic grids for over 6500 Arabic verbs based on a mapping between English glosses and the syntactic codes in Longmans Dictionary of Contemporary English (LDOCE) (Procter, 1978). We show that it is more efficient and less error-prone to hand-verify the automatically constructed grids than it would be to build the thematic grids by hand from scratch. The second experiment tested the automatic classification of verbs into a richer semantic typology based on (Levin, 1993), from which we can derive a more refined set of thematic grids. In this second experiment, we show that a brute-force, non-robust technique provides 72% accuracy for semantic classification of LDOCE verbs; we then show that it is possible to approach this yield with a more robust technique based on fine-tuned statistical correlations. We further suggest the possibility of raising this yield by taking into account linguistic factors such as polysemy and positive and negative constraints on the syntax-semantics relation. We conclude that, while human intervention will always be necessary for the construction of a semantic classification from LDOCE, such intervention is significantly minimized as more knowledge about the syntax-semantics relation is introduced.
north american chapter of the association for computational linguistics | 2003
Douglas W. Oard; David S. Doermann; Bonnie J. Dorr; Daqing He; Philip Resnik; Amy Weinberg; William Byrne; Sanjeev Khudanpur; David Yarowsky; Anton Leuski; Philipp Koehn; Kevin Knight
This paper describes an effort to rapidly develop language resources and component technology to support searching Cebuano news stories using English queries. Results from the first 60 hours of the exercise are presented.
north american chapter of the association for computational linguistics | 2006
Burcu Karagol-Ayan; David S. Doermann; Amy Weinberg
For a language with limited resources, a dictionary may be one of the few available electronic resources. To make effective use of the dictionary for translation, however, users must be able to access it using the root form of morphologically deformed variant found in the text. Stemming and data driven methods, however, are not suitable when data is sparse. We present algorithms for discovering morphemes from limited, noisy data obtained by scanning a hard copy dictionary. Our approach is based on the novel application of the longest common substring and string edit distance metrics. Results show that these algorithms can in fact segment words into roots and affixes from the limited data contained in a dictionary, and extract affixes. This in turn allows non native speakers to perform multilingual tasks for applications where response must be rapid, and their knowledge is limited. In addition, this analysis can feed other NLP tools requiring lexicons.
north american chapter of the association for computational linguistics | 2000
Mari J. B. Olsen; David R. Traum; Carol Van Ess-Dykema; Amy Weinberg; Ron Dolan
Machine translation between any two languages requires the generation of information that is implicit in the source language. In translating from Chinese to English, tense and other temporal information must be inferred from other grammatical and lexical cues. Moreover, Chinese multiple-clause sentences may contain inter-clausal relations (temporal or otherwise) that must be explicit in English (e.g., by means of a discourse marker). Perfective and imperfective grammatical aspect markers can provide cues to temporal structure, but such information is not present in every sentence. We report on a project to use the ]exical aspect features of (a)te]icity reflected in the Lexical Conceptual Structure of the input text to suggest tense and discourse structure in the English translation of a Chinese newspaper corpus.
Cognition | 1985
Amy Weinberg
In a recent reply to our article in Cognition (1984), E. Stabler criticized us on two main points: our construal of transformational grammar (TG) as a “first level” theory; and our claim that a first level construal of TG is crucial, for its psychological relevance. We’d like to dispute both of these points, and in so doing re-emphasize the psychological relevance of modern transformational grammar. First, some basic housekeeping: we must clear up the record by explaining our original position, which, contra Stabler’s interpretation, does not commit us to viewing a grammar as a first level approximation of a parser. Second, we outline Stabler’s reasons for thinking that grammars should be first level theories of this kind. We continue by examining Stabler’s claim that linguists judge their theories by the inherently nonpsychological criterion of relative simplicity. We argue that the psychological notion of learnability is actually the touchstone of comparison. In fact, given this criterion we shall see that one can understand just why linguistic grammars should form the abstract foundation of psychological parsing models.
Cognition | 1983
Amy Weinberg
Now, making psychologists ill does not seem like much of a research topic. Nor, to our mind, does the revival of the DTC: Dr. Gamham has missed our point. We came to bury (or at least expose the flaws of) the DTC, not to praise it. To review, we showed that if Transformational Grammar (TG) is embedded in an alternative parsing model, it can be made compatible with the reaction time results of Slobin (1966) and others. But such compatibilityat least with Slobin’s results-is impossible if the underlying parsing model is assumed to operate in a strict serial fashion (as assumed by the DTC). Our conclusion was simply that, given the psycholinguistic evidence and a choice between changing either seriality or grammar, the assumption of seriality should give way. Transformational Grammar could be reconciled with the known experimental results.’ Unfortunately, the demise of the DTC encouraged many psycholinguists to look elsewhere for knowledge representations appropriate for human parsing. Many approaches gave up the identification of the parser’s knowledge representation with that representation designed to explain how children acquire language. In contrast, we believe that it is premature to abandon the