Shuly Wintner
University of Haifa
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shuly Wintner.
language resources and evaluation | 2008
Alon Itai; Shuly Wintner
We describe a suite of standards, resources and tools for computational encoding and processing of Modern Hebrew texts. These include an array of XML schemas for representing linguistic resources; a variety of text corpora, raw, automatically processed and manually annotated; lexical databases, including a broad-coverage monolingual lexicon, a bilingual dictionary and a WordNet; and morphological processors which can analyze, generate and disambiguate Hebrew word forms. The resources are developed under centralized supervision, so that they are compatible with each other. They are freely available and many of them have already been used for several applications, both academic and industrial.
empirical methods in natural language processing | 2011
Gennadi Lembersky; Noam Ordan; Shuly Wintner
We investigate the differences between language models compiled from original target-language texts and those compiled from texts manually translated to the target language. Corroborating established observations of Translation Studies, we demonstrate that the latter are significantly better predictors of translated sentences than the former, and hence fit the reference set better. Furthermore, translated texts yield better language models for statistical machine translation than original texts.
Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition | 2007
Kenji Sagae; Eric Davis; Alon Lavie; Brian MacWhinney; Shuly Wintner
Corpora of child language are essential for psycholinguistic research. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. We describe an ongoing project that aims to annotate the English section of the CHILDES database with grammatical relations in the form of labeled dependency structures. To date, we have produced a corpus of over 65,000 words with manually curated gold-standard grammatical relation annotations. Using this corpus, we have developed a highly accurate data-driven parser for English CHILDES data. The parser and the manually annotated data are freely available for research purposes.
Journal of Child Language | 2010
Kenji Sagae; Eric Davis; Alon Lavie; Brian MacWhinney; Shuly Wintner
Corpora of child language are essential for research in child language acquisition and psycholinguistics. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. We describe a project whose goal is to annotate the English section of the CHILDES database with grammatical relations in the form of labeled dependency structures. We have produced a corpus of over 18,800 utterances (approximately 65,000 words) with manually curated gold-standard grammatical relation annotations. Using this corpus, we have developed a highly accurate data-driven parser for the English CHILDES data, which we used to automatically annotate the remainder of the English section of CHILDES. We have also extended the parser to Spanish, and are currently working on supporting more languages. The parser and the manually and automatically annotated data are freely available for research purposes.
Journal of Linguistics | 2000
Shuly Wintner
This paper suggests an analysis of Modern Hebrew noun phrases in the framework of HPSG. It focuses on the peculiar properties of the definite article, including the requirement for definiteness agreement among various elements in the noun phrase, definiteness inheritance in construct-state nominals, the fact that the article does not combine with constructs and the similarities between construct-state nouns and adjectives. Central to our analysis is the assumption that the Hebrew definite article is an affix, rather than a clitic or a stand-alone word. Several arguments, from all levels of linguistic representation, are provided to justify this claim. Adopting the lexical hypothesis, we conclude that the article combines with nominals in the lexicon, and is no longer available for syntactic processes. This leads to an analysis of noun phrases as NPs, rather than as DPs; we show that such a view is compatible with accepted criteria for headedness. We provide an HPSG analysis that covers the above mentioned phenomena, correctly predicting the location of the definite article in constructs, accounting for definiteness agreement and definiteness inheritance constraints, and yielding similar structures for the two major ways of expressing genitive relations in Hebrew.
Artificial Intelligence Review | 2004
Shuly Wintner
This paper reviews the current state of the art in Natural LanguageProcessing for Hebrew, both theoretical and practical. The Hebrewlanguage, like other Semitic languages, poses special challenges fordevelopers of programs for natural language processing: the writingsystem, rich morphology, unique word formation process of roots andpatterns, lack of linguistic corpora that document language usage, allcontribute to making computational approaches to Hebrew challenging. The paper briefly reviews the field of computational linguistics andthe problems it addresses, describes the special difficulties inherentto Hebrew (as well as to other Semitic languages), surveys a widevariety of past and ongoing works and attempts to characterize futureneeds and possible solutions.
Literary and Linguistic Computing | 2015
Vered Volansky; Noam Ordan; Shuly Wintner
Much research in translation studies indicates that translated texts are ontologically different from original non-translated ones. Translated texts, in any language, can be considered a dialect of that language, known as ‘translationese’. Several characteristics of translationese have been proposed as universal in a series of hypotheses. In this work, we test these hypotheses using a computational methodology that is based on supervised machine learning. We define several classifiers that implement various linguistically informed features, and assess the degree to which different sets of features can distinguish between translated and original texts. We demonstrate that some feature sets are indeed good indicators of translationese, thereby corroborating some hypotheses, whereas others perform much worse (sometimes at chance level), indicating that some ‘universal’ assumptions have to be reconsidered. In memoriam: Miriam Shlesinger, 1947–2012
Natural Language Engineering | 2008
Shlomo Yona; Shuly Wintner
Morphological analysis is a crucial component of several natural language processing tasks, especially for languages with a highly productive morphology, where stipulating a full lexicon of surface forms is not feasible. This paper describes HAMSAH (HAifa Morphological System for Analyzing Hebrew), a morphological processor for Modern Hebrew, based on finite-state linguistically motivated rules and a broad coverage lexicon. The set of rules comprehensively covers the morphological, morpho-phonological and orthographic phenomena that are observable in contemporary Hebrew texts. Reliance on finite-state technology facilitates the construction of a highly efficient, completely bidirectional system for analysis and generation.
empirical methods in natural language processing | 2011
Yulia Tsvetkov; Shuly Wintner
We propose a framework for using multiple sources of linguistic information in the task of identifying multiword expressions in natural language texts. We define various linguistically motivated classification features and introduce novel ways for computing them. We then manually define interrelationships among the features, and express them in a Bayesian network. The result is a powerful classifier that can identify multiword expressions of various types and multiple syntactic constructions in text corpora. Our methodology is unsupervised and language-independent; it requires relatively few language resources and is thus suitable for a large number of languages. We report results on English, French, and Hebrew, and demonstrate a significant improvement in identification accuracy, compared with less sophisticated baselines.
Computational Linguistics | 2009
Shuly Wintner
One of the most thought-provoking proposals I have heard recently came from Lori Levin during the discussion that concluded the EACL 2009 Workshop on the Interaction between Linguistics and Computational Linguistics. Lori proposed that we should form an ACL Special Interest Group on Linguistics. At first blush, I found the idea weird: Isn’t it a little like the American Academy of Pediatrics forming a SIG on Medicine (or on Children)? Second thoughts, however, revealed the appropriateness of the idea: In essence, linguistics is altogether missing in contemporary natural language engineering research. In the following pages I want to call for the return of linguistics to computational linguistics. The last two decades were marked by a complete paradigm shift in computational linguistics. Frustrated by the inability of applications based on explicit linguistic knowledge to scale up to real-world needs, and, perhaps more deeply, frustrated with the dominating theories in formal linguistics, we looked instead to corpora that reflect language use as our sources of (implicit) knowledge. With the shift in methodology came a subtle change in the goals of our entire enterprise. Two decades ago, a computational linguist could be interested in developing NLP applications; or in formalizing (and reasoning about) linguistic processes. These days, it is the former only. A superficial look at the papers presented in our main conferences reveals that the vast majority of them are engineering papers, discussing engineering solutions to practical problems. Virtually none addresses fundamental issues in linguistics. There’s nothing wrong with engineering work, of course. Every school of technology has departments of engineering in areas as diverse as Chemical Engineering, Mechanical Engineering, Aeronautical Engineering, or Biomedical Engineering; there’s no reason why there shouldn’t also be a discipline of Natural Language Engineering. But in the more established disciplines, engineering departments conduct research that is informed by some well-defined branch of science. Chemical engineers study chemistry; electrical engineers study physics; aeronautical engineers study dynamics; and biomedical engineers study biology, physiology, medical sciences, and so on. The success of engineering is also in part due to the choice of the “right” mathematics. The theoretical development of several scientific areas, notably physics, went alongside mathematical developments. Physics could not have accounted for natural phenomena without such mathematical infrastructure. For example, the development of (partial) differential equations went hand in hand with some of the greatest achievement in physics, and this branch of mathematics later turned out to be applicable also to chemistry, electrical engineering, and economics, among many other scientific fields.