Yoad Winter
Utrecht University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yoad Winter.
Journal of Logic, Language and Information | 2000
Joost Zwarts; Yoad Winter
This paper introduces a compositional semantics of locativeprepositional phrases which is based on a vector space ontology.Model-theoretic properties of prepositions like monotonicity andconservativity are defined in this system in a straightforward way.These notions are shown to describe central inferences with spatialexpressions and to account for the grammaticality of prepositionmodification. Model-theoretic constraints on the set of possibleprepositions in natural language are specified, similar to the semanticuniversals of Generalized Quantifier Theory.
international acm sigir conference on research and development in information retrieval | 2001
Ron Bekkerman; Ran El-Yaniv; Naftali Tishby; Yoad Winter
We describe a text categorization approach that is based on a combination of feature distributional clusters with a support vector machine (SVM) classifier. Our feature selection approach employs distributional clustering of words via the recently introducedinformation bottleneck method, which generates a more efficientword-clusterrepresentation of documents. Combined with the classification power of an SVM, this method yields high performance text categorization that can outperform other recent methods in terms of categorization accuracy and representation efficiency. Comparing the accuracy of our method with other techniques, we observe significant dependency of the results on the data set. We discuss the potential reasons for this dependency.
Natural Language Semantics | 2000
Yoad Winter
Sentences with multiple occurrences of plural definites give rise to certain effects suggesting that distributivity should be modeled by polyadic operations. Yet in this paper it is argued that the simpler treatment of distributivity using unary universal quantification should be retained. Seemingly polyadic effects are claimed to be restricted to definite NPs. This fact is accounted for by the special anaphoric (dependent) use of definites. Further evidence concerning various plurals, island constraints, and cumulative quantification is shown to support this claim. In addition, it is shown that the evidence against a simple atomic version of unary distributivity is not decisive either. In the (uncommon) cases where distributivity with definites is not strictly atomic, they can be analyzed as dependent on implicit quantifiers.
Linguistic Inquiry | 2002
Yoad Winter
This article analyzes the interactionsof semantic number, morphological number, and quantification. It argues that the traditional typology of distributive and collective predicates is unsuitable for a truth-conditional theory of plurality. A new test is proposed for classifying the semantic number of predicates according to their behavior with singular/plural quantificational noun phrases such as every/all student(s) and no teacher(s). Predicates that are (in)sensitive to such number variations are called atom/set predicates, respectively, and it is shown that this distinction cuts across the traditional distributive/ collective typology. The processes that govern the semantic number of sentences are reanalyzed in these terms.
Computational Linguistics | 2002
Yoad Winter
Since the early work of Montague, Boolean semantics and its subfield of generalized quantifier theory have become the model-theoretic foundation for the study of meaning in natural languages. This book uses this framework to develop a new semantic theory of central linguistic phenomena involving coordination, plurality, and scope. The proposed theory makes use of the standard Boolean interpretation of conjunction, a choice-function account of indefinites, and a novel semantics of plurals that is not based on the distributive/collective distinction. The key to unifying these mechanisms is a version of Montagovian semantics that is augmented by flexibility principles: semantic operations that have no counterpart in phonology.This is the first book to cover these areas in a way that is both linguistically comprehensive and formally explicit. On one hand, it addresses questions of primarily linguistic concern: the semantic functions of words like and and or in different languages, the interpretation of indefinites and their scope, and the semantic typology of noun phrases and predicates. On the other hand, it addresses formal questions that are motivated by the treatment of these linguistic problems: the use of Boolean algebras in linguistics, the proper formalization of choice functions within generalized quantifier theory, and the extension of this theory to the domain of plurality. While primarily intended for readers with a background in theoretical linguistics, the book will also be of interest to researchers and advanced students in logic, computational linguistics, philosophy of language, and artificial intelligence.
meeting of the association for computational linguistics | 2005
Roy Bar-Haim; Khalil Sima'an; Yoad Winter
A major architectural decision in designing a disambiguation model for segmentation and Part-of-Speech (POS) tagging in Semitic languages concerns the choice of the input-output terminal symbols over which the probability distributions are defined. In this paper we develop a segmenter and a tagger for Hebrew based on Hidden Markov Models (HMMs). We start out from a morphological analyzer and a very small morphologically annotated corpus. We show that a model whose terminal symbols are word segments (=morphemes), is advantageous over a word-level model for the task of POS tagging. However, for segmentation alone, the morpheme-level model has no significant advantage over the word-level model. Error analysis shows that both models are not adequate for resolving a common type of segmentation ambiguity in Hebrew -- whether or not a word in a written text is prefixed by a definiteness marker. Hence, we propose a morpheme-level model where the definiteness morpheme is treated as a possible feature of morpheme terminals. This model exhibits the best overall performance, both in POS tagging and in segmentation. Despite the small size of the annotated corpus available for Hebrew, the results achieved using our best model are on par with recent results on Modern Standard Arabic.
Natural Language Engineering | 2008
Roy Bar-Haim; Khalil Sima'an; Yoad Winter
Words in Semitic texts often consist of a concatenation of word segments, each corresponding to a part-of-speech (POS) category. Semitic words may be ambiguous with regard to their segmentation as well as to the POS tags assigned to each segment. When designing POS taggers for Semitic languages, a major architectural decision concerns the choice of the atomic input tokens (terminal symbols). If the tokenization is at the word level, the output tags must be complex, and represent both the segmentation of the word and the POS tag assigned to each word segment. If the tokenization is at the segment level, the input itself must encode the different alternative segmentations of the words, while the output consists of standard POS tags. Comparing these two alternatives is not trivial, as the choice between them may have global effects on the grammatical model. Moreover, intermediate levels of tokenization between these two extremes are conceivable, and, as we aim to show, beneficial. To the best of our knowledge, the problem of tokenization for POS tagging of Semitic languages has not been addressed before in full generality. In this paper, we study this problem for the purpose of POS tagging of Modern Hebrew texts. After extensive error analysis of the two simple tokenization models, we propose a novel, linguistically motivated, intermediate tokenization model that gives better performance for Hebrew over the two initial architectures. Our study is based on the well-known hidden Markov models (HMMs). We start out from a manually devised morphological analyzer and a very small annotated corpus, and describe how to adapt an HMM-based POS tagger for both tokenization architectures. We present an effective technique for smoothing the lexical probabilities using an untagged corpus, and a novel transformation for casting the segment-level tagger in terms of a standard, word-level HMM implementation. The results obtained using our model are on par with the best published results on Modern Standard Arabic, despite the much smaller annotated corpus available for Modern Hebrew.
Logic Journal of The Igpl \/ Bulletin of The Igpl | 2003
Yaroslav Fyodorov; Yoad Winter; Nissim Francez
This paper develops a version of Natural Logic – an inference system that works directly on natural language syntactic representations, with no intermediate translation to logical formulae. Following work by Sánchez, we develop a small fragment that computes semantic order relations between derivation trees in Categorial Grammar. The proposed system has the following new characteristics: (i) It uses orderings between derivation trees as purely syntactic units, derivable by a formal calculus. (ii) The system is extended for conjunctive phenomena like coordination and relative clauses. This allows a simple account of non-monotonic expressions that are reducible to conjunctions of monotonic ones. (iii) A decision procedure for provability is developed for a fragment of Natural Logic.
meeting of the association for computational linguistics | 2007
Saib Manour; Khalil Sima'an; Yoad Winter
We propose an enhanced Part-of-Speech (POS) tagger of Semitic languages that treats Modern Standard Arabic (henceforth Arabic) and Modern Hebrew (henceforth Hebrew) using the same probabilistic model and architectural setting. We start out by porting an existing Hidden Markov Model POS tagger for Hebrew to Arabic by exchanging a morphological analyzer for Hebrew with Buckwalters (2002) morphological analyzer for Arabic. This gives state-of-the-art accuracy (96.12%), comparable to Habash and Rambows (2005) analyzer-based POS tagger on the same Arabic datasets. However, further improvement of such analyzer-based tagging methods is hindered by the incomplete coverage of standard morphological analyzer (Bar Haim et al., 2005). To overcome this coverage problem we supplement the output of Buckwalters analyzer with synthetically constructed analyses that are proposed by a model which uses character information (Diab et al., 2004) in a way that is similar to Nakagawas (2004) system for Chinese and Japanese. A version of this extended model that (unlike Nakagawa) incorporates synthetically constructed analyses also for known words achieves 96.28% accuracy on the standard Arabic test set.
Journal of Logic, Language and Information | 2006
Anna Zamansky; Nissim Francez; Yoad Winter
This paper develops an inference system for natural language within the ‘Natural Logic’ paradigm as advocated by van Benthem (1997), Sánchez (1991) and others. The system that we propose is based on the Lambek calculus and works directly on the Curry-Howard counterparts for syntactic representations of natural language, with no intermediate translation to logical formulae. The Lambek-based system we propose extends the system by Fyodorov et~al. (2003), which is based on the Ajdukiewicz/Bar-Hillel (AB) calculus Bar Hillel, (1964). This enables the system to deal with new kinds of inferences, involving relative clauses, non-constituent coordination, and meaning postulates that involve complex expressions. Basing the system on the Lambek calculus leads to problems with non-normalized proof terms, which are treated by using normalization axioms.