Saif Mohammad
University of Toronto
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Saif Mohammad.
north american chapter of the association for computational linguistics | 2009
Saif Mohammad; Bonnie J. Dorr; Melissa Egan; Ahmed Hassan; Pradeep Muthukrishan; Vahed Qazvinian; Dragomir R. Radev; David M. Zajic
The number of research publications in various disciplines is growing exponentially. Researchers and scientists are increasingly finding themselves in the position of having to quickly understand large amounts of technical material. In this paper we present the first steps in producing an automatically generated, readily consumable, technical survey. Specifically we explore the combination of citation information and summarization techniques. Even though prior work (Teufel et al., 2006) argues that citation text is unsuitable for summarization, we show that in the framework of multi-document survey creation, citation texts can play a crucial role.
empirical methods in natural language processing | 2008
Saif Mohammad; Bonnie J. Dorr; Graeme Hirst
Knowing the degree of antonymy between words has widespread applications in natural language processing. Manually-created lexicons have limited coverage and do not include most semantically contrasting word pairs. We present a new automatic and empirical measure of antonymy that combines corpus statistics with the structure of a published thesaurus. The approach is evaluated on a set of closest-opposite questions, obtaining a precision of over 80%. Along the way, we discuss what humans consider antonymous and how antonymy manifests itself in utterances.
empirical methods in natural language processing | 2006
Saif Mohammad; Graeme Hirst
We propose a framework to derive the distance between concepts from distributional measures of word co-occurrences. We use the categories in a published thesaurus as coarse-grained concepts, allowing all possible distance values to be stored in a concept--concept matrix roughly .01% the size of that created by existing measures. We show that the newly proposed concept-distance measures outperform traditional distributional word-distance measures in the tasks of (1) ranking word pairs in order of semantic distance, and (2) correcting real-word spelling errors. In the latter task, of all the WordNet-based measures, only that proposed by Jiang and Conrath outperforms the best distributional concept-distance measures.
empirical methods in natural language processing | 2009
Yuval Marton; Saif Mohammad; Philip Resnik
Strictly corpus-based measures of semantic distance conflate co-occurrence information pertaining to the many possible senses of target words. We propose a corpus-thesaurus hybrid method that uses soft constraints to generate word-senseaware distributional profiles (DPs) from coarser concept DPs (derived from a Roget-like thesaurus) and sense-unaware traditional word DPs (derived from raw text). Although it uses a knowledge source, the method is not vocabulary-limited: if the target word is not in the thesaurus, the method falls back gracefully on the words co-occurrence information. This allows the method to access valuable information encoded in a lexical resource, such as a thesaurus, while still being able to effectively handle domain-specific terms and named entities. Experiments on word-pair ranking by semantic distance show the new hybrid method to be superior to others.
international conference on computational linguistics | 2003
Saif Mohammad; Ted Pedersen
This paper describes and evaluates a simple modification to the Brill Part-of-Speech Tagger. In its standard distribution the Brill Tagger allows manual assignment of a part-of-speech tag to a word prior to tagging. However, it may change it to another tag during processing. We suggest a change that guarantees that the pre-tag remains unchanged and ensures that it is used throughout the tagging process. Our method of guaranteed pre-tagging is appropriate when the tag of a word is known for certain, and is intended to help improve the accuracy of tagging by providing a reliable anchor or seed around which to tag.
meeting of the association for computational linguistics | 2007
Saif Mohammad; Graeme Hirst; Philip Resnik
Words in the context of a target word have long been used as features by supervised word-sense classifiers. Mohammad and Hirst (2006a) proposed a way to determine the strength of association between a sense or concept and co-occurring words---the distributional profile of a concept (DPC)---without the use of manually annotated data. We implemented an unsupervised naive Bayes word sense classifier using these DPCs that was best or within one percentage point of the best unsupervised systems in the Multilingual Chinese-English Lexical Sample Task (task #5) and the English Lexical Sample Task (task #17). We also created a simple PMI-based classifier to attempt the English Lexical Substitution Task (task #10); however, its performance was poor.
conference of the european chapter of the association for computational linguistics | 2006
Saif Mohammad; Graeme Hirst
empirical methods in natural language processing | 2007
Saif Mohammad; Iryna Gurevych; Graeme Hirst; Torsten Zesch
conference on computational natural language learning | 2004
Saif Mohammad; Ted Pedersen
Archive | 2008
Saif Mohammad