Ted Pedersen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ted Pedersen is active.

Explore More

Publication

Featured researches published by Ted Pedersen.

international conference on computational linguistics | 2002

An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet

Satanjeev Banerjee; Ted Pedersen

This paper presents an adaptation of Lesks dictionary-based word sense disambiguation algorithm. Rather than using a standard dictionary as the source of glosses for our approach, the lexical database WordNet is employed. This provides a rich hierarchy of semantic relations that our algorithm can exploit. This method is evaluated using the English lexical sample data from the SENSEVAL-2 word sense disambiguation exercise, and attains an overall accuracy of 32%. This represents a significant improvement over the 16% and 23% accuracy attained by variations of the Lesk algorithm used as benchmarks during the Senseval-2 comparative exercise among word sense disambiguation systems.

international conference on computational linguistics | 2003

Using measures of semantic relatedness for word sense disambiguation

Siddharth Patwardhan; Satanjeev Banerjee; Ted Pedersen

This paper generalizes the Adapted Lesk Algorithm of Banerjee and Pedersen (2002) to a method of word sense disambiguation based on semantic relatedness. This is possible since Lesks original algorithm (1986) is based on gloss overlaps which can be viewed as a measure of semantic relatedness. We evaluate a variety of measures of semantic relatedness when applied to word sense disambiguation by carrying out experiments using the English lexical sample data of SENSEVAL-2. We find that the gloss overlaps of Adapted Lesk and the semantic distance measure of Jiang and Conrath (1997) result in the highest accuracy.

north american chapter of the association for computational linguistics | 2004

WordNet::Similarity: measuring the relatedness of concepts

Ted Pedersen; Siddharth Patwardhan; Jason Michelizzi

WordNet::Similarity is a freely available software package that makes it possible to measure the semantic similarity and relatedness between a pair of concepts (or synsets). It provides six measures of similarity, and three measures of relatedness, all of which are based on the lexical database WordNet. These measures are implemented as Perl modules which take as input two concepts, and return a numeric value that represents the degree to which they are similar or related.

international conference on computational linguistics | 2003

The design, implementation, and use of the Ngram statistics package

Satanjeev Banerjee; Ted Pedersen

The Ngram Statistics Package (NSP) is a flexible and easy-to-use software tool that supports the identification and analysis of Ngrams, sequences of N tokens in online text. We have designed and implemented NSP to be easy to customize to particular problems and yet remain general enough to serve a broad range of needs. This paper provides an introduction to NSP while raising some general issues in Ngram analysis, and summarizes several applications where NSP has been successfully employed. NSP is written in Perl and is freely available under the GNU Public License.

north american chapter of the association for computational linguistics | 2003

An evaluation exercise for word alignment

Rada Mihalcea; Ted Pedersen

This paper presents the task definition, resources, participating systems, and comparative results for the shared task on word alignment, which was organized as part of the HLT/NAACL 2003 Workshop on Building and Using Parallel Texts. The shared task included Romanian-English and English-French sub-tasks, and drew the participation of seven teams from around the world.

international conference on computational linguistics | 2005

Name discrimination by clustering similar contexts

Ted Pedersen; Amruta Purandare; Anagha Kulkarni

It is relatively common for different people or organizations to share the same name. Given the increasing amount of information available online, this results in the ever growing possibility of finding misleading or incorrect information due to confusion caused by an ambiguous name. This paper presents an unsupervised approach that resolves name ambiguity by clustering the instances of a given name into groups, each of which is associated with a distinct underlying entity. The features we employ to represent the context of an ambiguous name are statistically significant bigrams that occur in the same context as the ambiguous name. From these features we create a co–occurrence matrix where the rows and columns represent the first and second words in bigrams, and the cells contain their log–likelihood scores. Then we represent each of the contexts in which an ambiguous name appears with a second order context vector. This is created by taking the average of the vectors from the co–occurrence matrix associated with the words that make up each context. This creates a high dimensional “instance by word” matrix that is reduced to its most significant dimensions by Singular Value Decomposition (SVD). The different “meanings” of a name are discriminated by clustering these second order context vectors with the method of Repeated Bisections. We evaluate this approach by conflating pairs of names found in a large corpus of text to create ambiguous pseudo-names. We find that our method is significantly more accurate than the majority classifier, and that the best results are obtained by having a small amount of local context to represent the instance, along with a larger amount of context for identifying features, or vice versa.

north american chapter of the association for computational linguistics | 2001

A decision tree of bigrams is an accurate predictor of word sense

Ted Pedersen

This paper presents a corpus-based approach to word sense disambiguation where a decision tree assigns a sense to an ambiguous word based on the bigrams that occur nearby. This approach is evaluated using the sense-tagged corpora from the 1998 SENSEVAL word sense disambiguation exercise. It is more accurate than the average results reported for 30 of 36 words, and is more accurate than the best results for 19 of 36 words.

north american chapter of the association for computational linguistics | 2009

WordNet::SenseRelate::AllWords - A Broad Coverage Word Sense Tagger that Maximizes Semantic Relatedness

Ted Pedersen; Varada Kolhatkar

WordNet::SenseRelate::AllWords is a freely available open source Perl package that assigns a sense to every content word (known to WordNet) in a text. It finds the sense of each word that is most related to the senses of surrounding words, based on measures found in WordNet::Similarity. This method is shown to be competitive with results from recent evaluations including Senseval-2 and Senseval-3.

meeting of the association for computational linguistics | 2005

Word Alignment for Languages with Scarce Resources

Joel D. Martin; Rada Mihalcea; Ted Pedersen

This paper presents the task definition, resources, participating systems, and comparative results for the shared task on word alignment, which was organized as part of the ACL 2005 Workshop on Building and Using Parallel Texts. The shared task included English-Inuktitut, Romanian-English, and English-Hindi sub-tasks, and drew the participation of ten teams from around the world with a total of 50 systems.

Computational Linguistics | 2008

Empiricism is not a matter of faith

Ted Pedersen

“Hurrah, this is it!” you exclaim as you set down the most recent issue of Computational Linguistics. “This Zigglebottom Tagger is exactly what I need!” A gleeful smile crosses your face as you imagine how your system will improve once you replace your tagger from graduate school with the clearly superior Zigglebottom method. You rub your hands together and page through the article looking for a way to obtain the tagger, but nothing is mentioned. That doesn’t dampen your enthusiasm, so you search the Web, but still nothing turns up. You persist though; those 17 pages of statistically significant results really are impressive. So you e-mail Zigglebottom asking for the tagger. Some days, or perhaps weeks, later, you get a hesitant reply saying: “We’re planning to release a demo version soon, stay tuned . . . ” Or perhaps: “We don’t normally do this, but we can send you a copy (informally) once we clean it up a bit . . . ” Or maybe: “We can’t actually give you the tagger, but you should be able to re-implement it from the article. Just let us know if you have any questions . . . ” Still having faith, and lacking any better alternative, you decide to re-implement the Zigglebottom Tagger. Despite three months of on-and-off effort, the end result provides just the same accuracy as your old tagger, which is nowhere near that reported in the article. Feeling sheepish, you conclude you must have misunderstood something, or maybe there’s a small detail missing from the article. So you contact Zigglebottom again and explain your predicament. He eventually responds: “We’ll look into this right away and get back to you . . . ” A year passes. You have the good fortune to bump into Zigglebottom at the Annual Meeting of the Association for Computational Linguistics (ACL). You angle for a seat next to him during a night out, and you buy him a few beers before you politely resume your quest for the tagger. Finally, he confesses rather glumly: “My student Pifflewhap was the one who did the implementation and ran the experiments, and if he’d only respond to my e-mail I could ask him to tell you how to get it working, but he’s graduated now and is apparently too busy to reply.” After a fewmore beers, Zigglebottom finally agrees to give you the tagger: “I’ll send you the version of the code I have, no promises though!” And true to his word, what he sends is incomplete and undocumented. It doesn’t compile easily, and it’s engineered so that a jumble of programs must be run in an undisclosed kabalistic sequence known only to (perhaps) the elusive Pifflewhap. You try your best to make it work every now

Explore More