Satanjeev Banerjee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Satanjeev Banerjee is active.

Explore More

Publication

Featured researches published by Satanjeev Banerjee.

international conference on computational linguistics | 2002

An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet

Satanjeev Banerjee; Ted Pedersen

This paper presents an adaptation of Lesks dictionary-based word sense disambiguation algorithm. Rather than using a standard dictionary as the source of glosses for our approach, the lexical database WordNet is employed. This provides a rich hierarchy of semantic relations that our algorithm can exploit. This method is evaluated using the English lexical sample data from the SENSEVAL-2 word sense disambiguation exercise, and attains an overall accuracy of 32%. This represents a significant improvement over the 16% and 23% accuracy attained by variations of the Lesk algorithm used as benchmarks during the Senseval-2 comparative exercise among word sense disambiguation systems.

international conference on computational linguistics | 2003

Using measures of semantic relatedness for word sense disambiguation

Siddharth Patwardhan; Satanjeev Banerjee; Ted Pedersen

This paper generalizes the Adapted Lesk Algorithm of Banerjee and Pedersen (2002) to a method of word sense disambiguation based on semantic relatedness. This is possible since Lesks original algorithm (1986) is based on gloss overlaps which can be viewed as a measure of semantic relatedness. We evaluate a variety of measures of semantic relatedness when applied to word sense disambiguation by carrying out experiments using the English lexical sample data of SENSEVAL-2. We find that the gloss overlaps of Adapted Lesk and the semantic distance measure of Jiang and Conrath (1997) result in the highest accuracy.

international conference on computational linguistics | 2003

The design, implementation, and use of the Ngram statistics package

Satanjeev Banerjee; Ted Pedersen

The Ngram Statistics Package (NSP) is a flexible and easy-to-use software tool that supports the identification and analysis of Ngrams, sequences of N tokens in online text. We have designed and implemented NSP to be easy to customize to particular problems and yet remain general enough to serve a broad range of needs. This paper provides an introduction to NSP while raising some general issues in Ngram analysis, and summarizes several applications where NSP has been successfully employed. NSP is written in Perl and is freely available under the GNU Public License.

international conference on acoustics, speech, and signal processing | 2010

Using the Amazon Mechanical Turk for transcription of spoken language

Matthew Marge; Satanjeev Banerjee; Alexander I. Rudnicky

We investigate whether Amazons Mechanical Turk (MTurk) service can be used as a reliable method for transcription of spoken language data. Utterances with varying speaker demographics (native and non-native English, male and female) were posted on the MTurk marketplace together with standard transcription guidelines. Transcriptions were compared against transcriptions carefully prepared in-house through conventional (manual) means. We found that transcriptions from MTurk workers were generally quite accurate. Further, when transcripts for the same utterance produced by multiple workers were combined using the ROVER voting scheme, the accuracy of the combined transcript rivaled that observed for conventional transcription methods. We also found that accuracy is not particularly sensitive to payment amount, implying that high quality results can be obtained at a fraction of the cost and turnaround time of conventional methods.

international conference on human computer interaction | 2005

The necessity of a meeting recording and playback system, and the benefit of topic–level annotations to meeting browsing

Satanjeev Banerjee; Carolyn Penstein Rosé; Alexander I. Rudnicky

Much work in the area of Computer Supported Cooperative Work (CSCW) has targeted the problem of supporting meetings between collaborators who are non-collocated, enabling meetings to transcend boundaries of space. In this paper, we explore the beginnings of a proposed solution for allowing meetings to transcend time as well. The need for such a solution is motivated by a user survey in which busy professionals are questioned about meetings they have either missed or forgotten the important details about after the fact. Our proposed solution allows these professionals to transcend time in a sense by revisiting a recorded meeting that has been structured for quick retrieval of sought information. Such a solution supports complete recovery of prior discussions, allowing needed information to be retrieved quickly, and thus potentially facilitating the effective continuation of discussions from the past. We evaluate the proposed solution with a formal user study in which we measure the impact of the proposed structural annotations on retrieval of information. The results of the study show that participants took significantly less time to retrieve the answers when they had access to discourse structure based annotation than in a control condition in which they had access only to unannotated video recordings (p < 0.01, effect size 0.94 standard deviations).

meeting of the association for computational linguistics | 2007

UMND1: Unsupervised Word Sense Disambiguation Using Contextual Semantic Relatedness

Siddharth Patwardhan; Satanjeev Banerjee; Ted Pedersen

In this paper we describe an unsupervised WordNet-based Word Sense Disambiguation system, which participated (as UMND1) in the SemEval-2007 Coarse-grained English Lexical Sample task. The system disambiguates a target word by using WordNet-based measures of semantic relatedness to find the sense of the word that is semantically most strongly related to the senses of the words in the context of the target word. We briefly describe this system, the configuration options used for the task, and present some analysis of the results.

meeting of the association for computational linguistics | 2005

SenseRelate::TargetWord---A Generalized Framework for Word Sense Disambiguation

Siddharth Patwardhan; Satanjeev Banerjee; Ted Pedersen

Many words in natural language have different meanings when used in different contexts. Sense Relate: Target Word is a Perl package that disambiguates a target word in context by finding the sense that is most related to its neighbors according to a WordNet: Similarity measure of relatedness.

intelligent user interfaces | 2007

Segmenting meetings into agenda items by extracting implicit supervision from human note-taking

Satanjeev Banerjee; Alexander I. Rudnicky

Splitting a meeting into segments such that each segment contains discussions on exactly one agenda item is useful for tasks such as retrieval and summarization of agenda item discussions. However, accurate topic segmentation of meetings is a difficult task. In this paper, we investigate the idea of acquiring implicit supervision from human meeting participants to solve the segmentation problem. Specifically we have implemented and tested a note taking interface that gives value to users by helping them organize and retrieve their notes easily, but that also extracts a segmentation of the meeting based on note taking behavior. We show that the segmentation so obtained achieves a Pk value of 0.212 which improves upon an unsupervised baseline by 45% relative, and compares favorably with a current state-of-the-art algorithm. Most importantly, we achieve this performance without any features or algorithms in the classic sense.

robot and human interactive communication | 2004

A research platform for multi-agent dialogue dynamics

Thomas K. Harris; Satanjeev Banerjee; Alexander I. Rudnicky; June Sison; Kerry Bodine; Alan W. Black

Dialogue agents are often designed with the tacit assumption that at any time, there is only one agent and one human, and that their communication channel is exclusive. We are interested in evaluating complications that arise when multiple heterogeneous dialogue agents interact with one or more human interlocutors, and their communication channel is necessarily shared. In This work we describe a multi-agent dialogue test-bed that we have implemented to study such dialogue coordination issues. We also describe an initial pilot study that uses this test-bed to experiment with a very simple addressee disambiguation algorithm.

language and technology conference | 2006

SmartNotes: Implicit Labeling of Meeting Data through User Note-Taking and Browsing

Satanjeev Banerjee; Alexander I. Rudnicky

We have implemented SmartNotes, a system that automatically acquires labeled meeting data as users take notes during meetings and browse the notes afterwards. Such data can enable meeting understanding components such as topic and action item detectors to automatically improve their performance over a sequence of meetings. The SmartNotes system consists of a laptop based note taking application, and a web based note retrieval system. We shall demonstrate the functionalities of this system, and will also demonstrate the labeled data obtained during typical meetings and browsing sessions.

Explore More