Ayan Bandyopadhyay | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ayan Bandyopadhyay is active.

Explore More

Publication

Featured researches published by Ayan Bandyopadhyay.

web science | 2012

Query expansion for microblog retrieval

Ayan Bandyopadhyay; Kripabandhu Ghosh; Prasenjit Majumder; Mandar Mitra

The extreme brevity of Microblog posts (such as ‘tweets’) exacerbates the well-known vocabulary mismatch problem when retrieving tweets in response to user queries. In this study, we explore various query expansion approaches as a way to address this problem. We use the Web as a source of query expansion terms. We also tried a variation of a standard pseudo-relevance feedback method. Results on the TREC 2011 Microblog test data (TWEETS11 corpus) are very promising – significant improvements are obtained over a baseline retrieval strategy that uses no query expansion. Since many of the TREC queries were oriented towards the news genre, we also tried using only news sites (BBC and NYTIMES) in the hope that these would be a cleaner, less noisy source for expansion terms. This turned out to be counter-productive.

ACM Transactions on Asian Language Information Processing | 2010

The FIRE 2008 Evaluation Exercise

Prasenjit Majumder; Mandar Mitra; Dipasree Pal; Ayan Bandyopadhyay; Samaresh Maiti; Sukomal Pal; Deboshree Modak; Sucharita Sanyal

The aim of the Forum for Information Retrieval Evaluation (FIRE) is to create an evaluation framework in the spirit of TREC (Text REtrieval Conference), CLEF (Cross-Language Evaluation Forum), and NTCIR (NII Test Collection for IR Systems), for Indian language Information Retrieval. The first evaluation exercise conducted by FIRE was completed in 2008. This article describes the test collections used at FIRE 2008, summarizes the approaches adopted by various participants, discusses the limitations of the datasets, and outlines the tasks planned for the next iteration of FIRE.

international acm sigir conference on research and development in information retrieval | 2008

Text collections for FIRE

Prasenjit Majumder; Mandar Mitra; Dipasree Pal; Ayan Bandyopadhyay; Samaresh Maiti; Sukanya Mitra; Aparajita Sen; Sukomal Pal

The aim of the Forum for Information Retrieval Evaluation (FIRE) is to create a Cranfield-like evaluation framework in the spirit of TREC, CLEF and NTCIR, for Indian Language Information Retrieval. For the first year, six Indian languages have been selected: Bengali, Hindi, Marathi, Punjabi, Tamil, and Telugu. This poster describes the tasks as well as the document and topic collections that are to be used at the FIRE workshop.

FIRE | 2013

Overview of FIRE 2011

Sauparna Palchowdhury; Prasenjit Majumder; Dipasree Pal; Ayan Bandyopadhyay; Mandar Mitra

We provide an overview of FIRE 2011, the third evaluation exercise conducted by the Forum for Information Retrieval Evaluation (FIRE). Our main focus is on the Adhoc task. We describe how the FIRE 2011 test collections were constructed. We also provide a brief overview of the approaches adopted by the Adhoc task participants.

international acm sigir conference on research and development in information retrieval | 2016

Retrievability of Code Mixed Microblogs

Debasis Ganguly; Ayan Bandyopadhyay; Mandar Mitra; Gareth J. F. Jones

Mixing multiple languages within the same document, a phenomenon called (linguistic) code mixing or code switching, is a frequent trend among multilingual users of social media. In the context of information retrieval (IR), code mixing may affect retrieval effectiveness due to the mixing of different vocabularies with different collection statistics within a single collection of documents. In this paper, we investigate the indexing and retrieval strategies for a mixed collection of documents, comprising of code-mixed and the monolingual documents. In particular, we address three alternative modes of indexing, namely (a) a single index for the two sub-collections; (b) a separate index for each sub-collection; and (c) a clustered index with two individual sub-collection statistics coupled with the overall one. We make use of the expected retrievability scores of the two classes of documents to empirically show that indexing strategies (a) and (b) mostly retrieve the monolingual documents at top ranks with standard retrieval approaches. Our experiments show that, by contrast, the clustered index (c) is able to alleviate this problem by improving the retrievability of the code-mixed documents.

Information Systems Frontiers | 2018

An Embedding Based IR Model for Disaster Situations

Ayan Bandyopadhyay; Debasis Ganguly; Mandar Mitra; Sanjoy Kumar Saha; Gareth J. F. Jones

Twitter (http://twitter.com) is one of the most popular social networking platforms. Twitter users can easily broadcast disaster-specific information, which, if effectively mined, can assist in relief operations. However, the brevity and informal nature of tweets pose a challenge to Information Retrieval (IR) researchers. In this paper, we successfully use word embedding techniques to improve ranking for ad-hoc queries on microblog data. Our experiments with the ‘Social Media for Emergency Relief and Preparedness’ (SMERP) dataset provided at an ECIR 2017 workshop show that these techniques outperform conventional term-matching based IR models. In addition, we show that, for the SMERP task, our word embedding based method is more effective if the embeddings are generated from the disaster specific SMERP data, than when they are trained on the large social media collection provided for the TREC (http://trec.nist.gov/) 2011 Microblog track dataset.

Advances in Focused Retrieval | 2009

Indian Statistical Institute at INEX 2008 Adhoc Track

Sukomal Pal; Mandar Mitra; Debasis Ganguly; Samaresh Maiti; Ayan Bandyopadhyay; Aparajita Sen; Sukanya Mitra

This paper describes the work that we did at Indian Statistical Institute towards XML retrieval for INEX 2008. Besides the Vector Space Model (VSM) that we have been using since INEX 2006, this year we implemented the Language Modeling (LM) approach in our text retrieval system (SMART) to retrieve XML elements against the INEX Adhoc queries. Like last year, we considered Content-Only (CO) queries and submitted three runs for the FOCUSED sub-task. Two runs are based on the Vector Space Model and one uses the Language Model. One of the VSM-based runs (VSMfbElts0.4) retrieves sub-document-level elements. Both the other runs (VSMfb and LM-nofb-0.20) retrieve elements only at the whole-document level. We applied blind feedback for both the VSM-based runs; no query expansion was used in the LM-based run. In general, the relative performance of our document-level runs is respectable (ranked 15/61 and 22/61 according to the official metric). Though our element retrieval run does reasonably (ranked 16/61 by iP[0.01]) according to the early-precision metrics, we think there is plenty of scope to improve our element retrieval strategy. Our immediate next task is therefore to focus on how to improve true element-level retrieval.

CLEF (Online Working Notes/Labs/Workshop) | 2012