Debasis Ganguly | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Debasis Ganguly is active.

Explore More

Publication

Featured researches published by Debasis Ganguly.

cross language evaluation forum | 2017

Overview of the CLEF 2017 Personalised Information Retrieval Pilot Lab (PIR-CLEF 2017)

Gabriella Pasi; Gareth J. F. Jones; Stefania Marrara; Camilla Sanvitto; Debasis Ganguly; Procheta Sen

The Personalised Information Retrieval Pilot Lab (PIR-CLEF 2017) provides a forum for the exploration of evaluation of personalised approaches to information retrieval (PIR). The Pilot Lab provides a preliminary edition of a Lab task dedicated to personalised search. The PIR-CLEF 2017 Pilot Task is the first evaluation benchmark based on the Cranfield paradigm, with the potential benefits of producing evaluation results that are easily reproducible. The task is based on search sessions over a subset of the ClueWeb12 collection, undertaken by 10 users by using a clearly defined and novel methodology. The collection provides data gathered by the activities undertaken during the search sessions by each participant, including details of relevant documents as marked by the searchers. The intention of the collection is to allow research groups working on PIR to both experience with and provide feedback about our proposed PIR evaluation methodology with the aim of launching a more formal PIR Lab at CLEF 2018.

international acm sigir conference on research and development in information retrieval | 2018

Procrastination is the Thief of Time: Evaluating the Effectiveness of Proactive Search Systems

Procheta Sen; Debasis Ganguly; Gareth J. F. Jones

Users of current search systems actively interact with the system to complete their search task. This can encompass formulating and reformulating a series queries expressing evolving of different information needs. We believe that the next generation of search systems will see a shift towards proactive understanding of user intent based on analysis of user activities. Such a proactive search system could start recommending documents that are likely to help users accomplish their tasks without requiring them to explicitly submit queries to the system. We propose a framework to evaluate such a search system. The key idea behind our proposed metric is to aggregate a correlation measure over a search session between the expected outcome, which in this case refers to the list of documents retrieved with a true user query, and the predicted outcome, which refers to the list of documents recommended by a proactive search system. Experiments on the AOL query log data show that the ranking of two sample proactive IR systems induced by our metric conforms to the expected ranking between these systems.

cross language evaluation forum | 2018

Evaluation of Personalised Information Retrieval at CLEF 2018 (PIR-CLEF)

Gabriella Pasi; Gareth J. F. Jones; Keith Curtis; Stefania Marrara; Camilla Sanvitto; Debasis Ganguly; Procheta Sen

The series of Personalised Information Retrieval (PIR-CLEF) Labs at CLEF is intended as a forum for the exploration of methodologies for the repeatable evaluation of personalised information retrieval (PIR). The PIR-CLEF 2018 Lab is the first full edition of this series after the successful pilot edition at CLEF 2017, and provides a Lab task dedicated to personalised search, while the workshop at the conference will form the basis of further discussion of strategies for the evaluation of PIR and suggestions for improving the activities of the PIR-CLEF Lab. The PIR-CLEF 2018 Task is the first PIR evaluation benchmark based on the Cranfield paradigm, with the potential benefits of producing evaluation results that are easily reproducible. The task is based on search sessions over a subset of the ClueWeb12 collection, undertaken by volunteer searchers using a methodology developed in the CLEF 2017 pilot edition of PIR-CLEF. The PIR-CLEF test collection provides a detailed set of data gathered during the activities undertaken by each subject during the search sessions, including their search queries and details of relevant documents as marked by the searchers. The PIR-CLEF 2018 workshop is intended to review the design and construction of the collection, and to consider the topic of reproducible evaluation of PIR more generally with the aim of improving future editions of the evaluation benchmark.

conference on information and knowledge management | 2018

Using Word Embeddings for Information Retrieval: How Collection and Term Normalization Choices Affect Performance

Dwaipayan Roy; Debasis Ganguly; Sumit Bhatia; Srikanta Bedathur; Mandar Mitra

Neural word embedding approaches, due to their ability to capture semantic meanings of vocabulary terms, have recently gained attention of the information retrieval (IR) community and have shown promising results in improving ad hoc retrieval performance. It has been observed that these approaches are sensitive to various choices made during the learning of word embeddings and their usage, often leading to poor reproducibility. We study the effect of varying following two parameters, viz., i) the term normalization and ii) the choice of training collection, on ad hoc retrieval performance with word2vec and fastText embeddings. We present quantitative estimates of similarity of word vectors obtained under different settings, and use embeddings based query expansion task to understand the effects of these parameters on IR effectiveness.

WWW '18 Companion of the The Web Conference 2018 on The Web Conference 2018 | 2018

WWW'18 Workshop on Exploitation of Social Media for Emergency Relief and Preparedness: Chairs' Welcome & Organization

Marie-Francine Moens; Gareth J. F. Jones; Saptarshi Ghosh; Debasis Ganguly; Tanmoy Chakraborty; Kripabandhu Ghosh

User-generated content on online social media (OSM) platforms has become an important source of real-time information during emergency events. The SMERP workshop series aims to provide a forum for researchers working on utilizing OSM for emergency preparedness and aiding post-emergency relief operations. The workshop aims to bring together researchers from diverse fields - Information Retrieval, Data Mining and Machine Learning, Natural Language Processing, Social Network Analysis, Computational Social Science, Human Computer Interaction - who can potentially contribute to utilizing social media for emergency relief and preparedness. The first SMERP workshop was held in April 2017 in conjunction with the ECIR 2017 conference. This 2nd SMERP Workshop with The Web Conference 2018 includes two keynote talks, a peer-reviewed research paper track, and a panel discussion.

Pattern Recognition Letters | 2018

A Fast Partitional Clustering Algorithm based on Nearest Neighbours Heuristics

Debasis Ganguly

Abstract K-means, along with its several other variants, is the most widely used family of partitional clustering algorithms. Generally speaking, this family of algorithm starts by initializing a number of data points as cluster centres, and then iteratively refines these cluster centres based on the current partition of the dataset. Given a set of cluster centres, inducing the partition over the dataset involves finding the nearest (or most similar) cluster centre for each data point, which is an O ( NK ) operation, N and K being the number of data points and the number of clusters, respectively. In our proposed approach, we avoid the explicit computation of these distances for the case of sparse vectors, e.g. documents, by utilizing a fundamental operation, namely TOP ( x ), which gives a list of the top most similar vectors with respect to the vector x . A standard way to store sparse vectors and retrieve the top most similar ones given a query vector, is with the help of the inverted list data structure. In our proposed method, we use the TOP ( x ) function to first select cluster centres that are likely to be dissimilar to each other. Secondly, to obtain the partition during each iteration of K-means, we avoid the explicit computation of the pair-wise similarities between the centroid and the non-centroid vectors. Thirdly, we avoid recomputation of the cluster centroids by adopting a centrality based heuristic. We demonstrate the effectiveness of our proposed algorithm on TREC-2011 Microblog dataset, a large collection of about 14 M tweets. Our experiments demonstrate that our proposed method is about 35x faster and produces more effective clusters in comparison to the standard K-means algorithm.

Information Systems Frontiers | 2018

An Embedding Based IR Model for Disaster Situations

Ayan Bandyopadhyay; Debasis Ganguly; Mandar Mitra; Sanjoy Kumar Saha; Gareth J. F. Jones

Twitter (http://twitter.com) is one of the most popular social networking platforms. Twitter users can easily broadcast disaster-specific information, which, if effectively mined, can assist in relief operations. However, the brevity and informal nature of tweets pose a challenge to Information Retrieval (IR) researchers. In this paper, we successfully use word embedding techniques to improve ranking for ad-hoc queries on microblog data. Our experiments with the ‘Social Media for Emergency Relief and Preparedness’ (SMERP) dataset provided at an ECIR 2017 workshop show that these techniques outperform conventional term-matching based IR models. In addition, we show that, for the SMERP task, our word embedding based method is more effective if the embeddings are generated from the disaster specific SMERP data, than when they are trained on the large social media collection provided for the TREC (http://trec.nist.gov/) 2011 Microblog track dataset.

Information Systems Frontiers | 2018

Exploitation of Social Media for Emergency Relief and Preparedness: Recent Research and Trends

Saptarshi Ghosh; Kripabandhu Ghosh; Debasis Ganguly; Tanmoy Chakraborty; Gareth J. F. Jones; Marie-Francine Moens; Muhammad Imran

Online Social Media, such as Twitter, Facebook and WhatsApp, are important sources of real-time information related to emergency events, including both natural calamities, man-made disasters, epidemics, and so on. There has been lot of recent work on designing information systems that would be useful for aiding post-disaster relief operations, as well as for pre-disaster preparedness. A special issue on “Exploitation of Social Media for Emergency Relief and Preparedness” was conducted for the journal Information Systems Frontiers. The objective of this special issue was to present a platform for dissemination of the empirical results of various technologies for extracting vital and actionable information from social media content in disaster situations. The papers included in this issue are expected to be the stepping stones for future explorations and technical innovations towards technologies meant for utilizing various online and offline information sources for enhancing pre-disaster preparedness and post-disaster relief operations.

Information Retrieval Journal | 2018

A non-parametric topical relevance model

Debasis Ganguly; Gareth J. F. Jones

An information retrieval (IR) system can often fail to retrieve relevant documents due to the incomplete specification of information need in the user’s query. Pseudo-relevance feedback (PRF) aims to improve IR effectiveness by exploiting potentially relevant aspects of the information need present in the documents retrieved in an initial search. Standard PRF approaches utilize the information contained in these top ranked documents from the initial search with the assumption that documents as a whole are relevant to the information need. However, in practice, documents are often multi-topical where only a portion of the documents may be relevant to the query. In this situation, exploitation of the topical composition of the top ranked documents, estimated with statistical topic modeling based approaches, can potentially be a useful cue to improve PRF effectiveness. The key idea behind our PRF method is to use the term-topic and the document-topic distributions obtained from topic modeling over the set of top ranked documents to re-rank the initially retrieved documents. The objective is to improve the ranks of documents that are primarily composed of the relevant topics expressed in the information need of the query. Our RF model can further be improved by making use of non-parametric topic modeling, where the number of topics can grow according to the document contents, thus giving the RF model the capability to adjust the number of topics based on the content of the top ranked documents. We empirically validate our topic model based RF approach on two document collections of diverse length and topical composition characteristics: (1) ad-hoc retrieval using the TREC 6-8 and the TREC Robust ’04 dataset, and (2) tweet retrieval using the TREC Microblog ’11 dataset. Results indicate that our proposed approach increases MAP by up to 9% in comparison to the results obtained with an LDA based language model (for initial retrieval) coupled with the relevance model (for feedback). Moreover, the non-parametric version of our proposed approach is shown to be more effective than its parametric counterpart due to its advantage of adapting the number of topics, improving results by up to 5.6% of MAP compared to the parametric version.

Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW '18 | 2018

Contextual Word Embedding: A Case Study in Clustering Tweets about Emergency Situations.

Debasis Ganguly; Kripabandhu Ghosh

Effective clustering of short documents, such as tweets, is difficult because of the lack of sufficient semantic context. Word embedding is a technique that is effective in addressing this lack of semantic context. However, the process of word vector embedding, in turn, relies on the availability of sufficient contexts to learn the word associations. To get around this problem, we propose a novel word vector training approach that leverages topically similar tweets to better learn the word associations. We test our proposed word embedding approach by clustering a collection of tweets on disasters. We observe that the proposed method improves clustering effectiveness by up to 14%.

Explore More