Debapriyo Majumdar | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Debapriyo Majumdar is active.

Explore More

Publication

Featured researches published by Debapriyo Majumdar.

very large data bases | 2008

TopX: efficient and versatile top-k query processing for semistructured data

Martin Theobald; Holger Bast; Debapriyo Majumdar; Ralf Schenkel; Gerhard Weikum

Recent IR extensions to XML query languages such as Xpath 1.0 Full-Text or the NEXI query language of the INEX benchmark series reflect the emerging interest in IR-style ranked retrieval over semistructured data. TopX is a top-k retrieval engine for text and semistructured data. It terminates query execution as soon as it can safely determine the k top-ranked result elements according to a monotonic score aggregation function with respect to a multidimensional query. It efficiently supports vague search on both content- and structure-oriented query conditions for dynamic query relaxation with controllable influence on the result ranking. The main contributions of this paper unfold into four main points: (1) fully implemented models and algorithms for ranked XML retrieval with XPath Full-Text functionality, (2) efficient and effective top-k query processing for semistructured data, (3) support for integrating thesauri and ontologies with statistically quantified relationships among concepts, leveraged for word-sense disambiguation and query expansion, and (4) a comprehensive description of the TopX system, with performance experiments on large-scale corpora like TREC Terabyte and INEX Wikipedia.

international acm sigir conference on research and development in information retrieval | 2005

Why spectral retrieval works

Holger Bast; Debapriyo Majumdar

We argue that the ability to identify pairs of related terms is at the heart of what makes spectral retrieval work in practice. Schemes such as latent semantic indexing (LSI) and its descendants have this ability in the sense that they can be viewed as computing a matrix of term-term relatedness scores which is then used to expand the given documents (not the queries). For almost all existing spectral retrieval schemes, this matrix of relatedness scores depends on a fixed low-dimensional subspace of the original term space. We instead vary the dimension and study for each term pair the resultin curve of relatedness scores. We find that it is actually the shape of this curve which is indicative for the term-pair relatedness, and not any of the individual relatedness scores on the curve. We derive two simple, parameterless algorithms that detect this shape and that consistently outperform previous methods on a number of test collections. Our curves also shed light on the effectiveness of three fundamental types of variations of the basic LSI scheme.

EWMF'05/KDO'05 Proceedings of the 2005 joint international conference on Semantics, Web and Mining | 2005

Discovering a term taxonomy from term similarities using principal component analysis

Holger Bast; Georges Dupret; Debapriyo Majumdar; Benjamin Piwowarski

We show that eigenvector decomposition can be used to extract a term taxonomy from a given collection of text documents. So far, methods based on eigenvector decomposition, such as latent semantic indexing (LSI) or principal component analysis (PCA), were only known to be useful for extracting symmetric relations between terms. We give a precise mathematical criterion for distinguishing between four kinds of relations of a given pair of terms of a given collection: unrelated (car – fruit), symmetrically related (car – automobile), asymmetrically related with the first term being more specific than the second (banana – fruit), and asymmetrically related in the other direction (fruit – banana). We give theoretical evidence for the soundness of our criterion, by showing that in a simplified mathematical model the criterion does the apparently right thing. We applied our scheme to the reconstruction of a selected part of the open directory project (ODP) hierarchy, with promising results.

computing and combinatorics conference | 2006

Sequences characterizing k -trees

Zvi Lotker; Debapriyo Majumdar; N. S. Narayanaswamy; Ingmar Weber

A non-decreasing sequence of n integers is the degree sequence of a 1-tree (i.e., an ordinary tree) on n vertices if and only if there are least two 1’s in the sequence, and the sum of the elements is 2(n–1). We generalize this result in the following ways. First, a natural generalization of this statement is a necessary condition for k-trees, and we show that it is not sufficient for any k > 1. Second, we identify non-trivial sufficient conditions for the degree sequences of 2-trees. We also show that these sufficient conditions are almost necessary using bounds on the partition function p(n) and probabilistic methods. Third, we generalize the characterization of degrees of 1-trees in an elegant and counter-intuitive way to yield integer sequences that characterize k-trees, for all k.

very large data bases | 2006

IO-Top-k: index-access optimized top-k query processing

Holger Bast; Debapriyo Majumdar; Ralf Schenkel; Martin Theobald; Gerhard Weikum

Untitled Event | 2006

IO-Top-k: Index-Access Optimized Top-k Query Processing

Holger Bast; Debapriyo Majumdar; Ralf Schenkel; Martin Theobald; Gerhard Weikum; Umeshwar Dayal; Kyu-Young Whang; David B. Lomet; Gustavo Alonso; Guy M. Lohman; Martin L. Kersten; Sang Kyun Cha; Young-Kuk Kim

Lecture Notes in Computer Science | 2006

Discovering a Term Taxonomy from Term Similarities Using Principal Component Analysis

Holger Bast; Georges Dupret; Debapriyo Majumdar; Benjamin Piwowarski; Markus Ackermann; Bettina Berendt; Marko Grobelnik; Andreas Hotho; Dunja Mladenic; Giovanni Semeraro; Myra Spiliopoulou; Gerd Stumme; Vojtech Svátek; Maarten van Someren

Untitled Event | 2006