Senjuti Basu Roy | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Senjuti Basu Roy is active.

Explore More

Publication

Featured researches published by Senjuti Basu Roy.

very large data bases | 2009

Group recommendation: semantics and efficiency

Sihem Amer-Yahia; Senjuti Basu Roy; Ashish Chawlat; Gautam Das; Cong Yu

We study the problem of group recommendation. Recommendation is an important information exploration paradigm that retrieves interesting items for users based on their profiles and past activities. Single user recommendation has received significant attention in the past due to its extensive use in Amazon and Netflix. How to recommend to a group of users who may or may not share similar tastes, however, is still an open problem. The need for group recommendation arises in many scenarios: a movie for friends to watch together, a travel destination for a family to spend a holiday break, and a good restaurant for colleagues to have a working lunch. Intuitively, items that are ideal for recommendation to a group may be quite different from those for individual members. In this paper, we analyze the desiderata of group recommendation and propose a formal semantics that accounts for both item relevance to a group and disagreements among group members. We design and implement algorithms for efficiently computing group recommendations. We evaluate our group recommendation method through a comprehensive user study conducted on Amazon Mechanical Turk and demonstrate that incorporating disagreements is critical to the effectiveness of group recommendation. We further evaluate the efficiency and scalability of our algorithms on the MovieLens data set with 10M ratings.

conference on information and knowledge management | 2008

Minimum-effort driven dynamic faceted search in structured databases

Senjuti Basu Roy; Haidong Wang; Gautam Das; Ullas Nambiar; Mukesh K. Mohania

In this paper, we propose minimum-effort driven navigational techniques for enterprise database systems based on the faceted search paradigm. Our proposed techniques dynamically suggest facets for drilling down into the database such that the cost of navigation is minimized. At every step, the system asks the user a question or a set of questions on different facets and depending on the user response, dynamically fetches the next most promising set of facets, and the process repeats. Facets are selected based on their ability to rapidly drill down to the most promising tuples, as well as on the ability of the user to provide desired values for them. Our facet selection algorithms also work in conjunction with any ranked retrieval model where a ranking function imposes a bias over the user preferences for the selected tuples. Our methods are principled as well as efficient, and our experimental study validates their effectiveness on several application scenarios.

international world wide web conferences | 2010

Facetedpedia: dynamic generation of query-dependent faceted interfaces for wikipedia

Chengkai Li; Ning Yan; Senjuti Basu Roy; Lekhendro Lisham; Gautam Das

This paper proposes Facetedpedia, a faceted retrieval system for information discovery and exploration in Wikipedia. Given the set of Wikipedia articles resulting from a keyword query, Facetedpedia generates a faceted interface for navigating the result articles. Compared with other faceted retrieval systems, Facetedpedia is fully automatic and dynamic in both facet generation and hierarchy construction, and the facets are based on the rich semantic information from Wikipedia. The essence of our approach is to build upon the collaborative vocabulary in Wikipedia, more specifically the intensive internal structures (hyperlinks) and folksonomy (category system). Given the sheer size and complexity of this corpus, the space of possible choices of faceted interfaces is prohibitively large. We propose metrics for ranking individual facet hierarchies by users navigational cost, and metrics for ranking interfaces (each with k facets) by both their average pairwise similarities and average navigational costs. We thus develop faceted interface discovery algorithms that optimize the ranking metrics. Our experimental evaluation and user study verify the effectiveness of the system.

international conference on management of data | 2010

Constructing and exploring composite items

Senjuti Basu Roy; Sihem Amer-Yahia; Ashish Chawla; Gautam Das; Cong Yu

Nowadays, online shopping has become a daily activity. Web users purchase a variety of items ranging from books to electronics. The large supply of online products calls for sophisticated techniques to help users explore available items. We propose to build composite items which associate a central item with a set of packages, formed by satellite items, and help users explore them. For example, a user shopping for an iPhone (i.e., the central item) with a price budget can be presented with both the iPhone and a package of other items that match well with the iPhone (e.g., {Belkin case, Bose sounddock, Kroo USB cable}) as a composite item, whose total price is within the users budget. We define and study the problem of effective construction and exploration of large sets of packages associated with a central item, and design and implement efficient algorithms for solving the problem in two stages: summarization, a technique which picks k representative packages for each central item; and visual effect optimization, which helps the user find diverse composite items quickly by minimizing overlap between packages presented to the user in a ranked order. We conduct an extensive set of experiments on Yahoo! Shopping1 data sets to demonstrate the efficiency and effectiveness of our algorithms.

international conference on data engineering | 2011

Interactive itinerary planning

Senjuti Basu Roy; Gautam Das; Sihem Amer-Yahia; Cong Yu

Planning an itinerary when traveling to a city involves substantial effort in choosing Points-of-Interest (POIs), deciding in which order to visit them, and accounting for the time it takes to visit each POI and transit between them. Several online services address different aspects of itinerary planning but none of them provides an interactive interface where users give feedbacks and iteratively construct their itineraries based on personal interests and time budget. In this paper, we formalize interactive itinerary planning as an iterative process where, at each step: (1) the user provides feedback on POIs selected by the system, (2) the system recommends the best itineraries based on all feedback so far, and (3) the system further selects a new set of POIs, with optimal utility, to solicit feedback for, at the next step. This iterative process stops when the user is satisfied with the recommended itinerary. We show that computing an itinerary is NP-complete even for simple itinerary scoring functions, and that POI selection is NP-complete. We develop heuristics and optimizations for a specific case where the score of an itinerary is proportional to the number of desired POIs it contains. Our extensive experiments show that our algorithms are efficient and return high quality itineraries.

very large data bases | 2015

Task assignment optimization in knowledge-intensive crowdsourcing

Senjuti Basu Roy; Ioanna Lykourentzou; Saravanan Thirumuruganathan; Sihem Amer-Yahia; Gautam Das

We present SmartCrowd, a framework for optimizing task assignment in knowledge-intensive crowdsourcing (KI-C). SmartCrowd distinguishes itself by formulating, for the first time, the problem of worker-to-task assignment in KI-C as an optimization problem, by proposing efficient adaptive algorithms to solve it and by accounting for human factors, such as worker expertise, wage requirements, and availability inside the optimization process. We present rigorous theoretical analyses of the task assignment optimization problem and propose optimal and approximation algorithms with guarantees, which rely on index pre-computation and adaptive maintenance. We perform extensive performance and quality experiments using real and synthetic data to demonstrate that the SmartCrowd approach is necessary to achieve efficient task assignments of high-quality under guaranteed cost budget.

very large data bases | 2013

A probabilistic optimization framework for the empty-answer problem

Davide Mottin; Alice Marascu; Senjuti Basu Roy; Gautam Das; Themis Palpanas; Yannis Velegrakis

We propose a principled optimization-based interactive query relaxation framework for queries that return no answers. Given an initial query that returns an empty answer set, our framework dynamically computes and suggests alternative queries with less conditions than those the user has initially requested, in order to help the user arrive at a query with a non-empty answer, or at a query for which no matter how many additional conditions are ignored, the answer will still be empty. Our proposed approach for suggesting query relaxations is driven by a novel probabilistic framework based on optimizing a wide variety of application-dependent objective functions. We describe optimal and approximate solutions of different optimization problems using the framework. We analyze these solutions, experimentally verify their efficiency and effectiveness, and illustrate their advantage over the existing approaches.

international conference on management of data | 2011

Location-aware type ahead search on spatial databases: semantics and efficiency

Senjuti Basu Roy; Kaushik Chakrabarti

Users often search spatial databases like yellow page data using keywords to find businesses near their current location. Typing the entire query is cumbersome and prone to errors, especially from mobile phones. We address this problem by introducing type-ahead search functionality on spatial databases. Like keyword search on spatial data, type-ahead search needs to be location-aware, i.e., with every letter being typed, it needs to return spatial objects whose names (or descriptions) are valid completions of the query string typed so far, and which rank highest in terms of proximity to the users location and other static scores. Existing solutions for type-ahead search cannot be used directly as they are not location-aware. We show that a straight-forward combination of existing techniques for performing type-ahead search with those for performing proximity search perform poorly. We propose a formal model for query processing cost and develop novel techniques that optimize that cost. Our empirical evaluations on real and synthetic datasets demonstrate the effectiveness of our techniques. To the best of our knowledge, this is the first work on location-aware type-ahead search.

knowledge discovery and data mining | 2013

The Microsoft academic search dataset and KDD Cup 2013

Senjuti Basu Roy; Martine De Cock; Vani Mandava; Swapna Savanna; Brian Dalessandro; Claudia Perlich; William Cukierski; Ben Hamner

KDD Cup 2013 challenged participants to tackle the problem of author name ambiguity in a digital library of scientific publications. The competition consisted of two tracks, which were based on large-scale datasets from a snapshot of Microsoft Academic Search, taken in January 2013 and including 250K authors and 2.5M papers. Participants were asked to determine which papers in an author profile are truly written by a given author (track 1), as well as to identify duplicate author profiles (track 2). Track 1 and track 2 were launched respectively on April 18 and April 20, 2013, with a common final submission deadline on June 12, 2013. For track 1 a training dataset with correct labels was diclosed at the start of the competition. This track was the most popular one, attracting submissions of 561 different teams. Track 2, which was formulated as an unsupervised learning task, received submissions from 241 participants. This paper presents details about the problem definitions, the datasets, the evaluation metrics and the results.

international conference on big data | 2013

Big data solutions for predicting risk-of-readmission for congestive heart failure patients

Kiyana Zolfaghar; Naren Meadem; Ankur Teredesai; Senjuti Basu Roy; Si Chi Chin; Brian Muckian

Developing holistic predictive modeling solutions for risk prediction is extremely challenging in healthcare informatics. Risk prediction involves integration of clinical factors with socio-demographic factors, health conditions, disease parameters, hospital care quality parameters, and a variety of variables specific to each health care provider making the task increasingly complex. Unsurprisingly, many of such factors need to be extracted independently from different sources, and integrated back to improve the quality of predictive modeling. Such sources are typically voluminous, diverse, and vary significantly over the time. Therefore, distributed and parallel computing tools collectively termed big data have to be developed. In this work, we study big data driven solutions to predict the 30-day risk of readmission for congestive heart failure (CHF) incidents. First, we extract useful factors from National Inpatient Dataset (NIS) and augment it with our patient dataset from Multicare Health System (MHS). Then, we develop scalable data mining models to predict risk of readmission using the integrated dataset. We demonstrate the effectiveness and efficiency of the open-source predictive modeling framework we used, describe the results from various modeling algorithms we tested, and compare the performance against baseline non-distributed, non-parallel, non-integrated small data results previously published to demonstrate comparable accuracy over millions of records.

Explore More