Saravanan Thirumuruganathan

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Saravanan Thirumuruganathan is active.

Explore More

Publication

Featured researches published by Saravanan Thirumuruganathan.

very large data bases | 2015

Task assignment optimization in knowledge-intensive crowdsourcing

Senjuti Basu Roy; Ioanna Lykourentzou; Saravanan Thirumuruganathan; Sihem Amer-Yahia; Gautam Das

We present SmartCrowd, a framework for optimizing task assignment in knowledge-intensive crowdsourcing (KI-C). SmartCrowd distinguishes itself by formulating, for the first time, the problem of worker-to-task assignment in KI-C as an optimization problem, by proposing efficient adaptive algorithms to solve it and by accounting for human factors, such as worker expertise, wage requirements, and availability inside the optimization process. We present rigorous theoretical analyses of the task assignment optimization problem and propose optimal and approximation algorithms with guarantees, which rely on index pre-computation and adaptive maintenance. We perform extensive performance and quality experiments using real and synthetic data to demonstrate that the SmartCrowd approach is necessary to achieve efficient task assignments of high-quality under guaranteed cost budget.

very large data bases | 2015

Worker skill estimation in team-based tasks

Habibur Rahman; Saravanan Thirumuruganathan; Senjuti Basu Roy; Sihem Amer-Yahia; Gautam Das

Many emerging applications such as collaborative editing, multi-player games, or fan-subbing require to form a team of experts to accomplish a task together. Existing research has investigated how to assign workers to such team-based tasks to ensure the best outcome assuming the skills of individual workers to be known. In this work, we investigate how to estimate individual workers skill based on the outcome of the team-based tasks they have undertaken. We consider two popular skill aggregation functions and estimate the skill of the workers, where skill is either a deterministic value or a probability distribution. We propose efficient solutions for worker skill estimation using continuous and discrete optimization techniques. We present comprehensive experiments and validate the scalability and effectiveness of our proposed solutions using multiple real-world datasets.

very large data bases | 2012

Who tags what?: an analysis framework

Mahashweta Das; Saravanan Thirumuruganathan; Sihem Amer-Yahia; Gautam Das; Cong Yu

The rise of Web 2.0 is signaled by sites such as Flickr, del.icio.us, and YouTube, and social tagging is essential to their success. A typical tagging action involves three components, user, item (e.g., photos in Flickr), and tags (i.e., words or phrases). Analyzing how tags are assigned by certain users to certain items has important implications in helping users search for desired information. In this paper, we explore common analysis tasks and propose a dual mining framework for social tagging behavior mining. This framework is centered around two opposing measures, similarity and diversity, being applied to one or more tagging components, and therefore enables a wide range of analysis scenarios such as characterizing similar users tagging diverse items with similar tags, or diverse users tagging similar items with diverse tags, etc. By adopting different concrete measures for similarity and diversity in the framework, we show that a wide range of concrete analysis problems can be defined and they are NP-Complete in general. We design efficient algorithms for solving many of those problems and demonstrate, through comprehensive experiments over real data, that our algorithms significantly out-perform the exact brute-force approach without compromising analysis result quality.

international conference on data mining | 2015

Task Assignment Optimization in Collaborative Crowdsourcing

Habibur Rahman; Senjuti Basu Roy; Saravanan Thirumuruganathan; Sihem Amer-Yahia; Gautam Das

A number of emerging applications, such as, collaborative document editing, sentence translation, and citizen journalism require workers with complementary skills and expertise to form groups and collaborate on complex tasks. While existing research has investigated task assignment for knowledge intensive crowdsourcing, they often ignore the aspect of collaboration among workers, that is central to the success of such tasks. Research in behavioral psychology has indicated that large groups hinder successful collaboration. Taking that into consideration, our work is one of the first to investigate and formalize the notion of collaboration among workers and present theoretical analyses to understand the hardness of optimizing task assignment. We propose efficient approximation algorithms with provable theoretical guarantees and demonstrate the superiority of our algorithms through a comprehensive set of experiments using real-world and synthetic datasets. Finally, we conduct a real world collaborative sentence translation application using Amazon Mechanical Turk that we hope provides a template for evaluating collaborative crowdsourcing tasks in micro-task based crowdsourcing platforms.

very large data bases | 2015

Walk, not wait: faster sampling over online social networks

Azade Nazi; Zhuojie Zhou; Saravanan Thirumuruganathan; Nan Zhang; Gautam Das

In this paper, we introduce a novel, general purpose, technique for faster sampling of nodes over an online social network. Specifically, unlike traditional random walks which wait for the convergence of sampling distribution to a predetermined target distribution - a waiting process that incurs a high query cost - we develop WALK-ESTIMATE, which starts with a much shorter random walk, and then proactively estimate the sampling probability for the node taken before using acceptance-rejection sampling to adjust the sampling probability to the predetermined target distribution. We present a novel backward random walk technique which provides provably unbiased estimations for the sampling probability, and demonstrate the superiority of WALK-ESTIMATE over traditional random walks through theoretical analysis and extensive experiments over real world online social networks.

very large data bases | 2012

MapRat: meaningful explanation, interactive exploration and geo-visualization of collaborative ratings

Saravanan Thirumuruganathan; Mahashweta Das; Shrikant Desai; Sihem Amer-Yahia; Gautam Das; Cong Yu

Collaborative rating sites such as IMDB and Yelp have become rich resources that users consult to form judgments about and choose from among competing items. Most of these sites either provide a plethora of information for users to interpret all by themselves or a simple overall aggregate information. Such aggregates (e.g., average rating over all users who have rated an item, aggregates along pre-defined dimensions, etc.) can not help a user quickly decide the desirability of an item. In this paper, we build a system MapRat that allows a user to explore multiple carefully chosen aggregate analytic details over a set of user demographics that meaningfully explain the ratings associated with item(s) of interest. MapRat allows a user to systematically explore, visualize and understand user rating patterns of input item(s) so as to make an informed decision quickly. In the demo, participants are invited to explore collaborative movie ratings for popular movies.

very large data bases | 2015

Aggregate estimations over location based services

Weimo Liu; Farhadur Rahman; Saravanan Thirumuruganathan; Nan Zhang; Gautam Das

Location based services (LBS) have become very popular in recent years. They range from map services (e.g., Google Maps) that store geographic locations of points of interests, to online social networks (e.g., WeChat, Sina Weibo, FourSquare) that leverage user geographic locations to enable various recommendation functions. The public query interfaces of these services may be abstractly modeled as a kNN interface over a database of two dimensional points on a plane: given an arbitrary query point, the system returns the k points in the database that are nearest to the query point. In this paper we consider the problem of obtaining approximate estimates of SUM and COUNT aggregates by only querying such databases via their restrictive public interfaces. We distinguish between interfaces that return location information of the returned tuples (e.g., Google Maps), and interfaces that do not return location information (e.g., Sina Weibo). For both types of interfaces, we develop aggregate estimation algorithms that are based on novel techniques for precisely computing or approximately estimating the Voronoi cell of tuples. We discuss a comprehensive set of real-world experiments for testing our algorithms, including experiments on Google Maps, WeChat, and Sina Weibo.

very large data bases | 2013

Rank discovery from web databases

Saravanan Thirumuruganathan; Nan Zhang; Gautam Das

Many web databases are only accessible through a proprietary search interface which allows users to form a query by entering the desired values for a few attributes. After receiving a query, the system returns the top-k matching tuples according to a pre-determined ranking function. Since the rank of a tuple largely determines the attention it receives from website users, ranking information for any tuple - not just the top-ranked ones - is often of significant interest to third parties such as sellers, customers, market researchers and investors. In this paper, we define a novel problem of rank discovery over hidden web databases. We introduce a taxonomy of ranking functions, and show that different types of ranking functions require fundamentally different approaches for rank discovery. Our technical contributions include principled and efficient randomized algorithms for estimating the rank of a given tuple, as well as negative results which demonstrate the inefficiency of any deterministic algorithm. We show extensive experimental results over real-world databases, including an online experiment at Amazon.com, which illustrates the effectiveness of our proposed techniques.

very large data bases | 2014

An expressive framework and efficient algorithms for the analysis of collaborative tagging

Mahashweta Das; Saravanan Thirumuruganathan; Sihem Amer-Yahia; Gautam Das; Cong Yu

The rise of Web 2.0 is signaled by sites such as Flickr, del.icio.us, and YouTube, and social tagging is essential to their success. A typical tagging action involves three components, user, item (e.g., photos in Flickr), and tags (i.e., words or phrases). Analyzing how tags are assigned by certain users to certain items has important implications in helping users search for desired information. In this paper, we develop a dual mining framework to explore tagging behavior. This framework is centered around two opposing measures, similarity and diversity, applied to one or more tagging components, and therefore enables a wide range of analysis scenarios such as characterizing similar users tagging diverse items with similar tags or diverse users tagging similar items with diverse tags. By adopting different concrete measures for similarity and diversity in the framework, we show that a wide range of concrete analysis problems can be defined and they are NP-Complete in general. We design four sets of efficient algorithms for solving many of those problems and demonstrate, through comprehensive experiments over real data, that our algorithms significantly out-perform the exact brute-force approach without compromising analysis result quality.

international conference on management of data | 2017

A Cost-based Optimizer for Gradient Descent Optimization

Zoi Kaoudi; Jorge-Arnulfo Quiane-Ruiz; Saravanan Thirumuruganathan; Sanjay Chawla; D. Agrawal

As the use of machine learning (ML) permeates into diverse application domains, there is an urgent need to support a declarative framework for ML. Ideally, a user will specify an ML task in a high-level and easy-to-use language and the framework will invoke the appropriate algorithms and system configurations to execute it. An important observation towards designing such a framework is that many ML tasks can be expressed as mathematical optimization problems, which take a specific form. Furthermore, these optimization problems can be efficiently solved using variations of the gradient descent (GD) algorithm. Thus, to decouple a user specification of an ML task from its execution, a key component is a GD optimizer. We propose a cost-based GD optimizer that selects the best GD plan for a given ML task. To build our optimizer, we introduce a set of abstract operators for expressing GD algorithms and propose a novel approach to estimate the number of iterations a GD algorithm requires to converge. Extensive experiments on real and synthetic datasets show that our optimizer not only chooses the best GD plan but also allows for optimizations that achieve orders of magnitude performance speed-up.

Explore More