Vinay Setty
Max Planck Society
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Vinay Setty.
international middleware conference | 2012
Vinay Setty; Maarten van Steen; Roman Vitenberg; Spyros Voulgaris
We propose PolderCast, a P2P topic-based Pub/Sub system that is (a) fault-tolerant and robust, (b) scalable w.r.t the number of nodes interested in a topic and number of topics that nodes are interested in, and (c) fast in terms of dissemination latency while (d) attaining a low communication overhead. This combination of properties is provided by an implementation that blends deterministic propagation over maintained rings with probabilistic dissemination following a limited number of random shortcuts. The rings are constructed and maintained using gossiping techniques. The random shortcuts are provided by two distinct peer-sampling services: Cyclon generates purely random links while Vicinity produces interest-induced random links. We analyze PolderCast and survey it in the context of existing approaches. We evaluate PolderCast experimentally using real-world workloads from Twitter and Facebook traces. We use widely renowned Scribe [5] as a baseline in a number of experiments. Robustness with respect to node churn is evaluated through traces from the Skype superpeer network. We show that the experimental results corroborate all of the above properties in settings of up to 10K nodes, 10K topics, and 5K topics per-node.
international conference on distributed computing systems | 2014
Vinay Setty; Roman Vitenberg; Gunnar Kreitz; Guido Urdaneta; Maarten van Steen
Publish/subscribe (pub/sub) is a popular communication paradigm in the design of large-scale distributed systems. A fundamental challenge in deploying pub/sub systems on a data center or a cloud infrastructure is efficient and cost-effective resource allocation that would allow delivery of notifications to all subscribers. In this paper, we provide answers to the following three fundamental questions: Given a pub/sub workload, (1) what is the minimum amount of resources needed to satisfy all the subscribers, (2) what is a cost-effective way to allocate resources for the given workload, and (3) what is the cost of hosting it on a public Infrastructure-as-a-Service (IaaS) provider like Amazon EC2. To answer these questions, we formulate a problem coined Minimum Cost Subscriber Satisfaction (MCSS). We prove MCSS to be NP-hard and provide an efficient heuristic solution based on a combination of optimizations. We evaluate the solution experimentally using real traces from Spotify and Twitter along with a pricing model from Amazon. We show the impact of each optimization using a naive solution as the baseline. Using a variety of practical scenarios for each dataset, we also show that our solution scales well for millions of subscribers and runs fast.
web search and data mining | 2017
Vinay Setty; Abhijit Anand; Arunav Mishra; Avishek Anand
We deal with the problem of ranking news events on a daily basis for large news corpora, an essential building block for news aggregation. News ranking has been addressed in the literature before but with individual news articles as the unit of ranking. However, estimating event importance accurately requires models to quantify current day event importance as well as its significance in the historical context. Consequently, in this paper we show that a cluster of news articles representing an event is a better unit of ranking as it provides an improved estimation of popularity, source diversity and authority cues. In addition, events facilitate quantifying their historical significance by linking them with long-running topics and recent chain of events. Our main contribution in this paper is to provide effective models for improved news event ranking. To this end, we propose novel event mining and feature generation approaches for improving estimates of event importance. Finally, we conduct extensive evaluation of our approaches on two large real-world news corpora each of which span for more than a year with a large volume of up to tens of thousands of daily news articles. Our evaluations are large-scale and based on a clean human curated ground-truth from Wikipedia Current Events Portal. Experimental comparison with a state-of-the-art news ranking technique based on language models demonstrates the effectiveness of our approach.
international conference on computer communications | 2014
Vinay Setty; Gunnar Kreitz; Guido Urdaneta; Roman Vitenberg; M.R. van Steen
Publish/subscribe (pub/sub) is a popular communication paradigm in the design of large-scale distributed systems. A provider of a pub/sub service (whether centralized, peer-assisted, or based on a federated organization of cooperatively managed servers) commonly faces a fundamental challenge: given limited resources, how to maximize the satisfaction of subscribers? We provide, to the best of our knowledge, the first formal treatment of this problem by introducing two metrics that capture subscriber satisfaction in the presence of limited resources. This allows us to formulate matters as two new flavors of maximum coverage optimization problems. Unfortunately, both variants of the problem prove to be NP-hard. By subsequently providing formal approximation bounds and heuristics, we show, however, that efficient approximations can be attained. We validate our approach using real-world traces from Spotify and show that our solutions can be executed periodically in real-time in order to adapt to workload variations.
web science | 2016
Hang Zhang; Vinay Setty
Use of social media platforms to express opinion and discuss various topics has been increasingly popular. Consequently, huge volume of social media data is generated by users across all these platforms, e.g. users comment on a variety of content items such as news articles, videos, images on social media. These comments are often noisy and sparse, therefore, identifying sub-topics within them to explore social media is a challenge. In this paper, we develop an effective way to distill sub-topics from all the comments related to a textual query and apply two different diversification techniques to select comments. We conduct experiments to validate our idea using seven years of Reddit comments and news events from Wikipedia Current Events Portal as queries.
international conference on distributed computing systems | 2016
Md. Yusuf Sarwar Uddin; Vinay Setty; Ye Zhao; Roman Vitenberg; Nalini Venkatasubramanian
In recent years, notification services for social networks, mobile apps, messaging systems and other electronic services have become truly ubiquitous. When a new content becomes available, the service sends an instant notification to the user. When the content is produced in massive quantities, and it includes both large-size media and a lot of meta-information, it gives rise to a major challenge of selecting content to notify about and information to include in such notifications. We tackle three important challenges in realizing rich notification delivery: (1) content and presentation utility modeling, (2) notification selection and (3) scheduling of delivery. We consider a number of progressive presentation levels for the content. Since utility is subjective and hard to model, we rely on real data and user surveys. We model the content utility by learning from large-scale real world data collected from Spotify music streaming service. For the utility of the presentation levels we rely on user surveys. Blending these two techniques together, we derive utility of notifications with different presentation levels. We then model the selection and delivery of rich notifications as an optimization problem with a goal to maximize the utility of notifications under resource budget constraints. We validate our system with large-scale simulations driven by the real-world de-identified traces obtained from Spotify. With the help of several baseline approaches we show that our solution is adaptive and resource efficient.
distributed event-based systems | 2014
Nils Peder Korsveien; Vinay Setty; Roman Vitenberg
We propose a tool for visualizing a variety of performance metrics in topic-based publish/subscribe systems, ranging from dissemination of publications to overlay properties. The tool can be used for gaining insight into the system performance and for comparing different pub/sub systems.
web science | 2015
Jaspreet Singh; Abhijit Anand; Vinay Setty; Avishek Anand
A significant portion of todays news articles are part of long running stories. To better understand the context of these stories journalists, social scientists and other scholars use news collections to find temporal and topical insights. However these insights are devoid of user impressions, derived from click-through data and query logs, and are only reliable if the collection is complete and consistent. In this work we introduce the notion of combining user impressions from Wikipedia with news collection based insights for long running news story exploration and outline promising new research directions. We also demonstrate our initial attempts with a prototype system called NewsEX.
very large data bases | 2018
Christian Aebeloe; Gabriela Montoya; Vinay Setty; Katja Hose
Vast amounts of world knowledge is now accessible through Knowledge Graphs (KGs) in RDF format and can be queried using SPARQL. Yet, finding paths between nodes in such graphs is not part of the official SPARQL 1.1 standard; only the simpler functionality of checking reachability is supported, i.e., assessing whether two nodes are connected based on certain conditions formalized as property paths but without providing information on how they are actually connected. To close this gap of functionality, we present JEDI, a system that extends a popular SPARQL engine, Jena, with the ability to compute paths connecting entities in a KG. JEDI shows the k most relevant results to the user where relevance is assessed as a trade-off between path length and diversification of the intermediate nodes in the path. Furthermore, our solution is not limited to a single property path pattern but supports queries containing multiple property path patterns. While JEDI supports arbitrary KGs, for demonstration purposes some predefined KGs, such as YAGO and DBLP, will be used. PVLDB Reference Format: Christian Aebeloe, Gabriela Montoya, Vinay Setty, Katja Hose. Discovering Diversified Paths in Knowledge Bases. PVLDB, 11 (12): 2002-2005, 2018. DOI: https://doi.org/10.14778/3229863.3236245
international acm sigir conference on research and development in information retrieval | 2018
Vinay Setty; Katja Hose
Representation of news events as latent feature vectors is essential for several tasks, such as news recommendation, news event linking, etc. However, representations proposed in the past fail to capture the complex network structure of news events. In this paper we propose Event2Vec, a novel way to learn latent feature vectors for news events using a network. We use recently proposed network embedding techniques, which are proven to be very effective for various prediction tasks in networks. As events involve different classes of nodes, such as named entities, temporal information, etc, general purpose network embeddings are agnostic to event semantics. To address this problem, we propose biased random walks that are tailored to capture the neighborhoods of news events in event networks. We then show that these learned embeddings are effective for news event recommendation and news event linking tasks using strong baselines, such as vanilla Node2Vec, and other state-of-the-art graph-based event ranking techniques.