Chamikara Jayalath
Purdue University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chamikara Jayalath.
IEEE Transactions on Computers | 2014
Chamikara Jayalath; Julian James Stephen; Patrick Eugster
Efficiently analyzing big data is a major issue in our current era. Examples of analysis tasks include identification or detection of global weather patterns, economic changes, social phenomena, or epidemics. The cloud computing paradigm along with software tools such as implementations of the popular MapReduce framework offer a response to the problem by distributing computations among large sets of nodes. In many scenarios, input data are, however, geographically distributed (geodistributed) across data centers, and straightforwardly moving all data to a single data center before processing it can be prohibitively expensive. Above-mentioned tools are designed to work within a single cluster or data center and perform poorly or not at all when deployed across data centers. This paper deals with executing sequences of MapReduce jobs on geo-distributed data sets. We analyze possible ways of executing such jobs, and propose data transformation graphs that can be used to determine schedules for job sequences which are optimized either with respect to execution time or monetary cost. We introduce G-MR, a system for executing such job sequences, which implements our optimization framework. We present empirical evidence in Amazon EC2 and VICCI of the benefits of G-MR over common, naïve deployments for processing geodistributed data sets. Our evaluations show that using G-MR significantly improves processing time and cost for geodistributed data sets.
ACM Transactions on Computer Systems | 2013
K. R. Jayaram; Patrick Eugster; Chamikara Jayalath
Content-based publish/subscribe (CPS) is an appealing abstraction for building scalable distributed systems, e.g., message boards, intrusion detectors, or algorithmic stock trading platforms. Recently, CPS extensions have been proposed for location-based services like vehicular networks, mobile social networking, and so on. Although current CPS middleware systems are dynamic in the way they support the joining and leaving of publishers and subscribers, they fall short in supporting subscription adaptations. These are becoming increasingly important across many CPS applications. In algorithmic high frequency trading, for instance, stock price thresholds that are of interest to a trader change rapidly, and gains directly hinge on the reaction time to relevant fluctuations rather than fixed values. In location-aware applications, a subscription is a function of the subscriber location (e.g. GPS coordinates), which inherently changes during motion. The common solution for adapting a subscription consists of a resubscription, where a new subscription is issued and the superseded one canceled. This incurs substantial overhead in CPS middleware systems, and leads to missed or duplicated events during the transition. In this article, we explore the concept of parametric subscriptions for capturing subscription adaptations. We discuss desirable and feasible guarantees for corresponding support, and propose novel algorithms for updating routing mechanisms effectively and efficiently in classic decentralized CPS broker overlay networks. Compared to resubscriptions, our algorithms significantly improve the reaction time to subscription updates without hampering throughput or latency under high update rates. We also propose and evaluate approximation techniques to detect and mitigate pathological cases of high frequency subscription oscillations, which could significantly decrease the throughput of CPS systems thereby affecting other subscribers. We analyze the benefits of our support through implementations of our algorithms in two CPS systems, and by evaluating our algorithms on two different application scenarios.
acm ifip usenix international conference on middleware | 2010
K. R. Jayaram; Chamikara Jayalath; Patrick Eugster
Subscription adaptations are becoming increasingly important across many content-based publish/subscribe (CPS) applications. In algorithmic high frequency trading, for instance, stock price thresholds that are of interest to a trader change rapidly, and gains directly hinge on the reaction time to relevant fluctuations. The common solution to adapt a subscription consists of a re-subscription, where a new subscription is issued and the superseded one canceled. This is ineffective, leading to missed or duplicate events during the transition. In this paper, we introduce the concept of parametric subscriptions to support subscription adaptations. We propose novel algorithms for updating routing mechanisms effectively and efficiently in classic CPS broker overlay networks. Compared to re-subscriptions, our algorithms significantly improve the reaction time to subscription updates and can sustain higher throughput in the presence of high update rates. We convey our claims through implementations of our algorithms in two CPS systems, and by evaluating them on two different real-world applications.
ieee international conference on cloud computing technology and science | 2014
Chamikara Jayalath; Julian James Stephen; Patrick Eugster
Integration of applications, data-centers, and programming abstractions in the cloud-of-clouds poses many challenges to system engineers. Different cloud providers offer different communication abstractions, and applications exhibit different communication patterns. By abstracting from hardware addresses and lower-level communication, the publish/subscribe paradigm seems like an adequate abstraction for supporting communication across clouds, as it supports many-to-many communication between publishers and subscribers, of which one-to-one or one-to-many can be viewed as special cases. In particular, content-based publish/subscribe (CPS) systems provide an expressive abstraction that matches well with the key-value pair model of many established cloud storage and computing systems, and decentralized overlay-based CPS implementations scale up well. However, CPS systems perform poorly at small scale, e.g., one-to-one or one-to-many communication. This holds especially for multi-send scenarios which we refer to as entourages that may range from a channel between a publisher and a single subscriber to a broadcast between a publisher and a handful of subscribers. These scenarios are common in cloud computing, where cheap hardware is exploited for parallelism (efficiency) and redundancy (fault-tolerance). With CPS, multi-send messages go over several hops before their destinations are even identified via predicate matching, resulting in increased latency, especially when destinations are located in different data-centers or zones. Topic-based publish/subscribe (TPS) systems support communication at small scale more efficiently, but still route messages over multiple hops and inversely lack the flexibility of CPS systems. In this paper, we propose CPS protocols for cloud-of-clouds communication that can dynamically identify entourages of publishers and corresponding subscribers. Our CPS protocols dynamically connect the publishers with their entourages through überlays . These überlays can transmit messages from a publisher to its corresponding subscribers with low latency. Our experiments show that our protocols make CPS abstraction viable and beneficial for many applications. We introduce a CPS system named Atmosphere that leverages out CPS protocols and illustrate how Atmosphere has allowed us to implement, with little effort, versions of the popular HDFS and ZooKeeper systems which operate efficiently across data-centers.
international conference on distributed computing systems | 2013
Chamikara Jayalath; Patrick Eugster
Big data processing undoubtedly represents a major challenge of this era. While several programming models and supporting systems have been proposed to deal with such data in so-called “cloud” infrastructures, they all exhibit the same limitation: all data is assumed to be located in one datacenter. This limitation results from cloud vendors promoting the abstraction of omnipresent computing and storage resources. When dealing with data distributed across datacenters, programmers currently have two options: (1) copying all data to a single datacenter easily becomes tedious if done manually as the original dataset is updated, leads to repetitive copying if performed as part of a program, and is sometimes impossible; (2) writing multiple variants of the same program, with consolidation occurring at different points varying by characteristics of the task (e.g., input sub-dataset sizes) is laborious and does not help determining the most appropriate one for a given run. This paper introduces geo-distributed data structures and operations for expressing data processing tasks taking place across datacenters. We describe the design and implementation of such data structures and operations for the PigLatin language. We illustrate the performance benefits of our geodistributed data structures and operations through several benchmarks, showing up to 2× faster response times.
international conference on computer communications | 2015
William Culhane; Kirill Kogan; Chamikara Jayalath; Patrick Eugster
Aggregation of computed sets of results fundamentally underlies the distillation of information in many of todays big data applications. To this end there are many systems which have been introduced which allow users to obtain aggregate results by aggregating along communication structures such as trees, but they do not focus on optimizing performance by optimizing the underlying structure to perform the aggregation. We consider two cases of the problem - aggregation of (1) single blocks of data, and of (2) streaming input. For each case we determine which metric of “fast” completion is the most relevant and mathematically model resulting systems based on aggregation trees to optimize that metric. Our assumptions and model are laid out in depth. From our model we determine how to create a provably ideal aggregation tree (i.e., with optimal fanin) using only limited information about the aggregation function being applied. Experiments in the Amazon Elastic Compute Cloud (EC2) confirm the validatity of our models in practice.
international middleware conference | 2013
Chamikara Jayalath; Julian James Stephen; Patrick Eugster
As demonstrated by the emergence of paradigms like fog computing [1] or cloud-of-clouds [2], the landscape of third-party computation is moving beyond straightforward single datacenter-based cloud computing. However, building applications that execute efficiently across data-centers and clouds is tedious due to the variety of communication abstractions provided, and variations in latencies within and between datacenters.
Middleware(ODP) | 2010
K. R. Jayaram; Chamikara Jayalath; Patrick Eugster
ieee international conference on cloud computing technology and science | 2014
William Culhane; Kirill Kogan; Chamikara Jayalath; Patrick Eugster
IEEE Computer | 2017
Patrick Eugster; Chamikara Jayalath; Kirill Kogan; Julian James Stephen