Jaliya Ekanayake | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jaliya Ekanayake is active.

Explore More

Publication

Featured researches published by Jaliya Ekanayake.

high performance distributed computing | 2010

Twister: a runtime for iterative MapReduce

Jaliya Ekanayake; Hui Li; Bingjing Zhang; Thilina Gunarathne; Seung-Hee Bae; Judy Qiu; Geoffrey C. Fox

MapReduce programming model has simplified the implementation of many data parallel applications. The simplicity of the programming model and the quality of services provided by many implementations of MapReduce attract a lot of enthusiasm among distributed computing communities. From the years of experience in applying MapReduce to various scientific applications we identified a set of extensions to the programming model and improvements to its architecture that will expand the applicability of MapReduce to more classes of applications. In this paper, we present the programming model and the architecture of Twister an enhanced MapReduce runtime that supports iterative MapReduce computations efficiently. We also show performance comparisons of Twister with other similar runtimes such as Hadoop and DryadLINQ for large scale data parallel applications.

ieee international conference on escience | 2008

MapReduce for Data Intensive Scientific Analyses

Jaliya Ekanayake; Shrideep Pallickara; Geoffrey C. Fox

Most scientific data analyses comprise analyzing voluminous data collected from various instruments. Efficient parallel/concurrent algorithms and frameworks are the key to meeting the scalability and performance requirements entailed in such scientific data analyses. The recently introduced MapReduce technique has gained a lot of attention from the scientific community for its applicability in large parallel data analyses. Although there are many evaluations of the MapReduce technique using large textual data collections, there have been only a few evaluations for scientific data analyses. The goals of this paper are twofold. First, we present our experience in applying the MapReduce technique for two scientific data analyses: (i) high energy physics data analyses; (ii) K-means clustering. Second, we present CGL-MapReduce, a streaming-based MapReduce implementation and compare its performance with Hadoop.

international conference on cloud computing | 2009

High Performance Parallel Computing with Clouds and Cloud Technologies

Jaliya Ekanayake; Geoffrey C. Fox

Infrastructure services (Infrastructure-as-a-service), provided by cloud vendors, allow any user to provision a large number of compute instances fairly easily. Whether leased from public clouds or allocated from private clouds, utilizing these virtual resources to perform data/compute intensive analyses requires employing different parallel runtimes to implement such applications. Among many parallelizable problems, most “pleasingly parallel” applications can be performed using MapReduce technologies such as Hadoop, CGL-MapReduce, and Dryad, in a fairly easy manner. However, many scientific applications, which have complex communication patterns, still require low latency communication mechanisms and rich set of communication constructs offered by runtimes such as MPI. In this paper, we first discuss large scale data analysis using different MapReduce implementations and then, we present a performance analysis of high performance parallel applications on virtualized resources.

BMC Bioinformatics | 2010

Hybrid cloud and cluster computing paradigms for life science applications

Judy Qiu; Jaliya Ekanayake; Thilina Gunarathne; Jong Youl Choi; Seung-Hee Bae; Hui Li; Bingjing Zhang; Tak-Lon Wu; Yang Ruan; Saliya Ekanayake; Adam Hughes; Geoffrey C. Fox

BackgroundClouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister.ResultsComparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications.ConclusionsThe hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications.MethodsWe used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.

international conference on web services | 2006

Axis2, Middleware for Next Generation Web Services

Srinath Perera; Chathura Herath; Jaliya Ekanayake; Eran Chinthaka; Ajith Ranabahu; Deepal Jayasinghe; Sanjiva Weerawarana; Glen Daniels

Axis2, the next generation of Apache Web services middleware, is an effort to re-architecture Apache Web service stack to incorporate the changes in Web services. Among many improvements, Axis2 provides first class messaging and SOAP extension supports together with a novel lightweight streaming based XML processing model. The architecture is build on top of a simple and extensible core that provides the basic abstractions for the rest of the system. We present the design and the thought process behind the key abstractions by breaking down the architecture into three topics, XML processing model, extensible SOAP processing model and messaging framework. This paper explains the overall architecture while concentrating on the three topics, and demonstrate how they all fit together to yield Axis2

international conference on cluster computing | 2009

Granules: A lightweight, streaming runtime for cloud computing with support, for Map-Reduce

Shrideep Pallickara; Jaliya Ekanayake; Geoffrey C. Fox

Cloud computing has gained significant traction in recent years. The Map-Reduce framework is currently the most dominant programming model in cloud computing settings. In this paper, we describe Granules, a lightweight, streaming-based runtime for cloud computing which incorporates support for the Map-Reduce framework. Granules provides rich lifecycle support for developing scientific applications with support for iterative, periodic and data driven semantics for individual computations and pipelines. We describe our support for variants of the Map-Reduce framework. The paper presents a survey of related work in this area. Finally, this paper describes our performance evaluation of various aspects of the system, including (where possible) comparisons with other comparable systems.

international conference on e-science | 2009

DryadLINQ for Scientific Analyses

Jaliya Ekanayake; Thilina Gunarathne; Geoffrey C. Fox; Atilla Soner Balkir; Christophe Poulain; Nelson Araujo; Roger S. Barga

Applying high level parallel runtimes to data/compute intensive applications is becoming increasingly common. The simplicity of the MapReduce programming model and the availability of open source MapReduce runtimes such as Hadoop, are attracting more users to the MapReduce programming model. Recently, Microsoft has released DryadLINQ for academic use, allowing users to experience a new programming model and a runtime that is capable of performing large scale data/compute intensive analyses. In this paper, we present our experience in applying DryadLINQ for a series of scientific data analysis applications, identify their mapping to the DryadLINQ programming model, and compare their performances with Hadoop implementations of the same applications.

ieee international conference on high performance computing data and analytics | 2008

Parallel data mining from multicore to cloudy grids

Geoffrey C. Fox; Seung-Hee Bae; Jaliya Ekanayake; Xiaohong Qiu; Huapeng Yuan

We describe a suite of data mining tools that cover clustering, information retrieval and the mapping of high dimensional data to low dimensions for visualization. Preliminary applications are given to particle physics, bioinformatics and medical informatics. The data vary in dimension from low (220), high (thousands) to undefined (sequences with dissimilarities but not vectors defined). We use deterministic annealing to provide more robust algorithms that are relatively insensitive to local minima. We discuss the algorithm structure and their mapping to parallel architectures of different types and look at the performance of the algorithms on three classes of system; multicore, cluster and Grid using a MapReduce style algorithm. Each approach is suitable in different application scenarios. We stress that data analysis/mining of large datasets can be a supercomputer application.

international conference on cloud computing | 2009

Biomedical Case Studies in Data Intensive Computing

Geoffrey C. Fox; Xiaohong Qiu; Scott Beason; Jong Youl Choi; Jaliya Ekanayake; Thilina Gunarathne; Mina Rho; Haixu Tang; Neil Devadasan; Gilbert C. Liu

Many areas of science are seeing a data deluge coming from new instruments, myriads of sensors and exponential growth in electronic records. We take two examples --- one the analysis of gene sequence data (35339 Alu sequences) and other a study of medical information (over 100,000 patient records) in Indianapolis and their relationship to Geographic and Information System and Census data available for 635 Census Blocks in Indianapolis. We look at initial processing (such as Smith Waterman dissimilarities), clustering (using robust deterministic annealing) and Multi Dimensional Scaling to map high dimension data to 3D for convenient visualization. We show how scaling pipelines can be produced that can be implemented using either cloud technologies or MPI which are compared. This study illustrates challenges in integrating data exploration tools with a variety of different architectural requirements and natural programming models. We present preliminary results for end to end study of two complete applications.

ieee international conference on escience | 2008

An Overview of the Granules Runtime for Cloud Computing

Shrideep Pallickara; Jaliya Ekanayake; Geoffrey C. Fox

In this paper we present a short introduction to the granules system, which is a lightweight streaming-based runtime for cloud computing. This paper provides a summary of the capabilities supported by the runtime.

Explore More