Thilina Gunarathne | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thilina Gunarathne is active.

Explore More

Publication

Featured researches published by Thilina Gunarathne.

high performance distributed computing | 2010

Twister: a runtime for iterative MapReduce

Jaliya Ekanayake; Hui Li; Bingjing Zhang; Thilina Gunarathne; Seung-Hee Bae; Judy Qiu; Geoffrey C. Fox

MapReduce programming model has simplified the implementation of many data parallel applications. The simplicity of the programming model and the quality of services provided by many implementations of MapReduce attract a lot of enthusiasm among distributed computing communities. From the years of experience in applying MapReduce to various scientific applications we identified a set of extensions to the programming model and improvements to its architecture that will expand the applicability of MapReduce to more classes of applications. In this paper, we present the programming model and the architecture of Twister an enhanced MapReduce runtime that supports iterative MapReduce computations efficiently. We also show performance comparisons of Twister with other similar runtimes such as Hadoop and DryadLINQ for large scale data parallel applications.

ieee international conference on cloud computing technology and science | 2010

MapReduce in the Clouds for Science

Thilina Gunarathne; Tak-Lon Wu; Judy Qiu; Geoffrey C. Fox

The utility computing model introduced by cloud computing combined with the rich set of cloud infrastructure services offers a very viable alternative to traditional servers and computing clusters. MapReduce distributed data processing architecture has become the weapon of choice for data-intensive analyses in the clouds and in commodity clusters due to its excellent fault tolerance features, scalability and the ease of use. Currently, there are several options for using MapReduce in cloud environments, such as using MapReduce as a service, setting up one’s own MapReduce cluster on cloud instances, or using specialized cloud MapReduce runtimes that take advantage of cloud infrastructure services. In this paper, we introduce Azure MapReduce, a novel MapReduce runtime built using the Microsoft Azure cloud infrastructure services. Azure MapReduce architecture successfully leverages the high latency, eventually consistent, yet highly scalable Azure infrastructure services to provide an efficient, on demand alternative to traditional MapReduce clusters. Further we evaluate the use and performance of MapReduce frameworks, including Azure MapReduce, in cloud environments for scientific applications using sequence assembly and sequence alignment as use cases.

BMC Bioinformatics | 2010

Hybrid cloud and cluster computing paradigms for life science applications

Judy Qiu; Jaliya Ekanayake; Thilina Gunarathne; Jong Youl Choi; Seung-Hee Bae; Hui Li; Bingjing Zhang; Tak-Lon Wu; Yang Ruan; Saliya Ekanayake; Adam Hughes; Geoffrey C. Fox

BackgroundClouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister.ResultsComparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications.ConclusionsThe hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications.MethodsWe used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.

grid computing environments | 2011

Apache airavata: a framework for distributed applications and computational workflows

Suresh Marru; Lahiru Gunathilake; Chathura Herath; Patanachai Tangchaisin; Marlon E. Pierce; Chris A. Mattmann; Raminder Singh; Thilina Gunarathne; Eran Chinthaka; Ross Gardler; Aleksander Slominski; Ate Douma; Srinath Perera; Sanjiva Weerawarana

In this paper, we introduce Apache Airavata, a software framework to compose, manage, execute, and monitor distributed applications and workflows on computational resources ranging from local resources to computational grids and clouds. Airavata builds on general concepts of service-oriented computing, distributed messaging, and workflow composition and orchestration. This paper discusses the architecture of Airavata and its modules, and illustrates how the software can be used as individual components or as an integrated solution to build science gateways or general-purpose distributed application and workflow management systems.

high performance distributed computing | 2010

Cloud computing paradigms for pleasingly parallel biomedical applications

Thilina Gunarathne; Tak-Lon Wu; Judy Qiu; Geoffrey C. Fox

Cloud computing offers exciting new approaches for scientific computing that leverages the hardware and software investments on large scale data centers by major commercial players. Loosely coupled problems are very important in many scientific fields and are on the rise with the ongoing move towards data intensive computing. There exist several approaches to leverage clouds & cloud oriented data processing frameworks to perform pleasingly parallel computations. In this paper we present two pleasingly parallel biomedical applications, 1) assembly of genome fragments 2) dimension reduction in the analysis of chemical structures, implemented utilizing cloud infrastructure service based utility computing models of Amazon AWS and Microsoft Windows Azure as well as utilizing MapReduce based data processing frameworks, Apache Hadoop and Microsoft DryadLINQ. We review and compare each of the frameworks and perform a comparative study among them based on performance, efficiency, cost and the usability. Cloud service based utility computing model and the managed parallelism (MapReduce) exhibited comparable performance and efficiencies for the applications we considered. We analyze the variations in cost between the different platform choices (eg: EC2 instance types), highlighting the need to select the appropriate platform based on the nature of the computation.

international conference on e-science | 2009

DryadLINQ for Scientific Analyses

Jaliya Ekanayake; Thilina Gunarathne; Geoffrey C. Fox; Atilla Soner Balkir; Christophe Poulain; Nelson Araujo; Roger S. Barga

Applying high level parallel runtimes to data/compute intensive applications is becoming increasingly common. The simplicity of the MapReduce programming model and the availability of open source MapReduce runtimes such as Hadoop, are attracting more users to the MapReduce programming model. Recently, Microsoft has released DryadLINQ for academic use, allowing users to experience a new programming model and a runtime that is capable of performing large scale data/compute intensive analyses. In this paper, we present our experience in applying DryadLINQ for a series of scientific data analysis applications, identify their mapping to the DryadLINQ programming model, and compare their performances with Hadoop implementations of the same applications.

Future Generation Computer Systems | 2013

Scalable parallel computing on clouds using Twister4Azure iterative MapReduce

Thilina Gunarathne; Bingjing Zhang; Tak-Lon Wu; Judy Qiu

Recent advances in data-intensive computing for science discovery are fueling a dramatic growth in the use of data-intensive iterative computations. The utility computing model introduced by cloud computing, combined with the rich set of cloud infrastructure and storage services, offers a very attractive environment in which scientists can perform data analytics. The challenges to large-scale distributed computations on cloud environments demand innovative computational frameworks that are specifically tailored for cloud characteristics to easily and effectively harness the power of clouds. Twister4Azure is a distributed decentralized iterative MapReduce runtime for Windows Azure Cloud. Twister4Azure extends the familiar, easy-to-use MapReduce programming model with iterative extensions, enabling a fault-tolerance execution of a wide array of data mining and data analysis applications on the Azure cloud. Twister4Azure utilizes the scalable, distributed and highly available Azure cloud services as the underlying building blocks, and employs a decentralized control architecture that avoids single point failures. Twister4Azure optimizes the iterative computations using a multi-level caching of data, a cache-aware decentralized task scheduling, hybrid tree-based data broadcasting and hybrid intermediate data communication. This paper presents the Twister4Azure iterative MapReduce runtime and a study of four real world data-intensive scientific applications implemented using Twister4Azure-two iterative applications, Multi-Dimensional Scaling and KMeans Clustering; and two pleasingly parallel applications, BLAST+ sequence searching and SmithWaterman sequence alignment. Performance measurements show comparable or a factor of 2 to 4 better results than the traditional MapReduce runtimes deployed on up to 256 instances and for jobs with tens of thousands of tasks. We also study and present solutions to several factors that affect the performance of iterative MapReduce applications on Windows Azure Cloud.

international conference on cloud computing | 2009

Biomedical Case Studies in Data Intensive Computing

Geoffrey C. Fox; Xiaohong Qiu; Scott Beason; Jong Youl Choi; Jaliya Ekanayake; Thilina Gunarathne; Mina Rho; Haixu Tang; Neil Devadasan; Gilbert C. Liu

Many areas of science are seeing a data deluge coming from new instruments, myriads of sensors and exponential growth in electronic records. We take two examples --- one the analysis of gene sequence data (35339 Alu sequences) and other a study of medical information (over 100,000 patient records) in Indianapolis and their relationship to Geographic and Information System and Census data available for 635 Census Blocks in Indianapolis. We look at initial processing (such as Smith Waterman dissimilarities), clustering (using robust deterministic annealing) and Multi Dimensional Scaling to map high dimension data to 3D for convenient visualization. We show how scaling pipelines can be produced that can be implemented using either cloud technologies or MPI which are compared. This study illustrates challenges in integrating data exploration tools with a variety of different architectural requirements and natural programming models. We present preliminary results for end to end study of two complete applications.

utility and cloud computing | 2011

Portable Parallel Programming on Cloud and HPC: Scientific Applications of Twister4Azure

Thilina Gunarathne; Bingjing Zhang; Tak-Lon Wu; Judy Qiu

Recent advancements in data intensive computing for science discovery are fueling a dramatic growth in use of data-intensive iterative computations. The utility computing model introduced by cloud computing combined with the rich set of cloud infrastructure services offers a very attractive environment for scientists to perform such data intensive computations. The challenges to large scale distributed computations on clouds demand new computation frameworks that are specifically tailored for cloud characteristics in order to easily and effectively harness the power of clouds. Twister4Azure is a distributed decentralized iterative MapReduce runtime for Windows Azure Cloud. It extends the familiar, easy-to-use MapReduce programming model with iterative extensions, enabling a wide array of large-scale iterative data analysis for scientific applications on Azure cloud. This paper presents the applicability of Twister4Azure with highlighted features of fault-tolerance, efficiency and simplicity. We study three data-intensive applications - two iterative scientific applications, Multi-Dimensional Scaling and KMeans Clustering, one data - intensive pleasingly parallel scientific application, BLAST+ sequence searching. Performance measurements show comparable or a factor of 2 to 4 better results than the traditional MapReduce runtimes deployed on up to 256 instances and for jobs with tens of thousands of tasks.

Proceedings of the 1st international workshop on Multicore software engineering | 2008

Developing a concurrent service orchestration engine in ccr

Wei Lu; Thilina Gunarathne; Dennis Gannon

As the Grid application models move towardsWeb services and the service oriented architecture (SOA), the service orchestration is becoming the key to build the large-scale system. With the significant attention, WS-BPEL is widely adopted as the standard web service orchestration language. As a concurrent workflow language WSBPEL introduces a set of complex and sophisticated concurrent and coordination semantics. Meanwhile the centralized architecture makes the orchestration engine be the inherent candidate for the performance bottleneck. Therefore implementing a correct and highly concurrent WS-BPEL engine presents significant challenge. The conventional thread based concurrent programming model is inadequate here. Instead, we believe an alternative model, namely the event-driven programming model aided with the high level coordination constructs such as join patterns, is more suitable for this case, from the perspective of system performance as well as the programmability. In this paper we present the implementation of a high performance WS-BPEL engine prototype, which is built upon the event-driven architecture and join patterns provided by the Microsoft Concurrent Coordination Runtime(CCR). We focus on how to interpret the concurrency semantics in WS-BPEL by using the event and join patterns, and how to drive the execution of a workflow in a reactive manner. Also our experience shows that the event driven architecture enables the orchestration engine to efficiently handle the massive concurrency on the multicore machine.

Explore More