Jeyarajan Thiyagalingam

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jeyarajan Thiyagalingam is active.

Explore More

Publication

Featured researches published by Jeyarajan Thiyagalingam.

Journal of Computational Science | 2013

Energy-aware software: Challenges, opportunities and strategies

Anne E. Trefethen; Jeyarajan Thiyagalingam

Abstract Energy consumption of computing systems has become a major concern. Constrained by cost, environmental concerns and policy, minimising the energy foot-print of computing systems is one of the primary goals of many initiatives. As we move towards exascale computing, energy constraints become very real and are a major driver in design decisions. The issue is also apparent at the scale of desk top machines, where many core and accelerator chips are common and offer a spectrum of opportunities for balancing energy and performance. Conventionally, approaches for reducing energy consumption have been either at the operational level (such as powering down all or part of systems) or at the hardware design level (such as utilising specialised low-energy components). In this paper, we are interested in a different approach; energy-aware software. By measuring the energy consumption of a computer application and understanding where the energy usage lies, may allow a change of the software to provide opportunities for energy savings. In order to understand the complexities of this approach, we specifically look at multithreaded algorithms and applications. By an evaluation of a benchmark suite on multiple architectures and multiple environments, we show how basic parameters, such as threading options, compilers and frequencies, can impact energy consumption. As such, we provide an overview of the challenges that face software developers in this regard. We then offer a view of the directions that need to be taken and possible strategies needed for building energy-aware software.

workshop on declarative aspects of multicore programming | 2011

Breaking the GPU programming barrier with the auto-parallelising SAC compiler

Jing Guo; Jeyarajan Thiyagalingam; Sven-Bodo Scholz

Over recent years, the use of Graphics Processing Units (GPUs) for general-purpose computing has become increasingly popular. The main reasons for this development are the attractive performance/price and performance/power ratios of these architectures. However, substantial performance gains from GPUs come at a price: they require extensive programming expertise and, typically, a substantial re-coding effort. Although the programming experience has been significantly improved by existing frameworks like CUDA and OpenCL, it is still a challenge to effectively utilise these devices. Directive-based approaches such as hiCUDA or OpenMP-variants offer further improvements but have not eliminated the need for the expertise on these complex architectures. Similarly, special purpose programming languages such as Microsofts Accelerator try to lower the barrier further. They provide the programmer with a special form of GPU data structures and operations on them which are then compiled into GPU code. In this paper, we take this trend towards a completely implicit, high-level approach yet another step further. We generate CUDA code from a MATLAB-like high level functional array programming language, Single Assignment C (SaC). To do so, we identify which data structures and operations can be successfully mapped on GPUs and transform existing programs accordingly. This paper presents the first runtime results from our GPU backend and it presents the basic set of GPU-specific program optimisations that turned out to be essential. Despite our high-level program specifications, we show that for a number of benchmarks speedups between a factor of 5 and 50 can be achieved through our parallelising compiler.

Concurrency and Computation: Practice and Experience | 2006

Is Morton layout competitive for large two-dimensional arrays yet?

Jeyarajan Thiyagalingam; Olav Beckmann; Paul H. J. Kelly

Two‐dimensional arrays are generally arranged in memory in row‐major order or column‐major order. Traversing a row‐major array in column‐major order, or vice versa, leads to poor spatial locality. With large arrays the performance loss can be a factor of 10 or more. This paper explores the Morton storage layout, which has substantial spatial locality whether traversed in row‐major or column‐major order. Using a small suite of dense kernels working on two‐dimensional arrays, we have carried out an extensive study of the impact of poor array layout and of whether Morton layout can offer an attractive compromise. We show that Morton layout can lead to better performance than the worse of the two canonical layouts; however, the performance of Morton layout compared to the better choice of canonical layout is often disappointing. We further study one simple improvement of the basic Morton scheme: we show that choosing the correct alignment for the base address of an array in Morton layout can sometimes significantly improve the competitiveness of this layout. Copyright

parallel computing | 2013

Design and initial performance of a high-level unstructured mesh framework on heterogeneous parallel systems

Gihan R. Mudalige; Michael B. Giles; Jeyarajan Thiyagalingam; I. Z. Reguly; Carlo Bertolli; Paul H. J. Kelly; Anne E. Trefethen

Discuss the main design issues in parallelizing unstructured mesh applications.Present OP2 for developing applications for heterogeneous parallel systems.Analyze the performance gained with OP2 for two industrial-representative benchmarks.Compare runtime, scaling and runtime break-downs of the applications.Present energy consumption of OP2 applications on CPU and GPU clusters. OP2 is a high-level domain specific library framework for the solution of unstructured mesh-based applications. It utilizes source-to-source translation and compilation so that a single application code written using the OP2 API can be transformed into multiple parallel implementations for execution on a range of back-end hardware platforms. In this paper we present the design and performance of OP2s recent developments facilitating code generation and execution on distributed memory heterogeneous systems. OP2 targets the solution of numerical problems based on static unstructured meshes. We discuss the main design issues in parallelizing this class of applications. These include handling data dependencies in accessing indirectly referenced data and design considerations in generating code for execution on a cluster of multi-threaded CPUs and GPUs. Two representative CFD applications, written using the OP2 framework, are utilized to provide a contrasting benchmarking and performance analysis study on a number of heterogeneous systems including a large scale Cray XE6 system and a large GPU cluster. A range of performance metrics are benchmarked including runtime, scalability, achieved compute and bandwidth performance, runtime bottlenecks and systems energy consumption. We demonstrate that an application written once at a high-level using OP2 is easily portable across a wide range of contrasting platforms and is capable of achieving near-optimal performance without the intervention of the domain application programmer.

IEEE Transactions on Visualization and Computer Graphics | 2015

Glyph-Based Video Visualization for Semen Analysis

Brian Duffy; Jeyarajan Thiyagalingam; Simon J. Walton; David J. Smith; Anne E. Trefethen; Jackson Kirkman-Brown; Eamonn A. Gaffney; Min Chen

The existing efforts in computer assisted semen analysis have been focused on high speed imaging and automated image analysis of sperm motility. This results in a large amount of data, and it is extremely challenging for both clinical scientists and researchers to interpret, compare and correlate the multidimensional and time-varying measurements captured from video data. In this work, we use glyphs to encode a collection of numerical measurements taken at a regular interval and to summarize spatio-temporal motion characteristics using static visual representations. The design of the glyphs addresses the needs for (a) encoding some 20 variables using separable visual channels, (b) supporting scientific observation of the interrelationships between different measurements and comparison between different sperm cells and their flagella, and (c) facilitating the learning of the encoding scheme by making use of appropriate visual abstractions and metaphors. As a case study, we focus this work on video visualization for computer-aided semen analysis, which has a broad impact on both biological sciences and medical healthcare. We demonstrate that glyph-based visualization can serve as a means of external memorization of video data as well as an overview of a large set of spatiotemporal measurements. It enables domain scientists to make scientific observation in a cost-effective manner by reducing the burden of viewing videos repeatedly, while providing them with a new visual representation for conveying semen statistics.

Archive | 2005

Towards building a generic grid services platform: a components-oriented approach

Jeyarajan Thiyagalingam; Stavros Isaiadis; Vladimir Getov

Grid applications using modern Grid infrastructures benefit from a rich variety of features, because they are designed with built-in exhaustive set of functions. As a result, the notion of a lightweight platform has not been addressed properly yet, and current systems cannot be transplanted, adopted or adapted easily. With the promise of the Grid to be pervasive, it is time to re-think the design methodology for next generation Grid infrastructures. Instead of building the underlying platform with an exhaustive rich set of features, in this chapter, we describe an alternative strategy following a component-oriented approach. Having a lightweight reconfigurable and expandable core platform is the key to our design. We identify and describe the very minimal and essential features that a modern Grid system should always offer and then provide any other functions as pluggable components. These pluggable components can be brought on-line whenever necessary as demanded implicitly by the application. With the support of adaptiveness, we see our approach as a solution towards a flexible dynamically reconfigurable Grid platform.

computer software and applications conference | 2008

Advanced Grid Programming with Components: A Biometric Identification Case Study

Thomas D. Weigold; Peter Buhler; Jeyarajan Thiyagalingam; Artie Basukoski; Vladimir Getov

Component-oriented software development has been attracting increasing attention for building complex distributed applications. A new infrastructure supporting this advanced concept is our prototype component framework based on the Grid component model. This paper provides an overview of the component framework and presents a case study where we utilise the component-oriented approach to develop a business process application for a biometric identification system. We then introduce the tools being developed as part of an integrated development environment to enable graphical component-based development of Grid applications. Finally, we report our initial findings and experiences of efficiently using the component framework and set of software tools.

Progress in Biophysics & Molecular Biology | 2014

Visualizing Cardiovascular Magnetic Resonance (CMR) imagery: challenges and opportunities.

Simon J. Walton; Kai Berger; Jeyarajan Thiyagalingam; Brian Duffy; Hui Fang; Cameron Holloway; Anne E. Trefethen; Min Chen

Cardiovascular Magnetic Resonance (CMR) imaging is an essential technique for measuring regional myocardial function. However, it is a time-consuming and cognitively demanding task to interpret, identify and compare various motion characteristics based on watching CMR imagery. In this work, we focus on the problems of visualising imagery resulting from 2D myocardial tagging in CMR. In particular we provide an overview of the current state of the art of relevant visualization techniques, and a discussion on why the problem is difficult from a perceptual perspective. Finally, we introduce a proof-of-concept multilayered visualization user interface for visualizing CMR data using multiple derived attributes encoded into multivariate glyphs. An initial evaluation of the system by clinicians suggested a great potential for this visualisation technology to become a clinical practice in the future.

international supercomputing conference | 2013

The Effect of Topology-Aware Process and Thread Placement on Performance and Energy

Albert Solernou; Jeyarajan Thiyagalingam; Mihai C. Duta; Anne E. Trefethen

Design of modern multiprocessor computer systems has become increasingly complex and renders the performance of scientific parallel applications highly sensitive to process and thread scheduling. In particular, the Non-Uniform Memory Access (NUMA), a frequent architecture solution, demands knowledge of the hardware details as well as skills that are normally beyond the average user in order to minimise memory access penalties and achieve good application performance. This situation is further complicated by the increasing use of modern heterogeneous systems involving both CPUs and accelerators, where process proximity to the accelerator strongly determines performance.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2011

Harnessing the Power of GPUs without Losing Abstractions in SAC and ArrayOL: A Comparative Study

Jing Guo; Wendell Rodrigues; Jeyarajan Thiyagalingam; Frédéric Guyomarc'h; Pierre Boulet; Sven-Bodo Scholz

Over recent years, using Graphics Processing Units (GPUs) has become as an effective method for increasing the performance of many applications. However, these performance benefits from GPUs come at a price. Firstly extensive programming expertise and intimate knowledge of the underlying hardware are essential for gaining good speedups. Secondly, the expressibility of GPU-based programs are not powerful enough to retain the high-level abstractions of the solutions. Although the programming experience has been significantly improved by existing frameworks like CUDA and OpenCL, it is still a challenge to effectively utilise these devices while still retaining the programming abstractions. To this end, performing a source-to-source transformation, whereby a high-level language is mapped to CUDA or OpenCL, is an attractive option. In particular, it enables one to retain high-level abstractions and to harness the power of GPUs without any expertise on the GPGPU programming. In this paper, we compare and analyse two such schemes. One of them is a transformation mechanism for mapping a image/signal processing domain-specific language, ArrayOL, to OpenCL. The other one is a transformation route for mapping a high-level general purpose array processing language, Single Assignment C (SaC) to CUDA. Using a real-world image processing application as a running example, we demonstrate that albeit the fact of being general purpose, the array processing language be used to specify complex array access patterns generically. Performance of the generated CUDA code is comparable to the OpenCL code created from domain-specific language.

Explore More