Ali Akoglu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ali Akoglu is active.

Explore More

Publication

Featured researches published by Ali Akoglu.

Frontiers in Plant Science | 2011

The iPlant Collaborative: Cyberinfrastructure for Plant Biology

Stephen A. Goff; Matthew W. Vaughn; Sheldon J. McKay; Eric Lyons; Ann E. Stapleton; Damian Gessler; Naim Matasci; Liya Wang; Matthew R. Hanlon; Andrew Lenards; Andy Muir; Nirav Merchant; Sonya Lowry; Stephen A. Mock; Matthew Helmke; Adam Kubach; Martha L. Narro; Nicole Hopkins; David Micklos; Uwe Hilgert; Michael Gonzales; Chris Jordan; Edwin Skidmore; Rion Dooley; John Cazes; Robert T. McLay; Zhenyuan Lu; Shiran Pasternak; Lars Koesterke; William H. Piel

The iPlant Collaborative (iPlant) is a United States National Science Foundation (NSF) funded project that aims to create an innovative, comprehensive, and foundational cyberinfrastructure in support of plant biology research (PSCIC, 2006). iPlant is developing cyberinfrastructure that uniquely enables scientists throughout the diverse fields that comprise plant biology to address Grand Challenges in new ways, to stimulate and facilitate cross-disciplinary research, to promote biology and computer science research interactions, and to train the next generation of scientists on the use of cyberinfrastructure in research and education. Meeting humanitys projected demands for agricultural and forest products and the expectation that natural ecosystems be managed sustainably will require synergies from the application of information technologies. The iPlant cyberinfrastructure design is based on an unprecedented period of research community input, and leverages developments in high-performance computing, data storage, and cyberinfrastructure for the physical sciences. iPlant is an open-source project with application programming interfaces that allow the community to extend the infrastructure to meet its needs. iPlant is sponsoring community-driven workshops addressing specific scientific questions via analysis tool integration and hypothesis testing. These workshops teach researchers how to add bioinformatics tools and/or datasets into the iPlant cyberinfrastructure enabling plant scientists to perform complex analyses on large datasets without the need to master the command-line or high-performance computational services.

international parallel and distributed processing symposium | 2009

Sequence alignment with GPU: Performance and design challenges

Gregory M. Striemer; Ali Akoglu

In bioinformatics, alignments are commonly performed in genome and protein sequence analysis for gene identification and evolutionary similarities. There are several approaches for such analysis, each varying in accuracy and computational complexity. Smith-Waterman (SW) is by far the best algorithm for its accuracy in similarity scoring. However, execution time of this algorithm on general purpose processor based systems makes it impractical for use by life scientists. In this paper we take Smith-Waterman as a case study to explore the architectural features of Graphics Processing Units (GPUs) and evaluate the challenges the hardware architecture poses, as well as the software modifications needed to map the program architecture on to the GPU. We achieve a 23x speedup against the serial version of the SW algorithm. We further study the effect of memory organization and the instruction set architecture on GPU performance. For that purpose we analyze another implementation on an Intel Quad Core processor that makes use of Intels SIMD based SSE2 architecture. We show that if reading blocks of 16 words at a time instead of 4 is allowed, and if 64KB of shared memory as opposed to 16KB is available to the programmer, GPU performance enhances significantly making it comparable to the SIMD based implementation. We quantify these observations to illustrate the need for studies on extending the instruction set and memory organization for the GPU.

International Journal of Reconfigurable Computing | 2012

High performance biological pairwise sequence alignment: FPGA versus GPU versus cell BE versus GPP

Khaled Benkrid; Ali Akoglu; Cheng Ling; Yang Song; Ying Liu; Xiang Tian

This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs), Graphics Processor Units (GPUs), and IBMs Cell Broadband Engine (Cell BE), in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as a base reference implementation. Comparison criteria include speed, energy consumption, and purchase and development costs. The study shows that FPGAs largely outperform all other implementation platforms on performance per watt criterion and perform better than all other platforms on performance per dollar criterion, although by a much smaller margin. Cell BE and GPU come second and third, respectively, on both performance per watt and performance per dollar criteria. In general, in order to outperform other technologies on performance per dollar criterion (using currently available hardware and development tools), FPGAs need to achieve at least two orders of magnitude speed-up compared to general-purpose processors and one order of magnitude speed-up compared to domain-specific technologies such as GPUs.

The Journal of Supercomputing | 2012

Cardiac simulation on multi-GPU platform

Venkata Krishna Nimmagadda; Ali Akoglu; Salim Hariri; Talal Moukabary

The cardiac bidomain model is a popular approach to study electrical behavior of tissues and simulate interactions between the cells by solving partial differential equations. The iterative and data parallel model is an ideal match for the parallel architecture of Graphic Processing Units (GPUs). In this study, we evaluate the effectiveness of architecture-specific optimizations and fine grained parallelization strategies, completely port the model to GPU, and evaluate the performance of single-GPU and multi-GPU implementations. Simulating one action potential duration (350 msec real time) for a 256×256×256 tissue takes 453 hours on a high-end general purpose processor, while it takes 664 seconds on a four-GPU based system including the communication and data transfer overhead. This drastic improvement (a factor of 2460×) will allow clinicians to extend the time-scale of simulations from milliseconds to seconds and minutes; and evaluate hypotheses in a shorter amount of time that was not feasible previously.

design automation conference | 2011

MO-pack: many-objective clustering for FPGA CAD

Senthilkumar Thoravi Rajavel; Ali Akoglu

Applications targeting FPGA integrated systems impose strict energy, channel width and delay constraints. We introduce the first many-objective clustering, MO-Pack, that targets these performance metrics concurrently. Detailed performance comparisons over state of the art clustering strategies targeting energy (P-T-VPack), delay (T-VPack), channel width (iRAC), and timing and routability (T-RPack) show that MO-Pack achieves its goals without increasing the logic area.

Cluster Computing | 2009

Scalable and highly parallel implementation of Smith-Waterman on graphics processing unit using CUDA

Ali Akoglu; Gregory M. Striemer

Program development environments have enabled graphics processing units (GPUs) to become an attractive high performance computing platform for the scientific community. A commonly posed problem in computational biology is protein database searching for functional similarities. The most accurate algorithm for sequence alignments is Smith-Waterman (SW). However, due to its computational complexity and rapidly increasing database sizes, the process becomes more and more time consuming making cluster based systems more desirable. Therefore, scalable and highly parallel methods are necessary to make SW a viable solution for life science researchers. In this paper we evaluate how SW fits onto the target GPU architecture by exploring ways to map the program architecture on the processor architecture. We develop new techniques to reduce the memory footprint of the application while exploiting the memory hierarchy of the GPU. With this implementation, GSW, we overcome the on chip memory size constraint, achieving 23× speedup compared to a serial implementation. Results show that as the query length increases our speedup almost stays stable indicating the solid scalability of our approach. Additionally this is a first of a kind implementation which purely runs on the GPU instead of a CPU-GPU integrated environment, making our design suitable for porting onto a cluster of GPUs.

acs/ieee international conference on computer systems and applications | 2009

Accelerated discovery through integration of Kepler with data turbine for ecosystem research

Yaser Jararweh; Arjun Hary; Youssif B. Al-Nashif; Salim Hariri; Ali Akoglu; Darrel Jenerette

There is a need for Accelerated Discovery Cycles (ADCs) for integrating experimental and observational data to capture large-scale dynamic ecosystem complexity, to instantly process massive datasets, to test contrasting mechanistic models and to drive the next set of experiments. The overreaching objective is to enable ADCs by coupling recent advances in computational models and cyber-systems with the unique experimental infrastructure of Biosphere 2 (B2), a large-scale earth system science facility now under management by the University of Arizona. In the context of ADCs, there is a need for software development environment for modeling complex systems and a middleware for data streaming from the field into the models. Kepler is an open source tool that enables the end user to design scientific workflows in order to manage scientific data and perform complex analysis on the data. Ring Buffered Network Bus (RBNB) Data Turbine is a middleware system that is used to integrate sensor-based environment observing systems with data processing systems. Currently the integration between Kepler and Data Turbine is limited to reading from the Data Turbine only. In ADC, multiple hypotheses are tested with different assimilation models. These models run on a distributed computing environment, therefore capability of simultaneous reads and writes to the Data Turbine is a necessity. In this paper we show how to integrate Kepler with RBNB Data Turbine to achieve this capability. We also exploit the open-source features of Kepler system and create customized processing models in order to accelerate and automate the experiments in ecosystems research. We describe in further details our implementation approach to enable future studies on Kepler and Data Turbine integration.

International Journal of Reconfigurable Computing | 2010

Timing-driven nonuniform depopulation-based clustering

Hanyu Liu; Ali Akoglu

Low-cost FPGAs have comparable number of Configurable Logic Blocks (CLBs) with respect to resource-rich FPGAs but have much less routing tracks. For CAD tools, this situation increases the difficulty of successfully mapping a circuit into the low-cost FPGAs. Instead of switching to resource-rich FPGAs, the designers could employ depopulation-based clustering techniques which underuse CLBs, hence improve routability by spreading the logic over the architecture. However, all depopulation-based clustering algorithms to this date increase critical path delay. In this paper, we present a timing-driven nonuniform depopulation-based clustering technique, T-NDPack, that targets critical path delay and channel width constraints simultaneously. T-NDPack adjusts the CLB capacity based on the criticality of the Basic Logic Element (BLE). Results show that T-NDPack reduces minimum channel width by 11.07% while increasing the number of CLBs by 13.28% compared to T-VPack. More importantly, T-NDPack decreases critical path delay by 2.89%.

field-programmable technology | 2007

A Highly Parallel FPGA based IEEE-754 Compliant Double-Precision Binary Floating-Point Multiplication Algorithm

Sandeep K. Venishetti; Ali Akoglu

There is increasing demand for fast floating-point arithmetic support to make field programmable gate arrays (FPGAs) a practical option for scientific applications. We propose a new IEEE-754 compliant double-precision floating-point multiplication algorithm that supports denormal numbers, NaN and exception handling. Solution involves bit-level operations with minimum dependency between partial products through a specialized adder tree structure tailored to make use of modular and parallel nature of FPGAs. We achieve maximum operational frequency of 274MHz for mantissa multiplication and 228MHz for the overall system on Xilinx Virtex-4 platform. Our design carries performance benefits similar to ASIC based algorithms; and routing benefits similar to ripple carry array and carry save multipliers. Proposed approach outperforms algorithm and IP-Core solutions in the academia and Xilinx LogiCORE multiplier when no embedded resources are used. Algorithm allows reaching double-double precision level with much less performance degradation and pipelining demand than IP-Core based approaches.

scientific cloud computing | 2016

Value-Based Resource Management in High-Performance Computing Systems

Dylan Machovec; Cihan Tunc; Nirmal Kumbhare; Bhavesh Khemka; Ali Akoglu; Salim Hariri; Howard Jay Siegel

We introduce a new metric, Value of Service (VoS), which enables resource management techniques for high-performance computing (HPC) systems to take into consideration the value of completion time of a task and the value of energy used to compute that task at a given instant of time. These value functions have a soft-threshold, where the value function begins to decrease from its maximum value, and a hard-threshold, where the value function goes to zero. Each task has an associated importance factor to express the relative significance among tasks. We define the value of a task as the weighted sum of its value of performance and value of energy, multiplied by its importance factor. We also consider the variation in value for completing a task at different time; the value of energy reduction can change significantly between peak and non-peak periods. We define VoS for a given workload to be sum of the values for all tasks that are executed during a given period of time. Our system model is based on virtual machines (VMs), where each dynamically arriving task will be assigned to a VM with a resource configuration based on number of homogenous cores and amount of memory. Based on VoS, we design, evaluate, and compare different resource management heuristics. This comparison is done over various simulation scenarios and example experiments on an IBM blade server based system.

Explore More