Abdullah Gharaibeh
University of British Columbia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Abdullah Gharaibeh.
high performance distributed computing | 2008
Samer Al-Kiswany; Abdullah Gharaibeh; Elizeu Santos-Neto; George L. Yuan; Matei Ripeanu
Today Graphics Processing Units (GPUs) are a largely underexploited resource on existing desktops and a possible cost-effective enhancement to high-performance systems. To date, most applications that exploit GPUs are specialized scientific applications. Little attention has been paid to harnessing these highly-parallel devices to support more generic functionality at the operating system or middleware level. This study starts from the hypothesis that generic middleware level techniques that improve distributed system reliability or performance (such as content addressing, erasure coding, or data similarity detection) can be significantly accelerated using GPU support. We take a first step towards validating this hypothesis, focusing on distributed storage systems. As a proof of concept, we design StoreGPU, a library that accelerates a number of hashing based primitives popular in distributed storage system implementations. Our evaluation shows that StoreGPU enables up to eight-fold performance gains on synthetic benchmarks as well as on a high-level application: the online similarity detection between large data files.
international parallel and distributed processing symposium | 2013
Abdullah Gharaibeh; Lauro Beltrão Costa; Elizeu Santos-Neto; Matei Ripeanu
Graph processing has gained renewed attention. The increasing large scale and wealth of connected data, such as those accrued by social network applications, demand the design of new techniques and platforms to efficiently derive actionable information from large scale graphs. Hybrid systems that host processing units optimized for both fast sequential processing and bulk processing (e.g., GPUaccelerated systems) have the potential to cope with the heterogeneous structure of real graphs and enable high performance graph processing. Reaching this point, however, poses multiple challenges. The heterogeneity of the processing elements (e.g., GPUs implement a different parallel processing model than CPUs and have much less memory) and the inherent irregularity of graph workloads require careful graph partitioning and load assignment. In particular, the workload generated by a partitioning scheme should match the strength of the processing element the partition is allocated to. This work explores the feasibility and quantifies the performance gains of such low-cost partitioning schemes. We propose to partition the workload between the two types of processing elements based on vertex connectivity. We show that such partitioning schemes offer a simple, yet efficient way to boost the overall performance of the hybrid system. Our evaluation illustrates that processing a 4-billion edges graph on a system with one CPU socket and one GPU, while offloading as little as 25% of the edges to the GPU, achieves 2x performance improvement over state-of-the-art implementations running on a dual-socket symmetric system. Moreover, for the same graph, a hybrid system with dualsocket and dual-GPU is capable of 1.13 Billion breadth-first search traversed edge per second, a performance rate that is competitive with the latest entries in the Graph500 list, yet at a much lower price point.
symposium on operating systems principles | 2010
Samer Al-Kiswany; Abdullah Gharaibeh; Matei Ripeanu
Storage systems in emerging large-scale (a.k.a. peta-scale) computing systems often introduce a performance or scalability bottleneck. To deal with these limitations we propose a new operational approach: versatile storage, an application-optimized and highly configurable storage system that harnesses node-local resources, is configured and deployed at application deployment time, and has a lifetime dependent on the application lifetime. Our prototype evaluation, using synthetic and application-level benchmarks, on a small cluster as well as on a 96K processor machine, provides evidence that the versatile storage approach can bring valuable benefits to large scale deployments in terms of storage system performance and scalability.
ieee international conference on high performance computing data and analytics | 2010
Abdullah Gharaibeh; Matei Ripeanu
GPUs offer drastically different performance characteristics compared to traditional multicore architectures. To explore the tradeoffs exposed by this difference, we refactor MUMmer, a widely-used, highly-engineered bioinformatics application which has both CPU- and GPU-based implementations. We synthesize our experience as three high-level guidelines to design efficient GPU-based applications. First, minimizing the communication overheads is as important as optimizing the computation. Second, trading-off higher computational complexity for a more compact in-memory representation is a valuable technique to increase overall performance (by enabling higher parallelism levels and reducing transfer overheads). Finally, ensuring that the chosen solution entails low pre- and post-processing overheads is essential to maximize the overall performance gains. Based on these insights, MUMmerGPU++, our GPU-based design of the MUMmer sequence alignment tool, achieves, on realistic workloads, up to 4x speedup compared to a previous, highly optimized GPU port.
high performance distributed computing | 2009
Abdullah Gharaibeh; Matei Ripeanu
This paper explores the feasibility of a cost-efficient storage architecture that offers the reliability and access performance characteristics of a high-end system. This architecture exploits two opportunities: First, scavenging idle storage from LAN-connected desktops not only offers a low-cost storage space, but also high I/O throughput by aggregating the I/O channels of the participating nodes. Second, the two components of data reliability - durability and availability - can be decoupled to control overall system cost. To capitalize on these opportunities, we integrate two types of components: volatile, scavenged storage and dedicated, yet low-bandwidth durable storage. On the one hand, the durable storage forms a low-cost back-end that enables the system to restore the data the volatile nodes may lose. On the other hand, the volatile nodes provide a high-throughput front-end. While integrating these components has the potential to offer a unique combination of high throughput, low cost, and durability, a number of concerns need to be addressed to architect and correctly provision the system. To this end, we develop analytical- and simulation based tools to evaluate the impact of system characteristics (e.g., bandwidth limitations on the durable and the volatile nodes) and design choices (e.g., replica placement scheme) on data availability and the associated system costs (e.g., maintenance traffic). Further, we implement and evaluate a prototype of the proposed architecture: namely a GridFTP server that aggregates volatile resources. Our evaluation demonstrates an impressive, up to 800MBps transfer throughput for the new GridFTP service.
irregular applications: architectures and algorithms | 2013
Abdullah Gharaibeh; Elizeu Santos-Neto; Lauro Beltrão Costa; Matei Ripeanu
This paper investigates the power, energy, and performance characteristics of large-scale graph processing on hybrid (i.e., CPU and GPU) single-node systems. Graph processing can be accelerated on hybrid systems by properly mapping the graph-layout to processing units, such that the algorithmic tasks exercise each of the units where they perform best. However, the GPUs have much higher Thermal Design Power (TDP), thus their impact on the overall energy consumption is unclear. Our evaluation using large real-world graphs and synthetic graphs as large as 1 billion vertices and 16 billion edges shows that a hybrid system is efficient in terms of both time-to-solution and energy.
IEEE Transactions on Parallel and Distributed Systems | 2011
Abdullah Gharaibeh; Samer Al-Kiswany; Matei Ripeanu
This paper explores the feasibility of a storage architecture that offers the reliability and access performance characteristics of a high-end system, yet is cost-efficient. We propose ThriftStore, a storage architecture that integrates two types of components: volatile, aggregated storage and dedicated, yet low-bandwidth durable storage. On the one hand, the durable storage forms a back end that enables the system to restore the data the volatile nodes may lose. On the other hand, the volatile nodes provide a high-throughput front-end. Although integrating these components has the potential to offer a unique combination of high throughput and durability at a low cost, a number of concerns need to be addressed to architect and correctly provision the system. To this end, we develop analytical and simulation-based tools to evaluate the impact of system characteristics (e.g., bandwidth limitations on the durable and the volatile nodes) and design choices (e.g., the replica placement scheme) on data availability and the associated system costs (e.g., maintenance traffic). Moreover, to demonstrate the high-throughput properties of the proposed architecture, we prototype a GridFTP server based on ThriftStore. Our evaluation demonstrates an impressive, up to 800 Mbps transfer throughput for the new GridFTP service.
Cluster Computing | 2009
Samer Al-Kiswany; Abdullah Gharaibeh; Elizeu Santos-Neto; Matei Ripeanu
Today Graphics Processing Units (GPUs) are a largely underexploited resource on existing desktops and a possible cost-effective enhancement to high-performance systems. To date, most applications that exploit GPUs are specialized scientific applications. Little attention has been paid to harnessing these highly-parallel devices to support more generic functionality at the operating system or middleware level. This study starts from the hypothesis that generic middleware-level techniques that improve distributed system reliability or performance (such as content addressing, erasure coding, or data similarity detection) can be significantly accelerated using GPU support.We take a first step towards validating this hypothesis and we design StoreGPU, a library that accelerates a number of hashing-based middleware primitives popular in distributed storage system implementations. Our evaluation shows that StoreGPU enables up twenty five fold performance gains on synthetic benchmarks as well as on a high-level application: the online similarity detection between large data files.
ieee conference on mass storage systems and technologies | 2014
Abdullah Gharaibeh; Cornel Constantinescu; Maohua Lu; Ramani R. Routray; Anurag Sharma; Prasenjit Sarkar; David Pease; Matei Ripeanu
Deduplication is a commonly-used technique on disk-based storage pools. However, deduplication has not been used for tape-based pools: tape characteristics, such as high mount and seek times combined with data fragmentation resulting from deduplication create a toxic combination that leads to unacceptably high retrieval times. This work proposes DedupT, a system that efficiently supports deduplication on tape pools. This paper (i) details the main challenges to enable efficient deduplication on tape libraries, (ii) presents a class of solutions based on graph-modeling of similarity between data items that enables efficient placement on tapes; and (iii) presents the design and evaluation of novel cross-tape and on-tape chunk placement algorithms that alleviate tape mount time overhead and reduce on-tape data fragmentation. Using 4.5 TB of real-world workloads, we show that DedupT retains at least 95% of the deduplication efficiency. We show that DedupT mitigates major retrieval time overheads, and, due to reading less data, is able to offer better restore performance compared to the case of restoring non-deduplicated data.
workshop on storage security and survivability | 2008
Abdullah Gharaibeh; Samer Al-Kiswany; Matei Ripeanu
Scavenged storage systems harness unused disk space from individual workstations the same way idle CPU cycles are harnessed by desktop grid applications like Seti@Home. These systems provide a promising low cost, high-performance storage solution in certain high-end computing scenarios. However, selecting the security level and designing the security mechanisms for such systems is challenging as scavenging idle storage opens the door for security threats absent in traditional storage systems that use dedicated nodes under a single administrative domain. Moreover, increased security often comes at the price of performance and scalability. This paper develops a general threat model for systems that use scavenged storage, presents the design of a protocol that addresses these threats and is optimized for throughput, and evaluates the overheads brought by the new security protocol when configured to provide a number of different security properties.