Mohammad Hammoud
Carnegie Mellon University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mohammad Hammoud.
ieee international conference on cloud computing technology and science | 2011
Mohammad Hammoud; Majd F. Sakr
MapReduce offers a promising programming model for big data processing. Inspired by functional languages, MapReduce allows programmers to write functional-style code which gets automatically divided into multiple map and/or reduce tasks and scheduled over distributed data across multiple machines. Hadoop, an open source implementation of MapReduce, schedules map tasks in the vicinity of their inputs in order to diminish network traffic and improve performance. However, Hadoop schedules reduce tasks at requesting nodes without considering data locality leading to performance degradation. This paper describes Locality-Aware Reduce Task Scheduler (LARTS), a practical strategy for improving MapReduce performance. LARTS attempts to collocate reduce tasks with the maximum required data computed after recognizing input data network locations and sizes. LARTS adopts a cooperative paradigm seeking a good data locality while circumventing scheduling delay, scheduling skew, poor system utilization, and low degree of parallelism. We implemented LARTS in Hadoop-0.20.2. Evaluation results show that LARTS outperforms the native Hadoop reduce task scheduler by an average of 7%, and up to 11.6%.
international conference on cloud computing | 2012
Mohammad Hammoud; M. Suhail Rehman; Majd F. Sakr
MapReduce is by far one of the most successful realizations of large-scale data-intensive cloud computing platforms. MapReduce automatically parallelizes computation by running multiple map and/or reduce tasks over distributed data across multiple machines. Hadoop is an open source implementation of MapReduce. When Hadoop schedules reduce tasks, it neither exploits data locality nor addresses partitioning skew present in some MapReduce applications. This might lead to increased cluster network traffic. In this paper we investigate the problems of data locality and partitioning skew in Hadoop. We propose Center-of-Gravity Reduce Scheduler (CoGRS), a locality-aware skew-aware reduce task scheduler for saving MapReduce network traffic. In an attempt to exploit data locality, CoGRS schedules each reduce task at its center-of-gravity node, which is computed after considering partitioning skew as well. We implemented CoGRS in Hadoop-0.20.2 and tested it on a private cloud as well as on Amazon EC2. As compared to native Hadoop, our results show that CoGRS minimizes off-rack network traffic by averages of 9.6% and 38.6% on our private cloud and on an Amazon EC2 cluster, respectively. This reflects on job execution times and provides an improvement of up to 23.8%.
very large data bases | 2015
Mohammad Hammoud; Dania Abed Rabbou; Reza Nouri; Seyed-Mehdi-Reza Beheshti; Sherif Sakr
The Resource Description Framework (RDF) and SPARQL query language are gaining wide popularity and acceptance. In this paper, we present DREAM, a distributed and adaptive RDF system. As opposed to existing RDF systems, DREAM avoids partitioning RDF datasets and partitions only SPARQL queries. By not partitioning datasets, DREAM offers a general paradigm for different types of pattern matching queries, and entirely averts intermediate data shuffling (only auxiliary data are shuffled). Besides, by partitioning queries, DREAM presents an adaptive scheme, which automatically runs queries on various numbers of machines depending on their complexities. Hence, in essence DREAM combines the advantages of the state-of-the-art centralized and distributed RDF systems, whereby data communication is avoided and cluster resources are aggregated. Likewise, it precludes their disadvantages, wherein system resources are limited and communication overhead is typically hindering. DREAM achieves all its goals via employing a novel graph-based, rule-oriented query planner and a new cost model. We implemented DREAM and conducted comprehensive experiments on a private cluster and on the Amazon EC2 platform. Results show that DREAM can significantly outperform three related popular RDF systems.
high performance embedded architectures and compilers | 2008
Mohammad Hammoud; Sangyeun Cho; Rami G. Melhem
This paper proposes and studies a hardware-based adaptive controlled migration strategy for managing distributed L2 caches in chip multiprocessors. Building on an area-efficient shared cache design, the proposed scheme dynamically migrates cache blocks to cache banks that best minimize the average L2 access latency. Cache blocks are continuously monitored and the locations of the optimal corresponding cache banks are predicted to effectively alleviate the impact of non-uniform cache access latency. By adopting migration alone without replication, the exclusiveness of cache blocks is maintained, thus further optimizing the cache miss rate. Simulation results using a full system simulator demonstrate that the proposed controlled migration scheme outperforms the shared caching strategy and compares favorably with previously proposed replication schemes.
international symposium on performance analysis of systems and software | 2007
Sangyeun Cho; Joel R. Martin; Ruibin Xu; Mohammad Hammoud; Rami G. Melhem
This paper proposes a specialized memory structure called CA-RAM (content addressable random access memory) to accelerate search operations present in many important real-world applications. Search operations can occupy a significant portion of total execution time and energy consumption, while posing a difficult performance problem to tackle using traditional memory hierarchy concepts. In essence, CA-RAM is a direct hardware implementation of the well-known hashing technique. Searchable records are stored in CA-RAM at a location determined by a hash function, defined on their search key. After a database has been built, looking up a record in CA-RAM typically involves a single memory access followed by a parallel key matching operation. Compared with a conventional CAM (content addressable memory) solution, CA-RAM capitalizes on dense SRAM and DRAM designs, and achieves comparable search performance while occupying much smaller area and consuming significantly less power. This paper presents detailed design aspects of CA-RAM, to be integrated in future general-purpose and application-specific processors and systems. To further motivate and justify our approach, we present two real examples of using CA-RAM to build a high-performance search accelerator targeting: IP address lookup in core routers and trigram lookup in a large speech recognition system
international conference on supercomputing | 2009
Mohammad Hammoud; Sangyeun Cho; Rami G. Melhem
This paper proposes DCC (Dynamic Cache Clustering), a novel distributed cache management scheme for large-scale chip multiprocessors. Using DCC, a per-core cache cluster is comprised of a number of L2 cache banks and cache clusters are constructed, expanded, and contracted dynamically to match each cores cache demand. The basic trade-offs of varying the on-chip cache clusters are average L2 access latency and L2 miss rate. DCC uniquely and efficiently optimizes both metrics and continuously tracks a near-optimal cache organization from many possible configurations. Simulation results using a full-system simulator demonstrate that DCC outperforms alternative L2 cache designs.
IEEE Computer Architecture Letters | 2010
Mohammad Hammoud; Sangyeun Cho; Rami G. Melhem
This paper describes dynamic pressure-aware associative placement (DPAP), a novel distributed cache management scheme for large-scale chip multiprocessors. Our work is motivated by the large non-uniform distribution of memory accesses across cache sets in different L2 banks. DPAP decouples the physical locations of cache blocks from their addresses for the sake of reducing misses caused by destructive interferences. Temporal pressure at the on-chip last-level cache, is continuously collected at a group (comprised of local cache sets) granularity, and periodically recorded at the memory controller(s) to guide the placement process. An incoming block is consequently placed at a cache group that exhibits the minimum pressure. Simulation results using a full-system simulator demonstrate that DPAP outperforms the baseline shared NUCA scheme by an average of 8.3% and by as much as 18.9% for the benchmark programs we examined. Furthermore, evaluations showed that DPAP outperforms related cache designs.
IEEE Transactions on Learning Technologies | 2015
Khaled Salah; Mohammad Hammoud; Sherali Zeadally
Cloud computing platforms can be highly attractive to conduct course assignments and empower students with valuable and indispensable hands-on experience. In particular, the cloud can offer teaching staff and students (whether local or remote) on-demand, elastic, dedicated, isolated, (virtually) unlimited, and easily configurable virtual machines. As such, employing cloud-based laboratories can have clear advantages over using classical ones, which impose major hindrances against fulfilling pedagogical objectives and do not scale well when the number of students and distant university campuses grows up. We show how the cloud paradigm can be leveraged to teach a cybersecurity course. Specifically, we share our experience when using cloud computing to teach a senior course on cybersecurity across two campuses via a virtual classroom equipped with live audio and video. Furthermore, based on this teaching experience, we propose guidelines that can be applied to teach similar computer science and engineering courses. We demonstrate how cloud-based laboratory exercises can greatly help students in acquiring crucial cybersecurity skills as well as cloud computing ones, which are in high demand nowadays. The cloud we used for this course was the Amazon Web Services (AWS) public cloud. However, our presented use cases and approaches are equally applicable to other available cloud platforms such as Rackspace and Google Compute Engine, among others.
international world wide web conferences | 2016
Aisha Hasan; Mohammad Hammoud; Reza Nouri; Sherif Sakr
RDF and SPARQL query language are gaining wide popularity and acceptance. This demonstration paper presents DREAM, a hybrid RDF system, which combines the advantages and averts the disadvantages of the centralized and distributed RDF schemes. In particular, DREAM avoids partitioning RDF datasets and reversely partitions SPARQL queries. By not partitioning datasets, DREAM offers a general paradigm for different types of pattern matching queries and entirely precludes intermediate data shuffling (only auxiliary data are shuffled). By partitioning only queries, DREAM suggests an adaptive scheme, which runs queries on different numbers of machines depending on their complexities. DREAM achieves these goals and significantly outperforms related systems via employing a novel graph-based, rule-oriented query planner and a new cost model. This paper proposes demonstrating DREAM live over the cloud using a friendly graphical user interface (GUI). The GUI allows participants to execute and visualize pre-defined and user-defined (which can be written by participants on-the-fly) SPARQL queries over various real-world and synthetic RDF datasets. Furthermore, participants can empirically compare and contrast DREAM against three state-of-the-art RDF systems.
international conference on cloud computing | 2013
Mohammad Hammoud; Majd F. Sakr
MapReduce is now a pervasive analytics engine on the cloud. Hadoop is an open source implementation of MapReduce and is currently enjoying wide popularity. Hadoop offers a high-dimensional space of configuration parameters, which makes it difficult for practitioners to set for efficient and cost-effective execution. In this work we observe that MapReduce application performance is highly influenced by map concurrency. Map concurrency is defined in terms of two configurable parameters, the number of available map slots and the number of map tasks running over the slots. We show that some inherent MapReduce characteristics enable well-informed prediction of map concurrency. We propose Map Concurrency Characterization (MC2), a standalone utility program that can predict the best map concurrency for any given MapReduce application. By leveraging the generated predicted information, MC2 can judiciously guide Map phase configuration and, consequently, improve Hadoop performance. Unlike many of relevant schemes, MC2 does not employ simulation, dynamic instrumentation, and/or static analysis of unmodified job code to predict map concurrency. In contrast, MC2 utilizes a simple, yet effective mathematical model, which exploits the MapReduce characteristics that impact map concurrency. We implemented MC2 and conducted comprehensive experiments on a private cloud and on Amazon MC2 using Hadoop 0.20.2. Our results show that MC2 can correctly predict the best map concurrencies for the tested benchmarks and provide up to 2.2X speedup in runtime.