Mazen Kharbutli
Jordan University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mazen Kharbutli.
high-performance computer architecture | 2004
Mazen Kharbutli; Keith Irwin; Yan Solihin; Jaejin Lee
Using alternative cache indexing/hashing functions is a popular technique to reduce conflict misses by achieving a more uniform cache access distribution across the sets in the cache. Although various alternative hashing functions have been demonstrated to eliminate the worst case conflict behavior, no study has really analyzed the pathological behavior of such hashing functions that often result in performance slowdown. We present an in-depth analysis of the pathological behavior of cache hashing functions. Based on the analysis, we propose two new hashing functions: prime modulo and prime displacement that are resistant to pathological behavior and yet are able to eliminate the worst case conflict behavior in the L2 cache. We show that these two schemes can be implemented in fast hardware using a set of narrow add operations, with negligible fragmentation in the L2 cache. We evaluate the schemes on 23 memory intensive applications. For applications that have nonuniform cache accesses, both prime modulo and prime displacement hashing achieve an average speedup of 1.27 compared to traditional hashing, without slowing down any of the 23 benchmarks. We also evaluate using multiple prime displacement hashing functions in conjunction with a skewed associative L2 cache. The skewed associative cache achieves a better average speedup at the cost of some pathological behavior that slows down four applications by up to 7%.
Ibm Journal of Research and Development | 2006
Rithin Shetty; Mazen Kharbutli; Yan Solihin; Milos Prvulovic
The ability to detect and pinpoint memory-related bugs in production runs is important because in-house testing may miss bugs. This paper presents HeapMon, a heap memory bug-detection scheme that has a very low performance overhead, is automatic, and is easy to deploy. HeapMon relies on two new techniques. First, it decouples application execution from bug monitoring, which executes as a helper thread on a separate core in a chip multiprocessor system. Second, it associates a filter bit with each cached word to safely and significantly reduce bug checking frequency--by 95% on average. We test the effectiveness of these techniques using existing and injected memory bugs in SPEC®2000 applications and show that HeapMon effectively detects and identifies most forms of heap memory bugs. Our results also indicate that the HeapMon performance overhead is only 5%, on average--orders of magnitude less than existing tools. Its overhead is also modest: 3.1% of the cache size and a 32-KB victim cache for on-chip filter bits and 6.2% of the allocated heap memory size for state bits, which are maintained by the helper thread as a software data structure.
Simulation Modelling Practice and Theory | 2014
Yaser Jararweh; Moath Jarrah; Mazen Kharbutli; Zakarea Alshara; Mohammed Noraden Alsaleh; Mahmoud Al-Ayyoub
Abstract Cloud computing is an emerging and fast-growing computing paradigm that has gained great interest from both industry and academia. Consequently, many researchers are actively involved in cloud computing research projects. One major challenge facing cloud computing researchers is the lack of a comprehensive cloud computing experimental tool to use in their studies. This paper introduces CloudExp , a modeling and simulation environment for cloud computing. CloudExp can be used to evaluate a wide spectrum of cloud components such as processing elements, data centers, storage, networking, Service Level Agreement (SLA) constraints, web-based applications, Service Oriented Architecture (SOA), virtualization, management and automation, and Business Process Management (BPM). Moreover, CloudExp introduces the Rain workload generator which emulates real workloads in cloud environments. Also, MapReduce processing model is integrated in CloudExp in order to handle the processing of big data problems.
ieee international conference on cloud computing technology and science | 2013
Yaser Jararweh; Zakarea Alshara; Moath Jarrah; Mazen Kharbutli; Mohammad Noraden Alsaleh
Cloud computing is an evolving and fast-growing computing paradigm that has gained great interest from both industry and academia. Consequently, universities are actively integrating cloud computing into their IT curricula. One major challenge facing cloud computing instructors is the lack of a teaching tool to experiment with. This paper introduces TeachCloud, a modelling and simulation environment for cloud computing. TeachCloud can be used to experiment with different cloud components such as: processing elements, data centres, storage, networking, service level agreement (SLA) constraints, web-based applications, service oriented architecture (SOA), virtualisation, management and automation, and business process management (BPM). Also, TeachCloud introduces MapReduce processing model in order to handle embarrassingly parallel data processing problems. TeachCloud is an extension of CloudSim, a research-oriented simulator used for the development and validation in cloud computing.
international conference on computer design | 2010
Rami Sheikh; Mazen Kharbutli
Due to the ever increasing performance gap between the processor and the main memory, it becomes crucial to bridge that gap by designing an efficient memory hierarchy capable of reducing the average memory access time. The cache replacement algorithm plays a central role in designing an efficient memory hierarchy. Many of the recent studies in cache replacement algorithms have focused on improving L2 cache replacement algorithms by minimizing the miss count. However, depending on the dependency chain, cache miss bursts, and other factors, a processors ability to partially hide the cost of an L2 cache miss varies; that is, cache miss costs are not uniform. Therefore, a better solution would account also for the aggregate miss cost in designing cache replacement algorithms. Our proposed solution combines the two principles of locality and cost-sensitivity into one which we call: LACS: Locality-Aware Cost-Sensitive cache replacement algorithm. LACS estimates a cache blocks cost from the number of instructions the processor manages to issue during a cache miss on that block and then victimizes cache blocks with low cost and poor locality in order to maximize the overall cache performance. When LACS is evaluated using a uniprocessor architecture model, it speeds up 10 L2 cache performance-constrained SPEC CPU2000 benchmarks by up to 85% and 15% on average while not slowing down any of the 20 SPEC CPU2000 benchmarks evaluated. When evaluated using a dual-core CMP architecture model, LACS speeds up 6 SPEC CPU2000 benchmark pairs by up to 44% and 11% on average.
IEEE Transactions on Computers | 2005
Mazen Kharbutli; Yan Solihin; Jaejin Lee
Using alternative cache indexing/hashing functions is a popular technique to reduce conflict misses by achieving a more uniform cache access distribution across the sets in the cache. Although various alternative hashing functions have been demonstrated to eliminate the worst-case conflict behavior, no study has really analyzed the pathological behavior of such hashing functions that often results in performance slowdown. We present an in-depth analysis of the pathological behavior of cache hashing functions. Based on the analysis, we propose two new hashing functions, prime modulo and odd-multiplier displacement, that are resistant to pathological behavior and yet are able to eliminate the worst-case conflict behavior in the L2 cache. We show that these two schemes can be implemented in fast hardware using a set of narrow addition operations, with negligible fragmentation in the L2 cache. We evaluate the schemes on 23 memory intensive applications. For applications that have nonuniform cache accesses, both prime modulo and odd-multiplier displacement hashing achieve an average speedup of 1.27 compared to traditional hashing, without slowing down any of the 23 benchmarks. We also evaluate using odd-multiplier displacement function with multiple multipliers in conjunction with a skewed associative L2 cache. The skewed associative cache achieves a better average speedup at the cost of some pathological behavior that slows down four applications by up to 7 percent.
IEEE Transactions on Computers | 2014
Mazen Kharbutli; Rami Sheikh
The design of an effective last-level cache (LLC) in general-and an effective cache replacement/partitioning algorithm in particular-is critical to the overall system performance. The processors ability to hide the LLC miss penalty differs widely from one miss to another. The more instructions the processor manages to issue during the miss, the better it is capable of hiding the miss penalty and the lower the cost of that miss. This nonuniformity in the processors ability to hide LLC miss latencies, and the resultant nonuniformity in the performance impact of LLC misses, opens up an opportunity for a new cost-sensitive cache replacement algorithm. This paper makes two key contributions. First, It proposes a framework for estimating the costs of cache blocks at run-time based on the processors ability to (partially) hide their miss latencies. Second, It proposes a simple, low-hardware overhead, yet effective, cache replacement algorithm that is locality-aware and cost-sensitive (LACS). LACS is thoroughly evaluated using a detailed simulation environment. LACS speeds up 12 LLC-performance-constrained SPEC CPU2006 benchmarks by up to 51% and 11% on average. When evaluated using a dual/quad-core CMP with a shared LLC, LACS significantly outperforms LRU in terms of performance and fairness, achieving improvements up to 54%.
Network Protocols and Algorithms | 2012
Mazen Kharbutli; Monther Aldwairi; Abdullah Mughrabi
The safeguarding of networks from malicious activities and intrusions continues to be one of the most important aspects in network security. Intrusion Detection Systems (IDSs) play a fundamental role in network protection. Unfortunately, the speeds of existing IDSs are unable to keep up with the rapid increases in network speeds and attack complexities. Fortunately, parallel computing on multi-core systems can lend a helping hand mitigating this performance gap. In this paper, novel and effective parallel implementations of the Wu-Manber (WM) algorithm for signature-based detection systems are proposed, implemented, and evaluated. The proposed function and data parallel algorithms prove to be effective in terms of execution time reduction and load balancing, thus providing swift intrusion detection at increased network bandwidths. The algorithms achieve an optimal load balance and an average speedup of 2x for four cores.
ieee jordan conference on applied electrical engineering and computing technologies | 2013
Mazen Kharbutli; Moath Jarrah; Yaser Jararweh
The design of an effective last-level cache (LLC) is crucial to the overall processor performance and, consequently, continues to be the center of substantial research. Unfortunately, LLCs in modern high-performance processors are not used efficiently. One major problem suffered by LLCs is their low hit rates caused by the large fraction of cache blocks that do not get re-accessed after being brought into the LLC following a cache miss. These blocks do not contribute any cache hits and usually induce cache pollution and thrashing. Cache bypassing presents an effective solution to this problem. Cache blocks that are predicted not to be accessed while residing in the cache are not inserted into the LLC following a miss, instead they bypass the LLC and are only inserted in the higher cache levels. This paper presents a simple, low-hardware overhead, yet effective, cache bypassing algorithm that dynamically chooses which blocks to insert into the LLC and which to bypass it following a miss based on past access/bypass patterns. Our proposed algorithm is thoroughly evaluated using a detailed simulation environment where its effectiveness, performance-improvement capabilities, and robustness are demonstrated. Moreover, it is shown to outperform the state-of-the-art cache bypassing algorithm in both a uniprocessor and a multi-core processor settings.
Procedia Computer Science | 2013
Sahel Alouneh; Mazen Kharbutli; Rana AlQurem
Abstract With software systems continuously growing in size and complexity, the number and variety of security vulnerabilities in those systems is increasing in an alarming rate. Vulnerabilities in the programs stack are commonly exploited by attackers in the form of stack-based attacks. In this paper, a software based solution for stack-based vulnerabilities and attacks is proposed and implemented. The proposed solution involves creating a new patch tool that fixes a wide-range of stack related vulnerabilities in the existing applications. The basic idea of our approach is to implement a patch tool that makes multiple copies of the return addresses in the stack, and then randomizes the location of all copies in addition to their number. All duplicate copies are updated and checked in parallel such that any mismatch between any of these copies would indicate a possible attack attempt and would trigger an exception. The results of our implementation show high protection against integer overflow and buffer overflow attacks.