Mridha Mohammad Waliullah
Chalmers University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mridha Mohammad Waliullah.
international parallel and distributed processing symposium | 2008
Mridha Mohammad Waliullah; Per Stenström
Transactional memory systems promise to reduce the burden of exposing thread-level parallelism in programs by relieving programmers from analyzing complex inter-thread dependences in detail. By encapsulating large program code blocks and executing them as atomic blocks, dependence checking is deferred to run-time at which point one of many conflicting transactions will be committed whereas the others will have to roll-back and re-execute. In current proposals, a checkpoint is taken at the beginning of the atomic block and all execution can be wasted even if the conflicting access happens at the end of the atomic block In this paper, we propose a novel scheme that (1) predicts when the first conflicting access occurs and (2) inserts a checkpoint before it is executed. When the prediction is correct, the only execution discarded is the one that has to be re-done. When the prediction is incorrect, the whole transaction has to be re-executed just as before. Overall, we find that our scheme manages to maintain high prediction accuracy and leads to a quite significant reduction in the number of lost cycles due to roll-backs; the geometric mean speedup across five applications is 16%.
international conference on embedded computer systems: architectures, modeling, and simulation | 2010
Anurag Negi; Mridha Mohammad Waliullah; Per Stenström
Transactional memory (TM) promises to unlock parallelism in software in a safer and easier way than lock-based approaches but the path to deployment is unclear for several reasons. First of all, since TM has not been deployed in any machine yet, experience of using it is limited. While software transactional memory implementations exist, they are too slow to provide useful experience. Existing hardware transactional memory implementations, on the other hand, can provide the efficiency required but they require a significant effort to integrate in cache coherence infrastructures or freeze critical policy parameters. This paper proposes the LV∗ (lazy versioning and eager/lazy conflict resolution) class of hardware transactional memory protocols. This class of protocols has been implemented with ease of deployment in mind. LV∗ can be integrated with low additional complexity in standard snoopy-cache MESI-protocols and can be accommodated in a directory-based cache coherence infrastructure. Since the optimal conflict resolution policy (lazy or eager) depends on transactional characteristics of workloads, LV∗ supports a set of conflict resolution policies that range from LazEr — a family of Lazy versioning Eager conflict resolution protocols — to LL-MESI which provides lazy resolution. We show that LV∗ can be hosted in a MESI protocol through straightforward extensions and that the flexibility in the choice of conflict resolution strategy has a significant impact on performance.
symposium on computer architecture and high performance computing | 2011
Mridha Mohammad Waliullah; Per Stenström
This paper analyzes the sources of performance losses in hardware transactional memory and investigates techniques to reduce the losses. It dissects the root causes of data conflicts in hardware transactional memory systems (HTM) into four classes of conflicts: true sharing, false sharing, silent store, and write-write conflicts. These conflicts can cause performance and energy losses due to aborts and extra communication. To quantify losses, the paper first proposes the 5C cache-miss classification model that extends the well-established 4C model with a new class of cache misses known as contamination misses. The paper also contributes with two techniques for removal of data conflicts: One for removal of false sharing conflicts and another for removal of silent store conflicts. In addition, it revisits and adapts a technique that is able to reduce losses due to both true and false conflicts. All of the proposed techniques can be accommodated in a lazy versioning and lazy conflict resolution HTM built on top of a MESI cache-coherence infrastructure with quite modest extensions. Their ability to reduce performance is quantitatively established, individually as well as in combination. Performance is improved substantially.
international conference on embedded computer systems: architectures, modeling, and simulation | 2008
Mridha Mohammad Waliullah; Per Stenström
Transactional memory systems promise to simplify parallel programming by avoiding deadlock, livelock, and serialization problems through optimistic, concurrent execution of code segments that potentially can have data conflicts with each other. Data conflict detection in proposed hardware transactional memory systems is done by associating a read bit with each cache block that is set when a block is speculatively read. However, since the set of blocks that have been speculatively read - the read set - has to be maintained until the transaction commits, one can often not replace a block that has been speculatively read. This leads to poor utilization of the private caches in a multi-core system. We propose a new scheme for managing the read set in hardware transactional memory systems. The novel insight is that only the addresses of the speculatively read blocks are needed for conflict detection but not the data. As a result, there is an opportunity to reduce the space needed to keep track of speculatively read blocks by B/A, where B is the block size and A is the block address. Assuming that B is 32 bytes and A is 32 bits, there is an eightfold space saving due to this. This paper presents a novel design for leveraging this opportunity and evaluates a concept that uses a Bloom filter to hash the addresses of the read set into a structure. We find that the proposed scheme utilizes the private cache more efficiently in a typical system configuration.
european conference on parallel processing | 2007
Mridha Mohammad Waliullah; Per Stenström
Transactional memory systems trade ease of programming with runtime performance losses in handling transactions. This paper focuses on starvation effects that show up in systems where unordered transactions are committed on a demand-driven basis. Such simple commit arbitration policies are prone to starvation. The design issues for commit arbitration policies are analyzed and novel policies that reduce the amount of wasted computation due to roll-back and, most importantly, that avoid starvation are proposed. We analyze in detail how to incorporate them in a TCC-like transactional memory protocol. The proposed schemes have no impact on the common-case performance and add quite modest complexity to the baseline protocol.
International Journal of Parallel Programming | 2014
Mridha Mohammad Waliullah; Per Stenström
This paper analyzes the sources of performance losses in hardware transactional memory and investigates techniques to reduce the losses. It dissects the root causes of data conflicts in hardware transactional memory systems (HTM) into four classes of conflicts: true sharing, false sharing, silent store, and write-write conflicts. These conflicts can cause performance and energy losses due to aborts and extra communication. To quantify losses, the paper proposes the 5C cache-miss classification model that extends the well-established 4C model with a new class of cache misses known as contamination misses. The paper also contributes with two techniques for removal of data conflicts: One for removal of false sharing conflicts and another for removal of silent store conflicts. In addition, it revisits and adapts a technique that is able to reduce losses due to both true and false conflicts. All of the proposed techniques can be accommodated in a lazy versioning and lazy conflict resolution HTM built on top of a MESI cache-coherence infrastructure with quite modest extensions. Their ability to reduce performance is quantitatively established, individually as well as in combination. Performance and energy consumption are improved substantially.
high performance embedded architectures and compilers | 2011
Mridha Mohammad Waliullah
Transactional memory systems promise to reduce the burden of exposing thread-level parallelism in programs by relieving programmers from analyzing complex inter-thread dependences in detail. By encapsulating large program code blocks and executing them as atomic blocks, dependence checking is deferred to run-time. One of many conflicting transactions will then be committed whereas the others will have to roll-back and re-execute. In current proposals, a checkpoint is taken at the beginning of the atomic block and all execution can be wasted even if the conflicting access happens at the end of the atomic block. In this paper, we propose a novel scheme that (1) predicts when the first conflicting access occurs and (2) inserts a checkpoint before it is executed. When the prediction is correct, the only execution discarded is the one that has to be re-done. When the prediction is incorrect, the whole transaction has to be re-executed just as before. Overall, we find that our scheme manages to maintain high prediction accuracy and leads to a quite significant reduction in the number of lost cycles due to roll-backs; the geometric mean speedup across five applications is 16%.
Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies | 2010
Anurag Negi; Mridha Mohammad Waliullah; Per Stenström
Transactional memory (TM) promises to unlock parallelism in software in a safer and easier way than lock-based approaches but the path to deployment is unclear for several reasons. First of all, since TM has not been deployed in any machine yet, experience of using it is limited. While software transactional memory implementations exist, they are too slow to provide useful experience. Existing hardware transactional memory implementations, on the other hand, can provide the efficiency required but they require a significant effort to integrate in cache coherence infrastructures or freeze critical policy parameters. This paper proposes the LV* (lazy versioning and eager/lazy conflict resolution) class of hardware transactional memory protocols. This class of protocols has been implemented with ease of deployment in mind. LV* can be integrated with low additional complexity in standard snoopy-cache MESI-protocols and can be accommodated in a directory-based cache coherence infrastructure. Since the optimal conflict resolution policy (lazy or eager) depends on transactional characteristics of workloads, LV* supports a set of conflict resolution policies that range from LazEr -- a family of Lazy versioning Eager conflict resolution protocols -- to LL-MESI which provides lazy resolution. We show that LV* can be hosted in a MESI protocol through straightforward extensions and that the flexibility in the choice of conflict resolution strategy has a significant impact on performance.
european conference on parallel processing | 2009
Mridha Mohammad Waliullah; Per Stenström
2008 IEEE International Symposium on Parallel and Distributed Processing Systems | 2008
Mridha Mohammad Waliullah; Per Stenström