Talal Bonny
University of Sharjah
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Talal Bonny.
design, automation, and test in europe | 2007
Talal Bonny; Joerg Henkel
Code density is a major requirement in embedded system design since it not only reduces the need for the scarce resource memory but also implicitly improves further important design parameters like power consumption and performance. Within this paper we introduce a novel and efficient hardware-supported approach that belongs to the group of statistical compression schemes as it is based on canonical Huffman coding. In particular, our scheme is the first to also compress the necessary Look-up Tables that can become significant in size if the application is large and/or high compression is desired. Our scheme optimizes the number of generated look-up tables to improve the compression ratio. In average, we achieve compression ratios as low as 49% (already including the overhead of the lookup tables). Thereby, our scheme is entirely orthogonal to approaches that take particularities of a certain instruction set architecture into account. We have conducted evaluations using a representative set of applications and have applied it to three major embedded processor architectures, namely ARM, MIPS and PowerPC
great lakes symposium on vlsi | 2011
M. Affan Zidan; Talal Bonny; Khaled N. Salama
Many database applications, such as sequence comparing, sequence searching, and sequence matching, etc, process large database sequences. we introduce a novel and efficient technique to improve the performance of database applications by using a Hybrid GPU/CPU platform. In particular, our technique solves the problem of the low efficiency resulting from running short-length sequences in a database on a GPU. To verify our technique, we applied it to the widely used Smith-Waterman algorithm. The experimental results show that our Hybrid GPU/CPU technique improves the average performance by a factor of 2.2, and improves the peak performance by a factor of 2.8 when compared to earlier implementations.
cairo international biomedical engineering conference | 2010
Talal Bonny; M. Affan Zidan; Khaled N. Salama
Sequence alignment algorithms such as the Smith-Waterman algorithm are among the most important applications in the development of bioinformatics. Sequence alignment algorithms must process large amounts of data which may take a long time. Here, we introduce our Adaptive Hybrid Multiprocessor technique to accelerate the implementation of the Smith-Waterman algorithm. Our technique utilizes both the graphics processing unit (GPU) and the central processing unit (CPU). It adapts to the implementation according to the number of CPUs given as input by efficiently distributing the workload between the processing units. Using existing resources (GPU and CPU) in an efficient way is a novel approach. The peak performance achieved for the platforms GPU + CPU, GPU + 2CPUs, and GPU + 3CPUs is 10.4 GCUPS, 13.7 GCUPS, and 18.6 GCUPS, respectively (with the query length of 511 amino acid).
great lakes symposium on vlsi | 2006
Talal Bonny; Joerg Henkel
The presented work uses code compression to improve the design efficiency of an embedded system. In particular, we present a method and architecture for compressing the so-called Look-up Tables that are necessary for the de-compression process. No other work has yet focused on minimizing the Look-up Tables that, as we show, have a significant impact on the total overhead of a hardware-based decompression scheme. We introduce a novel and very efficient hardware-supported approach based on Canonical Huffman Coding. Using the Lin-Kernighan algorithm we reduce the Look-up Table size by up to 45%. As a result, we achieve all-over compression ratios as low as 45% (already including the overhead of the Look-up Tables). Thereby, our scheme is entirely orthogonal to approaches that take particularities of a certain instruction set architecture into account, meaning that compression could be further improved. Factoring in the orthogonality, our scheme is the basis for not-yet-achieved efficiency in hardware-supported compression schemes. We have conducted evaluations using a representative set (in terms of size and application domain) of applications and have applied it to three major embedded processor architectures, namely ARM, MIPS and PowerPC. The hardware evaluation shows no performance penalty.
design, automation, and test in europe | 2008
Talal Bonny; Jörg Henkel
Reducing the code size of embedded applications is one of the important constraint in embedded system design. Code compression can provide substantial savings in terms of size. In this paper, we introduce a novel and efficient hardware-supported approach. Our approach investigates the benefits of re-encoding the unused bits (we call them re-encodable bits) in the instruction format for a specific application to improve the compression ratio. Re-encoding those bits may reduce the size of decoding table by more than 37%. We achieve compression ratios as low as 44% (including all overhead that incurs). We have conducted evaluations using a representative set of applications and have applied it to two major embedded processors, namely MIPS and ARM.
IEEE Transactions on Very Large Scale Integration Systems | 2008
Talal Bonny; Jörg Henkel
Code density is of increasing concern in embedded system design since it reduces the need for the scarce resource memory and also implicitly improves further important design parameters like power consumption and performance. In this paper we introduce a novel, hardware-supported approach. Besides the code, also the lookup tables (LUTs) are compressed, that can become significant in size if the application is large and/or high compression is desired. Our scheme optimizes the number and size of generated LUTs to improve the compression ratio. To show the efficiency of our approach, we apply it to two compression schemes: ldquodictionary-basedrdquo and ldquostatisticalrdquo. We achieve an average compression ratio of 48% (already including the overhead of the LUTs). Thereby, our scheme is orthogonal to approaches that take particularities of a certain instruction set architecture into account. We have conducted evaluations using a representative set of applications and have applied it to three major embedded processor architectures, namely ARM, MIPS, and PowerPC.
design automation conference | 2007
Talal Bonny; Jörg Henkel
The size of embedded software is rising at a rapid pace. It is often challenging and time consuming to fit an amount of required software functionality within a given hardware resource budget. Code compression is a means to alleviate the problem. In this paper we introduce a novel and efficient hardware-supported approach. Our scheme reduces the size of the generated decoding table by splitting instructions into portions of varying size (called patterns) before Huffman coding compression is applied. It improves the final compression ratio (including all overhead that incurs) by more than 20% compared to known schemes based on Huffman coding. We achieve allover compression ratios as low as 44%. Thereby, our scheme is orthogonal to approaches that take particularities of a certain instruction set architectures into account. We have conducted evaluations using a representative set of applications and have applied it to two major embedded processors, namely ARM and MIPS.
Journal of Applied Physics | 2016
Anis Allagui; Andrea Espinel Rojas; Talal Bonny; Ahmed S. Elwakil; Mohammad Ali Abdelkareem
In the standard two-electrode configuration employed in electrolytic process, when the control dc voltage is brought to a critical value, the system undergoes a transition from conventional electrolysis to contact glow discharge electrolysis (CGDE), which has also been referred to as liquid-submerged micro-plasma, glow discharge plasma electrolysis, electrode effect, electrolytic plasma, etc. The light-emitting process is associated with the development of an irregular and erratic current time-series which has been arbitrarily labelled as “random,” and thus dissuaded further research in this direction. Here, we examine the current time-series signals measured in cathodic CGDE configuration in a concentrated KOH solution at different dc bias voltages greater than the critical voltage. We show that the signals are, in fact, not random according to the NIST SP. 800-22 test suite definition. We also demonstrate that post-processing low-pass filtered sequences requires less time than the native as-measured seq...
ACM Transactions on Design Automation of Electronic Systems | 2010
Talal Bonny; Jörg Henkel
The size of embedded software is increasing at a rapid pace. It is often challenging and time consuming to fit an amount of required software functionality within a given hardware resource budget. Code compression is a means to alleviate the problem by providing substantial savings in terms of code size. In this article we introduce a novel and efficient hardware-supported compression technique that is based on Huffman Coding. Our technique reduces the size of the generated decoding table, which takes a large portion of the memory. It combines our previous techniques, Instruction Splitting Technique and Instruction Re-encoding Technique into new one called Combined Compression Technique to improve the final compression ratio by taking advantage of both previous techniques. The instruction Splitting Technique is instruction set architecture (ISA)-independent. It splits the instructions into portions of varying size (called patterns) before Huffman coding is applied. This technique improves the final compression ratio by more than 20% compared to other known schemes based on Huffman Coding. The average compression ratios achieved using this technique are 48% and 50% for ARM and MIPS, respectively. The Instruction Re-encoding Technique is ISA-dependent. It investigates the benefits of reencoding unused bits (we call them reencodable bits) in the instruction format for a specific application to improve the compression ratio. Reencoding those bits can reduce the size of decoding tables by up to 40%. Using this technique, we improve the final compression ratios in comparison to the first technique to 46% and 45% for ARM and MIPS, respectively (including all overhead that incurs). The Combined Compression Technique improves the compression ratio to 45% and 42% for ARM and MIPS, respectively. In our compression technique, we have conducted evaluations using a representative set of applications and we have applied each technique to two major embedded processor architectures, namely ARM and MIPS.
design automation conference | 2009
Talal Bonny; Jörg Henkel
Compressing program code compiled for VLIW processors to reduce the amount of memory is a necessary means to decrease costs. The main disadvantage of any code compression technique is the system performance penalty because of the extra time required to decode the compressed instructions during run time. In this paper we improve the performance of decoding compressed instructions by using our novel compression technique (LICT: left-uncompressed instruction technique) which can be used in conjunction with any compression algorithm. Furthermore, we adapt a new code compression approach called Burrows-Wheeler (BW) which has been used before in data compression. It significantly reduces the code size compared to state-of-the-art approaches for VLIW processors. Using our LICT in conjunction with the BW algorithm improves the performance explicitly (2.5times) with little impact on the compression ratio (only 3% compression ratio loss).