Amr G. Wassal
Cairo University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Amr G. Wassal.
international symposium on circuits and systems | 2012
Eman El Mandouh; Amr G. Wassal
This paper studies the problem of automatic assertion extraction from simulation traces. Previous approaches to the assertion generation problem have focused on a single aspect of automatic assertion extractions, and have yielded often unfavorable results. We propose a framework that combines searching for known assertion via templates with frequent and sequential patterns mining, while constraining the search by some knowledge about the design. These constraints can be automatically extracted using static analysis methods from the Register Transfer Level (RTL) description of the design, or as a user input to the assertion detector. Our experimental results show that this approach helps in the detection of assertion patterns that are typically common and widely used in todays RTL designs.
acm sigplan symposium on principles and practice of parallel programming | 2015
Mahmoud Khairy; Mohamed Zahran; Amr G. Wassal
Recent GPUs are equipped with general-purpose L1 and L2 caches in an attempt to reduce memory bandwidth demand and improve the performance of some irregular GPGPU applications. However, due to the massive multithreading, GPGPU caches suffer from severe resource contention and low data-sharing which may degrade the performance instead. In this work, we propose three techniques to efficiently utilize and improve the performance of GPGPU caches. The first technique aims to dynamically detect and bypass memory accesses that show streaming behavior. In the second technique, we propose dynamic warp throttling via cores sampling (DWT-CS) to alleviate cache thrashing by throttling the number of active warps per core. DWT-CS monitors the MPKI at L1, when it exceeds a specific threshold, all GPU cores are sampled with different number of active warps to find the optimal number of warps that mitigates thrashing and achieves the highest performance. Our proposed third technique addresses the problem of GPU cache associativity since many GPGPU applications suffer from severe associativity stalls and conflict misses. Prior work proposed cache bypassing on associativity stalls. In this work, instead of bypassing, we employ a better cache indexing function, Pseudo Random Interleaving Cache (PRIC), that is based on polynomial modulus mapping, in order to fairly and evenly distribute memory accesses over cache sets. The proposed techniques improve the average performance of streaming and contention applications by 1.2X and 2.3X respectively. Compared to prior work, it achieves 1.7X and 1.5X performance improvement over Cache-Conscious Wavefront Scheduler and Memory Request Prioritization Buffer respectively.
international symposium on circuits and systems | 2011
Amr G. Wassal; Ahmed R. Elsherif
Cell search has to be performed by the user equipment in 3GPP Long Term Evolution (LTE) systems to obtain cell identity before connecting to the cell. The cell identity is transmitted using the Primary Synchronization Symbol (PSS) and the Secondary Synchronization Symbol (SSS). The PSS carries one of three possible values for the sector identity whereas the SSS carries one of 168 possible values for the physical cell identity. Straightforward implementation of SSS detection incurs a large area due to the large number of required correlators. This paper presents a complete design and hardware implementation for the detection of the SSS. The fixed point representation of the proposed system design is optimized to use the smallest word lengths at various system nodes while maintaining the overall performance within acceptable limits. Different architectures are studied, compared and an all-serial architecture is chosen to minimize the design area and power consumption, while achieving an acceptable acquisition time.
BioSystems | 2015
Nesma ElKalaawy; Amr G. Wassal
Biochemical networks depict the chemical interactions that take place among elements of living cells. They aim to elucidate how cellular behavior and functional properties of the cell emerge from the relationships between its components, i.e. molecules. Biochemical networks are largely characterized by dynamic behavior, and exhibit high degrees of complexity. Hence, the interest in such networks is growing and they have been the target of several recent modeling efforts. Signal transduction pathways (STPs) constitute a class of biochemical networks that receive, process, and respond to stimuli from the environment, as well as stimuli that are internal to the organism. An STP consists of a chain of intracellular signaling processes that ultimately result in generating different cellular responses. This primer presents the methodologies used for the modeling and simulation of biochemical networks, illustrated for STPs. These methodologies range from qualitative to quantitative, and include structural as well as dynamic analysis techniques. We describe the different methodologies, outline their underlying assumptions, and provide an assessment of their advantages and disadvantages. Moreover, publicly and/or commercially available implementations of these methodologies are listed as appropriate. In particular, this primer aims to provide a clear introduction and comprehensive coverage of biochemical modeling and simulation methodologies for the non-expert, with specific focus on relevant literature of STPs.
custom integrated circuits conference | 2011
Ayman Elsayed; Ahmed Elshennawy; Ahmed Elmallah; Ahmed Shaban; Botros George; Mostafa Elmala; Ayman Ismail; Amr G. Wassal; Mostafa M. Sakr; Ahmed Mokhtar; M. Hafez; A. Hamed; M. Saeed; M. Samir; M. Hammad; M. Elkhouly; A. Kamal; M. Rabieah; A. Elghufaili; S. Shaibani; I. Hakami; T. Alanazi
An interface for MEMS gyroscope is implemented in 0.18µm HVCMOS technology, and achieves a low noise floor of 1m°/sec/√Hz over 200Hz BW. Electromechanical ΣΔ force-feedback and self-clocking scheme based on gyro resonance are implemented. The interface includes on-chip reference generation, decimation, and temperature compensation.
Photomask Technology 2012 | 2012
Amr G. Wassal; Heba Sharaf; Sherif Hammouda
To continue scaling the circuit features down, Double Patterning (DP) technology is needed in 22nm technologies and lower. DP requires decomposing the layout features into two masks for pitch relaxation, such that the spacing between any two features on each mask is greater than the minimum allowed mask spacing. The relaxed pitches of each mask are then processed on two separate exposure steps. In many cases, post-layout decomposition fails to decompose the layout into two masks due to the presence of conflicts. Post-layout decomposition of a standard cells block can result in native conflicts inside the cells (internal conflict), or native conflicts on the boundary between two cells (boundary conflict). Resolving native conflicts requires a redesign and/or multiple iterations for the placement and routing phases to get a clean decomposition. Therefore, DP compliance must be considered in earlier phases, before getting the final placed cell block. The main focus of this paper is generating a library of decomposed standard cells to be used in a DP-aware placer. This library should contain all possible decompositions for each standard cell, i.e., these decompositions consider all possible combinations of boundary conditions. However, the large number of combinations of boundary conditions for each standard cell will significantly increase the processing time and effort required to obtain all possible decompositions. Therefore, an efficient methodology is required to reduce this large number of combinations. In this paper, three different reduction methodologies are proposed to reduce the number of different combinations processed to get the decomposed library. Experimental results show a significant reduction in the number of combinations and decompositions needed for the library processing. To generate and verify the proposed flow and methodologies, a prototype for a placement-aware DP-ready cell-library is developed with an optimized number of cell views.
IEEE Transactions on Nuclear Science | 2011
Ahmed A. Abou-Auf; Hamzah A. Abdel-Aziz; Amr G. Wassal
We developed a cell-level fault model for logic failure induced in standard-cell ASIC devices exposed to total ionizing dose. This fault model is valid for CMOS process technologies that exhibit field-oxide leakage current under total dose. The fault model was represented at the cell level using hardware descriptive languages (HDL) such as VHDL or Verilog which consequently allowed for cell-level simulation of ASIC devices under total dose using functional simulation tools normally used within the HDL design flow of ASIC devices. We then developed a methodology to identify worst-case test vectors (WCTV) using commercially available automatic test pattern generation (ATPG) tools targeting the developed fault model. Finally, we experimentally validated the significance of using WCTV in total-dose testing of CMOS ASIC devices.
IEEE Transactions on Nuclear Science | 2010
Ahmed A. Abou-Auf; Hamzah A. Abdel-Aziz; Mostafa M. Abdel-Aziz; Amr G. Wassal; T A Abdul-Rahman
We developed a cell-level fault model for leakage current failure of standard-cell ASIC devices exposed to total ionizing dose. This fault model is valid for CMOS process technologies that exhibit field-oxide leakage current under total dose. The fault model was represented using hardware descriptive languages which consequently allowed for cell-level simulation of ASIC devices under total dose using functional simulation tools normally used during the design flow of ASIC devices. However, the identification of worst-case test vectors using those tools using automatic test pattern generation (ATPG) tools targeting the fault model developed. This can lead to prohibitively long search time for WCTV in large ASIC devices. We developed an innovative search method based on genetic algorithms (GA) which made possible the identification of WCTV for large ASIC devices in very short time. Finally, we experimentally validated the significance of WCTV in total dose testing of ASIC devices.
international conference on electronics, circuits, and systems | 2013
Mohamed A. E. Mahmoud; Amr G. Wassal; Alaa El-Rouby; Rafik Guindi
Timing verification is an essential process in nanometer design. Therefore, static timing analysis (STA) is currently the main aspect of performance verification. Traditional STA is based on lookup tables with input slew and output load capacitance. It is becoming insufficient to accurately characterize many significant aspects of the conventional cell delays models, such as: the process variations, nonlinear waveforms, nonlinear loads, and multiple inputs switching (MIS). Therefore, the current trend in modern designs is to use current source based models (CSM), which model MOSFETs as a transconductance. This paper proposes a CSM for combinational logic cells which can accommodate single input switching (SIS) signals. It can also handle where small capacitances are connected at the gate output, while fast ramp signals are applied to the gate input. When compared with ELDO, the proposed model produces more accurate stage delay than that obtained from the standard cell lookup tables.
SPIE Photomask Technology | 2012
Yasmine A. Badr; Amr G. Wassal; Sherif Hammouda
Double Patterning (DP) is still the most viable lithography option for sub-22nm nodes. The two main types of DP are Litho Etch Litho Etch (LELE) and Self-Aligned Double Patterning (SADP). Of those two, SADP has the advantage of lower sensitivity to overlay error. However SADP imposes a lot of restrictions on the layout. One of the ways to do SADP decomposition is to use an LELE decomposer while prohibiting stitches, and to generate mandrel and trim masks from LELE masks using some Boolean characterization equations. In this paper, we propose an SADP decomposer based on an LELE decomposer that is used to decide which target polygons are mandrel and which are non-mandrel. However the core of the LELE decomposer has been made SADP-aware, such that it gives less priority to pairs of polygons separated by spacing values that are prohibited by SADP. Then, a mandrel and trim masks generator uses the LELE decomposer output and produces the final mandrel and trim masks. Experimental results show that adding SADPawareness to the core of the decomposer has decreased the average number of coloring conflicts by 38%. The proposed decomposer is faster than the previous SADP decomposition approaches that use Integer Linear Programming (ILP) and Satisfiability (SAT).