Sherief Reda | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sherief Reda is active.

Explore More

Publication

Featured researches published by Sherief Reda.

international symposium on microarchitecture | 2011

Pack & Cap: adaptive DVFS and thread packing under power caps

Ryan Cochran; Can Hankendi; Ayse Kivilcim Coskun; Sherief Reda

The ability to cap peak power consumption is a desirable feature in modern data centers for energy budgeting, cost management, and efficient power delivery. Dynamic voltage and frequency scaling (DVFS) is a traditional control knob in the tradeoff between server power and performance. Multi-core processors and the parallel applications that take advantage of them introduce new possibilities for control, wherein workload threads are packed onto a variable number of cores and idle cores enter low-power sleep states. This paper proposes Pack & Cap, a control technique designed to make optimal DVFS and thread packing control decisions in order to maximize performance within a power budget. In order to capture the workload dependence of the performance-power Pareto frontier, a multinomial logistic regression (MLR) classifier is built using a large volume of performance counter, temperature, and power characterization data. When queried during runtime, the classifier is capable of accurately selecting the optimal operating point. We implement and validate this method on a real quad-core system running the PARSEC parallel benchmark suite. When varying the power budget during runtime, Pack & Cap meets power constraints 82% of the time even in the absence of a power measuring device. The addition of thread packing to DVFS as a control knob increases the range of feasible power constraints by an average of 21% when compared to DVFS alone and reduces workload energy consumption by an average of 51.6% compared to existing control techniques that achieve the same power range.

design automation conference | 2005

Power-aware placement

Yongseok Cheon; Pei-Hsin Ho; Andrew B. Kahng; Sherief Reda; Qinke Wang

Lowering power is one of the greatest challenges facing the IC industry today. We present a power-aware placement method that simultaneously performs (1) activity-based register clustering that reduces clock power by placing registers in the same leaf cluster of the clock trees in a smaller area and (2) activity-based net weighting that reduces net switching power by assigning a combination of activity and timing weights to the nets with higher switching rates or more critical timing. The method applies to designs with multiple clocks and gated clocks. We implemented the method and obtained experimental results on 8 real-world designs after placement, routing, extraction and analysis. The power-aware placement method achieved on average 25.3% and 11.4% reduction in net switching power and total power respectively with 2.0% timing, 1.2% cell area and 11.5% runtime impact. This method has been incorporated into a commercial physical design tool.

IEEE Transactions on Very Large Scale Integration Systems | 2009

Maximizing the Functional Yield of Wafer-to-Wafer 3-D Integration

Sherief Reda; Gregory Smith; Larry Smith

Three-dimensional integrated circuit technology with through-silicon vias offers many advantages, including improved form factor, increased circuit performance, robust heterogenous integration, and reduced costs. Wafer-to-wafer integration supports the highest possible density of through-silicon vias and highest throughput; however, in contrast to die-to-wafer integration, it does not benefit from the ability to bond only tested and diced good die. In wafer-to-wafer integration, wafers are entirely bonded together, which can unintentionally integrate a bad die from one wafer to a good die from another wafer reducing the yield. In this paper, we propose solutions that maximize the yield of wafer-to-wafer 3-D integration, assuming that the individual die can be tested on the wafers before bonding. We exploit some of the available flexibility in the integration process, and propose wafer assignment algorithms that maximize the number of good 3-D ICs. Our algorithms range from scalable, fast heuristics to optimal methods that exactly maximize the yield of wafer-to-wafer 3-D integration. Using realistic defect models and yield simulations, we demonstrate the effectiveness of our methods up to large numbers of wafer stacks. Our results demonstrate that it is possible to significantly improve the yield in comparison to yield-oblivious wafer assignment methods.

international symposium on physical design | 2005

A semi-persistent clustering technique for VLSI circuit placement

Charles J. Alpert; Andrew B. Kahng; Gi-Joon Nam; Sherief Reda; Paul G. Villarrubia

Placement is a critical component of todays physical synthesis flow with tremendous impact on the final performance of VLSI designs. However, it accounts for a significant portion of the over-all physical synthesis runtime. With complexity and netlist size of todays VLSI design growing rapidly, clustering for placement can provide an attractive solution to manage affordable placement runtime. Such clustering, however, has to be carefully devised to avoid any adverse impact on the final placement solution quality. In this paper we present a new bottom-up clustering technique, called best-choice, targeted for large-scale placement problems. Our best-choice clustering technique operates directly on a circuit hypergraph and repeatedly clusters the globally best pair of objects. Clustering score manipulation using a priority-queue data structure enables us to identify the best pair of objects whenever clustering is performed. To improve the runtime of priority-queue-based best-choice clustering, we propose a lazy-update technique for faster updates of clustering score with almost no loss of solution quality. We also discuss a number of effective methods for clustering score calculation, balancing cluster sizes, and handling of fixed blocks. The effectiveness of our best-choice clustering methodology is demonstrated by extensive comparisons against other standard clustering techniques such as Edge-Coarsening [12] and First-Choice [13]. All clustering methods are implemented within an industrial placer CPLACE [1] and tested on several industrial benchmarks in a semi-persistent clustering context.

design, automation, and test in europe | 2014

ABACUS: A technique for automated behavioral synthesis of approximate computing circuits

Kumud Nepal; Yueting Li; R. Iris Bahar; Sherief Reda

Many classes of applications, especially in the domains of signal and image processing, computer graphics, computer vision, and machine learning, are inherently tolerant to inaccuracies in their underlying computations. This tolerance can be exploited to design approximate circuits that perform within acceptable accuracies but have much lower power consumption and smaller area footprints (and often better run times) than their exact counterparts. In this paper, we propose a new class of automated synthesis methods for generating approximate circuits directly from behavioral-level descriptions. In contrast to previous methods that operate at the Boolean level or use custom modifications, our automated behavioral synthesis method enables a wider range of possible approximations and can operate on arbitrary designs. Our method first creates an abstract synthesis tree (AST) from the input behavioral description, and then applies variant operators to the AST using an iterative stochastic greedy approach to identify the optimal inexact designs in an efficient way. Our method is able to identify the optimal designs that represent the Pareto frontier trade-off between accuracy and power consumption. Our methodology is developed into a tool we call ABACUS, which we integrate with a standard ASIC experimental flow based on industrial tools. We validate our methods on three realistic Verilog-based benchmarks from three different domains - signal processing, computer vision and machine learning. Our tool automatically discovers optimal designs, providing area and power savings of up to 50% while maintaining good accuracy.

design automation conference | 2010

Consistent runtime thermal prediction and control through workload phase detection

Ryan Cochran; Sherief Reda

Elevated temperatures impact the performance, power consumption, and reliability of processors, which rely on integrated thermal sensors to measure runtime thermal behavior. These thermal measurements are typically inputs to a dynamic thermal management system that controls the operating parameters of the processor and cooling system. The ability to predict future thermal behavior allows a thermal management system to optimize a processors operation so as to prevent the on-set of high temperatures. In this paper we propose a new thermal prediction method that leads to consistent results between the thermal models used in prediction and observed thermal sensor measurements, and is capable of accurately predicting temperature behavior with heterogenous workload assignment on a multicore platform. We devise an off-line analysis algorithm that learns a set of thermal models as a function of operating frequency and globally defined workload phases. We incorporate these thermal models into a dynamic voltage and frequency scaling (DVFS) technique that limits the maximum temperature during runtime. We demonstrate the effectiveness of our proposed system in predicting the thermal behavior of a real quad-core processor in response to different workloads. In comparison to a reactive thermal management technique, our predictive method dramatically reduces the number of thermal violations, the magnitude of thermal cycles, and workload runtimes.

design automation conference | 2010

Thermal monitoring of real processors: techniques for sensor allocation and full characterization

Abdullah Nazma Nowroz; Ryan Cochran; Sherief Reda

The increased power densities of multi-core processors and the variations within and across workloads lead to runtime thermal hot spots locations of which change across time and space. Thermal hot spots increase leakage, deteriorate timing, and reduce the mean time to failure. To manage runtime thermal variations, circuit designers embed within-die thermal sensors that acquire temperatures at few selected locations. The acquired temperatures are then used to guide runtime thermal management techniques. The capabilities of these techniques are essentially bounded by the spatial thermal resolution of the sensor measurements. In this paper we characterize temperature signals of real processors and demonstrate that on-chip thermal gradients lead to sparse signals in the frequency domain. We exploit this observation to (1) devise thermal sensor allocation techniques, and (2) devise signal reconstruction techniques that fully characterize the thermal status of the processor using the limited number of measurements from the thermal sensors. To verify the accuracy of our methods, we compare our temperature characterization results against thermal measurements acquired from a state-of-the-art infrared camera that captures the mid-band infrared emissions from the back of the die of a 45 nm dual-core processor. Our results show that our techniques are capable of accurately characterizing the temperatures of real processors.

international symposium on physical design | 2005

APlace: a general analytic placement framework

Andrew B. Kahng; Sherief Reda; Qinke Wang

We streamline and extend APlace, the general analytic placement engine based on ideas of Naylor et al. [7] and described in [3, 4, 5]. Previous work explored the adaptability of APlace to multiple contexts with good quality of results. For example, the framework was extended to traditional wirelength-driven standard-cell placement in [3, 5], achieving good results in placed HPWL and routed final wire-length. The framework was also extended to top-down multilevel placement, congestion-directed placement, mixed-size placement, timing-driven placement, I/O-core co-placement and constraint handling for mixed-signal contexts [3, 4, 5]. In this work, we have modified the implementation of APlace for speed and scalability. Improvements have been made in clustering, legalization and detailed placement strategies, as well as via a distributable solution framework for both global and detailed placement phases.

design automation conference | 2009

Spectral techniques for high-resolution thermal characterization with limited sensor data

Ryan Cochran; Sherief Reda

Elevated chip temperatures are true limiters to the scalability of computing systems. Excessive runtime thermal variations compromise the performance and reliability of integrated circuits. To address these thermal issues, state of the art chips have integrated thermal sensors that monitor temperatures at a few selected die locations. These temperature measurements are then used by thermal management techniques to appropriately manage chip performance. Thermal sensors and their support circuitry incur design overheads, die area, and manufacturing costs. In this paper, we propose a new direction for full thermal characterization of integrated circuits based on spectral Fourier analysis techniques. Application of these techniques to temperature sensing is based on the observation that die temperature is simply a space varying signal, and that space varying signals are treated identically to time varying signals in signal analysis. We utilize Nyquist-Shannon sampling theory to devise methods that can almost fully reconstruct the thermal status of an integrated circuit during runtime using a minimal number of thermal sensors. We propose methods that can handle uniform and non uniform thermal sensor placements. We develop an extensive experimental setup and demonstrate the effectiveness of our methods by thermally characterizing a 16-core processor. Our method produces full thermal characterization with an average absolute error of 0.6% using a limited number of sensors.

design, automation, and test in europe | 2009

Analyzing the impact of process variations on parametric measurements: novel models and applications

Sherief Reda; Sani R. Nassif

In this paper we propose a novel statistical framework to model the impact of process variations on semiconductor circuits through the use of process sensitive test structures. Based on multivariate statistical assumptions, we propose the use of the expectation-maximization algorithm to estimate any missing test measurements and to calculate accurately the statistical parameters of the underlying multivariate distribution. We also propose novel techniques to validate our statistical assumptions and to identify any outliers in the measurements. Using the proposed model, we analyze the impact of the systematic and random sources of process variations to reveal their spatial structures. We utilize the proposed model to develop a novel application that significantly reduces the volume, time, and costs of the parametric test measurements procedure without compromising its accuracy. We extensively verify our models and results on measurements collected from more than 300 wafers and over 25 thousand die fabricated at a state-of-the-art facility. We prove the accuracy of our proposed statistical model and demonstrate its applicability towards reducing the volume and time of parametric test measurements by about 2.5 - 6.1times at absolutely no impact to test quality.

Explore More