Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Wajahat Qadeer is active.

Publication


Featured researches published by Wajahat Qadeer.


international symposium on computer architecture | 2013

Convolution engine: balancing efficiency & flexibility in specialized computing

Wajahat Qadeer; Rehan Hameed; Ofer Shacham; Preethi Venkatesan; Christos Kozyrakis; Mark Horowitz

This paper focuses on the trade-off between flexibility and efficiency in specialized computing. We observe that specialized units achieve most of their efficiency gains by tuning data storage and compute structures and their connectivity to the data-flow and data-locality patterns in the kernels. Hence, by identifying key data-flow patterns used in a domain, we can create efficient engines that can be programmed and reused across a wide range of applications. We present an example, the Convolution Engine (CE), specialized for the convolution-like data-flow that is common in computational photography, image processing, and video processing applications. CE achieves energy efficiency by capturing data reuse patterns, eliminating data transfer overheads, and enabling a large number of operations per memory access. We quantify the tradeoffs in efficiency and flexibility and demonstrate that CE is within a factor of 2-3x of the energy and area efficiency of custom units optimized for a single kernel. CE improves energy and area efficiency by 8-15x over a SIMD engine for most applications.


IEEE Micro | 2010

Rethinking Digital Design: Why Design Must Change

Ofer Shacham; Omid Azizi; Megan Wachs; Wajahat Qadeer; Zain Asgar; Kyle Kelley; John P. Stevenson; Stephen Richardson; Mark Horowitz; Benjamin C. Lee; Alex Solomatnikov; Amin Firoozshahian

Because of technology scaling, power dissipation is todays major performance limiter. Moreover, the traditional way to achieve power efficiency, application-specific designs, is prohibitively expensive. These power and cost issues necessitate rethinking digital design. To reduce design costs, we need to stop building chip instances, and start making chip generators instead. Domain-specific chip generators are templates that codify designer knowledge and design trade-offs to create different application-optimized chips.


design automation conference | 2012

Avoiding game over: bringing design to the next level

Ofer Shacham; Megan Wachs; Andrew Danowitz; Sameh Galal; John S. Brunhaver; Wajahat Qadeer; Sabarish Sankaranarayanan; Artem Vassiliev; Stephen Richardson; Mark Horowitz

Technology scaling has created a catch-22: technology now can do almost anything we want, but the NRE design costs are so high, that almost no one can afford to use it. Our current situation is reminiscent of the 1980s, when only a few companies could afford to produce custom silicon. Synthesis and placement and routing tools changed this, by providing modular tools with well defined interfaces that codified designer knowledge about the physical design of chips. Now we need a new set of tools that can codify designer knowledge about how to construct software, hardware, and validation to again enable application designers to produce chips. Researchers are developing methodologies that allow users to create hardware constructors, or generators. These include Genesis 2, which extends SystemVerilog and enables the designer to encode hierarchical system construction procedu-rally. To demonstrate some of the capabilities that these languages and tools provide, we describe FPGen, a complete floating point generator written in Genesis 2, that also generates the needed validation collateral and hints for the backend processes.


Communications of The ACM | 2015

Convolution engine: balancing efficiency and flexibility in specialized computing

Wajahat Qadeer; Rehan Hameed; Ofer Shacham; Preethi Venkatesan; Christos Kozyrakis; Mark Horowitz

General-purpose processors, while tremendously versatile, pay a huge cost for their flexibility by wasting over 99% of the energy in programmability overheads. We observe that reducing this waste requires tuning data storage and compute structures and their connectivity to the data-flow and data-locality patterns in the algorithms. Hence, by backing off from full programmability and instead targeting key data-flow patterns used in a domain, we can create efficient engines that can be programmed and reused across a wide range of applications within that domain. We present the Convolution Engine (CE)---a programmable processor specialized for the convolution-like data-flow prevalent in computational photography, computer vision, and video processing. The CE achieves energy efficiency by capturing data-reuse patterns, eliminating data transfer overheads, and enabling a large number of operations per memory access. We demonstrate that the CE is within a factor of 2--3× of the energy and area efficiency of custom units optimized for a single kernel. The CE improves energy and area efficiency by 8--15× over data-parallel Single Instruction Multiple Data (SIMD) engines for most image processing applications.


conference on multimedia computing and networking | 2005

Managing heterogeneous wireless environments via Hotspot servers

Tajana Simunic; Wajahat Qadeer; Giovanni De Micheli

Wireless communication today supports heterogeneous wireless devices with a number of different wireless network interfaces (WNICs). A large fraction of communication is infrastructure based, so the wireless access points and hotspot servers have become more ubiquitous. Battery lifetime is still a critical issue, with WNICs typically consuming a large fraction of the overall power budget in a mobile device. In this work we present a new technique for managing power consumption and QoS in diverse wireless environments using Hotspot servers. We introduce a resource manager module at both Hotspot server and the client. Resource manager schedules communication bursts between it and each client. The schedulers decide what WNIC to employ for communication, when to communicate data and how to minimize power dissipation while maintaining an acceptable QoS based on the application needs. We present two new scheduling policies derived from well known earliest deadline first (EDF) and rate monotonic (RM) [26] algorithms. The resource manager and the schedulers have been implemented in the HPs Hotspot server [14]. Our measurement and simulation results show a significant improvement in power dissipation and QoS of Bluetooth and 802.11b for applications such as MP3, MPEG4, WWW, and email.


international symposium on microarchitecture | 2009

Using a configurable processor generator for computer architecture prototyping

Alex Solomatnikov; Amin Firoozshahian; Ofer Shacham; Zain Asgar; Megan Wachs; Wajahat Qadeer; Stephen Richardson; Mark Horowitz

Building hardware prototypes for computer architecture research is challenging. Unfortunately, development of the required software tools (compilers, debuggers, runtime) is even more challenging, which means these systems rarely run real applications. To overcome this issue, when developing our prototype platform, we used the Tensilica processor generator to produce a customized processor and corresponding software tools and libraries. While this base processor was very different from the streamlined custom processor we initially imagined, it allowed us to focus on our main objective - the design of a reconfigurable CMP memory system - and to successfully tape out an 8-core CMP chip with only a small group of designers. One person was able to handle processor configuration and hardware generation, support of a complete software tool chain, as well as developing the custom runtime software to support three different programming models. Having a sophisticated software tool chain not only allowed us to run more applications on our machine, it once again pointed out the need to use optimized code to get an accurate evaluation of architectural features.


international symposium on computer architecture | 2010

Understanding sources of inefficiency in general-purpose chips

Rehan Hameed; Wajahat Qadeer; Megan Wachs; Omid Azizi; Alex Solomatnikov; Benjamin C. Lee; Stephen Richardson; Christos Kozyrakis; Mark Horowitz


Communications of The ACM | 2011

Understanding sources of ineffciency in general-purpose chips

Rehan Hameed; Wajahat Qadeer; Megan Wachs; Omid Azizi; Alex Solomatnikov; Benjamin C. Lee; Stephen Richardson; Christos Kozyrakis; Mark Horowitz


design automation conference | 2007

Chip multi-processor generator

Alex Solomatnikov; Amin Firoozshahian; Wajahat Qadeer; Ofer Shacham; Kyle Kelley; Zain Asgar; Megan Wachs; Rehan Hameed; Mark Horowitz


Archive | 2014

Low power programmable image processor

Rehan Hameed; Wajahat Qadeer; Christoforos E. Kozyrakis; Mark Horowitz

Collaboration


Dive into the Wajahat Qadeer's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge