Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dara Kusic is active.

Publication


Featured researches published by Dara Kusic.


international conference on autonomic computing | 2008

Power and Performance Management of Virtualized Computing Environments Via Lookahead Control

Dara Kusic; Jeffrey O. Kephart; James E. Hanson; Nagarajan Kandasamy; Guofei Jiang

There is growing incentive to reduce the power consumed by large-scale data centers that host online services such as banking, retail commerce, and gaming. Virtualization is a promising approach to consolidating multiple online services onto a smaller number of computing resources. A virtualized server environment allows computing resources to be shared among multiple performance-isolated platforms called virtual machines. By dynamically provisioning virtual machines, consolidating the workload, and turning servers on and off as needed, data center operators can maintain the desired quality-of-service (QoS) while achieving higher server utilization and energy efficiency. We implement and validate a dynamic resource provisioning framework for virtualized server environments wherein the provisioning problem is posed as one of sequential optimization under uncertainty and solved using a lookahead control scheme. The proposed approach accounts for the switching costs incurred while provisioning virtual machines and explicitly encodes the corresponding risk in the optimization problem. Experiments using the Trade6 enterprise application show that a server cluster managed by the controller conserves, on average, 26% of the power required by a system without dynamic control while still maintaining QoS goals.


international conference on autonomic computing | 2006

Risk-Aware Limited Lookahead Control for Dynamic Resource Provisioning in Enterprise Computing Systems

Dara Kusic; Nagarajan Kandasamy

Utility or on-demand computing, a provisioning model where a service provider makes computing infrastructure available to customers as needed, is becoming increasingly common in enterprise computing systems. Realizing this model requires making dynamic and sometimes risky, resource provisioning and allocation decisions in an uncertain operating environment to maximize revenue while reducing operating cost. This paper develops an optimization framework wherein the resource provisioning problem is posed as one of sequential decision making under uncertainty and solved using a limited lookahead control scheme. The proposed approach accounts for the switching costs incurred during resource provisioning and explicitly encodes risk in the optimization problem. Simulations using workload traces from the Soccer World Cup 1998 web site show that a computing system managed by our controller generates up to 20% more revenue than a system without dynamic control while incurring low control overhead.


field programmable gate arrays | 2005

An FPGA-based VLIW processor with custom hardware execution

Raymond R. Hoare; Dara Kusic; Joshua Fazekas; John Foster

The capability and heterogeneity of new FPGA (Field Programmable Gate Array) devices continues to increase with each new line of devices. Efficiently programming these devices is increasing in difficulty. However, FPGAs continue to be utilized for algorithms traditionally targeted to embedded DSP microprocessors such as signal and image processing applications.This paper presents an architecture that combines VLIW (Very Large Instruction Word) processing with the capability to introduce application specific customized instructions and complex hardware functions. To support this architecture, a compilation and design automation flow are described for programs written in C.Several design tradeoffs for the architecture were examined including number of VLIW functional units and register file size. The architecture was implemented on an Altera Stratix II FPGA. The Stratix II device was selected because it offers a large number of high-speed DSP (digital signal processing) blocks that execute multiply accumulate operations.We show that our combined VLIW with hardware functions exhibit as much as 230X speedup and 63X on average for computational kernels for a set of benchmarks. This allows for an overall speedup of 30X and 12X on average for signal processing benchmarks from the MediaBench.


EURASIP Journal on Advances in Signal Processing | 2006

Rapid VLIW processor customization for signal processing applications using combinational hardware functions

Raymond R. Hoare; Dara Kusic; Joshua Fazekas; John Foster; Shen Chih Tung; Michael L. McCloud

This paper presents an architecture that combines VLIW (very long instruction word) processing with the capability to introduce application-specific customized instructions and highly parallel combinational hardware functions for the acceleration of signal processing applications. To support this architecture, a compilation and design automation flow is described for algorithms written in C. The key contributions of this paper are as follows: (1) a 4-way VLIW processor implemented in an FPGA, (2) large speedups through hardware functions, (3) a hardware/software interface with zero overhead, (4) a design methodology for implementing signal processing applications on this architecture, (5) tractable design automation techniques for extracting and synthesizing hardware functions. Several design tradeoffs for the architecture were examined including the number of VLIW functional units and register file size. The architecture was implemented on an Altera Stratix II FPGA. The Stratix II device was selected because it offers a large number of high-speed DSP (digital signal processing) blocks that execute multiply-accumulate operations. Using the MediaBench benchmark suite, we tested our methodology and architecture to accelerate software. Our combined VLIW processor with hardware functions was compared to that of software executing on a RISC processor, specifically the soft core embedded NIOS II processor. For software kernels converted into hardware functions, we show a hardware performance multiplier of up to times that of software with an average times faster. For the entire application in which only a portion of the software is converted to hardware, the performance improvement is as much as 30X times faster than the nonaccelerated application, with a 12X improvement on average.


Cluster Computing | 2007

Risk-aware limited lookahead control for dynamic resource provisioning in enterprise computing systems

Dara Kusic; Nagarajan Kandasamy

Abstract Utility or on-demand computing, a provisioning model where a service provider makes computing infrastructure available to customers as needed, is becoming increasingly common in enterprise computing systems. Realizing this model requires making dynamic, and sometimes risky, resource provisioning and allocation decisions in an uncertain operating environment to maximize revenue while reducing operating cost. This paper develops an optimization framework wherein the resource provisioning problem is posed as one of sequential decision making under uncertainty and solved using a limited lookahead control scheme. The proposed approach accounts for the switching costs incurred during resource provisioning and explicitly encodes risk in the optimization problem. Simulations using workload traces from the Soccer World Cup 1998 web site show that a computing system managed by our controller generates up to 20% more profit than a system without dynamic control while incurring low control overhead.


ACM Transactions in Embedded Computing Systems | 2006

Reducing power while increasing performance with supercisc

Raymond R. Hoare; Dara Kusic; Gayatri Mehta; Joshua Fazekas; John Foster

Multiprocessor Systems on Chips (MPSoCs) have become a popular architectural technique to increase performance. However, MPSoCs may lead to undesirable power consumption characteristics for computing systems that have strict power budgets, such as PDAs, mobile phones, and notebook computers. This paper presents the super-complex instruction-set computing (SuperCISC) Embedded Processor Architecture and, in particular, investigates performance and power consumption of this device compared to traditional processor architecture-based execution. SuperCISC is a heterogeneous, multicore processor architecture designed to exceed performance of traditional embedded processors while maintaining a reduced power budget compared to low-power embedded processors. At the heart of the SuperCISC processor is a multicore VLIW (Very Large Instruction Word) containing several homogeneous execution cores/functional units. In addition, complex and heterogeneous combinational hardware function cores are tightly integrated to the core VLIW engine providing an opportunity for improved performance and reduced energy consumption. Our SuperCISC processor core has been synthesized for both a 90-nm Stratix II Field Programmable Gate Aray (FPGA) and a 160-nm standard cell Application-Specific Integrated Circuit (ASIC) fabrication process from OKI, each operating at approximately 167 MHz for the VLIW core. We examine several reasons for speedup and power improvement through the SuperCISC architecture, including predicated control flow, cycle compression, and a reduction in arithmetic power consumption, which we call power compression. Finally, testing our SuperCISC processor with multimedia and signal-processing benchmarks, we show how the SuperCISC processor can provide performance improvements ranging from 7X to 160X with an average of 60X, while also providing orders of magnitude of power improvements for the computational kernels. The power improvements for our benchmark kernels range from just over 40X to over 400X, with an average savings exceeding 130X. By combining these power and performance improvements, our total energy improvements all exceed 1000X. As these savings are limited to the computational kernels of the applications, which often consume approximately 90% of the execution time, we expect our savings to approach the ideal application improvement of 10X.


IEEE Transactions on Network and Service Management | 2011

Combined Power and Performance Management of Virtualized Computing Environments Serving Session-Based Workloads

Dara Kusic; Nagarajan Kandasamy; Guofei Jiang

This paper develops an online resource provisioning framework for combined power and performance management in a virtualized computing environment serving session-based workloads. We pose this management problem as one of sequential optimization under uncertainty and solve it using limited lookahead control (LLC), a form of model-predictive control. The approach accounts for the switching costs incurred when provisioning virtual machines and explicitly encodes the risk of provisioning resources in an uncertain and dynamic operating environment. We experimentally validate the control framework on a server cluster supporting three online services. When managed using LLC, our cluster setup saves, on average, 41% in power-consumption costs over a twenty-four hour period when compared to a system operating without dynamic control. Finally, we use trace-based simulations to analyze LLC performance on server clusters larger than our testbed and show how concepts from approximation theory can be used to further reduce the computational burden of controlling large systems.


international conference on autonomic computing | 2007

Approximation Modeling for the Online Performance Management of Distributed Computing Systems

Dara Kusic; Nagarajan Kandasamy; Guofei Jiang

This paper develops a hierarchical control framework to solve performance management problems in distributed computing systems. To reduce the control overhead, concepts from approximation theory are used in the construction of the dynamical models that predict system behavior, and in the solution of the associated control equations themselves. Using a dynamic resource provisioning problem as a case study, we show that a computing system managed by the proposed control framework using approximation models realizes profit gains that are, in the best case, within 1% of a controller using an exact parametric model of the system.


IEEE Transactions on Circuits and Systems Ii-express Briefs | 2006

A VLIW Processor With Hardware Functions: Increasing Performance While Reducing Power

Raymond R. Hoare; Dara Kusic; Justin Stander; Gayatri Mehta; Joshua Fazekas

This brief presents a heterogeneous multicore embedded processor architecture designed to exceed performance of traditional embedded processors while reducing the power consumed compared to low-power embedded processors. At the heart of this architecture is a multicore very large instruction word (VLIW) containing homogeneous execution cores/functional units. Additionally, heterogeneous combinational hardware function cores are tightly integrated to the VLIW core providing an opportunity for improved performance and reduced energy consumption. Our processor has been synthesized for both a 90-nm Stratix II field-programmable gate array and a 160-nm cell-based application-specific integrated circuit from Oki each operating at a core frequency of 167 MHz. For selected multimedia and signal processing benchmarks, we show how this processor provides kernel performance improvements averaging 179X over an Intel StrongARM and 36X over an Intel XScale leading to application speedups averaging 30X over StrongARM and 10X over XScale


international conference on electronics circuits and systems | 2004

A 64-way VLIW/SIMD FPGA architecture and design flow

Raymond R. Hoare; Ivan S. Kourtev; Joshua Fazekas; Dara Kusic; John Foster; Sedric Boddie; Ahmed Muaydh

Current FPGA architectures are heterogeneous, containing tens of thousands of logic elements and hundreds of embedded multipliers and memory units. However, efficiently utilizing these resources requires hardware designers and complex computer aided design tools. The paper describes several multi-processor architectures implemented on an FPGA, including a 64-way single interface multiple data (SIMD) and a variable size very long instruction word (VLIW) architecture. The design and synthesis of the target architectures are presented and compared for scalability and achieving parallelism. The performance and chip utilization of a shared register file is examined for different numbers of VLIW processing elements. The associated compilation flow is described based on the Trimaran VLIW compiler which achieves explicitly parallel instructions from C code. Benchmarks from the Media-Bench suite are being used to test the performance of the parallelism of both the software and hardware components.

Collaboration


Dive into the Dara Kusic's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Joshua Fazekas

University of Pittsburgh

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

John Foster

University of Pittsburgh

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gayatri Mehta

University of North Texas

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge