Margaret Martonosi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Margaret Martonosi is active.

Explore More

Publication

Featured researches published by Margaret Martonosi.

international symposium on computer architecture | 2000

Wattch: a framework for architectural-level power analysis and optimizations

David M. Brooks; Vivek Tiwari; Margaret Martonosi

Power dissipation and thermal issues are increasingly significant in modern processors. As a result, it is crucial that power/performance tradeoffs be made more visible to chip architects and even compiler writers, in addition to circuit designers. Most existing power analysis tools achieve high accuracy by calculating power estimates for designs only after layout or floorplanning are complete. In addition to being available only late in the design process, such tools are often quite slow, which compounds the difficulty of running them for a large space of design possibilities. This paper presents Wattch, a framework for analyzing and optimizing microprocessor power dissipation at the architecture-level. Wattch is 1000X or more faster than existing layout-level power tools, and yet maintains accuracy within 10% of their estimates as verified using industry tools on leading-edge designs. This paper presents several validations of Wattchs accuracy. In addition, we present three examples that demonstrate how architects or compiler writers might use Wattch to evaluate power consumption in their design process. We see Wattch as a complement to existing lower-level tools; it allows architects to explore and cull the design space early on, using faster, higher-level tools. It also opens up the field of power-efficient computing to a wider range of researchers by providing a power evaluation methodology within the portable and familiar SimpleScalar framework.

high performance computer architecture | 2001

Dynamic thermal management for high-performance microprocessors

David M. Brooks; Margaret Martonosi

With the increasing clock rate and transistor count of todays microprocessors, power dissipation is becoming a critical component of system design complexity. Thermal and power-delivery issues are becoming especially critical for high-performance computing systems. In this work, we investigate dynamic thermal management as a technique to control CPU power dissipation. With the increasing usage of clock gating techniques, the average power dissipation typically seen by common applications is becoming much less than the chips rated maximum power dissipation. However system designers still must design thermal heat sinks to withstand the worse-case scenario. We define and investigate the major components of any dynamic thermal management scheme. Specifically we explore the tradeoffs between several mechanisms for responding to periods of thermal trauma and we consider the effects of hardware and software implementations. With approximate dynamic thermal management, the CPU can be designed for a much lower maximum power rating, with minimal performance impact for typical applications.

international symposium on computer architecture | 2001

Cache decay: exploiting generational behavior to reduce cache leakage power

Stefanos Kaxiras; Zhigang Hu; Margaret Martonosi

Power dissipation is increasingly important in CPUs ranging from those intended for mobile use, all the way up to high-performance processors for high-end servers. While the bulk of the power dissipated is dynamic switching power, leakage power is also beginning to be a concern. Chipmakers expect that in future chip generations, leakages proportion of total chip power will increase significantly. This paper examines methods for reducing leakage power within the cache memories of the CPU. Because caches comprise much of a CPU chips area and transistor counts, they are reasonable targets for attacking leakage. We discuss policies and implementations for reducing cache leakage by invalidating and “turning off” cache lines when they hold data not likely to be reused. In particular, our approach is targeted at the generational nature of cache line usage. That is, cache lines typically have a flurry of frequent use when first brought into the cache, and then have a period of “dead time” before they are evicted. By devising effective, low-power ways of deducing dead time, our results show that in many cases we can reduce LI cache leakage energy by 4x in SPEC2000 applications without impacting performance. Because our decay-based techniques have notions of competitive on-line algorithms at their roots, their energy usage can be theoretically bounded at within a factor of two of the optimal oracle-based policy. We also examine adaptive decay-based policies that make energy-minimizing policy choices on a per-application basis by choosing appropriate decay intervals individually for each cache line. Our proposed adaptive policies effectively reduce LI cache leakage energy by 5x for the SPEC2000 with only negligible degradations in performance.

international symposium on microarchitecture | 2006

An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget

Canturk Isci; Alper Buyuktosunoglu; C.-Y. Chen; Pradip Bose; Margaret Martonosi

Chip-level power and thermal implications will continue to rule as one of the primary design constraints and performance limiters. The gap between average and peak power actually widens with increased levels of core integration. As such, if per-core control of power levels (modes) is possible, a global power manager should be able to dynamically set the modes suitably. This would be done in tune with the workload characteristics, in order to always maintain a chip-level power that is below the specified budget. Furthermore, this should be possible without significant degradation of chip-level throughput performance. We analyze and validate this concept in detail in this paper. We assume a per-core DVFS (dynamic voltage and frequency scaling) knob to be available to such a conceptual global power manager. We evaluate several different policies for global multi-core power management. In this analysis, we consider various different objectives such as prioritization and optimized throughput. Overall, our results show that in the context of a workload comprised of SPEC benchmark threads, our best architected policies can come within 1% of the performance of an ideal oracle, while meeting a given chip-level power budget. Furthermore, we show that these global dynamic management policies perform significantly better than static management, even if static scheduling is given oracular knowledge

international symposium on microarchitecture | 2003

Runtime power monitoring in high-end processors: methodology and empirical data

Canturk Isci; Margaret Martonosi

With power dissipation becoming an increasingly vexing problem across many classes of computer systems, measuring power dissipation of real, running systems has become crucial for hardware and software system research and design. Live power measurements are imperative for studies requiring execution times too long for simulation, such as thermal analysis. Furthermore, as processors become more complex and include a host of aggressive dynamic power management techniques, per-component estimates of power dissipation have become both more challenging as well as more important. In this paper we describe our technique for a coordinated measurement approach that combines real total power measurement with performance-counter-based, per-unit power estimation. The resulting tool offers live total power measurements for Intel Pentium 4 processors, and also provides power breakdowns for 22 of the major CPU subunits over minutes of SPEC2000 and desktop workload execution. As an example application, we use the generated component power breakdowns to identify program power phase behaviour. Overall, this paper demonstrates a processor power measurement and estimation methodology and also gives experiences and empirical application results that can provide a basis for future power-aware research.

international conference on embedded networked sensor systems | 2004

Hardware design experiences in ZebraNet

Pei Zhang; Christopher M. Sadler; S. A. Lyon; Margaret Martonosi

The enormous potential for wireless sensor networks to make a positive impact on our society has spawned a great deal of research on the topic, and this research is now producing environment-ready systems. Current technology limits coupled with widely-varying application requirements lead to a diversity of hardware platforms for different portions of the design space. In addition, the unique energy and reliability constraints of a system that must function for months at a time without human intervention mean that demands on sensor network hardware are different from the demands on standard integrated circuits. This paper describes our experiences designing sensor nodes and low level software to control them. In the ZebraNet system we use GPS technology to record fine-grained position data in order to track long term animal migrations [14]. The ZebraNet hardware is composed of a 16-bit TI microcontroller, 4 Mbits of off-chip flash memory, a 900 MHz radio, and a low-power GPS chip. In this paper, we discuss our techniques for devising efficient power supplies for sensor networks, methods of managing the energy consumption of the nodes, and methods of managing the peripheral devices including the radio, flash, and sensors. We conclude by evaluating the design of the ZebraNet nodes and discussing how it can be improved. Our lessons learned in developing this hardware can be useful both in designing future sensor nodes and in using them in real systems.

international symposium on computer architecture | 2006

Techniques for Multicore Thermal Management: Classification and New Exploration

James Donald; Margaret Martonosi

Power density continues to increase exponentially with each new technology generation, posing a major challenge for thermal management in modern processors. Much past work has examined microarchitectural policies for reducing total chip power, but these techniques alone are insufficient if not aimed at mitigating individual hotspots. The industrys current trend has been toward multicore architectures, which provide additional opportunities for dynamic thermal management. This paper explores various thermal management techniques that exploit the distributed nature of multicore processors. We classify these techniques in terms of core throttling policy, whether that policy is applied locally to a core or to the processor as a whole, and process migration policies. We use Turandot and a HotSpot-based thermal simulator to simulate a variety of workloads under thermal duress on a 4-core PowerPCTMprocessor. Using benchmarks from the SPEC 2000 suite we characterize workloads in terms of instruction throughput as well as their effective duty cycles. Among a variety of options we find that distributed controltheoretic DVFS alone improves throughput by 2.5X under our test conditions. Our final design involves a PI-based core thermal controller and an outer control loop to decide process migrations. This policy avoids all thermal emergencies and yields an average of 2.6X speedup over the baseline across all workloads.

acm special interest group on data communication | 2005

Erasure-coding based routing for opportunistic networks

Yong Wang; Sushant Jain; Margaret Martonosi; Kevin R. Fall

Routing in Delay Tolerant Networks (DTN) with unpredictable node mobility is a challenging problem because disconnections are prevalent and lack of knowledge about network dynamics hinders good decision making. Current approaches are primarily based on redundant transmissions. They have either high overhead due to excessive transmissions or long delays due to the possibility of making wrong choices when forwarding a few redundant copies. In this paper, we propose a novel forwarding algorithm based on the idea of erasure codes. Erasure coding allows use of a large number of relays while maintaining a constant overhead, which results in fewer cases of long delays.We use simulation to compare the routing performance of using erasure codes in DTN with four other categories of forwarding algorithms proposed in the literature. Our simulations are based on a real-world mobility trace collected in a large outdoor wild-life environment. The results show that the erasure-coding based algorithm provides the best worst-case delay performance with a fixed amount of overhead. We also present a simple analytical model to capture the delay characteristics of erasure-coding based forwarding, which provides insights on the potential of our approach.

acm sigplan symposium on principles and practice of parallel programming | 2003

Impala: a middleware system for managing autonomic, parallel sensor systems

Ting Liu; Margaret Martonosi

Sensor networks are long-running computer systems with many sensing/compute nodes working to gather information about their environment, process and fuse that information, and in some cases, actuate control mechanisms in response. Like traditional parallel systems, communication between nodes is of fundamental importance, but is typically accomplished via wireless transceivers. One further key attribute of sensor networks is that they are almost always long running systems, intended to operate in situ, with minimal direct human intervention, for months or years. This requirement for long-running autonomy mandates careful design of the runtime system that manages applications on each node, to ensure reliability and ease of upgrades over the life of the system.This paper describes Impala, a middleware architecture that enables application modularity, adaptivity, and repair-ability in wireless sensor networks. Impala allows software updates to be received via the nodes wireless transceiver and to be applied to the running system dynamically. In addition, Impala also provides an interface for on-the-fly application adaptation in order to improve the performance, energy-efficiency, and reliability of the software system. Impala has been designed to be a part of the ZebraNet mobile sensor network, but we are also prototyping it within HP/Compaq iPAQ Pocket PC handhelds. We present performance data for both real system measurements of the Pocket PC version as well as simulations of a full mobile sensor system deployment. Overall, Impala is a lightweight runtime system that can greatly improve system reliability, performance, and energy-efficiency. The ideas introduced here for sensor networks have applicability more broadly in other long-running autonomous parallel systems as well.

international conference on embedded networked sensor systems | 2006

Data compression algorithms for energy-constrained devices in delay tolerant networks

Christopher M. Sadler; Margaret Martonosi

Sensor networks are fundamentally constrained by the difficulty and energy expense of delivering information from sensors to sink. Our work has focused on garnering additional significant energy improvements by devising computationally-efficient lossless compression algorithms on the source node. These reduce the amount of data that must be passed through the network and to the sink, and thus have energy benefits that are multiplicative with the number of hops the data travels through the network.Currently, if sensor system designers want to compress acquired data, they must either develop application-specific compression algorithms or use off-the-shelf algorithms not designed for resource-constrained sensor nodes. This paper discusses the design issues involved with implementing, adapting, and customizing compression algorithms specifically geared for sensor nodes. While developing Sensor LZW (S-LZW) and some simple, but effective, variations to this algorithm, we show how different amounts of compression can lead to energy savings on both the compressing node and throughout the network and that the savings depends heavily on the radio hardware. To validate and evaluate our work, we apply it to datasets from several different real-world deployments and show that our approaches can reduce energy consumption by up to a factor of 4.5X across the network.

Explore More