David Meisner
University of Michigan
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by David Meisner.
architectural support for programming languages and operating systems | 2009
David Meisner; Brian T. Gold; Thomas F. Wenisch
Data center power consumption is growing to unprecedented levels: the EPA estimates U.S. data centers will consume 100 billion kilowatt hours annually by 2011. Much of this energy is wasted in idle systems: in typical deployments, server utilization is below 30%, but idle servers still consume 60% of their peak power draw. Typical idle periods though frequent--last seconds or less, confounding simple energy-conservation approaches. In this paper, we propose PowerNap, an energy-conservation approach where the entire system transitions rapidly between a high-performance active state and a near-zero-power idle state in response to instantaneous load. Rather than requiring fine-grained power-performance states and complex load-proportional operation from each system component, PowerNap instead calls for minimizing idle power and transition time, which are simpler optimization goals. Based on the PowerNap concept, we develop requirements and outline mechanisms to eliminate idle power waste in enterprise blade servers. Because PowerNap operates in low-efficiency regions of current blade center power supplies, we introduce the Redundant Array for Inexpensive Load Sharing (RAILS), a power provisioning approach that provides high conversion efficiency across the entire range of PowerNaps power demands. Using utilization traces collected from enterprise-scale commercial deployments, we demonstrate that, together, PowerNap and RAILS reduce average server power consumption by 74%.
international symposium on computer architecture | 2011
David Meisner; Christopher M. Sadler; Luiz André Barroso; Wolf-Dietrich Weber; Thomas F. Wenisch
Much of the success of the Internet services model can be attributed to the popularity of a class of workloads that we call Online Data-Intensive (OLDI) services. These work-loads perform significant computing over massive data sets per user request but, unlike their offline counterparts (such as MapReduce computations), they require responsiveness in the sub-second time scale at high request rates. Large search products, online advertising, and machine translation are examples of workloads in this class. Although the load in OLDI services can vary widely during the day, their energy consumption sees little variance due to the lack of energy proportionality of the underlying machinery. The scale and latency sensitivity of OLDI workloads also make them a challenging target for power management techniques. We investigate what, if anything, can be done to make OLDI systems more energy-proportional. Specifically, we evaluate the applicability of active and idle low-power modes to reduce the power consumed by the primary server components (processor, memory, and disk), while maintaining tight response time constraints, particularly on 95th-percentile latency. Using Web search as a representative example of this workload class, we first characterize a production Web search workload at cluster-wide scale. We provide a fine-grain characterization and expose the opportunity for power savings using low-power modes of each primary server component. Second, we develop and validate a performance model to evaluate the impact of processor- and memory-based low-power modes on the search latency distribution and consider the benefit of current and foreseeable low-power modes. Our results highlight the challenges of power management for this class of workloads. In contrast to other server workloads, for which idle low-power modes have shown great promise, for OLDI workloads we find that energy-proportionality with acceptable query latency can only be achieved using coordinated, full-system active low-power modes.
architectural support for programming languages and operating systems | 2011
Qingyuan Deng; David Meisner; Luiz E. Ramos; Thomas F. Wenisch; Ricardo Bianchini
Main memory is responsible for a large and increasing fraction of the energy consumed by servers. Prior work has focused on exploiting DRAM low-power states to conserve energy. However, these states require entire DRAM ranks to be idled, which is difficult to achieve even in lightly loaded servers. In this paper, we propose to conserve memory energy while improving its energy-proportionality by creating active low-power modes for it. Specifically, we propose MemScale, a scheme wherein we apply dynamic voltage and frequency scaling (DVFS) to the memory controller and dynamic frequency scaling (DFS) to the memory channels and DRAM devices. MemScale is guided by an operating system policy that determines the DVFS/DFS mode of the memory subsystem based on the current need for memory bandwidth, the potential energy savings, and the performance degradation that applications are willing to withstand. Our results demonstrate that MemScale reduces energy consumption significantly compared to modern memory energy management approaches. We conclude that the potential benefits of the MemScale mechanisms and policy more than compensate for their small hardware cost.
international symposium on computer architecture | 2013
Kevin T. Lim; David Meisner; Ali G. Saidi; Parthasarathy Ranganathan; Thomas F. Wenisch
Distributed in-memory key-value stores, such as memcached, are central to the scalability of modern internet services. Current deployments use commodity servers with high-end processors. However, given the cost-sensitivity of internet services and the recent proliferation of volume low-power System-on-Chip (SoC) designs, we see an opportunity for alternative architectures. We undertake a detailed characterization of memcached to reveal performance and power inefficiencies. Our study considers both high-performance and low-power CPUs and NICs across a variety of carefully-designed benchmarks that exercise the range of memcached behavior. We discover that, regardless of CPU microarchitecture, memcached execution is remarkably inefficient, saturating neither network links nor available memory bandwidth. Instead, we find performance is typically limited by the per-packet processing overheads in the NIC and OS kernel---long code paths limit CPU performance due to poor branch predictability and instruction fetch bottlenecks. Our insights suggest that neither high-performance nor low-power cores provide a satisfactory power-performance trade-off, and point to a need for tighter integration of the network interface. Hence, we argue for an alternate architecture---Thin Servers with Smart Pipes (TSSP)---for cost-effective high-performance memcached deployment. TSSP couples an embedded-class low-power core to a memcached accelerator that can process GET requests entirely in hardware, offloading both network handling and data look up. We demonstrate the potential benefits of our TSSP architecture through an FPGA prototyping platform, and show the potential for a 6X-16X power-performance improvement over conventional server baselines.
international symposium on microarchitecture | 2012
Qingyuan Deng; David Meisner; Abhishek Bhattacharjee; Thomas F. Wenisch; Ricardo Bianchini
Recent work has introduced memory system dynamic voltage and frequency scaling (DVFS), and has suggested that balanced scaling of both CPU and the memory system is the most promising approach for conserving energy in server systems. In this paper, we first demonstrate that CPU and memory system DVFS often conflict when performed independently by separate controllers. In response, we propose Co Scale, the first method for effectively coordinating these mechanisms under performance constraints. Co Scale relies on execution profiling of each core via (existing and new) performance counters, and models of core and memory performance and power consumption. Co Scale explores the set of possible frequency settings in such a way that it efficiently minimizes the full-system energy consumption within the performance bound. Our results demonstrate that, by effectively coordinating CPU and memory power management, Co Scale conserves a significant amount of system energy compared to existing approaches, while consistently remaining within the prescribed performance bounds. The results also show that Co Scale conserves almost as much system energy as an offline, idealized approach.
architectural support for programming languages and operating systems | 2010
Steven Pelley; David Meisner; Pooya Zandevakili; Thomas F. Wenisch; Jack Underwood
Data center power infrastructure incurs massive capital costs, which typically exceed energy costs over the life of the facility. To squeeze maximum value from the infrastructure, researchers have proposed over-subscribing power circuits, relying on the observation that peak loads are rare. To ensure availability, these proposals employ power capping, which throttles server performance during utilization spikes to enforce safe power budgets. However, because budgets must be enforced locally -- at each power distribution unit (PDU) -- local utilization spikes may force throttling even when power delivery capacity is available elsewhere. Moreover, the need to maintain reserve capacity for fault tolerance on power delivery paths magnifies the impact of utilization spikes. In this paper, we develop mechanisms to better utilize installed power infrastructure, reducing reserve capacity margins and avoiding performance throttling. Unlike conventional high-availability data centers, where collocated servers share identical primary and secondary power feeds, we reorganize power feeds to create shuffled power distribution topologies. Shuffled topologies spread secondary power feeds over numerous PDUs, reducing reserve capacity requirements to tolerate a single PDU failure. Second, we propose Power Routing, which schedules IT load dynamically across redundant power feeds to: (1) shift slack to servers with growing power demands, and (2) balance power draw across AC phases to reduce heating and improve electrical stability. We describe efficient heuristics for scheduling servers to PDUs (an NP-complete problem). Using data collected from nearly 1000 servers in three production facilities, we demonstrate that these mechanisms can reduce the required power infrastructure capacity relative to conventional high-availability data centers by 32% without performance degradation.
international symposium on performance analysis of systems and software | 2012
David Meisner; Junjie Wu; Thomas F. Wenisch
Recently, there has been an explosive growth in Internet services, greatly increasing the importance of data center systems. Applications served from “the cloud” are driving data center growth and quickly overtaking traditional workstations. Although there are a many tools for evaluating components of desktop and server architectures in detail, scalable modeling tools are noticeably missing. We describe BigHouse a simulation infrastructure for data center systems. Instead of simulating servers using detailed microarchitectural models, BigHouse raises the level of abstraction. Using a combination of queuing theory and stochastic modeling, BigHouse can simulate server systems in minutes rather than hours. BigHouse leverages statistical simulation techniques to limit simulation turnaround time to the minimum runtime needed for a desired accuracy. In this paper, we introduce BigHouse, describe its design, and present case studies for how it has already been applied to build and validate models of data center workloads and systems. Furthermore, we describe statistical techniques incorporated into BigHouse to accelerate and parallelize its simulations, and demonstrate its scalability to model large cluster systems while maintaining reasonable simulation time.
international symposium on low power electronics and design | 2012
Qingyuan Deng; David Meisner; Abhishek Bhattacharjee; Thomas F. Wenisch; Ricardo Bianchini
The fraction of server energy consumed by the memory system has been increasing rapidly and is now on par with that consumed by processors. Recent work demonstrates that substantial memory energy can be saved with only a small, tightly-controlled performance degradation using memory Dynamic Frequency and Voltage Scaling (DVFS). Prior studies consider only servers with a single memory controller (MC); however, multicore server processors have begun to incorporate multiple MCs. We propose MultiScale, the first technique to coordinate DVFS across multiple MCs, memory channels, and memory devices. Under operating system control, MultiScale monitors application bandwidth requirements across MCs. It then uses a heuristic algorithm to select and apply a frequency combination that will minimize the overall system energy within user-specified per-application performance constraints. Our results demonstrate that MultiScale reduces system energy consumption significantly, compared to prior approaches, while respecting the user-specified performance constraints.
high-performance computer architecture | 2015
Chang-Hong Hsu; Yunqi Zhang; Michael A. Laurenzano; David Meisner; Thomas F. Wenisch; Jason Mars; Lingjia Tang; Ronald G. Dreslinski
Reducing the long tail of the query latency distribution in modern warehouse scale computers is critical for improving performance and quality of service of workloads such as Web Search and Memcached. Traditional turbo boost increases a processors voltage and frequency during a coarse-grain sliding window, boosting all queries that are processed during that window. However, the inability of such a technique to pinpoint tail queries for boosting limits its tail reduction benefit. In this work, we propose Adrenaline, an approach to leverage finer granularity, 10s of nanoseconds, voltage boosting to effectively rein in the tail latency with query-level precision. Two key insights underlie this work. First, emerging finer granularity voltage/frequency boosting is an enabling mechanism for intelligent allocation of the power budget to precisely boost only the queries that contribute to the tail latency; and second, per-query characteristics can be used to design indicators for proactively pinpointing these queries, triggering boosting accordingly. Based on these insights, Adrenaline effectively pinpoints and boosts queries that are likely to increase the tail distribution and can reap more benefit from the voltage/frequency boost. By evaluating under various workload configurations, we demonstrate the effectiveness of our methodology. We achieve up to a 2.50x tail latency improvement for Memcached and up to a 3.03x for Web Search over coarse-grained DVFS given a fixed boosting power budget. When optimizing for energy reduction, Adrenaline achieves up to a 1.81x improvement for Memcached and up to a 1.99x for Web Search over coarse-grained DVFS.
international symposium on low power electronics and design | 2010
David Meisner; Thomas F. Wenisch
Accurately modeling server power consumption is critical in designing data center power provisioning infrastructure. However, to date, most research proposals have used average CPU utilization to infer the power consumption of clusters, typically averaging over tens of minutes per observation. We demonstrate that average CPU utilization is not sufficient to predict peak power consumption accurately. By characterizing the relationship between server utilization and power supply behavior, we can more accurately model the actual peak power consumption. Finally, we introduce a new operating system metric that can capture the needed information to design for peak power with low overhead.
