Phillip Stanley-Marbell

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Phillip Stanley-Marbell is active.

Explore More

Publication

Featured researches published by Phillip Stanley-Marbell.

workshop on mobile computing systems and applications | 2000

Scylla: a smart virtual machine for mobile embedded systems

Phillip Stanley-Marbell; Liviu Iftode

With the proliferation of wireless devices with embedded processors, there is an increasing desire to deploy applications that run transparently over the varied architectures of these devices. Virtual machines are one solution for code mobility, providing a virtualized processor architecture that is implemented over the individual node architectures. Proposed virtual machines for embedded systems are generally slow and consume significant energy, making them unsuitable for devices with limited processing power and energy resources. Presented is a novel virtual machine architecture, Scylla, specially designed for mobile embedded systems, that is simple, fast and robust. In addition to a basic instruction set, Scylla supports inter-device communication, power management and error recovery. To make on-the-fly compilation extremely efficient, the instruction set closely matches popular processor architectures that can be found in embedded systems today. This paper describes Scylla, along with a preliminary evaluation of its performance, including the costs of the on-the-fly compilation and the overhead of having a virtual machine, based on simulations and measurements on a prototype system.

international symposium on low power electronics and design | 2011

Pinned to the walls: impact of packaging and application properties on the memory and power walls

Phillip Stanley-Marbell; Victoria Caparros Cabezas; Ronald P. Luijten

This article presents a study of the impact of packaging on the memory and power walls, in the context of application properties. The analysis is supported by characterizations of 130 hardware designs spanning 30 years, along with both microarchitectural simulation and actual-hardware performance counter measurements of 25 applications. It is shown that if trends in supply pin count (growing as the square root of current) and total packaging pin count (doubling every six years) continue, application memory bandwidth requirements, even in the presence of aggressive cache hierarchies, may limit the number of on-chip threads to under a thousand in 2020.

IEEE Transactions on Computers | 2003

Modeling, analysis, and self-management of electronic textiles

Phillip Stanley-Marbell; Diana Marculescu; Radu Marculescu; Pradeep K. Khosla

Scaling in CMOS device technology has made it possible to cheaply embed intelligence in a myriad of devices. In particular, it has become feasible to fabricate flexible materials (e.g., woven fabrics) with large numbers of computing and communication elements embedded into them. Such computational fabrics, electronic textiles, or e-textiles have applications ranging from smart materials for aerospace applications to wearable computing. This paper addresses the modeling of computation, communication and failure in e-textiles and investigates the performance of two techniques, code migration and remote execution, for adapting applications executing over the hardware substrate, to failures in both devices and interconnection links. The investigation is carried out using a cycle-accurate simulation environment developed to model computation, power consumption, and node/link failures for large numbers of computing elements in configurable network topologies. A detailed analysis of the two techniques for adapting applications to the error prone substrate is presented, as well as a study of the effects of parameters, such as failure rates, communication speeds, and topologies, on the efficacy of the techniques and the performance of the system as a whole. It is shown that code migration and remote execution provide feasible methods for adapting applications to take advantage of redundancy in the presence of failures and involve trade offs in communication versus memory requirements in processing elements.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2011

Performance, Power, and Thermal Analysis of Low-Power Processors for Scale-Out Systems

Phillip Stanley-Marbell; Victoria Caparros Cabezas

There is increased interest, in high-performance computing as well as in commercial data centers, in so-called scale-out systems, where large numbers of low-cost and low-power-dissipation servers are used for workloads which have available coarse-grained parallelism. One target class of devices for building scale-out systems is the class of low-power processors, such as those based on the ARM architecture, the Power Architecture, and the Intel Atom processor. This article presents a detailed characterization of three contemporary low-power processors covering all the aforementioned IS As, all implemented in state-of-the-art 45nm semiconductor processes. Processor performance, power dissipation, thermal load, and board-level power dissipation apportionment are presented, via a combination of hardware performance counters, OS-level timing measurements, current measurements, and thermal imaging via a microbolometer array. It is demonstrated that while certain processors might provide low power dissipation, the most energy-efficient platform depends on the characteristics of the application, and the design of the entire platform (including integrated versus on-board peripherals, power supply regulators, etc.). The lowest-power platform showed a power-efficiency advantage of almost four times lower idle power dissipation, and almost five times lower active power dissipation for a single-threaded workload, versus the highest-power-dissipation platform studied. The latter however achieved a factor of two better energy-efficiency than its closest competitor, when executing a throughput-oriented workload, due to significantly better compute performance and available hardware concurrency.

design, automation, and test in europe | 2007

An 0.9 x 1.2", Low Power, Energy-Harvesting System with Custom Multi-Channel Communication Interface

Phillip Stanley-Marbell; Diana Marculescu

Presented is a self-powered computing system, sunflower, that uses a novel combination of a PIN photodiode array, switching regulators, and a supercapacitor, to provide a small footprint renewable energy source. The design provides software-controlled power-adaptation facilities, for both the main processor and its peripherals. The systems power consumption is characterized, and its energy-scavenging efficiency is quantified with field measurements under a variety of weather conditions

high performance embedded architectures and compilers | 2007

Sunflower: full-system, embedded, microarchitecture evaluation

Phillip Stanley-Marbell; Diana Marculescu

This paper describes Sunflower, a full-system microarchitectural evaluation environment for embedded computing systems. The environment enables detailed microarchitectural simulation of multiple instances of complete embedded systems, their peripherals, and medium access control / physical layer communication between systems. The environment models the microarchitecture, computation and communication upset events under a variety of stochastic distributions, compute and communication power consumption, electrochemical battery systems, and power regulation circuitry, as well as analog signals external to the processing elements. The simulation environment provides facilities for speeding up simulation performance, which tradeoff accuracy of simulated properties for simulation speed. Through the detailed simulation of benchmarks in which the effect of simulation speedup on correctness can be accurately quantified, it is demonstrated that traditional techniques proposed for simulation speedup can introduce significant error when simulating a combination of computation and analog physical phenomena external to a processor.

international conference on computer aided design | 2003

Fault-Tolerant Techniques for Ambient Intelligent Distributed Systems

Diana Marculescu; Nicholas H. Zamora; Phillip Stanley-Marbell; Radu Marculescu

Ambient Intelligent Systems provide an unexplored hardware platformfor executing distributed applications under strict energy constraints.These systems must respond quickly to changes in userbehavior or environmental conditions and must provide high availabilityand fault-tolerance under given quality constraints. Thesesystems will necessitate fault-tolerance to be built into applications.One way to provide such fault-tolerance is to employ the use of redundancy.Hundreds of computational devices will be available indeeply networked ambient intelligent systems, providing opportunitiesto exploit node redundancy to increase application lifetime orimprove quality of results if it drops below a threshold. Pre-copyingwith remote execution is proposed as a novel, alternative techniqueof code migration to enhance system lifetime for ambient intelligentsystems. Self-management of the system is considered in two differentscenarios: applications that tolerate graceful quality degradationand applications with single-point failures. The proposed techniquecan be part of a design methodology for prolonging the lifetime ofa wide range of applications under various types of faults, despitescarce energy resources.

international conference on computer aided design | 2003

Dynamic Fault-Tolerance and Metrics for Battery Powered, Failure-Prone Systems

Phillip Stanley-Marbell; Diana Marculescu

Emerging VLSI technologies and platforms are giving rise tosystems with inherently high potential for runtime failure.Such failures range from intermittent electrical and mechanicalfailures at the system level, to device failures at the chip level.Techniques to provide reliable computation in the presence offailures must do so while maintaining high performance, withan eye toward energy efficiency. When possible, they shouldmaximize battery lifetime in the face of battery discharge non-linearities. This paper introduces the concept of adaptive fault-tolerance management for failure-prone systems, and a classification of local algorithms for achieving system-wide reliability.In order to judge the efficacy of the proposed algorithmsfor dynamic fault-tolerance management, a set of metrics, forcharacterizing system behavior in terms of energy efficiency,reliability, computation performance and battery lifetime, ispresented. For an example platform employed in a realistic evaluation scenario, it is shown that system configurations with the best performance and lifetime are not necessarilythose with the best combination of performance, reliability,battery lifetime and average power consumption.

acm symposium on parallel algorithms and architectures | 2011

Parallelism and data movement characterization of contemporary application classes

Victoria Caparros Cabezas; Phillip Stanley-Marbell

This paper presents a framework for characterizing the distribution of fine-grained parallelism, data movement, and communication-minimizing code partitions. Understanding the spectrum of parallelism available in applications, and how much data movement might result if such parallelism is exploited, is essential in the hardware design process because these properties will be the limiters to performance scaling of future computing systems. The framework is applied to characterizing 26 applications and kernels, classified according to their dominant components in the Berkeley dwarf/ computational motif classification. The distributions of ILP and TLP over execution time are studied, and it is shown that, though mean ILP is high, available ILP is significantly smaller for most of the execution. The results from this framework are complemented by hardware performance counter data on two RISC platforms (IBM Power7 and Freescale P2020) and one CISC platform (IntelAtom D510), spanning a broad range of real machine characteristics. Employing a combination of these new techniques, and building upon previous proposals, it is demonstrated that the similarity in available ideal-case parallelism and data movement within and across the dwarf classes, is limited.

Operating Systems Review | 2010

A unified execution model for cloud computing

Eric Van Hensbergen; Noah Evans; Phillip Stanley-Marbell

This article presents the design goals and architecture for a unified execution model (UEM) for cloud computing and clusters. The UEM combines interfaces for logical provisioning and distributed command execution with integrated mechanisms for establishing and maintaining communication, synchronization, and control. In this paper, the UEM architecture is described, and an existing application which could benefit from its facilities is used to illustrate its value.

Explore More