Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Saturnino Garcia is active.

Publication


Featured researches published by Saturnino Garcia.


architectural support for programming languages and operating systems | 2010

Conservation cores: reducing the energy of mature computations

Ganesh Venkatesh; Jack Sampson; Nathan Goulding; Saturnino Garcia; Vladyslav Bryksin; Jose Lugo-Martinez; Steven Swanson; Michael Bedford Taylor

Growing transistor counts, limited power budgets, and the breakdown of voltage scaling are currently conspiring to create a utilization wall that limits the fraction of a chip that can run at full speed at one time. In this regime, specialized, energy-efficient processors can increase parallelism by reducing the per-computation power requirements and allowing more computations to execute under the same power budget. To pursue this goal, this paper introduces conservation cores. Conservation cores, or c-cores, are specialized processors that focus on reducing energy and energy-delay instead of increasing performance. This focus on energy makes c-cores an excellent match for many applications that would be poor candidates for hardware acceleration (e.g., irregular integer codes). We present a toolchain for automatically synthesizing c-cores from application source code and demonstrate that they can significantly reduce energy and energy-delay for a wide range of applications. The c-cores support patching, a form of targeted reconfigurability, that allows them to adapt to new versions of the software they target. Our results show that conservation cores can reduce energy consumption by up to 16.0x for functions and by up to 2.1x for whole applications, while patching can extend the useful lifetime of individual c-cores to match that of conventional processors.


international symposium on microarchitecture | 2011

The GreenDroid Mobile Application Processor: An Architecture for Silicon's Dark Future

Nathan Goulding-Hotta; Jack Sampson; Ganesh Venkatesh; Saturnino Garcia; Joe Auricchio; Po-Chao Huang; Manish Arora; Siddhartha Nath; Vikram Bhatt; Jonathan Babb; Steven Swanson; Michael Bedford Taylor

This article discusses about Greendroid mobile Application Processor. Dark silicon has emerged as the fundamental limiter in modern processor design. The Greendroid mobile application processor demonstrates an approach that uses dark silicon to execute general-purpose smart phone applications with less energy than todays most energy efficient designs.


ieee international symposium on workload characterization | 2009

SD-VBS: The San Diego Vision Benchmark Suite

Sravanthi Kota Venkata; Ikkjin Ahn; Donghwan Jeon; Anshuman Gupta; Christopher M. Louie; Saturnino Garcia; Serge J. Belongie; Michael Bedford Taylor

In the era of multi-core, computer vision has emerged as an exciting application area which promises to continue to drive the demand for both more powerful and more energy efficient processors. Although there is still a long way to go, vision has matured significantly over the last few decades, and the list of applications that are useful to end users continues to grow. The parallelism inherent in vision applications makes them a promising workload for multi-core and many-core processors.


programming language design and implementation | 2011

Kremlin: rethinking and rebooting gprof for the multicore age

Saturnino Garcia; Donghwan Jeon; Christopher M. Louie; Michael Bedford Taylor

Many recent parallelization tools lower the barrier for parallelizing a program, but overlook one of the first questions that a programmer needs to answer: which parts of the program should I spend time parallelizing? This paper examines Kremlin, an automatic tool that, given a serial version of a program, will make recommendations to the user as to what regions (e.g. loops or functions) of the program to attack first. Kremlin introduces a novel hierarchical critical path analysis and develops a new metric for estimating the potential of parallelizing a region: self-parallelism. We further introduce the concept of a parallelism planner, which provides a ranked order of specific regions to the programmer that are likely to have the largest performance impact when parallelized. Kremlin supports multiple planner personalities, which allow the planner to more effectively target a particular programming environment or class of machine. We demonstrate the effectiveness of one such personality, an OpenMP planner, by comparing versions of programs that are parallelized according to Kremlins plan against third-party manually parallelized versions. The results show that Kremlins OpenMP planner is highly effective, producing plans whose performance is typically comparable to, and sometimes much better than, manual parallelization. At the same time, these plans would require that the user parallelize significantly fewer regions of the program.


conference on object-oriented programming systems, languages, and applications | 2011

Kismet: parallel speedup estimates for serial programs

Donghwan Jeon; Saturnino Garcia; Christopher M. Louie; Michael Bedford Taylor

Software engineers now face the difficult task of refactoring serial programs for parallel execution on multicore processors. Currently, they are offered little guidance as to how much benefit may come from this task, or how close they are to the best possible parallelization. This paper presents Kismet, a tool that creates parallel speedup estimates for unparallelized serial programs. Kismet differs from previous approaches in that it does not require any manual analysis or modification of the program. This difference allows quick analysis of many programs, avoiding wasted engineering effort on those that are fundamentally limited. To accomplish this task, Kismet builds upon the hierarchical critical path analysis (HCPA) technique, a recently developed dynamic analysis that localizes parallelism to each of the potentially nested regions in the target program. It then uses a parallel execution time model to compute an approximate upper bound for performance, modeling constraints that stem from both hardware parameters and internal program structure. Our evaluation applies Kismet to eight high-parallelism NAS Parallel Benchmarks running on a 32-core AMD multicore system, five low-parallelism SpecInt benchmarks, and six medium-parallelism benchmarks running on the finegrained MIT Raw processor. The results are compelling. Kismet is able to significantly improve the accuracy of parallel speedup estimates relative to prior work based on critical path analysis.


ieee hot chips symposium | 2010

GreenDroid: A mobile application processor for a future of dark silicon

Nathan Goulding; Jack Sampson; Ganesh Venkatesh; Saturnino Garcia; Joe Auricchio; Jonathan Babb; Michael Bedford Taylor; Steven Swanson

This article consists of a collection of slides from the authors conference presentation on the GreenDroid, a mobile application processor. Also assesses the future of dark silicon. Some of the specific topics discussed include: the special features, system specifications, and system design for the GreenDroid; system architectures; applications for use; platforms supported; processing capabilities; memory capabilities; and targeted markets for application processors.


high-performance computer architecture | 2011

Efficient complex operators for irregular codes

Jack Sampson; Ganesh Venkatesh; Nathan Goulding-Hotta; Saturnino Garcia; Steven Swanson; Michael Bedford Taylor

Complex “fat operators” are important contributors to the efficiency of specialized hardware. This paper introduces two new techniques for constructing efficient fat operators featuring up to dozens of operations with arbitrary and irregular data and memory dependencies. These techniques focus on minimizing critical path length and load-use delay, which are key concerns for irregular computations. Selective Depipelining(SDP) is a pipelining technique that allows fat operators containing several, possibly dependent, memory operations. SDP allows memory requests to operate at a faster clock rate than the datapath, saving power in the datapath and improving memory performance. Cachelets are small, customized, distributed L0 caches embedded in the datapath to reduce load-use latency. We apply these techniques to Conservation Cores(c-cores) to produce coprocessors that accelerate irregular code regions while still providing superior energy efficiency. On average, these enhanced c-cores reduce EDP by 2× and area by 35% relative to c-cores. They are up to 2.5× faster than a general-purpose processor and reduce energy consumption by up to 8× for a variety of irregular applications including several SPECINT benchmarks.


ieee international symposium on workload characterization | 2014

CortexSuite: A synthetic brain benchmark suite

Shelby Thomas; Chetan Gohkale; Enrico Tanuwidjaja; Tony Chong; David Lau; Saturnino Garcia; Michael Bedford Taylor

These days, many traditional end-user applications are said to “run fast enough” on existing machines, so the search continues for novel applications that can leverage the new capabilities of our evolving hardware. Foremost of these potential applications are those that are clustered around information processing capabilities that humans have today but are lacking in computers. The fact that brains can perform these computations serves as an existence proof that these applications are realizable. At the same time, we often discover that the human nervous system, with its 80 billion neurons, on some metrics, is more powerful and energy-efficient than todays machines. Both of these aspects make this class of applications a desirable target for an architectural benchmark suite, because there is evidence that these applications are both useful and computationally challenging. This paper details CortexSuite, a Synthetic Brain Benchmark Suite, which seeks to capture this workload. We classify and identify benchmarks within CortexSuite by analogy to the human neural processing function. We use the major lobes of the cerebral cortex as a model for the organization and classification of data processing algorithms. To be clear, our goal is not to emulate the brain at the level of the neuron, but rather to collect together synthetic, man-made algorithms that have similar function and have met with success in the real world. We consulted six world-class machine learning and computer vision researchers, who collectively hold 83,091 citations across their distinct subareas, asking them to identify newly emerging computationally-intensive algorithms or applications that are going to have a large impact over the next ten years. This is coupled with datasets that reflect the philosophy of practical use algorithms and are coded in “clean C” so as to make them accessible, analyzable, and usable for parallel and approximate compiler and architecture researchers alike.


acm sigplan symposium on principles and practice of parallel programming | 2011

Kremlin: like gprof, but for parallelization

Donghwan Jeon; Saturnino Garcia; Christopher M. Louie; Sravanthi Kota Venkata; Michael Bedford Taylor

This paper overviews Kremlin, a software profiling tool designed to assist the parallelization of serial programs. Kremlin accepts a serial source code, profiles it, and provides a list of regions that should be considered in parallelization. Unlike a typical profiler, Kremlin profiles not only work but also parallelism, which is accomplished via a novel technique called hierarchical critical path analysis. Our evaluation demonstrates that Kremlin is highly effective, resulting in a parallelized program whose performance sometimes outperforms, and is mostly comparable to, manual parallelization. At the same time, Kremlin would require that the user parallelize significantly fewer regions of the program. Finally, a user study suggests Kremlin is effective in improving the productivity of programmers.


IEEE Micro | 2012

The Kremlin Oracle for Sequential Code Parallelization

Saturnino Garcia; Donghwan Jeon; Christopher M. Louie; Michael Bedford Taylor

The Kremlin open-source tool helps programmers by automatically identifying regions in sequential programs that merit parallelization. Kremlin combines a novel dynamic program analysis, hierarchical critical-path analysis, with multicore processor models to evaluate thousands of potential parallelization strategies and estimate their performance outcomes.

Collaboration


Dive into the Saturnino Garcia's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Donghwan Jeon

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jack Sampson

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Steven Swanson

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge