Ruben Gonzalez
Griffith University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ruben Gonzalez.
international symposium on microarchitecture | 2006
Jack Sampson; Ruben Gonzalez; Jean-Francois Collard; Norman P. Jouppi; Michael S. Schlansker; Brad Calder
We examine the ability of CMPs, due to their lower on-chip communication latencies, to exploit data parallelism at inner-loop granularities similar to that commonly targeted by vector machines. Parallelizing code in this manner leads to a high frequency of barriers, and we explore the impact of different barrier mechanisms upon the efficiency of this approach. To further exploit the potential of CMPs for fine-grained data parallel tasks, we present barrier filters, a mechanism for fast barrier synchronization on-chip multi-processors to enable vector computations to be efficiently distributed across the cores of a CMP. We ensure that all threads arriving at a barrier require an unavailable cache line to proceed, and, by placing additional hardware in the shared portions of the memory subsystem, we starve their requests until they all have arrived. Specifically, our approach uses invalidation requests to both make cache lines unavailable and identify when a thread has reached the barrier. We examine two types of barrier filters, one synchronizing through instruction cache lines, and the other through data cache lines
international conference on parallel architectures and compilation techniques | 2007
Miquel Pericàs; Adrián Cristal; Francisco J. Cazorla; Ruben Gonzalez; Daniel A. Jiménez; Mateo Valero
Multi-core processors naturally exploit thread-level parallelism (TLP). However, extracting instruction-level parallelism (ILP) from individual applications or threads is still a challenge as application mixes in this environment are nonuniform. Thus, multi-core processors should be flexible enough to provide high throughput for uniform parallel applications as well as high performance for more general workloads. Heterogeneous architectures are a first step in this direction, but partitioning remains static and only roughly fits application requirements. This paper proposes the Flexible Heterogeneous Mul-tiCore processor (FMC), the first dynamic heterogeneous multi-core architecture capable of reconfiguring itself to fit application requirements without programmer intervention. The basic building block of this microarchitecture is a scalable, variable-size window microarchitecture that exploits the concept of Execution Locality to provide large-window capabilities. This allows to overcome the memory wall for applications with high memory-level parallelism (MLP). The microarchitecture contains a set of small and fast cache processors that execute high locality code and a network of small in-order memory engines that together exploit low locality code. Single-threaded applications can use the entire network of cores while multi-threaded applications can efficiently share the resources. The sizing of critical structures remains small enough to handle current power envelopes. In single-threaded mode this processor is able to outperform previous state-of-the-art high-performance processor research by 12% on SpecFP. We show how in a quad- threaded/quad-core environment the processor outperforms a statically allocated configuration in both throughput and harmonic mean, two commonly used metrics to evaluate SMTperformance, by around 2-4%. This is achieved while using a very simple sharing algorithm.
high-performance computer architecture | 2006
Miquel Pericàs; Adrian Cristal; Ruben Gonzalez; Daniel A. Jiménez; Mateo Valero
Building processors with large instruction windows has been proposed as a mechanism for overcoming the memory wall, but finding a feasible and implementable design has been an elusive goal. Traditional processors are composed of structures that do not scale to large instruction windows because of timing and power constraints. However, the behavior of programs executed with large instruction windows gives rise to a natural and simple alternative to scaling. We characterize this phenomenon of execution locality and propose a microarchitecture to exploit it to achieve the benefit of a large instruction window processor with low implementation cost. Execution locality is the tendency of instructions to exhibit high or low latency based on their dependence on memory operations. In this paper we propose a decoupled microarchitecture that executes low latency instructions on a cache processor and high latency instructions on a memory processor. We demonstrate that such a design, using small structures and many in-order components, can achieve the same performance as much more aggressive proposals while minimizing design complexity.
Communications of The ACM | 2000
Ruben Gonzalez; Gregory Cranitch; Jun Hyung Jo
COMMUNICATIONS OF THE ACM January 2000/Vol. 43, No. 1 89 O ne significant characteristic of multimedia is that it is a vehicle for the convergence of the traditionally separate technologies of computing, entertainment, and telecommunications. To this is added a rich diversity of application areas wanting to exploit this new technological medium including education, commerce, advertising, and medicine, to name a few. This conjunction of technologies and applications creates a very fertile ground for innovation and creation of new multimedia forms. Vocationally, there is high mobility of practitioners in the field and there are very few well-defined career paths. In this rapidly evolving environment, it is imperative that practitioners are highly adaptable and multiskilled. In this context a number of universities have recently introduced undergraduate multimedia degrees. This has presented some interesting challenges to those involved in designing these courses. Under the garb of multimedia there is certainly industry demand for graduates, but defining “multimedia” has proved to be a difficult task. This burdens academics with the problem of achieving consensus on a curriculum for multimedia. This is not so much due to the wide scope of multidisciplinary applications of multimedia (much like word processing) but because the term “multimedia” has metamorphized into something akin to the mythological, multiheaded Hydra where its meaning is often determined by some immediate context in isolation from its source. Implicit in this debate about the definition of multimedia is the question of which discipline can claim ownership, with all that entails. Is it possible that multimedia is not part of an existing discipline but a new one? Some have argued that multimedia is like the emergence of computer science 30 years ago. CS, however, evolved from a conjunction of the fundamentally related disciplines of electrical engineering, mathematics, and other sciences. The similar academic and research cultures associated with these disciplines provided a strong sense of cohesion and direction to their progeny. Unfortunately, this is not the case with multimedia as a number of the disciplines that lay claim to it have traditionally antagonistic cultures. Attempts to synthesize a definition of multimedia and, hence, an undergraduate course as simply a conglomeration of the multitudinous diverse views Ruben Gonzalez, Greg Cranitch, and Jun Jo
IEEE MultiMedia | 2000
Ruben Gonzalez
The word multimedia conjures up many emotions and responses-it often refers to anything that uses visual and acoustic data. As a result, some may brush multimedia off as just a marketing slogan. However, many universities have introduced degree courses in multimedia. This article attempts to define the scope and basis for a multimedia discipline.
international conference on multimedia computing and systems | 1998
Kathy Melih; Ruben Gonzalez
Despite growing interest in multimedia data management, audio retrieval has received little attention. In part, this can be attributed to existing unstructured audio representations that do not easily lend themselves to content based retrieval and especially browsing. This paper aims to address this oversight. It begins by reviewing existing techniques and the specific problems posed by unstructured representations. Some characteristics of audio perception that may be exploited in the solution to these problems are then presented. A new structured representation is then detailed that is designed to support content based retrieval and browsing. Finally, the suitability of this representation for its intended purpose is discussed.
ACM Sigarch Computer Architecture News | 2005
Jack Sampson; Ruben Gonzalez; Jean-Francois Collard; Norman P. Jouppi; Michael S. Schlansker
This paper presents a novel mechanism for barrier synchronization on chip multi-processors (CMPs). By forcing the invalidation of selected I-cache lines, this mechanism starves threads and thus forces their execution to stop. Threads are let free when all have entered the barrier.We evaluated this mechanism using SMTSim and report much better (and most importantly, more flat) performance than lock-based barriers supported by existing microprocessors.
australasian user interface conference | 2001
Jolon Faichney; Ruben Gonzalez
A two-dimensional, zoomable, space filling user interface is presented for browsing conventional, hierarchical file system. Through user studies, the Goldleaf browser was compared with the widely used Microsoft Windows Explorer user interface. The times and number of mouse clicks to locate directories and files were recorded. The user studies found that the Goldleaf browser required less than half the mouse clicks to locate a directory compared with Windows Explorer. Through the use of document thumbnails, subjects were able to locate documents in less than two-thirds the time that it took using Windows Explorer. A majority of subjects felt that the ability of the Goldleaf browser to display multiple levels of the file system simultaneously was its most beneficial feature in completing the tasks. Subjects found that the Goldleaf browser required less mental and physical effort and was more enjoyable to use than Explorer.
BMC Health Services Research | 2017
Shelley Roberts; Wendy Chaboyer; Ruben Gonzalez; Andrea P. Marshall
BackgroundPatient participation in health care is associated with improved outcomes for patients and hospitals. New technologies are creating vast potential for patients to participate in care at the bedside. Several studies have explored patient use, satisfaction and perceptions of health information technology (HIT) interventions in hospital. Understanding what works for whom, under what conditions, is important when considering interventions successfully engaging patients in care. This realist review aimed to determine key features of interventions using bedside technology to engage hospital patients in their care and analyse these in terms of context, mechanisms and outcomes.MethodsA realist review was chosen to explain how and why complex HIT interventions work or fail within certain contexts. The review was guided by Pawson’s realist review methodology, involving: clarifying review scope; searching for evidence; data extraction and evidence appraisal; synthesising evidence and drawing conclusions. Author experience and an initial literature scope provided insight and review questions and theories (propositions) around why interventions worked were developed and iteratively refined. A purposive search was conducted to find evidence to support, refute or identify further propositions, which formed an explanatory model. Each study was ‘mined’ for evidence to further develop the propositions and model.ResultsInteractive learning was the overarching theme of studies using technology to engage patients in their care. Several propositions underpinned this, which were labelled: information sharing; self-assessment and feedback; tailored education; user-centred design; and support in use of HIT. As studies were mostly feasibility or usability studies, they reported patient-centred outcomes including patient acceptability, satisfaction and actual use of HIT interventions. For each proposition, outcomes were proposed to come about by mechanisms including improved communication, shared decision-making, empowerment and self-efficacy; which acted as facilitators to patient participation in care. Overall, there was a stronger representation of health than IT disciplines in studies reviewed, with a lack of IT input in terms of theoretical underpinning, methodological design and reporting of outcomes.ConclusionHIT interventions have great potential for engaging hospitalised patients in their care. However, stronger interdisciplinary collaboration between health and IT researchers is needed for effective design and evaluation of HIT interventions.
international conference on supercomputing | 2005
Ruben Gonzalez; Adrián Cristal; Miquel Pericàs; Mateo Valero; Alexander V. Veidenbaum
This paper proposes a new organization for clustered processors. Such processors have many advantages, including improved implementability and scalability, reduced power, and, potentially, faster clock speed. Difficulties lie in assigning instructions to clusters (steering) so as to minimize the effect of inter-cluster communication latency. The asymmetric clustered architecture proposed in this paper aims to increase the IPC and reduce power consumption by using two different types of integer clusters and a new steering algorithm. One type is a standard, 64b integer cluster, while the other is a very narrow, 20b cluster. The narrow cluster runs at twice the clock rate of the standard cluster.A new instruction steering mechanism is proposed to increase the use of the fast, narrow cluster as well as to minimize inter-cluster communication. Steering is performed by a history-based predictor, which is shown to be 98% accurate.The proposed architecture is shown to have a higher average IPC than its un-clustered equivalent for a four-wide issue processor, something that has never been achieved by previously proposed clustered organizations. Overall, a 3% increase in average IPC over an un-clustered design and a 8% over a symmetric cluster with dependence based steering are achieved for a 2-cycle intercluster communication latency.Part of the reason for higher IPC is the ability of the new architecture to execute most of the address computations as narrow, fast operations. The new architecture exploits its early knowledge of partial address values to achieve a 0-cycle address translation for 90% of all address computations, further improving performance.