Vincent John Mooney | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vincent John Mooney is active.

Explore More

Publication

Featured researches published by Vincent John Mooney.

international symposium on systems synthesis | 2002

Round-robin Arbiter Design and Generation

Eung S. Shin; Vincent John Mooney; George F. Riley

In this paper, we introduce a Round-robin Arbiter Generator (RAG) tool. The RAG tool can generate a design for a Bus Arbiter (BA). The BA is able to handle the exact number of bus masters for both on-chip and off-chip buses. RAG can also generate a distributed and parallel hierarchical Switch Arbiter (SA). The first contribution of this paper is the automated generation of a round-robin token passing BA to reduce time spent on arbiter design. The generated arbiter is fair, fast, and has a low and predictable worst-case wait time. The second contribution of this paper is the design and integration of a distributed fast arbiter, e.g., for a terabit switch, based on 2/spl times/2 and 4/spl times/4 switch arbiters (SAs). Using a .25/spl mu/ TSMC standard cell library from LEDA Systems [10, 14], we show the arbitration time of a 256/spl times/256 SA for a terabit switch and demonstrate that the SA generated by RAG meets the time constraint to achieve approximately six terabits of throughput in a typical network switch design. Furthermore, our generated SA performs better than the Ping-Pong Arbiter and Programmable Priority Encoder by a factor of 1.9/spl times/ and 2.4/spl times/, respectively.

digital systems design | 2001

A comparison of five different multiprocessor SoC bus architectures

Kyeong Keol Ryu; Eung S. Shin; Vincent John Mooney

The performance of a system, especially multiprocessor system, heavily depends upon the efficiency of its bus architecture. In System-on-a-Chip (SoC), the bus architecture can be devised with advantages such as shorter propagation delay (resulting in a faster bus clock), larger bus width, and multiple buses. This paper presents five different SoC bus architectures for a multiprocessor system: Global Bus I Architecture (GBIA), Global Bus II Architecture (GBIIA), Bi-FIFO Bus Architecture (BFBA), Crossbar Switch Bus Architecture (CSBA), and CoreConnect Bus architecture (CCBA). The performance of these architectures is evaluated using applications from wireless communications-an Orthogonal Frequency Division Multiplexing (OFDM) transmitter-and from video processing-an MPEG2 decoder. To increase performance, these bus architectures employ a pipelined scheme, resulting in improved throughput. While all five bus architectures perform well, we find that BFBA and CSBA perform the best for the OFDM transmitter and the MPEG2 decoder, respectively.

real-time systems symposium | 2008

Task Scheduling for Control Oriented Requirements for Cyber-Physical Systems

Fumin Zhang; Klementyna Szwaykowska; Wayne H. Wolf; Vincent John Mooney

The wide applications of cyber-physical systems (CPS) call for effective design strategies that optimize the performance of both computing units and physical plants.We study the task scheduling problem for a class of CPS whose behaviors are regulated by feedback control laws. We co-design the control law and the task scheduling algorithm for predictable performance and power consumption for both the computing and the physical systems. We use a typical example, multiple inverted pendulums controlled by one processor, to illustrate our method.

IEEE Design & Test of Computers | 2002

A hardware-software real-time operating system framework for SoCs

Vincent John Mooney; Douglas M. Blough

The /spl delta/ framework for RTOS-SoC codesign helps designers simultaneously build a SoC or platform-ASIC architecture and a customized hardware-software RTOS. Examples generated by this prototype tool include RTOS designs that speed up applications by 27% or more, using a small amount of hardware area.

ACM Transactions in Embedded Computing Systems | 2007

Timing analysis for preemptive multitasking real-time systems with caches

Yudong Tan; Vincent John Mooney

In this paper, we propose an approach to estimate the worst case response time (WCRT) of tasks in a preemptive multi-tasking single-processor real-time system with a set associative cache. The approach focuses on analyzing the cache reload overhead caused by preemptions. We combine inter-task cache eviction behavior analysis and path analysis of the preempted task to reduce, in our analysis, the estimate of the number of cache lines that can possibly be evicted by the preempting task (thus requiring a reload by the preempted task). A mobile robot application which contains three tasks is used to test our approach. The experimental results show that our approach can tighten the WCRT estimate by up to 73% over prior state-of-the-art.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2004

Automated bus generation for multiprocessor SoC design

Kyeong Keol Ryu; Vincent John Mooney

The performance of a multiprocessor system heavily depends upon the efficiency of its bus architecture. This paper presents a methodology to generate a custom bus system for a multiprocessor system-on-a-chip (SoC). Our bus-synthesis tool, which we call BusSynth, uses this methodology to generate five different bus systems as examples: 1) bidirectional first-in first-out bus architecture; 2) global bus architecture (GBA) version I; 3) GBA version III; 4) hybrid bus architecture (Hybrid); and 5) split bus architecture. We verify and evaluate the performance of each bus system in the context of three applications: an orthogonal frequency division multiplexing wireless transmitter, an MPEG2 decoder, and a database example. Our methodology gives the designer a great benefit in the fast-design space exploration of bus architectures across a variety of performance impacting factors such as bus types, processor types, and software programming style. In this paper, we show that BusSynth can generate buses that, when compared to a typical general GBA, achieve superior performance (e.g., 41% reduction in execution time in the case of a database example). In addition, the bus architecture generated by BusSynth is designed in a matter of seconds instead of weeks for the hand design of a custom bus system.

power and timing modeling optimization and simulation | 2004

Sleepy Stack Reduction of Leakage Power

Jun-Cheol Park; Vincent John Mooney; Philipp Pfeiffenberger

Leakage power consumption of current CMOS technology is already a great challenge. ITRS projects that leakage power consumption may come to dominate total chip power consumption as the technology feature size shrinks. We propose a novel leakage reduction technique, named “sleepy stack,” which can be applied to general logic design. Our sleepy stack approach retains exact logic state – making it better than traditional sleep and zigzag techniques – while saving leakage power consumption. Unlike the stack approach (which saves state), the sleepy stack approach can work well with dual-V th technologies, reducing leakage by several orders of magnitude over the stack approach in single-V th technology. Unfortunately, the sleepy stack approach does have a area penalty (roughly 50~120%) as compared to stack technology; nonetheless, the sleepy stack approach occupies a niche where state-saving and extra low leakage is desired at a (potentially small) cost in terms of increased delay and area.

asia and south pacific design automation conference | 2003

A comparison of the RTU hardware RTOS with a hardware/software RTOS

Jaehwan Lee; Vincent John Mooney; Anders Daleby; Karl Ingström; Tommy Klevin; Lennart Lindh

In this paper, we show the performance comparison and analysis result among three RTOSes: the Real-Time Unit (RTU) hardware RTOS, the pure software Atalanta RTOS and a hardware/software RTOS composed of part of Atalanta interfaced to the System-on-a-Chip Lock Cache (SoCLC) hardware. We also present our RTOS configuration framework that can automatically configure these three RTOSes. The average-case simulation result of a database application example on a three-processor system running thirty tasks with RTU and the same system with SoCLC showed 36% and 19% overall speedups, respectively, as compared to the pure software RTOS system.

compilers, architecture, and synthesis for embedded systems | 2000

A dynamic memory management unit for embedded real-time system-on-a-chip

Mohamed Shalan; Vincent John Mooney

with global on-chip memory allocation/de-allocation in a dynamic yet deterministic way is an important issue for upcoming billion transistor multiprocessor System-on-a-Chip (SoC) designs. To achieve this, we propose a new memory management hierarchy called Two-Level Memory Management. To implement this memory management scheme - which presents a paradigm shift in the way designers look at on-chip dynamic memory allocation - we present a System-on-a-Chip Dynamic Memory Management Unit (SoCDMMU) for allocation of the global on-chip memory, which we refer to as level two memory management (level one is the operating system management of memory allocated to a particular on-chip processor). In this way, heterogeneous processors in an SoC can request and be granted portions of the global memory in twenty clock cycles in the worst case for a four-processor SoC, which is at least an order of magnitude faster than software-based memory management. We present a sample implementation of the SoCDMMU and compare hardware and software implementations.

Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627) | 2002

Hardware support for real-time embedded multiprocessor system-on-a-chip memory management

Mohamed Shalan; Vincent John Mooney

The aggressive evolution of the semiconductor industry smaller process geometries, higher densities, and greater chip complexity - has provided design engineers the means to create complex, high-performance Systems-on-a-Chip (SoC) designs. Such SoC designs typically have more than one processor and huge memory, all on the same chip. Dealing with the global onchip memory allocation/de-allocation in a dynamic yet deterministic way is an important issue for the upcoming billion transistor multiprocessor SoC designs. To achieve this, we propose a memory management hierarchy we call Two-Level Memory Management. To implement this memory management scheme which presents a paradigm shift in the way designers look at on-chip dynamic memory allocation - we present a System-on-a-Chip Dynamic Memory Management Unit (SoCDMMU) for allocation of the global on-chip memory, which we refer to as Level Two memory management (Level One is the operating system management of memory allocated to a particular on-chip Processing Element). In this way, processing elements (heterogeneous or non-heterogeneous hardware or software) in an SoC can request and be granted portions of the global memory in a fast and deterministic time (for an example of a four processing element SoC, the dynamic memory allocation of the global onchip memory takes sixteen cycles per allocation/deallocation in the worst case). In this paper, we show how to modify an existing Real-Time Operating System (RTOS) to support the new proposed SoCDMMU. Our example shows a multiprocessor SoC that utilizes the SoCDMMU has 440% overall speedup of the application transition time over fully shared memory that does not utilize the SoCDMMU.

Explore More