Jean-Luc Gaudiot | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jean-Luc Gaudiot is active.

Explore More

Publication

Featured researches published by Jean-Luc Gaudiot.

IEEE Transactions on Computers | 1990

Network resilience: a measure of network fault tolerance

Walid A. Najjar; Jean-Luc Gaudiot

A probabilistic measure of network fault tolerance expressed as the probability of a disconnection is proposed. Qualitative evaluation of this measure is presented. As expected, the single-node disconnection probability is the dominant factor irrespective of the topology under consideration. The authors derive an analytical approximation to the disconnection probability and verify it with a Monte Carlo simulation. On the basis of this model, the measures of network resilience and relative network resilience are proposed as probabilistic measures of network fault tolerance. These are used to evaluate the effects of the disconnection probability on the reliability of the system. >

IEEE Transactions on Computers | 2006

A Simple High-Speed Multiplier Design

Jung-Yup Kang; Jean-Luc Gaudiot

The performance of multiplication is crucial for multimedia applications such as 3D graphics and signal processing systems, which depend on the execution of large numbers of multiplications. Previously reported algorithms mainly focused on rapidly reducing the partial products rows down to final sums and carries used for the final accumulation. These techniques mostly rely on circuit optimization and minimization of the critical paths. In this paper, an algorithm to achieve fast multiplication in twos complement representation is presented. Rather than focusing on reducing the partial products rows down to final sums and carries, our approach strives to generate fewer partial products rows. In turn, this influences the speed of the multiplication, even before applying partial products reduction techniques. Fewer partial products rows are produced, thereby lowering the overall operation time. In addition to the speed improvement, our algorithm results in a true diamond-shape for the partial product tree, which is more efficient in terms of implementation. The synthesis results of our multiplication algorithm using the Artisan TSMC 0.13mum 1.2-volt standard-cell library show 13 percent improvement in speed and 14 percent improvement in power savings for 8-bit times 8-bit multiplications (10 percent and 3 percent, respectively, for 16-bit times 16-bit multiplications) when compared to conventional multiplication algorithms

IEEE Transactions on Parallel and Distributed Systems | 2002

SMT layout overhead and scalability

James S. Burns; Jean-Luc Gaudiot

Simultaneous Multi-Threading (SMT) is a hardware technique that increases processor throughput by issuing instructions simultaneously from multiple threads. However, while SMT can be added to an existing microarchitecture with relatively low overhead, this additional chip area could be used for other resources such as more functional units, larger caches, or better branch predictors. How large is the SMT overhead and at what point does SMT no longer pay off for maximum throughput compared to adding other architecture features? This paper evaluates the silicon overhead of SMT by performing a transistor/interconnect-level analysis of the layout. We discuss microarchitecture issues that impact SMT implementations and show how the Instruction Set Architecture (ISA) and microarchitecture can have a large effect on the SMT overhead and performance. Results show that SMT yields large performance gains with small to moderate area overhead.

Proceedings of IEEE International Symposium on Parallel Algorithms Architecture Synthesis | 1997

The Sisal model of functional programming and its implementation

Jean-Luc Gaudiot; Wim Bohm; Walid A. Najjar; Tom DeBoni; John Feo; Patrick Miller

Programming a massively-parallel machine is a daunting task for any human programmer, and parallelization may even be impossible for any compiler. Instead, the functional programming paradigm may prove to be an ideal solution by providing an implicitly parallel interface to the programmer. We describe the Sisal (Stream and Iteration in a Single Assignment Language) project. Its goal is to provide a general-purpose user interface for a wide range of parallel processing platforms.

IEEE Transactions on Computers | 1986

Sructure Handling in Data-Flow Systems

Jean-Luc Gaudiot

Data-flow languages have been hailed as the solution to the programmability of general-purpose multiprocessors. However, data-flow semantics introduce constructs that lead to much overhead at compilation, allocation, and execution time. Indeed, due to its functionality, the data-flow model of computation does not handle repetitive program constructs very efficiently. This is due to the fact that the cornerstone of data flow, namely the concept of single assignment, is opposed to the idea of reexecution of a portion of program as in a loop. A corollary of this problem is the effective representation, storage, and processing of data structures, as these will most often be used in loops. In this paper, various aspects of this issue are explailned in detail. Several solutions that have been put forward in the current literature are then surveyed and analyzed. In order to offset some of the disadvantages presented by these, we introduce new methods for handling arrays. In the first one, we raise the level of computation to that of arrays for more efficient operation. In the two others, the opposite approach is taken, and the notion of array is done away with entirely at the execution level in order to take advantage of the data-flow semantics at their best logical level of performance.

international conference on parallel architectures and compilation techniques | 2001

Area and System Clock Effects on SMT/CMP Processors

James S. Burns; Jean-Luc Gaudiot

Two approaches to high throughput processors are chip multiprocessing (CMP) and simultaneous multi-threading (SMT). CMP increases layout efficiency, which allows more functional units and a faster clock rate. However, CMP suffers from hardware partitioning of functional resources. SMT increases functional unit utilization by issuing instructions simultaneously from multiple threads. However, a wide-issue SMT suffers from layout and technology implementation problems. We use silicon resources as our basis for comparison and find that area and system clock have a large effect on the optimal SMT/CCMP design trade. We show the area overhead of SMT on each processor and how it scales with the width of the processor pipeline and the number of SMT threads. The wide issue SMT delivers the highest single-thread performance with improved multi-thread throughput. However multiple smaller cores deliver the highest throughput.

IEEE Transactions on Computers | 2012

Synchronization-Aware Energy Management for VFI-Based Multicore Real-Time Systems

Jian-Jun Han; Xiaodong Wu; Dakai Zhu; Hai Jin; Laurence T. Yang; Jean-Luc Gaudiot

Voltage and frequency island (VFI) was recently adopted as an effective energy management technique for multicore processors. For a set of periodic real-time tasks that access shared resources running on a VFI-based multicore system with dynamic voltage and frequency scaling (DVFS) capability, we study both static and dynamic synchronization-aware energy management schemes. First, based on the enhanced MSRP resource access protocol with a suspension mechanism, we devise a synchronization-aware task mapping heuristic for partitioned-EDF scheduling, which assigns tasks that access similar set of resources to the same core to reduce the synchronization overhead and thus improve schedulability. Then, static schemes that assign uniform and different scaled frequencies for tasks on different VFIs are studied. To further exploit dynamic slack, we propose an integrated synchronization-aware slack management framework to appropriately reclaim, preserve, release and steal slack at runtime to slow down the execution of tasks subject to the common voltage/frequency limitation of VFIs and timing/synchronization constraints of tasks. Taking the additional delay due to task synchronization into consideration, the new scheme allocates slack in a fair manner and scales down the execution of both noncritical and critical sections of tasks for more energy savings. Simulation results show that, the synchronization-aware mapping can significantly improve the schedulability of tasks. The energy savings obtained by the static scheme with different frequencies for tasks on different VFIs is close to that of an optimal Integer Nonlinear Programming (INLP) solution. Moreover, compared to the simple extension of existing solutions for uniprocessor systems, our schemes can obtain much better energy savings (up to 40 percent) with comparable DVFS overhead.

IEEE Computer Architecture Letters | 2011

Prefetching in Embedded Mobile Systems Can Be Energy-Efficient

Jie Tang; Shaoshan Liu; Zhimin Gu; Chen Liu; Jean-Luc Gaudiot

Data prefetching has been a successful technique in high-performance computing platforms. However, the conventional wisdom is that they significantly increase energy consumption, and thus not suitable for embedded mobile systems. On the other hand, as modern mobile applications pose an increasing demand for high performance, it becomes essential to implement high-performance techniques, such as prefetching, in these systems. In this paper, we study the impact of prefetching on the performance and energy consumption of embedded mobile systems. Contrary to the conventional wisdom, our findings demonstrate that as technology advances, prefetching can be energy-efficient while improving performance. Furthermore, we have developed a simple but effective analytical model to help system designers to identify the conditions for energy efficiency.

international parallel and distributed processing symposium | 2003

Dynamic scheduling issues in SMT architectures

Chul-Ho Shin; Seong-Won Lee; Jean-Luc Gaudiot

Simultaneous multithreading (SMT) attempts to attain higher processor utilization by allowing instructions from multiple independent threads to coexist in a processor and compete for shared resources. Previous studies have shown, however, that its throughput may be limited by the number of threads. A reason is that a fixed thread scheduling policy cannot be optimal for the varying mixes of threads it may face in an SMT processor. Our adaptive dynamic thread scheduling (ADTS) was previously proposed to achieve higher utilization by allowing a detector thread to make use of wasted pipeline slots with nominal hardware and software costs. The detector thread adaptively switches between various fetch policies. Our previous study showed that a single fixed thread scheduling policy presents much room (some 30%) for improvement compared to an oracle-scheduled case. In this paper, we take a closer look at ADTS. We implemented the functional model of the ADTS and its software architecture to evaluate various heuristics for determining a better fetch policy for a next scheduling quantum. We report that performance could be improved by as much as 25%.

IEEE Transactions on Computers | 2009

Potential Impact of Value Prediction on Communication in Many-Core Architectures

Shaoshan Liu; Jean-Luc Gaudiot

The newly emerging many-core-on-a-chip designs have renewed an intense interest in parallel processing. By applying Amdahls formulation to the programs in the PARSEC and SPLASH-2 benchmark suites, we find that most applications may not have sufficient parallelism to efficiently utilize modern parallel machines. The long sequential portions in these application programs are caused by computation as well as communication latency. However, value prediction techniques may allow the ldquoparallelizationrdquo of the sequential portion by predicting values before they are produced. In conventional superscalar architectures, the computation latency dominates the sequential sections. Thus, value prediction techniques may be used to predict the computation result before it is produced. In many-core architectures, since the communication latency increases with the number of cores, value prediction techniques may be used to reduce both the communication and computation latency. In this paper, we extend Amdahls formulation to model the data redundancy inherent to each benchmark, thereby identifying the potential of value prediction techniques. Our analysis shows that the performance of PARSEC benchmarks may improve by a factor of 180 and 230 percent for the SPLASH-2 suite, compared to when only the intrinsic parallelism is considered. This demonstrates the immense potential of fine-grained value prediction in reducing the communication latency in many-core architectures.

Explore More