Daniel Etiemble | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel Etiemble is active.

Explore More

Publication

Featured researches published by Daniel Etiemble.

conference on high performance computing (supercomputing) | 2000

MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks

Franck Cappello; Daniel Etiemble

The hybrid memory model of clusters of multiprocessors raises two issues: programming model and performance. Many parallel programs have been written by using the MPI standard. To evaluate the pertinence of hybrid models for existing MPI codes, we compare a unified model (MPI) and a hybrid one (OpenMP fine grain parallelization after profiling) for the NAS 2.3 benchmarks on two IBM SP systems. The superiority of one model depends on 1) the level of shared memory model parallelization, 2) the communication patterns and 3) the memory access patterns. The relative speeds of the main architecture components (CPU, memory, and network) are of tremendous importance for selecting one model. With the used hybrid model, our results show that a unified MPI approach is better for most of the benchmarks. The hybrid approach becomes better only when fast processors make the communication performance significant and the level of parallelization is sufficient.

high performance computer architecture | 2000

Investigating the performance of two programming models for clusters of SMP PCs

Franck Cappello; Olivier Richard; Daniel Etiemble

Multiprocessors and high performance networks allow building CLUsters of MultiProcessors (CLUMPs). One distinctive feature over traditional parallel computers is their hybrid memory model (message passing between the nodes and shared memory inside the nodes). We evaluate the performance of a cluster of dual processor PCs connected by a Myrinet network for NAS benchmarks using two programming models: a Single Memory Model based on the MPICH-PM/CLUMP library of the RWCP and a Hybrid Memory Model using MPICH-PM and OpenMP. MPI programs are used as the reference in all experiments involving programming models. We compare dual processor node configurations speedup versus uniprocessor node configurations for each model. We demonstrate that the superiority of one model over the other depends on the features of the applications. In particular, we detail the speedup results from breakdowns of the benchmark execution times and from measurements of hardware counters.

international symposium on multiple valued logic | 1994

Performance of CMOS current mode full adders

Keivan Navi; A. Kazeminejad; Daniel Etiemble

We present the performance of three different multivalued current mode 1-bit adders. These circuits have been simulated with the electrical parameters of a standard 1.2 /spl mu/m CMOS technology. The performance of a binary voltage mode 1-bit adder is also presented. The binary version uses twice more transistors comparing with multivalued ones, but it is two or three times faster. Multivalued versions are more complicated to design and optimize. These results confirm the chip density advantage of multivalued circuits and the speed advantage of binary versions when using CMOS technologies.<<ETX>>

international symposium on multiple-valued logic | 1992

On the performance of multivalued integrated circuits: past, present and future

Daniel Etiemble

The characteristics of the successful m-valued I/sup 2/L and ROMs that have been designed in the past are examined, and the reasons for their success are discussed. The problems associated with scaling of m-valued CMOS current mode circuits are examined. The tolerance issue, the respective propagation delays of binary and m-valued ICs, and the interconnection issue are considered. The challenges for m-valued circuits in competition with the exponential performance increase of binary circuits are identified.<<ETX>>

international conference on electronics, circuits, and systems | 2011

A small footprint interleaved multithreaded processor for embedded systems

Charly Bechara; Aurelien Berhault; Nicolas Ventroux; Stéphane Chevobbe; Yves Lhuillier; Raphaël David; Daniel Etiemble

With the increase in the design complexity of MPSoC architectures and the need for more transistor/energy efficient processor architectures, designers are exploiting the parallelism at the thread level (TLP) through the implementation of embedded multithreaded processors. Moreover, future manycore architectures tend to use small footprint RISC cores. In this paper, we present a small footprint, scalar, in-order, 5-stage pipeline, interleaved multithreaded processor with 2 hardware thread contexts for embedded systems and SoC integration. Synthesis results in 40 nm TSMC shows that the multithreaded core area is only 19800 μm2 and 13.97 kilogates, which is almost equal to a 4KB direct mapped cache memory according to CACTI 6.5 tool [1]. The IMT core has an augmentation of 73.2% in core area compared to the monothreaded core. The multithreaded core is validated by running a simple bubble-sort application and varying the L1 D

ifip ieee international conference on very large scale integration | 2011

High Performance SoC Design Using Magnetic Logic and Memory

Weisheng Zhao; Lionel Torres; Luís Vitório Cargnini; Raphael Martins Brum; Yue Zhang; Yoann Guillemenet; Gilles Sassatelli; Yahya Lakys; Jacques-Olivier Klein; Daniel Etiemble; D. Ravelosona; C. Chappert

memory. The average performance gain is 17% compared to the monothreaded core.

Future Generation Computer Systems | 2001

Understanding performance of SMP clusters running MPI programs

Franck Cappello; Olivier Richard; Daniel Etiemble

As the technolody node shrinks down to 90nm and below, high standby power becomes one of the major critical issues for CMOS highspeed computing circuits (e.g. logic and cache memory) due to the high leakage currents. A number of non-volatile storage technologies, such as FRAM, MRAM, PCRAM and RRAM, are under investigation to bring the non-volatility into the logic circuits and then eliminate completely the standby power issue. Thanks to its infinite endurance, high switching/sensing speed and easy integration on top of CMOS process, MRAM is considered as the most promising one. Numerous logic circuits based on MRAM technology have been proposed and prototyped in the last years. In this paper, we present an overview and current status of these logic circuits and discuss their potential applications in the future from both physical and architectural points of view.

symposium on computer arithmetic | 1993

Algorithms and multi-valued circuits for the multioperand addition in the binary stored-carry number system

Daniel Etiemble; Keivan Navi

Abstract Clusters of multiprocessors (CLUMPs) have an hybrid memory model, with message passing between nodes and shared memory inside nodes. We examine the performance of Myrinet clusters of SMP PCs when using a single memory model (SMM) based on the MPICH-PM/CLUMP library of the RWCP, which can directly use the MPI programs written for a cluster of uniprocessors. The specificities of the communication patterns with the SMM approach are detailed. PC clusters with 2-way and 4-way nodes are considered and compared.

embedded systems for real-time multimedia | 2005

Customizing 16-bit floating point instructions on a NIOS II processor for FPGA image and media processing

Daniel Etiemble; Samir Bouaziz; Lionel Lacassagne

Algorithms for the sum of two (three and four) digits in the binary stored-carry number system, using the smallest set of values for the positional sum, are presented. The corresponding adders, which use multivalued current-mode circuits, are also presented. The implementation of multioperand additions using these adders is compared with the usual binary implementation.<<ETX>>

International Journal of Parallel Programming | 2013

Parallel Smith-Waterman Comparison on Multicore and Manycore Computing Platforms with BSP++

Khaled Hamidouche; Fernando Machado Mendonca; Joel Falcou; Alba Cristina Magalhaes Alves de Melo; Daniel Etiemble

We have implemented customized SIMD 16-bit floating point instructions on a NIOS II processor. On several image processing and media benchmarks for which the accuracy and dynamic range of this format is sufficient, a speed-up ranging from 1.5 to more than 2 is obtained versus the integer implementation. The hardware overhead remains limited and is compatible with the capacities of todays FPGAs.

Explore More