Is this you? Create Your Porfile

Jeremy Constantin

École Polytechnique Fédérale de Lausanne

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jeremy Constantin is active.

Explore More

Publication

Featured researches published by Jeremy Constantin.

design, automation, and test in europe | 2012

Multi-core architecture design for ultra-low-power wearable health monitoring systems

Ahmed Yasir Dogan; Jeremy Constantin; Martino Ruggiero; Andreas Burg; David Atienza

Personal health monitoring systems can offer a cost-effective solution for human healthcare. To extend the lifetime of health monitoring systems, we propose a near-threshold ultra-low-power multi-core architecture featuring low-power cores, yet capable of executing biomedical applications, with multiple instruction and data memories, tightly coupled through flexible crossbar interconnects. This architecture also includes broadcasting mechanisms for the data and instruction memories to optimize system energy consumption by tailoring memory sharing to the target application. Moreover, the architecture enables power gating of the unused memory banks to lower leakage power. Our experimental results show that compared to the state-of-the-art, the proposed architecture achieves 39.5% power savings at high workload requirements (637 MOps/s), and 38.8% savings at low workload requirements (5 kOps/s), whereby leakage power consumption dominates.

ifip ieee international conference on very large scale integration | 2012

TamaRISC-CS: An ultra-low-power application-specific processor for compressed sensing

Jeremy Constantin; Ahmed Yasir Dogan; Oskar Andersson; Pascal Meinerzhagen; Joachim Neves Rodrigues; David Atienza; Andreas Burg

design, automation, and test in europe | 2015

Exploiting dynamic timing margins in microprocessors for frequency-over-scaling with instruction-based clock adjustment

Jeremy Constantin; Lai Wang; Georgios Karakonstantis; Anupam Chattopadhyay; Andreas Burg

Static timing analysis provides the basis for setting the clock period of a microprocessor core, based on its worst-case critical path. However, depending on the design, this critical path is not always excited and therefore dynamic timing margins exist that can theoretically be exploited for the benefit of better speed or lower power consumption (through voltage scaling). This paper introduces predictive instruction-based dynamic clock adjustment as a technique to trim dynamic timing margins in pipelined microprocessors. To this end, we exploit the different timing requirements for individual instructions during the dynamically varying program execution flow without the need for complex circuit-level measures to detect and correct timing violations. We provide a design flow to extract the dynamic timing information for the design using post-layout dynamic timing analysis and we integrate the results into a custom cycle-accurate simulator. This simulator allows annotation of individual instructions with their impact on timing (in each pipeline stage) and rapidly derives the overall code execution time for complex benchmarks. The design methodology is illustrated at the microarchitecture level, demonstrating the performance and power gains possible on a 6-stage OpenRISC in-order general purpose processor core in a 28nm CMOS technology. We show that employing instruction-dependent dynamic clock adjustment leads on average to an increase in operating speed by 38% or to a reduction in power consumption by 24%, compared to traditional synchronous clocking, which at all times has to respect the worst-case timing identified through static timing analysis.

Iet Circuits Devices & Systems | 2012

Low-power processor architecture exploration for online biomedical signal analysis

Ahmed Yasir Dogan; Jeremy Constantin; David Atienza; Andreas Burg; Luca Benini

In this study, the authors explore sequential and parallel processing architectures, utilising a custom ultra-low-power (ULP) processing core, to extend the lifetime of health monitoring systems, where slow biosignal events and highly parallel computations exist. To this end, a single- and a multi-core architecture are proposed and compared. The single-core architecture is composed of one ULP processing core, an instruction memory (IM) and a data memory (DM), while the multi-core architecture consists of several ULP processing cores, individual IMs for each core, a shared DM and an interconnection crossbar between the cores and the DM. These architectures are compared with respect to power/performance trade-offs for different target workloads of online biomedical signal analysis, while exploiting near threshold computing. The results show that with respect to the single-core architecture, the multi-core solution consumes 62% less power for high computation requirements (167 MOps/s), while consuming 46% more power for extremely low computation needs when the power consumption is dominated by leakage. Additionally, the authors show that the proposed ULP processing core, using a simplified instruction set architecture (ISA), achieves energy savings of 54% compared to a reference microcontroller ISA (PIC24).

2016 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XIX) | 2016

193 MOPS/mW @ 162 MOPS, 0.32V to 1.15V voltage range multi-core accelerator for energy efficient parallel and sequential digital processing

Davide Rossi; Antonio Pullini; Igor Loi; Michael Gautschi; Frank K. Gürkaynak; Adam Teman; Jeremy Constantin; Andreas Burg; Ivan Miro-Panades; Edith Beigne; Fabien Clermidy; Philippe Flatresse; Luca Benini

Low power (mW) and high performance (GOPS) are strong requirements for compute-intensive signal processing in E-health, Internet-of-Things, and wearable applications. This work presents a building block for programmable Ultra-Low Power accelerators, namely a tightly-coupled computing cluster that supports parallel and sequential execution at high energy efficiency over a wide range of workload requirements. The cluster, implemented in 28nm UTBB FD-SOI technology, achieves peak energy efficiency in the near-threshold (NVT) operating region: 193 MOPS/mW at 162 MOPS for parallel workloads, and 90 MOPS/mW at 68 MOPS for sequential workloads at 0.46V and 0.5V, respectively. The energy efficient operating range is wide (0.32V to 1.15V), also meeting the design goal of 1 GOPS within a 10 mW power envelope (at 0.66V).

embedded and ubiquitous computing | 2014

A Wireless Body Sensor Network for Activity Monitoring with Low Transmission Overhead

Rubén Braojos; Ivan Beretta; Jeremy Constantin; Andreas Burg; David Atienza

Activity recognition has been a research field of high interest over the last years, and it finds application in the medical domain, as well as personal healthcare monitoring during daily home- and sports-activities. With the aim of producing minimum discomfort while performing supervision of subjects, miniaturized networks of low-power wireless nodes are typically deployed on the body to gather and transmit physiological data, thus forming a Wireless Body Sensor Network (WBSN). In this work, we propose a WBSN for online activity monitoring, which combines the sensing capabilities of wearable nodes and the high computational resources of modern smart phones. The proposed solution provides different tradeoffs between classification accuracy and energy consumption, thanks to different workloads assigned to the nodes and to the mobile phone in different network configurations. In particular, our WBSN is able to achieve very high activity recognition accuracies (up to 97.2%) on multiple subjects, while significantly reducing the sampling frequency and the volume of transmitted data with respect to other state-of-the-art solutions.

design, automation, and test in europe | 2013

Synchronizing code execution on ultra-low-power embedded multi-channel signal analysis platforms

Ahmed Yasir Dogan; Rubén Braojos; Jeremy Constantin; Giovanni Ansaloni; Andreas Burg; David Atienza

Embedded biosignal analysis involves a considerable amount of parallel computations, which can be exploited by employing low-voltage and ultra-low-power (ULP) parallel computing architectures. By allowing data and instruction broadcasting, single instruction multiple data (SIMD) processing paradigm enables considerable power savings and application speedup, in turn allowing for a lower voltage supply for a given workload. The state-of-the-art multi-core architectures for biosignal analysis however lack a bare, yet smart, synchronization technique among the cores, allowing lockstep execution of algorithm parts that can be performed using the SIMD, even in the presence of data-dependent execution flows. In this paper, we propose a lightweight synchronization technique to enhance an ULP multi-core processor, resulting in improved energy efficiency through lockstep SIMD execution. Our results show that the proposed improvements accomplish tangible power savings, up to 64% for an 8-core system operating at a workload of 89 MOps/s while exploiting voltage scaling.

IFIP Advances in Information and Communication Technology; 418, pp 88-106 (2013) | 2013

An Ultra-Low-Power Application-Specific Processor for Compressed Sensing

Jeremy Constantin; Ahmed Yasir Dogan; Oskar Andersson; Pascal Meinerzhagen; Joachim Neves Rodrigues; David Atienza; Andreas Burg

Compressed sensing (CS) is a universal technique for the compression of sparse signals. CS has been widely used in sensing platforms where portable, autonomous devices have to operate for long periods of time with limited energy resources. Therefore, an ultra-low-power (ULP) CS implementation is vital for these kind of energy-limited systems. Sub-threshold (sub-VT) operation is commonly used for ULP computing, and can also be combined with CS. However, most established CS implementations can achieve either no or very limited benefit from sub-VT operation. Therefore, we propose a sub-VT application-specific instruction-set processor (ASIP), exploiting the specific operations of CS. Our results show that the proposed ASIP accomplishes 62x speed-up and 11.6x power savings with respect to an established CS implementation running on the baseline low-power processor.Compressed sensing (CS) is a universal low-complexity data compression technique for signals that have a sparse representation in some domain. While CS data compression can be done both in the analog- and digital domain, digital implementations are often used on low-power sensor nodes, where an ultra-low-power (ULP) processor carries out the algorithm on Nyquist-rate sampled data. In such systems an energy-efficient implementation of the CS compression kernel is a vital ingredient to maximize battery lifetime. In this paper, we propose an application-specific instruction-set processor (ASIP) processor that has been optimized for CS data compression and for operation in the subthreshold (sub-VT) regime. The design is equipped with specific sub-VT capable standard-cell based memories, to enable low-voltage operation with low leakage. Our results show that the proposed ASIP accomplishes 62× speed-up and 11.6× power savings with respect to a straightforward CS implementation running on the baseline low-power processor without instruction set extensions. (Less)

signal processing systems | 2017

An FPGA-Based 4 Mbps Secret Key Distillation Engine for Quantum Key Distribution Systems

Jeremy Constantin; Raphael Houlmann; Nicholas Preyss; Nino Walenta; Hugo Zbinden; Pascal Junod; Andreas Burg

Quantum key distribution (QKD) enables provably secure communication between two parties over an optical fiber that arguably withstands any form of attack. Besides the need for a suitable physical signalling scheme and the corresponding devices, QKD also requires a secret key distillation protocol. This protocol and the involved signal processing handle the reliable key agreement process over the fragile quantum channel, as well as the necessary post-processing of key bits to avoid leakage of secret key information to an eavesdropper. In this paper we present in detail an implementation of a key distillation engine for a QKD system based on the coherent one-way (COW) protocol. The processing of key bits by the key distillation engine includes agreement on quantum bit detections (sifting), information reconciliation with forward error correction coding, parameter estimation, and privacy amplification over an authenticated channel. We detail the system architecture combining all these processing steps, and discuss the design trade-offs for each individual system module. We also assess the performance and efficiency of our key distillation implementation in terms of throughput, error correction capabilities, and resource utilization. On a single-FPGA (Xilinx Virtex-6 LX240T) platform, the system supports distilled key rates of up to 4 Mbps.

application specific systems architectures and processors | 2012

Instruction Set Extensions for Cryptographic Hash Functions on a Microcontroller Architecture

Jeremy Constantin; Andreas Burg; Frank K. Gürkaynak

In this paper, we investigate the benefits of instruction set extensions (ISEs) on a 16-bit microcontroller architecture for software implementations of cryptographic hash functions,using the example of the five SHA-3 final round candidates. We identify the general algorithm bottlenecks, taking into account memory footprints and cycle counts of our optimized reference assembly implementations. We show that our target applications benefit from algorithm-specific ISEs based on finite state machines for address generation, lookup table integration, and extension of computational units through microcoded instructions.The gains in throughput, memory consumption, and the area overhead are assessed, by implementing the modified cores and applications utilizing the developed ISEs. Our results show that with less than 10% additional core area, it is possible to increase the execution speed on average by 172% (ranging from 21%to 703%), while reducing memory requirements on average by more than 40%.

Explore More