Georgios Karakonstantis

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Georgios Karakonstantis is active.

Explore More

Publication

Featured researches published by Georgios Karakonstantis.

international symposium on low power electronics and design | 2007

Low-power process-variation tolerant arithmetic units using input-based elastic clocking

Debabrata Mohapatra; Georgios Karakonstantis; Kaushik Roy

In this paper we propose a design methodology for low-power, high-performance, process-variation tolerant architecture for arithmetic units. The novelty of our approach lies in the fact that possible delay failures due to process variations and/or voltage scaling are predicted in advance and addressed by employing an elastic clocking technique. The prediction mechanism exploits the dependence of delay of arithmetic units upon input data patterns and identifies specific inputs that activate the critical path. Under iso-yield conditions, the proposed design operates at a lower scaled down Vdd without any performance degradation, while it ensures a superlative yield under a design style employing nominal supply and transistor threshold voltage. Simulation results show power savings of upto 29%, energy per computation savings of upto 25.5% and yield enhancement of upto 11.1% compared to the conventional adders and multipliers implemented in the 70nm BPTM technology. We incorporated the proposed modules in the execution unit of a five stage DLX pipeline to measure performance using SPEC2000 benchmarks [9]. Maximum area and throughput penalty obtained were 10% and 3% respectively.

IEEE Transactions on Very Large Scale Integration Systems | 2010

Process-Variation Resilient and Voltage-Scalable DCT Architecture for Robust Low-Power Computing

Georgios Karakonstantis; Nilanjan Banerjee; Kaushik Roy

In this paper, we present a novel discrete cosine transform (DCT) architecture that allows aggressive voltage scaling for low-power dissipation, even under process parameter variations with minimal overhead as opposed to existing techniques. Under a scaled supply voltage and/or variations in process parameters, any possible delay errors appear only from the long paths that are designed to be less contributive to output quality. The proposed architecture allows a graceful degradation in the peak SNR (PSNR) under aggressive voltage scaling as well as extreme process variations. Results show that even under large process variations (±3σ around mean threshold voltage) and aggressive supply voltage scaling (at 0.88 V, while the nominal voltage is 1.2 V for a 90-nm technology), there is a gradual degradation of image quality with considerable power savings (71% at PSNR of 23.4 dB) for the proposed architecture, when compared to existing implementations in a 90-nm process technology.

design automation conference | 2012

On the exploitation of the inherent error resilience of wireless systems under unreliable silicon

Georgios Karakonstantis; Christoph Roth; Christian Benkeser; Andreas Burg

In this paper, we investigate the impact of circuit misbehavior due to parametric variations and voltage scaling on the performance of wireless communication systems. Our study reveals the inherent error resilience of such systems and argues that sufficiently reliable operation can be maintained even in the presence of unreliable circuits and manufacturing defects. We further show how selective application of more robust circuit design techniques is sufficient to deal with high defect rates at low overhead and improve energy efficiency with negligible system performance degradation.

IEEE Transactions on Very Large Scale Integration Systems | 2010

Voltage Scalable High-Speed Robust Hybrid Arithmetic Units Using Adaptive Clocking

Swaroop Ghosh; Debabrata Mohapatra; Georgios Karakonstantis; Kaushik Roy

In this paper, we explore various arithmetic units for possible use in high-speed, high-yield ALUs operated at scaled supply voltage with adaptive clock stretching. We demonstrate that careful logic optimization of the existing arithmetic units (to create hybrid units) indeed make them further amenable to supply voltage scaling. Such hybrid units result from mixing right amount of fast arithmetic into the slower ones. Simulations on different hybrid adder and multipliers in BPTM 70 nm technology show 18%-50% improvements in power compared to standard adders with only 2%-8% increase in die-area at iso-yield. These optimized datapath units can be used to construct voltage scalable robust ALUs that can operate at high clock frequency with minimal performance degradation due to occasional clock stretching.

design, automation, and test in europe | 2015

Exploiting dynamic timing margins in microprocessors for frequency-over-scaling with instruction-based clock adjustment

Jeremy Constantin; Lai Wang; Georgios Karakonstantis; Anupam Chattopadhyay; Andreas Burg

Static timing analysis provides the basis for setting the clock period of a microprocessor core, based on its worst-case critical path. However, depending on the design, this critical path is not always excited and therefore dynamic timing margins exist that can theoretically be exploited for the benefit of better speed or lower power consumption (through voltage scaling). This paper introduces predictive instruction-based dynamic clock adjustment as a technique to trim dynamic timing margins in pipelined microprocessors. To this end, we exploit the different timing requirements for individual instructions during the dynamically varying program execution flow without the need for complex circuit-level measures to detect and correct timing violations. We provide a design flow to extract the dynamic timing information for the design using post-layout dynamic timing analysis and we integrate the results into a custom cycle-accurate simulator. This simulator allows annotation of individual instructions with their impact on timing (in each pipeline stage) and rapidly derives the overall code execution time for complex benchmarks. The design methodology is illustrated at the microarchitecture level, demonstrating the performance and power gains possible on a 6-stage OpenRISC in-order general purpose processor core in a 28nm CMOS technology. We show that employing instruction-dependent dynamic clock adjustment leads on average to an increase in operating speed by 38% or to a reduction in power consumption by 24%, compared to traditional synchronous clocking, which at all times has to respect the worst-case timing identified through static timing analysis.

allerton conference on communication, control, and computing | 2012

Data mapping for unreliable memories

Christoph Roth; Christian Benkeser; Christoph Studer; Georgios Karakonstantis; Andreas Burg

Future digital signal processing (DSP) systems must provide robustness on algorithm and application level to the presence of reliability issues that come along with corresponding implementations in modern semiconductor process technologies. In this paper, we address this issue by investigating the impact of unreliable memories on general DSP systems. In particular, we propose a novel framework to characterize the effects of unreliable memories, which enables us to devise novel methods to mitigate the associated performance loss. We propose to deploy specifically designed data representations, which have the capability of substantially improving the system reliability compared to that realized by conventional data representations used in digital integrated circuits, such as 2s-complement or sign-magnitude number formats. To demonstrate the efficacy of the proposed framework, we analyze the impact of unreliable memories on coded communication systems, and we show that the deployment of optimized data representations substantially improves the error-rate performance of such systems.

signal processing systems | 2009

System level DSP synthesis using voltage overscaling, unequal error protection & adaptive quality tuning

Georgios Karakonstantis; Debabrata Mohapatra; Kaushik Roy

In this paper, we propose a system level design approach considering voltage over-scaling (VOS) that achieves error resiliency using unequal error protection of different computation elements, while incurring minor quality degradation. Depending on user specifications and severity of process variations/channel noise, the degree of VOS in each block of the system is adaptively tuned to ensure minimum system power while providing “just-the-right” amount of quality and robustness. This is achieved, by taking into consideration system level interactions and ensuring that under any change of operating conditions only the “less-crucial” computations, that contribute less to block/system output quality, are affected. The design methodology applied to a DCT/IDCT system shows large power benefits (up to 69%) at reasonable image quality while tolerating errors induced by varying operating conditions (VOS, process variations, channel noise). Interestingly, the proposed IDCT scheme conceals channel noise at scaled voltages.

field-programmable custom computing machines | 2012

Shortening Design Time through Multiplatform Simulations with a Portable OpenCL Golden-model: The LDPC Decoder Case

Gabriel Falcao; Muhsen Owaida; David Novo; Madhura Purnaprajna; Nikolaos Bellas; Christos D. Antonopoulos; Georgios Karakonstantis; Andreas Burg; Paolo Ienne

Hardware designers and engineers typically need to explore a multi-parametric design space in order to find the best configuration for their designs using simulations that can take weeks to months to complete. For example, designers of special purpose chips need to explore parameters such as the optimal bit width and data representation. This is the case for the development of complex algorithms such as Low-Density Parity-Check (LDPC) decoders used in modern communication systems. Currently, high-performance computing offers a wide set of acceleration options, that range from multicore CPUs to graphics processing units (GPUs) and FPGAs. Depending on the simulation requirements, the ideal architecture to use can vary. In this paper we propose a new design flow based on Open CL, a unified multiplatform programming model, which accelerates LDPC decoding simulations, thereby significantly reducing architectural exploration and design time. Open CL-based parallel kernels are used without modifications or code tuning on multicore CPUs, GPUs and FPGAs. We use SOpen CL (Silicon to Open CL), a tool that automatically converts Open CL kernels to RTL for mapping the simulations into FPGAs. To the best of our knowledge, this is the first time that a single, unmodified Open CL code is used to target those three different platforms. We show that, depending on the design parameters to be explored in the simulation, on the dimension and phase of the design, the GPU or the FPGA may suit different purposes more conveniently, providing different acceleration factors. For example, although simulations can typically execute more than 3× faster on FPGAs than on GPUs, the overhead of circuit synthesis often outweighs the benefits of FPGA-accelerated execution.

design automation conference | 2015

Mitigating the impact of faults in unreliable memories for error-resilient applications

Shrikanth Ganapathy; Georgios Karakonstantis; Adam Teman; Andreas Burg

Inherently error-resilient applications in areas such as signal processing, machine learning and data analytics provide opportunities for relaxing reliability requirements, and thereby reducing the overhead incurred by conventional error correction schemes. In this paper, we exploit the tolerable imprecision of such applications by designing an energy-efficient fault-mitigation scheme for unreliable data memories to meet target yield. The proposed approach uses a bit-shuffling mechanism to isolate faults into bit locations with lower significance. This skews the bit-error distribution towards the low order bits, substantially limiting the output error magnitude. By controlling the granularity of the shuffling, the proposed technique enables trading-off quality for power, area, and timing overhead. Compared to error-correction codes, this can reduce the overhead by as much as 83% in read power, 77% in read access time, and 89% in area, when applied to various data mining applications in 28nm process technology.

international conference on computer design | 2011

Multi-level wordline driver for low power SRAMs in nano-scale CMOS technology

Farshad Moradi; Georgios Panagopoulos; Georgios Karakonstantis; Dag T. Wisland; Hamid Mahmoodi; Jens Kargaard Madsen; Kaushik Roy

In this paper, a multi-level wordline driver scheme is presented to improve SRAM read and write stability while lowering power consumption during hold operation. The proposed circuit applies a shaped wordline voltage pulse during read mode and a boosted wordline pulse during write mode. During read, the applied shaped pulse is tuned at nominal voltage for short period of time, whereas for the remaining access time, the wordline voltage is reduced to a lower level. This pulse results in improved read noise margin without any degradation in access time which is explained by examining the dynamic and nonlinear behavior of the SRAM cell. Furthermore, during hold mode, the wordline voltage starts from a negative value and reaches zero voltage, resulting in a lower leakage current compared to conventional SRAM. Our simulations using TSMC 65nm process show that the proposed wordline driver results in 2X improvement in static read noise margin while the write margin is improved by 3X. In addition, the total leakage of the proposed SRAM is reduced by 10% while the total power is improved by 12% in the worst case scenario of a single SRAM cell. The total area penalty is 10% for a 128Kb standard SRAM array.

Explore More