Is this you? Create Your Porfile

Lucian Petrica

Politehnica University of Bucharest

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lucian Petrica is active.

Explore More

Publication

Featured researches published by Lucian Petrica.

ieee faible tension faible consommation | 2013

VASILE: A reconfigurable vector architecture for instruction level frequency scaling

Lucian Petrica; Valeriu Codreanu; Sorin Cotofana

Coarse-grained dynamic frequency scaling has been extensively utilised in embedded (multiprocessor) platforms to achieve energy reduction and by implication to extend the autonomy and battery lifetime. In this paper we propose to make use of fine-grained frequency scaling, i.e., adjust the frequency at instruction level, to increase the instruction throughput of a FPGA implemented Vector Processor (VP). We introduce a VP architectural template and an associated design methodology that enables the creation of application requirements tailored VP instances. For each instance, the data-path delays of individual instructions are optimized separately, guided by profiling data corresponding to the target application class, maximizing the performance of frequently utilised instructions to the detriment of those which are less often executed. In this way instructions are divided into clock frequency classes according to their data-path delay and at run time the clock frequency is scaled to the value required by the class of the to be executed instruction. During the application execution different VP instances are dynamically configured in FPGA in order to create the most appropriate hardware support for optimizing the application performance in terms of throughput without increasing power consumption, and therefore reducing energy. As operating frequency changes induce a certain time penalty, which may potentially diminish the actual performance gain, the application code is optimised during the compilation in order to reduce the number of runtime clock switches via, e.g., loop tiling, instruction clustering. We evaluate the effectiveness of the proposed approach on several computational kernels used in image processing applications, i.e., sum of absolute differences, sum of squared differences, and Gaussian filtering. Our results indicate that an average instruction throughput increase of 20%, and a 15 % energy consumption reduction are achieved due to the utilisation of runtime reconfiguration and fine-grained frequency scaling.

telecommunications forum | 2013

Cognitive radio testing framework based on USRP

Alexandru Martian; Lucian Petrica; Octavian Radu

Spectrum scarcity has become one of the most important problems to be solved in order to ensure the coexistence of the large number of modern communication systems. Cognitive radio (CR) technology has been considered a promising solution for enabling dynamic spectrum access and by that addressing this problem. Since its inception more than 10 years ago, several standardization activities have contributed to illustrating the potential of CR for commercial use. Meanwhile, worldwide efforts from agencies such as the Federal Communications Commission (FCC) have been deployed in order to remove regulatory barriers for future development of CR networks. A flexible testing framework for CR devices implemented using the Universal Software Radio Peripheral (USRP) Software Defined Radio (SDR) platform is proposed and described, together with a case study on a commercially available device. The requirements imposed by the 802.11h standard are verified on this device and a number of discovered issues are presented and discussed.

international semiconductor conference | 2017

Baptista's chaos-based cipher implemented in a field programmable gate array

Octaviana Datcu; Radu Hobincu; Lucian Petrica

In the context of the current struggle for information security and computational efficacy, this paper studies Baptistas chaos-based encryption cipher as a resource-efficient alternative to the more popular block cipher algorithms. We evaluate the cipher by encrypting different types of data — text, images and sound — and we present the analysis of cyphertext statistical distribution and obfuscation characteristics. Simulation results illustrate the effectiveness of the algorithm on multimedia content. We present an implementation of the cipher as a digital system in a Xilinx Zynq 7000 series FPGA, and evaluate hardware resource utilization and maximum obtainable frequency. The cipher source code — Matlab simulation and the Verilog hardware description — is made available as a GIT repository. The aim of the demarche is to provide digital circuitry which may be easily integrated in a hybrid analog-digital cryptosystem which employs a dynamically changing secret key.

international conference on electronics computers and artificial intelligence | 2017

FPGA systolic array GZIP compressor

Ovidiu Plugariu; Alexandru Dumitru Gegiu; Lucian Petrica

In this paper we present a complete, open-source GZIP compressor implementation for FPGA based on a systolic array architecture. GZIP is one of the most utilized compression algorithms. Besides the usual use-case of compression for data storage, distributed computing systems such as Hadoop utilize compression to reduce the amount of data which is transferred between computing nodes in a cluster. However, compression with GZIP requires significant amounts of CPU processing power, negating some of the advantages of the compressed-transfer approach in distributed systems. We have designed, implemented and tested a hardware architecture and software application for compressing files using a hardware GZIP compressor. The system presented in this paper offloads GZIP compression from the host CPU to one or more systolic GZIP compression cores in FPGA, thereby reducing latency caused by compression and freeing up the CPU for other computing tasks. We implemented and evaluated a single GZIP compression core in a ML605 development board, equipped with a Xilinx Virtex 6 FPGA, utilizing Xillybus for data transfers over PCI Express. Our results indicate the peak compression throughput of our implementation is over 1.3 Gbps and an average throughput of 52 Mbps on the Calgary corpus. Our FPGA compression solution is at least twice as fast as software compression on an Intel Core i7, in all evaluated scenarios, and up to 18× faster for large files. The project source code is publicly available online1.

e health and bioengineering conference | 2017

Framework for an embedded emotion assessment system for space science applications

Andra Baltoiu; Lucian Petrica; Adrian Dinculescu; Cristian Vizitiu

In the context of a growing need for dedicated solutions in the field of human space mission support, this paper proposes an emotion assessment system and its dedicated implementation architecture. The proposed system is described and a pilot implementation of the emotion assessment system is developed utilizing open source software OpenFace for detection of key facial features, called Action Units (AUs), in image sequences. The Cohn-Kanade extended database is utilized to train a neural network (ANN) for detecting emotions from AU values. The system is evaluated in a preliminary study with respect to AU detection accuracy. The correlation between the area underneath the ROC score is obtained by applying OpenFace on the CK+ database and the benchmark score is 0.61. On the AU presence detection task, the OpenFace F1 score on the CK+ database is well within the standard deviation of the scores obtained on 4 other databases. The AU and ANN execution performance is evaluated on the Jetson TK1, a low power embedded platform, and performance bottlenecks are identified.

design, automation, and test in europe | 2015

Hybrid adaptive clock management for FPGA processor acceleration

Alexandru Gheolbanoiu; Lucian Petrica; Sorin Cotofana

As FPGAs speed, power efficiency, and logic capacity are increasing, so does the number of applications which make use of FPGA processors. However, due to placement and routing constraints, FPGA processors instruction delay balancing is a real challenge, especially when the implementation approaches the FPGA resource capacity. Consequently, even though some instructions can operate at high frequencies, the slow instructions determine the processor clock period, resulting in the underutil-isation of the processor potential. However, the fast instructions latent performance may be harnessed through Adaptive Clock Management (ACM), i.e., by dynamically adapting the clock frequency such that each instruction gets sufficient time for correct completion. Up to date, ACM augmented FPGA processors have been proposed based on Clock Multiplexing (CM), but they suffer from long clock switching delays, which could nullify most of the ACM potential performance gain. This paper proposes an effective FPGA tailored clock manipulation approach able to leverage the ACM potential. We first evaluate Clock Stretching (CS), i.e., the temporary clock period augmentation, as a CM alternative in FPGA processor designs and introduce an FPGA specific CS circuit implementation. Subsequently, we evaluate the advantages and drawbacks of the two techniques and propose a Hybrid ACM, which monitors the processor instruction stream and determines the optimal adaptive clocking strategy in order to provide the maximum speedup for the executing program. Given that CS has very low latency at the expense of limited accuracy and dynamic range we rely on it when the program requires frequent clock period changes. Otherwise we utilise CM, which is rather slow but enables the FPGA processor operation at the edge of its hardware capabilities. We evaluate our proposal on a vector processor mapped on a Xilinx Zynq FPGA. Our experiments indicate that on Sum of Squared Differences algorithm, Neural network, and FIR filter execution traces the hybrid ACM provides up to 14% performance increase over the CM based ACM.

international symposium on electronics and telecommunications | 2012

Dynamic power management through adaptive task scheduling for multi-threaded SIMD processors

Lucian Petrica

Power management is one of the most important issues in computer architecture today. Devices often operate on the edge of their thermal envelope and system designers must balance the power consumption of various system components in order to ensure safe operation. This paper proposes an adaptive scheduler for a multi-threaded SIMD processor which is able to trade performance for power consumption in order to stay within a given power budget. By moving threads between processor cores, the scheduler is able to create more opportunity for the use of aggressive power management techniques like clock gating. Evaluation shows the proposed algorithm enables more power savings than frequency scaling for various synthetic workloads.

international semiconductor conference | 2010

Technology driven architecture for integral parallel embedded computing

Petronela Bumbacea; Valeriu Codreanu; Radu Hobincu; Lucian Petrica; Gheorghe Stefan

The computational structures are not able to scale following the increased number of components offered by the technological development driven by the Moores law. In order to use efficiently the emerging nanotechnologies new architectural approaches are requested. Thus, new technology driven architectures must be developed. The proposed architecture is designed in this technologically evolving context, to support the increasing computational diversity, complexity and intensity requested in the emergent domain of parallel embedded computing. The resulting physical embodiment has at least two orders of magnitude higher effective GIPS/Watt and GIPS/mm2 than the currently produced structures. This new architectural approach is based on ConnexArray™ technology, already developed and tested on real chips, and on the Bubble-free Embedded Architecture for Multithreading execution model. The paper proposes a computational platform able to manage tens of threads and a number of execution/processing units which starts from tens and goes up to thousands.

international conference on communications | 2014