Isidoros Sideris
National Technical University of Athens
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Isidoros Sideris.
international on line testing symposium | 2008
Kiamal Z. Pekmestzi; Nicholas Axelos; Isidoros Sideris; Nikos K. Moshopoulos
In this paper a BISR architecture for embedded memories is presented. The proposed scheme utilises a multiple bank cache-like memory for repairs. Statistical analysis is used for minimisation of the total resources required to achieve a very high fault coverage. Simulation results show that the proposed BISR scheme is characterised by high efficiency and low area overhead, even for high defect densities. On a 4 Mbit memory and an average number of 1024 memory defects per IC, a repair ratio of 100% and over 90% require less than 2% and 1% memory overhead respectively.
java technologies for real-time and embedded systems | 2006
Isidoros Sideris; George Economakos; Kiamal Z. Pekmestzi
Java processors have been introduced to offer hardware acceleration for java applications. They execute java bytecodes directly in hardware. However, the stack nature of the java virtual machine instruction set imposes a limitation on the achievable execution performance. If we intend to exploit instruction level parallelism, we must remove the stack completely. This can be achieved by recursive stack folding algorithms, such as OPEX, which dynamically transform groups of java bytecodes to RISC like instructions. However, the decoding throughputs that are obtained are limited. In this paper we propose a novel stack folding technique, that uses a predecoded cache to store folded bytecodes, thus enabling reuse. The decoding throughput reaches 4 RISC instructions per cycle. With use of a superscalar backend core, the obtained IPC is approximately 2.08 instructions per cycle (or 3.02 java bytecodes per cycle).
international conference on electronics, circuits, and systems | 2007
Dimitris Bekiaris; Isidoros Sideris; George Economakos; Kiamal Z. Pekmestzi
This paper presents a new programmable finite-impulse response (FIR) digital filter scheme based on a low latency, power efficient architecture with reduced hardware complexity. In the proposed scheme, the input data is kept in bit-parallel form, while the coefficients enter the circuit in digit-serial form. The coefficient digits are encoded using the Modified Booth algorithm to reduce the partial products required for each multiplication. The structure of the filter is based on the technique of merging adjacent multiply-add units. The computation of the intermediate results is implemented using the carry-save arithmetic. Also, the coefficient digits of adjacent multiply-add units enter the filter in digit-skew form, while the input data sample remains stable until the relative output sample is produced. Thus, the proposed architecture results in a circuit with reduced hardware cost and lower power consumption, compared to other schemes presented in the bibliography.
IEEE Transactions on Computers | 2012
Isidoros Sideris; Kiamal Z. Pekmestzi
This paper presents low cost techniques for error detection and correction in Ternary Content Addressable Memories (TCAMs). The techniques exploit the inherent redundancy of TCAM cells to allow for protection at lower cost. A fault detection technique with the cost of parity but with about the half probability of silent data corruption is proposed. This technique is then applied at both horizontal and vertical dimensions of the TCAM array, and a low cost error correction scheme is derived. Last, another error correction scheme is proposed, which employs a SECDED ECC of the half complexity, by making use of the TCAM redundancy, without compromising single bit error correction. The proposed schemes come with minimal area, power, and critical path overheads, in comparison with standard schemes, and they are good alternatives for TCAM arrays protection.
acm symposium on applied computing | 2010
Nikos Anastasiadis; Isidoros Sideris; Kiamal Z. Pekmestzi
Real time video is used in a wide variety of applications, ranging from video surveillance to medical imaging. These operations require significant amounts of processing power, especially when high resolution frames are used. A large percentage of processing time is used in edge detection kernels. Thus, accelerating these kernels is of vital importance in achieving satisfactory frame rates for real time performance, even in high resolutions. This paper proposes a hardware coprocessor to the Xilinx Microblaze processor which accelerates edge detection significantly, while keeping the hardware requirements low, by using no multipliers at all. Using a Xilinx Spartan 3E FPGA, we have reported a frame rate of 157 frames per second in 4CIF format, which corresponds to a 4x speedup over the software only solution. The speedup was achieved with only 1131 slices and 5 block RAMs hardware occupation, which makes the solution very attractable.
Journal of Systems Architecture | 2008
Isidoros Sideris; Kiamal Z. Pekmestzi; George Economakos
Java processors have been introduced to offer hardware acceleration for Java applications. They execute Java bytecodes directly in hardware. However, the stack nature of the Java virtual machine instruction set imposes a limitation on the achievable execution performance. In order to exploit instruction level parallelism and allow out of order execution, we must remove the stack completely. This can be achieved by recursive stack folding algorithms, such as OPEX, which dynamically transform groups of Java bytecodes to RISC like instructions. However, the decoding throughputs that are obtained are limited. In this paper, we explore microarchitectural techniques to improve the decoding throughput of Java processors. Our techniques are based on the use of a predecoded cache to store the folding results, so that it could be reused. The ultimate goal is to exploit every possible instruction level parallelism in Java programs by having a superscalar out of order core in the backend being fed at a sustainable rate. With the use of a predecoded cache of 2x2048 entries and a 4-way superscalar core we have from 4.8 to 18.3 times better performance than an architecture employing pattern based folding.
Integration | 2013
Isidoros Sideris; Kiamal Z. Pekmestzi
This paper presents a low cost fault detection mechanism for FIFO buffers. The scheme is based on column parity maintenance in a single register, which is updated by monitoring the values written to and read from the FIFO memory array. A non-zero column parity when the FIFO is empty, constitutes an indication of fault, and this property is exploited for fault detection. The technique has gains in area, power and critical path delay, at the expense of (1) greater detection latency, due to the need for the FIFO to become empty in order to assert a violation and (2) worse Silent Data Corruption (SDC) rate.
Microprocessors and Microsystems | 2009
Isidoros Sideris; Kiamal Z. Pekmestzi; George Economakos
Java has gained great popularity in embedded appliances such as set-top boxes, smart phones and other hand held devices. In this paper we propose a translation based hw/sw codesigned Java virtual machine architecture, which extends a typical embedded RISC processor. The architectural extensions we propose include special instructions that accelerate translated blocks dispatch and security checks for arrays and objects. The extensions are done in a way that operating systems support is maintained, something that makes their adoption more attractive. Benchmarking using Embedded Caffeine Mark (ECM) benchmarks, showed significant speedups, especially when high performance RISC processors are employed.
international symposium on circuits and systems | 2006
Paul Bougas; Andreas Tsirikos; Kostas Anagnostopoulos; Isidoros Sideris; Kiamal Z. Pekmestzi
In this paper, a novel architecture for the implementation of serial parallel multipliers (SPM) is proposed. The proposed multiplier is based on a segmentation technique of a simple SPM to blocks of equal bit length. This multiplier achieves higher throughput because it requires small number of zeros to start a new multiplication cycle at a moderate hardware expense and achieves significant hardware reduction compared to the double precision SPM. The proposed technique permits the optimization of the area time product
acm symposium on applied computing | 2010
Isidoros Sideris; Nikos K. Moshopoulos; Kiamal Z. Pekmestzi
Java has gained great popularity in a wide range of applications. Just-in-time compilation is crucial for providing efficient execution of Java programs, but it is generally a hard task, not suited for embedded systems. This paper presents a hardware acceleration unit for efficient execution of JIT translation in embedded SoCs. The translation is limited to only first level optimizations, which include translation of Java bytecodes to native RISC instructions (stack folding). For experimentation, we developed a SoC with ARM7TDMI processor. In a f 80nm ASIC technology and 80MHz clock, we measured a speed up of up to 4 times over the software only JIT translation.