Ioannis Papaefstathiou

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ioannis Papaefstathiou is active.

Explore More

Publication

Featured researches published by Ioannis Papaefstathiou.

international conference on communications | 2004

Variable packet size buffered crossbar (CICQ) switches

Manolis Katevenis; Giorgos Passas; Dimitrios Simos; Ioannis Papaefstathiou; Nikolaos Chrysos

One of the most widely used architectures for packet switches is the crossbar. A special version of it is the buffered crossbar, where small buffers are associated with the crosspoints; this simplifies scheduling and improves its efficiency and QoS capabilities to the point where the switch needs no internal speedup. Furthermore, by supporting variable length packets throughout a buffered crossbar: (a) there is no need for segmentation and reassembly (SAR) circuits; (b) no speedup is necessary to support SAR; and (c) synchronization between the input and output clock domains is simplified. In turn, the lack of SAR and speedup mean that no output queues are needed, either. In this paper we present an architecture, a chip layout and cost analysis, and a performance evaluation of such a 300 Gbps buffered crossbar operating on variable-size packets. The proposed organization is simple yet powerful, it can be implemented using modern technology, and, as the performance results demonstrate, it clearly outperforms unbuffered crossbars.

international conference on embedded computer systems: architectures, modeling, and simulation | 2007

A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems

Vassilis Dimopoulos; Ioannis Papaefstathiou; Dionisios N. Pnevmatikatos

The Aho-Corasick (AC) algorithm is a very flexible and efficient but memory-hungry pattern matching algorithm that can scan the existence of a query string among multiple test strings looking at each character exactly once, making it one of the main options for software-base intrusion detection systems such as SNORT. We present the Split-AC algorithm, which is a reconfigurable variation of the AC algorithm that exploits domain-specific characteristics of intrusion detection to reduce considerably the FSM memory requirements. SplitAC achieves an overall reduction between 28-75% compared to the best proposed implementation.

ieee international conference computer and communications | 2007

Memory-Efficient 5D Packet Classification At 40 Gbps

Ioannis Papaefstathiou; Vassilis Papaefstathiou

Packet classification is one of the most important enabling technologies for next generation network services. Even though many multi-dimensional classification algorithms have been proposed, most of them are precluded from commercial equipments due to their high memory requirements. In this paper, we present an efficient packet classification scheme, called Bloom Based Packet Classification (B2PC). B2PC comprises of an innovative 5-field search algorithm that decomposes multifield classification rules into internal single field rules which are combined using multi-level Bloom filters. The design of B2PC is optimized for the common case based on analysis of real world classification databases. The hardware implementation of this scheme handles 4K rules by involving only 530KB of memory for its data structures, while it supports network streams at a rate of 15Gbps even in the worst case, and more than 40Gbps in the average case. This system covers 1.3 mm in a 0.18mum CMOS technology. We show that given a certain memory budget and silicon cost, the B2PC is the most efficient hardware-based approach to the classification problem.

design automation conference | 2003

A fully programmable memory management system optimizing queue handling at multi gigabit rates

George Kornaros; Ioannis Papaefstathiou; Aristides Nikologiannis; Nicholaos Zervos

Two of the main bottlenecks when designing a network embedded system are very often the memory bandwidth and its capacity. This is mainly due to the extremely high speed of the state-of-the-art network links and to the fact that in order to support advanced quality of service (QoS), per-flow queuing is desirable. In this paper we describe the architecture of a memory manager that can provide up to 10Gbs of aggregate throughput while handling 512K queues. The presented system supports a complete instruction set and thus we believe it can be used as a hardware component in any suitable embedded system, particularly network SoCs that implement per flow queuing. When designing this scheme several optimization techniques have been evaluated and the most cost and performance effective ones used. These techniques minimize both the memory bandwidth and the memory capacity needed, which is considered a main advantage of the proposed scheme. The proposed architecture uses a simple DRAM for data storage and a typical SRAM for keeping data structures-pointers, therefore minimizing the systems cost. The device has been fabricated within a novel programmable network processor designed for efficient protocol processing in high speed networking applications. It consists of 155K gates and occupies 5.23 mm/sup 2/ in UMC 0.18 /spl mu/m CMOS.

Wireless Networks | 2013

Evaluating routing metric composition approaches for QoS differentiation in low power and lossy networks

Panagiotis Karkazis; Panagiotis Trakadas; Helen-Catherine Leligou; Lambros Sarakis; Ioannis Papaefstathiou; Theodore B. Zahariadis

The use of Wireless Sensor Networks (WSN) in a wide variety of application domains has been intensively pursued lately while Future Internet designers consider WSN as a network architecture paradigm that provides abundant real-life real-time information which can be exploited to enhance the user experience. The wealth of applications running on WSNs imposes different Quality of Service requirements on the underlying network with respect to delay, reliability and loss. At the same time, WSNs present intricacies such as limited energy, node and network resources. To meet the application’s requirements while respecting the characteristics and limitations of the WSN, appropriate routing metrics have to be adopted by the routing protocol. These metrics can be primary (e.g. expected transmission count) to capture a specific effect (e.g. link reliability) and achieve a specific goal (e.g. low number of retransmissions to economize resources) or composite (e.g. combining latency with remaining energy) to satisfy different applications needs and WSNs requirements (e.g. low latency and energy consumption at the same time). In this paper, (a) we specify primary routing metrics and ways to combine them into composite routing metrics, (b) we prove (based on the routing algebra formalism) that these metrics can be utilized in such a way that the routing protocol converges to optimal paths in a loop-free manner and (c) we apply the proposed approach to the RPL protocol specified by the ROLL group of IETF for such low power and lossy link networks to quantify the achieved performance through extensive computer simulations.

design, automation, and test in europe | 2005

Queue Management in Network Processors

Ioannis Papaefstathiou; Theofanis Orphanoudakis; George Kornaros; Christopher Kachris; Ioannis Mavroidis; Aristides Nikologiannis

One of the main bottlenecks when designing a network processing system is very often its memory subsystem. This is mainly due to the state-of-the-art network links operating at very high speeds and to the fact that in order to support advanced quality of service (QoS), a large number of independent queues is desirable. In this paper we analyze the performance bottlenecks of various data memory managers integrated in typical network processing units (NPU). We expose the performance limitations of software implementations utilizing the RISC processing cores typically found in most NPU architectures and we identify the requirements for hardware assisted memory management in order to achieve wire-speed operation at gigabit per second rates. Furthermore, we describe the architecture and performance of a hardware memory manager that fulfills those requirements. This memory manager, although it is implemented in a reconfigurable technology, can provide up to 6.2 Gbit/s of aggregate throughput, while handling 32 K independent queues.

international conference on systems | 2009

High-speed FPGA-based implementations of a Genetic Algorithm

Michalis Vavouras; Kyprianos Papadimitriou; Ioannis Papaefstathiou

One very promising approach for solving complex optimizing and search problems is the Genetic Algorithm (GA) one. Based on this scheme a population of abstract representations of candidate solutions to an optimization problem gradually evolves toward better solutions. The aim is the optimization of a given function, the so called fitness function, which is evaluated upon the initial population as well as upon the solutions after successive generations. In this paper, we present the design of a GA and its implementation on state-of-the-art FPGAs. Our approach optimizes significantly more fitness functions than any other proposed solution. Several experiments on a platform with a Virtex-II Pro FPGA have been conducted. Implementations on a number of different high-end FPGAs outperforms other reconfigurable systems with a speedup ranging from 1.2x to 96.5x.

field programmable custom computing machines | 2008

MPLEM: An 80-processor FPGA Based Multiprocessor System

Georgios-Grigorios Mplemenos; Ioannis Papaefstathiou

Multiprocessor embedded systems (MESes) are a very promising approach for high performance yet relatively low-cost computing. At the same time modern FPGAs provide the silicon capacity to build multiprocessor systems containing 10-100 processors, complex memory systems, heterogeneous interconnection schemes and custom engines executing the performance-critical operations. In this work we present a MES implemented in a state-of-the-art FPGA consisting of up to eighty 32-bit processors. The efficiency of our approach is demonstrated by the fact that our system can execute the BLAST CPU-intensive application, which is the prevalent tool used by molecular biologists for DNA sequence matching and database search, many times faster than a simple PC.

IEEE Design & Test of Computers | 2009

Accelerating Emulation and Providing Full Chip Observability and Controllability

Iakovos Mavroidis; Ioannis Papaefstathiou

The authors deploy an emulation framework that automatically transforms certain hardware description language (HDL) parts of the testbench into synthesizable code to offload the software simulator and minimize the communication overhead. They also extend this architecture by adding multiple fast scan chain paths in the design to provide full circuit observability and controllability on the fly.

field programmable logic and applications | 2012

Breaking the GSM A5/1 cryptography algorithm with rainbow tables and high-end FPGAS

Maria Kalenderi; Dionisios N. Pnevmatikatos; Ioannis Papaefstathiou; Charalampos Manifavas

A5 is the basic cryptographic algorithm used in GSM cell-phones to ensure that the user communication is protected against illicit acts. The A5/1 version was developed in 1987 and has since been under attack. The most recent attack on A5/1 is the “A51 security project”, led by Karsten Nohl that consists of the creation of rainbow tables that map the internal state of the algorithm with the keystream. Rainbow tables are efficient structures that allow the tradeoff between run-time (computations performed to crack a conversation) and space (memory to hold pre-computed information). In this paper we describe a very effective parallel architecture for the creation of the A5/1 rainbow tables in reconfigurable hardware. Rainbow table creation is the most expensive portion of cracking a particular encrypted information exchange. Our approach achieves almost 3000× speedup over a single processor, and 2.5× speedup compared to GPUs. This performance is achieved with less than 5 Watt power consumption, achieving an energy efficiency in the order of 150x better that the GPU approach.

Explore More