Lorenzo De Carli | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lorenzo De Carli is active.

Explore More

Publication

Featured researches published by Lorenzo De Carli.

acm special interest group on data communication | 2009

PLUG: flexible lookup modules for rapid deployment of new protocols in high-speed routers

Lorenzo De Carli; Yi Pan; Amit Kumar; Cristian Estan; Karthikeyan Sankaralingam

New protocols for the data link and network layer are being proposed to address limitations of current protocols in terms of scalability, security, and manageability. High-speed routers and switches that implement these protocols traditionally perform packet processing using ASICs which offer high speed, low chip area, and low power. But with inflexible custom hardware, the deployment of new protocols could happen only through equipment upgrades. While newer routers use more flexible network processors for data plane processing, due to power and area constraints lookups in forwarding tables are done with custom lookup modules. Thus most of the proposed protocols can only be deployed with equipment upgrades. To speed up the deployment of new protocols, we propose a flexible lookup module, PLUG (Pipelined Lookup Grid). We can achieve generality without loosing efficiency because various custom lookup modules have the same fundamental features we retain: area dominated by memories, simple processing, and strict access patterns defined by the data structure. We implemented IPv4, Ethernet, Ethane, and SEATTLE in our dataflow-based programming model for the PLUG and mapped them to the PLUG hardware which consists of a grid of tiles. Throughput, area, power, and latency of PLUGs are close to those of specialized lookup modules.

programming language design and implementation | 2013

A general constraint-centric scheduling framework for spatial architectures

Tony Nowatzki; Michael Sartin-Tarm; Lorenzo De Carli; Karthikeyan Sankaralingam; Cristian Estan; Behnam Robatmili

Specialized execution using spatial architectures provides energy efficient computation, but requires effective algorithms for spatially scheduling the computation. Generally, this has been solved with architecture-specific heuristics, an approach which suffers from poor compiler/architect productivity, lack of insight on optimality, and inhibits migration of techniques between architectures. Our goal is to develop a scheduling framework usable for all spatial architectures. To this end, we expresses spatial scheduling as a constraint satisfaction problem using Integer Linear Programming (ILP). We observe that architecture primitives and scheduler responsibilities can be related through five abstractions: placement of computation, routing of data, managing event timing, managing resource utilization, and forming the optimization objectives. We encode these responsibilities as 20 general ILP constraints, which are used to create schedulers for the disparate TRIPS, DySER, and PLUG architectures. Our results show that a general declarative approach using ILP is implementable, practical, and typically matches or outperforms specialized schedulers.

international symposium on computers and communications | 2009

Increasing performances of TCP data transfers through multiple parallel connections

Andrea Baldini; Lorenzo De Carli; Fulvio Giovanni Ottavio Risso

Although Transmission Control Protocol (TCP) is a widely deployed and successful protocol, it shows some limitations in present-day environments. In particular, it is unable to exploit multiple (physical or logical) paths between two hosts. This paper presents PATTHEL, a session-layer solution designed for parallelizing stream data transfers. Parallelization is achieved by striping the data flow among multiple TCP channels. This solution does not require invasive changes to the networking stack and can be implemented entirely in user space. Moreover, it is flexible enough to suit several scenarios — e.g. it can be used to split a data transfer among multiple relays within a peer-to-peer overlay network.

internet measurement conference | 2014

HILTI: an Abstract Execution Environment for Deep, Stateful Network Traffic Analysis

Robin Sommer; Matthias Vallentin; Lorenzo De Carli; Vern Paxson

When developing networking systems such as firewalls, routers, and intrusion detection systems, one faces a striking gap between the ease with which one can often describe a desired analysis in high-level terms, and the tremendous amount of low-level implementation details that one must still grapple with to come to a robust solution. We present HILTI, a platform that bridges this divide by providing to application developers much of the low-level functionality, without tying it to a specific analysis structure. HILTI consists of two parts: (1) an abstract machine model that we tailor specifically to the networking domain, directly supporting the fields common abstractions and idioms in its instruction set; and (2) a compilation strategy for turning programs written for the abstract machine into optimized, natively executable code. We have developed a prototype of the HILTI compiler toolchain that fully implements the designs functionality, and ported exemplars of networking applications to the HILTI model to demonstrate the aptness of its abstractions. Our evaluation of HILTIs functionality and performance confirms its potential to become a powerful platform for future application development.

architectures for networking and communications systems | 2011

Experiences in Co-designing a Packet Classification Algorithm and a Flexible Hardware Platform

Nilay Vaish; Thawan Kooburat; Lorenzo De Carli; Karthikeyan Sankaralingam; Cristian Estan

Algorithmic solutions to the packet classification problem in network equipment have long been a subject of study in academia and industry and with increases in network speeds they are becoming even more important. Since general purpose processors cannot meet performance and cost requirements, researchers have been assuming that ASICs or FPGAs are necessary for hardware implementation. Industry and academia have been working on SRAM-based platforms specialized for tables used in network equipment, but existing publications only describe the mapping of simpler exact match or prefix match lookups to such platforms. In this paper we adopt a software-hardware co-design approach mapping the EffiCuts algorithm to the PLUG platform. Our work confirms that this solution achieves high throughput (142 million packets per second) and low power (3.1 Watts). It identifies and evaluates changes to the original algorithm and to the platform that can improve throughput and memory utilization.

international conference on parallel architectures and compilation techniques | 2010

Design and implementation of the PLUG architecture for programmable and efficient network lookups

Amit Kumar; Lorenzo De Carli; Sung Jin Kim; Marc de Kruijf; Karthikeyan Sankaralingam; Cristian Estan; Somesh Jha

This paper proposes a new architecture called Pipelined LookUp Grid (PLUG) that can perform data structure lookups in network processing. PLUGs are programmable and through simplicity achieve power efficiency. We draw upon the insights that data structure lookups have natural structure that can be statically determined and exploited. The PLUG execution model transforms data-structure lookups into pipelined stages of computation and associates small code-blocks with data. The PLUG architecture is a tiled architecture with each tile consisting predominantly of SRAMs, a lightweight no-buffering router, and an array of lightweight computation cores. Using a principle of fixed delays in the execution model, the architecture is contention-free and completely statically scheduled thus achieving high energy efficiency. The architecture enables rapid deployment of new network protocols and generalizes as a data-structure accelerator. This paper describes the PLUG architecture, the compiler, and evaluates our RTL prototype PLUG chip synthesized on a 55nm technology library. We evaluate six diverse high-end network processing workloads including IPv4, IPv6, and Ethernet forwarding. We show that at a 55nm technology, a 16-tile PLUG occupies 58mm2, provides 4MB on-chip storage, and sustains a clock frequency of 1 GHz. This translates to 1 billion lookups per second, a latency of 18ns to 219ns, and average power less than 1 watt.

architectures for networking and communications systems | 2012

LEAP: latency- energy- and area-optimized lookup pipeline

Eric Nathaniel Harris; Samuel Lawrence Wasmundt; Lorenzo De Carli; Karthikeyan Sankaralingam; Cristian Estan

Table lookups and other types of packet processing require so much memory bandwidth that the networking industry has long been a major consumer of specialized memories like TCAMs. Extensive research in algorithms for longest prefix matching and packet classification has laid the foundation for lookup engines relying on area- and power-efficient random access memories. Motivated by costs and semiconductor technology trends, designs from industry and academia implement multi-algorithm lookup pipelines by synthesizing multiple functions into hardware, or by adding programmability. In existing proposals, programmability comes with significant overhead. We build on recent innovations in computer architecture that demonstrate the efficiency and flexibility of dynamically synthesized accelerators. In this paper we propose LEAP, a latency-energy- and area- optimized lookup pipeline based on an analysis of various lookup algorithms. We compare to PLUG, which relies on von-Neumann-style programmable processing. We show that LEAP has equivalent flexibility by porting all lookup algorithms previously shown to work with PLUG. At the same time, LEAP reduces chip area by 1.5×, power consumption by 1.3×, and latency typically by 5×. Furthermore, programming LEAP is straight-forward; we demonstrate an intuitive Python-based API.

ieee hot chips symposium | 2014

Memory processing units

Jaikrishnan Menon; Lorenzo De Carli; Vijayraghavan Thiruvengadam; Karthikeyan Sankaralingam; Cristian Estan

Presents a conference poster that addresses the technology of memory processing units. Some of the following topics are examined: current processing capabilities; MPU hardware; performance and energy output; and new trends in the industry.

ACM Transactions on Programming Languages and Systems | 2015

A Scheduling Framework for Spatial Architectures Across Multiple Constraint-Solving Theories

Tony Nowatzki; Michael Sartin-Tarm; Lorenzo De Carli; Karthikeyan Sankaralingam; Cristian Estan; Behnam Robatmili

Spatial architectures provide energy-efficient computation but require effective scheduling algorithms. Existing heuristic-based approaches offer low compiler/architect productivity, little optimality insight, and low architectural portability. We seek to develop a spatial-scheduling framework by utilizing constraint-solving theories and find that architecture primitives and scheduler responsibilities can be related through five abstractions: computation placement, data routing, event timing, resource utilization, and the optimization objective. We encode these responsibilities as 20 mathematical constraints, using SMT and ILP, and create schedulers for the TRIPS, DySER, and PLUG architectures. Our results show that a general declarative approach using constraint solving is implementable, is practical, and can outperform specialized schedulers.

computer and communications security | 2014