Cristian Soviani | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Cristian Soviani is active.

Explore More

Publication

Featured researches published by Cristian Soviani.

design, automation, and test in europe | 2006

Optimizing sequential cycles through Shannon decomposition and retiming

Cristian Soviani; Olivier Tardieu; Stephen A. Edwards

Optimizing sequential cycles is essential for many types of high-performance circuits, such as pipelines for packet processing. Retiming is a powerful technique for speeding pipelines, but it is stymied by tight sequential cycles. Designers usually attack such cycles by manually combining Shannon decomposition with retiming - effectively a form of speculation ut such manual decomposition is error-prone. We propose an efficient algorithm that simultaneously applies Shannon decomposition and retiming to optimize circuits with tight sequential cycles. While the algorithm is only able to improve certain circuits (roughly half of the benchmarks we tried), the performance increase can be dramatic (7%-61%) with only a modest increase in area (3%-12%). The algorithm is also fast, making it a practical addition to a synthesis flow.

languages, compilers, and tools for embedded systems | 2004

Generating fast code from concurrent program dependence graphs

Jia Zeng; Cristian Soviani; Stephen A. Edwards

While concurrency in embedded systems is most often supplied by real-time operating systems, this approach can be unpredictable and difficult to debug. Synchronous concurrency, in which a system marches in lockstep to a global clock, is conceptually easier and potentially more efficient because it can be statically scheduled beforehand.We present an algorithm for generating efficient sequential code from such synchronous concurrent specifications. Starting from a concurrent program dependence graph generated from the synchronous, concurrent language Esterel, we generate efficient, statically scheduled sequential code while adding a minimal amount of runtime scheduling overhead.Experimentally, we obtain speedups as high as six times over existing techniques. While we applied our technique to Esterel, it should be applicable to other synchronous, concurrent languages.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2007

Optimizing Sequential Cycles Through Shannon Decomposition and Retiming

Cristian Soviani; Olivier Tardieu; Stephen A. Edwards

Optimizing sequential cycles is essential for many types of high-performance circuits, such as pipelines for packet processing. Retiming is a powerful technique for speeding pipelines, but it is stymied by tight sequential cycles. Designers usually attack such cycles by manually combining Shannon decomposition with retiming-effectively a form of speculation-but such manual decomposition is error prone. We propose an efficient algorithm that simultaneously applies Shannon decomposition and retiming to optimize circuits with tight sequential cycles. While the algorithm is only able to improve certain circuits (roughly half of the benchmarks we tried), the performance increase can be dramatic (7%-61%) with only a modest increase in area (1%-12%). The algorithm is also fast, making it a practical addition to a synthesis flow

design automation conference | 2006

Synthesis of high-performance packet processing pipelines

Cristian Soviani; Ilija Hadzic; Stephen A. Edwards

Packet editing is a fundamental building block of data communication systems such as switches and routers. Circuits that implement this function are critical and define the features of the system. We propose a high-level synthesis technique for a new model for representing packet editing functions. Experiments show our circuits achieve a throughput of up to 40Gb/s on a commercially available FPGA device, equal to state-of-the-art implementations

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2009

Synthesis and Optimization of Pipelined Packet Processors

Cristian Soviani; Ilija Hadzic; Stephen A. Edwards

We consider pipelined architectures of packet processors consisting of a sequence of simple packet-processing modules interconnected by first-in first-out buffers. We propose a new model for describing their function, an automated synthesis technique that generates efficient hardware for them, and an algorithm for computing minimum buffer sizes that allow such pipelines to achieve their maximum throughput. Our functional model provides a level of abstraction familiar to a network protocol designer; in particular, it does not require knowledge of register-transfer-level hardware design. Our synthesis tool implements the specified function in a sequential circuit that processes packet data a word at a time. Finally, our analysis technique computes the maximum throughput possible from the modules and then determines the smallest buffers that can achieve it. Experimental results conducted on industrial-strength examples suggest that our techniques are practical. Our synthesis algorithm can generate circuits that achieve 40 Gb/s on field-programmable gate arrays, equal to state-of-the-art manual implementations, and our buffer-sizing algorithm has a practically short runtime. Together, our techniques make it easier to quickly develop and deploy high-speed network switches.

Archive | 2008

High level synthesis for packet processing pipelines

Stephen A. Edwards; Cristian Soviani

Packet processing is an essential function of state-of-the-art network routers and switches. Implementing packet processors in pipelined architectures is a well-known, established technique, albeit different approaches have been proposed. The design of packet processing pipelines is a delicate trade-off between the desire for abstract specifications, short development time, and design maintainability on one hand and very aggressive performance requirements on the other. This thesis proposes a coherent design flow for packet processing pipelines. Like the design process itself, I start by introducing a novel domain-specific language that provides a high-level specification of the pipeline. Next, I address synthesizing this model and calculating its worst-case throughput. Finally, I address some specific circuit optimization issues. I claim, based on experimental results, that my proposed technique can dramatically improve the design process of these pipelines, while the resulting performance matches the expectations of hand-crafted design. The considered pipelines exhibit a pseudo-linear topology, which can be too restrictive in the general case. However, especially due to its high performance, such an architecture may be suitable for applications outside packet processing, in which case some of my proposed techniques could be easily adapted. Since I ran my experiments on FPGAs, this work has an inherent bias towards that technology; however, most results are technology-independent.

Proceedings of the 16th International Workshop on Logic and Synthesis, May 30 - June 1, 2007, San Diego, CA | 2007

FIFO Sizing for High-Performance Pipelines

Cristian Soviani; Stephen A. Edwards

Performance-critical pipelines—such as a packet processing pipeline in a network device—are built from a sequence of simple processing modules, connected by FIFOs. Due to their complex sequential behavior, the worst case throughput, as well as the size of the interconnecting FIFOs, are currently designed using very rough heuristics. Such systems are usually validated by simulation, or worse, field testing. In this paper, we propose a methodology that address these two issues. First, we propose a fast technique for computing the maximum possible throughput assuming unbounded FIFOs. Then, we describe two algorithms, one exact, one heuristic, that compute minimum FIFO sizes that can achieve this throughput (i.e., FIFOs that do not introduce bottlenecks). Experimental results suggest our algorithm is applicable to pipelines of at least five modules with runtimes generally in minutes. Since such a computation is only needed a few times for any design, we consider our technique practical.

Archive | 2004

Improved Controller Synthesis from Esterel

Cristian Soviani; Jia Zeng; Stephen A. Edwards

We present a new procedure for automatically synthesizing controllers from high-level Esterel specifications. Unlike existing RTL synthesis approaches, this approach frees the designer from tedious bit-level state encoding and certain types of inter-machine communication. Experimental results suggest that even with a fairly primitive state assignment heuristic, our compiler consistently produces smaller, slightly faster circuits that the existing Esterel compiler. We mainly attribute this to a different style of distributing state bits throughout the circuit. Initial results are encouraging, but some hand-optimized encodings suggest room for a better state assignment algorithm. We are confident that such improvements will make our technique even more practical.

Proceedings of the 14th International Workshop on Logic and Synthesis, June 8-10, 2005, Lake Arrowhead, Calif. | 2005

High-Level Optimization by Combining Retiming and Shannon Decomposition

Cristian Soviani; Olivier Tardieu; Stephen A. Edwards

Applying Shannon decomposition can reshape sequential circuits and improve opportunities for retiming. Both Shannon decomposition and retiming only rely on limited information about combinational blocks (timing estimates), so both techniques are suitable for high-level synthesis. We describe an efficient algorithm to preprocess a circuit using Shannon decomposition to increase retiming efficacy. It assembles complex chains of Shannon decompositions while carefully avoiding parallel ones in order to limit the area overhead due to logic duplication. We compare a traditional retiming flow with the same flow augmented with our algorithm. Although our algorithm provides no improvement on half of our benchmarks, for the other half we obtain a 25% speed-up on average (7% to 61%), while only increasing area by 5% (3% to 12%).

Proceedings: The First Annual Workshop on Interaction Between Operating System and Computer Architecture (IOSCA-1): Saturday, October 8, 2005, Austin, Texas | 2005

Adding a Flow-Oriented Paradigm to Commodity Operating Systems

Cristian Soviani; Stephen A. Edwards; Angelos D. Keromytis

The speed of CPUs and memories has historically outstripped I/O, but emerging network and storage technologies promise to invert this relationship. As a result, fundamental assumptions about the role of the operating system in computing systems will have to change. We propose an operating and application architecture that removes the CPU and memory from the path of high-speed I/O. In our model, the operating system becomes a data-flow manager and applications merely direct this flow instead of directly participating in it. Our proof-of-concept prototype, which we implemented on an FPGA board, nearly doubled the throughput of a simple cryptographic networking application, suggesting our model can provide a substantial improvement.

Explore More