Daniel Crisan
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Daniel Crisan.
high performance interconnects | 2011
Daniel Crisan; Andreea Anghel; Robert Birke; Cyriel Minkenberg; Mitchell Gusat
One of the consequential new features of emerging datacenter networks is lossless ness, achieved by means of Priority Flow Control (PFC). Despite PFCs key role in the datacenter and its increasing availability -- supported by virtually all Converged Enhanced Ethernet (CEE) products -- its impact remains largely unknown. This has motivated us to evaluate the sensitivity of three widespread TCP versions to PFC, as well as to the more involved Quantized Congestion Notification (QCN) congestion management mechanism. As datacenter workloads we have adopted several representative commercial and scientific applications. For evaluation we employ an accurate Layer 2 CEE network simulator coupled with a TCP implementation extracted from FreeBSD v9. A somewhat unexpected outcome of this investigation is that PFC significantly improves TCP performance across all tested configurations and workloads, hence our recommendation to enable PFC whenever possible. In contrast, QCN can help or harm depending on its parameter settings, which are currently neither adaptive nor universal for datacenters. To the best of our knowledge this is the first performance evaluation of TCP performance in lossless CEE networks.
acm special interest group on data communication | 2013
Daniel Crisan; Robert Birke; Gilles Cressier; Cyriel Minkenberg; Mitchell Gusat
Datacenter networking is currently dominated by two major trends. One aims toward lossless, flat layer-2 fabrics based on Converged Enhanced Ethernet or InfiniBand, with benefits in efficiency and performance. The other targets flexibility based on Software Defined Networking, which enables Overlay Virtual Networking. Although clearly complementary, these trends also exhibit some conflicts: In contrast to physical fabrics, which avoid packet drops by means of flow control, practically all current virtual networks are lossy. We quantify these losses for several common combinations of hypervisors and virtual switches, and show their detrimental effect on application performance. Moreover, we propose a zero-loss Overlay Virtual Network (zOVN) designed to reduce the query and flow completion time of latency-sensitive datacenter applications. We describe its architecture and detail the design of its key component, the zVALE lossless virtual switch. As proof of concept, we implemented a zOVN prototype and benchmark it with Partition-Aggregate in two testbeds, achieving an up to 15-fold reduction of the mean completion time with three widespread TCP versions. For larger-scale validation and deeper introspection into zOVN, we developed an OMNeT++ model for accurate cross-layer simulations of a virtualized datacenter, which confirm the validity of our results.
IEEE Journal on Selected Areas in Communications | 2014
Daniel Crisan; Robert Birke; Katherine Barabash; Rami Cohen; Mitchell Gusat
Datacenter-based Cloud computing has induced new disruptive trends in networking, key among which is network virtualization. Software-Defined Networking overlays aim to improve the efficiency of the next generation multitenant datacenters. While early overlay prototypes are already available, they focus mainly on core functionality, with little being known yet about their impact on the system level performance. Using query completion time as our primary performance metric, we evaluate the overlay network impact on two representative datacenter workloads, Partition/Aggregate and 3-Tier. We measure how much performance is traded for overlays benefits in manageability, security and policing. Finally, we aim to assist the datacenter architects by providing a detailed evaluation of the key overlay choices, all made possible by our accurate cross-layer hybrid/mesoscale simulation platform.
high performance interconnects | 2010
Mitchell Gusat; Daniel Crisan; Cyriel Minkenberg; Casimer M. DeCusatis
In order to fully realize the potential of Cloud and High Performance Computing (HPC) applications, significant improvement is required in the cost/performance of data center networks. While recent industry standards such as Quantized Congestion Notification (QCN) for Converged Enhanced Ethernet (CEE) have begun to address this issue, there are still significant problems left open. Therefore we propose two novel source-based adaptive routing schemes for CEE-based networks. First, we develop a basic source-driven Reactive Route Control (R2C2) adaptive routing scheme. In response to congestion notifications, the source activates additional paths to re-route traffic around potential congestion points. Using industry standard VLANs, a source node can effectively control the path choices in the network. This approach goes beyond conventional QCN limitations by replacing its reaction point with a VLAN-based multipath route controller. We thus enable HPC/Cloud applications demanding direct and/or secure access to the network features. Second, we combine R2C2 with the QCN reaction point, resulting in the higher performance Reactive Route & Rate Controller (R3C2). In case of persistent or multiple hotspots when VLAN route selection alone is insufficient, the R3C2 source will throttle its packet injection rates individually along each congested route of a multipath bundle. Detailed simulations against established data center and HPC benchmarks show the practical benefits in performance and stability.
high performance switching and routing | 2012
Robert Birke; Daniel Crisan; Katherine Barabash; Anna Levin; Casimer M. DeCusatis; Cyriel Minkenberg; Mitchell Gusat
One of the prevalent trends in emerging large scale multi-tenant datacenters is network virtualization using overlays. Here we investigate application performance degradation in such an overlay applied to commodity 10 Gigabit Ethernet networks. We have adopted partition/aggregate as a representative commercial workload that today is deployed on bare metal servers and is notoriously sensitive to latency and TCP incast congestion. Using query completion time as the primary metric, we evaluate the degree to which a software-defined network (SDN) overlay impacts this applications behavior, the performance bounds of partition/aggregate with an SDN overlay, and whether active queue management (AQM) such as random early detection (RED) can benefit this environment. We introduce a generic SDN overlay framework, which we measure in hardware and simulate using a real TCP stack extracted from FreeBSD v9, running over a detailed Layer 2 commodity 10G Ethernet fabric network simulator. To further alleviate TCP incast congestion and support legacy congestion control, we propose an AQM translation scheme called v-RED. Finally, we report results concerning SDNs benefits in addressing TCP incast. Contrary to our expectations, we found that latency-sensitive applications do not necessarily suffer from performance degradation when deployed over SDN overlays.
high performance interconnects | 2012
Fredy D. Neeser; Nikolaos Chrysos; Rolf Clauberg; Daniel Crisan; Mitchell Gusat; Cyriel Minkenberg; Kenneth M. Valk; Claude Basso
One consequential feature of Converged Enhanced Ethernet (CEE) is loss lessness, achieved through L2 Priority Flow Control (PFC) and Quantized Congestion Notification (QCN). We focus on QCN and its effectiveness in identifying congestive flows in input-buffered CEE switches. QCN assumes an idealized, output-queued switch, however, as future switches scale to higher port counts and link speeds, purely output-queued or shared-memory architectures lead to excessive memory bandwidth requirements, moreover, PFC typically requires dedicated buffers per input. Our objective is to complement PFCs coarse per-port/priority granularity with QCNs per-flow control. By detecting buffer overload early, QCN can drastically reduce PFCs side effects. We install QCN congestion points (CPs) at input buffers with virtual output queues and demonstrate that arrival-based marking cannot correctly discriminate between culprits and victims. Our main contribution is occupancy sampling (QCN-OS), a novel, QCN-compatible marking scheme. We focus on random occupancy sampling, a practical method not requiring any per-flow state. For CPs with arbitrarily scheduled buffers, QCN-OSis shown to correctly identify congestive flows, improving buffer utilization, switch efficiency, and fairness.
international conference on cluster computing | 2014
Daniel Crisan; Robert Birke; Nikolaos Chrysos; Cyriel Minkenberg; Mitchell Gusat
Converged Enhanced Ethernet (CEE) is a crucial step in embracing storage, cluster, and high-performance computing fabrics under a common network. However, the adoption of lossless CEE in virtualized clusters is hindered by the lack of network hypervisor software that addresses the major issues of losslessness, i.e., head-of-line blocking and saturation trees. Our objective is to design a hypervisor that prevents miscon-figured or malicious virtual machines from filling the lossless network with stalled packets, thus compromising tenant isolation. Furthermore, we observe that current hypervisors perform compulsory isolation, management, and mobility functions, but introduce new bottlenecks on the data-path. By taking advantage of the lossless fabric, we deconstruct the existing virtualized networking stack into its core functions and consolidate them into zFabric, an efficient hypervisor that meets our aforementioned goals. To demonstrate zFabrics benefits, we evaluate a prototype implementation on a datacenter testbed. Besides resolving head-of-line blocking, zFabric improves throughputs for long flows by up to 56%, lowers CPU utilization by up to 63%, and shortens completion times by up to 7x for partition-aggregate queries when compared with current virtualized TCP stacks.
Proceedings of the 2013 Interconnection Network Architecture: On-Chip, Multi-Chip on | 2013
Daniel Crisan; Robert Birke; Nikolaos Chrysos; Mitch Gusat
Key to the economic viability of clouds and datacenters is their elastic scalability. Therefore most active related research areas focus on the datacenter fabric scalability, efficiency, performance, virtualization, optimal virtual machine (VM) allocation and migration. Here we ask the questions: Given a set of tenant workloads running on generic servers interconnected by a 10--100G Ethernet fabric with modern network virtualization and transport protocols, how can the datacenter operator reach the optimal operation region? How is this optimum defined, traded between operator and tenants, and measured with what metrics? In this paper we propose an evaluation methodology and a set of simple, but descriptive, metrics as a first attempt to answer the questions raised above. As proof of concept, we investigate a multitenant virtualized datacenter network running a 3-tier workload. Our proposal enables a quantitative comparison between competing datacenter fabrics and virtualization architectures.
international conference on computer communications | 2013
Daniel Crisan; Robert Birke; Cyriel Minkenberg; Mitch Gusat
Current hypervisor software drops packets in the virtual network. This behavior is suboptimal, wastes network resources and harms performance. We propose to extend the flow control mechanisms that currently exist only in physical networks to the virtual networks. Using a simple setup we show the advantages of losslessness in a virtualized environment.
Archive | 2012
Daniel Crisan; Casimer M. DeCusatis; Mitch Gusat; Cyriel Minkenberg