Mosharaf Chowdhury | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mosharaf Chowdhury is active.

Explore More

Publication

Featured researches published by Mosharaf Chowdhury.

IEEE ACM Transactions on Networking | 2012

ViNEYard: virtual network embedding algorithms with coordinated node and link mapping

Mosharaf Chowdhury; Muntasir Raihan Rahman; Raouf Boutaba

Network virtualization allows multiple heterogeneous virtual networks (VNs) to coexist on a shared infrastructure. Efficient mapping of virtual nodes and virtual links of a VN request onto substrate network resources, also known as the VN embedding problem, is the first step toward enabling such multiplicity. Since this problem is known to be NP-hard, previous research focused on designing heuristic-based algorithms that had clear separation between the node mapping and the link mapping phases. In this paper, we present ViNEYard-a collection of VN embedding algorithms that leverage better coordination between the two phases. We formulate the VN embedding problem as a mixed integer program through substrate network augmentation. We then relax the integer constraints to obtain a linear program and devise two online VN embedding algorithms D-ViNE and R-ViNE using deterministic and randomized rounding techniques, respectively. We also present a generalized window-based VN embedding algorithm (WiNE) to evaluate the effect of lookahead on VN embedding. Our simulation experiments on a large mix of VN requests show that the proposed algorithms increase the acceptance ratio and the revenue while decreasing the cost incurred by the substrate network in the long run.

acm special interest group on data communication | 2011

Managing data transfers in computer clusters with orchestra

Mosharaf Chowdhury; Matei Zaharia; Justin Ma; Michael I. Jordan; Ion Stoica

Cluster computing applications like MapReduce and Dryad transfer massive amounts of data between their computation stages. These transfers can have a significant impact on job performance, accounting for more than 50% of job completion times. Despite this impact, there has been relatively little work on optimizing the performance of these data transfers, with networking researchers traditionally focusing on per-flow traffic management. We address this limitation by proposing a global management architecture and a set of algorithms that (1) improve the transfer times of common communication patterns, such as broadcast and shuffle, and (2) allow scheduling policies at the transfer level, such as prioritizing a transfer over other transfers. Using a prototype implementation, we show that our solution improves broadcast completion times by up to 4.5X compared to the status quo in Hadoop. We also show that transfer-level scheduling can reduce the completion time of high-priority transfers by 1.7X.

acm special interest group on data communication | 2012

FairCloud: sharing the network in cloud computing

Lucian Popa; Gautam Kumar; Mosharaf Chowdhury; Arvind Krishnamurthy; Sylvia Ratnasamy; Ion Stoica

The network, similar to CPU and memory, is a critical and shared resource in the cloud. However, unlike other resources, it is neither shared proportionally to payment, nor do cloud providers offer minimum guarantees on network bandwidth. The reason networks are more difficult to share is because the network allocation of a virtual machine (VM) X depends not only on the VMs running on the same machine with X, but also on the other VMs that X communicates with and the cross-traffic on each link used by X. In this paper, we start from the above requirements--payment proportionality and minimum guarantees--and show that the network-specific challenges lead to fundamental tradeoffs when sharing cloud networks. We then propose a set of properties to explicitly express these tradeoffs. Finally, we present three allocation policies that allow us to navigate the tradeoff space. We evaluate their characteristics through simulation and testbed experiments to show that they can provide minimum guarantees and achieve better proportionality than existing solutions.

acm special interest group on data communication | 2012

Surviving failures in bandwidth-constrained datacenters

Peter Bodik; Ishai Menache; Mosharaf Chowdhury; Pradeepkumar Mani; David A. Maltz; Ion Stoica

Datacenter networks have been designed to tolerate failures of network equipment and provide sufficient bandwidth. In practice, however, failures and maintenance of networking and power equipment often make tens to thousands of servers unavailable, and network congestion can increase service latency. Unfortunately, there exists an inherent tradeoff between achieving high fault tolerance and reducing bandwidth usage in network core; spreading servers across fault domains improves fault tolerance, but requires additional bandwidth, while deploying servers together reduces bandwidth usage, but also decreases fault tolerance. We present a detailed analysis of a large-scale Web application and its communication patterns. Based on that, we propose and evaluate a novel optimization framework that achieves both high fault tolerance and significantly reduces bandwidth usage in the network core by exploiting the skewness in the observed communication patterns.

hot topics in networks | 2012

Coflow: a networking abstraction for cluster applications

Mosharaf Chowdhury; Ion Stoica

Cluster computing applications -- frameworks like MapReduce and user-facing applications like search platforms -- have application-level requirements and higher-level abstractions to express them. However, there exists no networking abstraction that can take advantage of the rich semantics readily available from these data parallel applications. We propose coflow, a networking abstraction to express the communication requirements of prevalent data parallel programming paradigms. Coflows make it easier for the applications to convey their communication semantics to the network, which in turn enables the network to better optimize common communication patterns.

virtualized infrastructure systems and architectures | 2010

PolyViNE: policy-based virtual network embedding across multiple domains

Mosharaf Chowdhury; Fady Samuel; Raouf Boutaba

Intra-domain virtual network embedding is a well studied problem in the network virtualization literature. For most practical purposes, however, virtual networks (VNs) must be provisioned across heterogeneous administrative domains managed by multiple infrastructure providers (InPs). In this paper we present PolyViNE, a policy-based inter-domain VN embedding framework that embeds end-to-end VNs in a decentralized manner. PolyViNE introduces a distributed protocol that coordinates the VN embedding process across participating InPs and ensures competitive prices for service providers (SPs), i.e., VN owners. We also present a location aware VN request forwarding mechanism -- based on a hierarchical addressing scheme (COST) and a location awareness protocol (LAP) -- to allow faster embedding and outline scalability and performance characteristics of PolyViNE using quantitative and qualitative evaluations.

acm special interest group on data communication | 2015

Efficient coflow scheduling with Varys

Mosharaf Chowdhury; Yuan Zhong; Ion Stoica

Communication in data-parallel applications often involves a collection of parallel flows. Traditional techniques to optimize flow-level metrics do not perform well in optimizing such collections, because the network is largely agnostic to application-level requirements. The recently proposed coflow abstraction bridges this gap and creates new opportunities for network scheduling. In this paper, we address inter-coflow scheduling for two different objectives: decreasing communication time of data-intensive jobs and guaranteeing predictable communication time. We introduce the concurrent open shop scheduling with coupled resources problem, analyze its complexity, and propose effective heuristics to optimize either objective. We present Varys, a system that enables data-intensive frameworks to use coflows and the proposed algorithms while maintaining high network utilization and guaranteeing starvation freedom. EC2 deployments and trace-driven simulations show that communication stages complete up to 3.16X faster on average and up to 2X more coflows meet their deadlines using Varys in comparison to per-flow mechanisms. Moreover, Varys outperforms non-preemptive coflow schedulers by more than 5X.

acm special interest group on data communication | 2015

Efficient Coflow Scheduling Without Prior Knowledge

Mosharaf Chowdhury; Ion Stoica

Inter-coflow scheduling improves application-level communication performance in data-parallel clusters. However, existing efficient schedulers require a priori coflow information and ignore cluster dynamics like pipelining, task failures, and speculative executions, which limit their applicability. Schedulers without prior knowledge compromise on performance to avoid head-of-line blocking. In this paper, we present Aalo that strikes a balance and efficiently schedules coflows without prior knowledge. Aalo employs Discretized Coflow-Aware Least-Attained Service (D-CLAS) to separate coflows into a small number of priority queues based on how much they have already sent across the cluster. By performing prioritization across queues and by scheduling coflows in the FIFO order within each queue, Aalos non-clairvoyant scheduler reduces coflow completion times while guaranteeing starvation freedom. EC2 deployments and trace-driven simulations show that communication stages complete 1.93X faster on average and 3.59X faster at the 95th percentile using Aalo in comparison to per-flow mechanisms. Aalos performance is comparable to that of solutions using prior knowledge, and Aalo outperforms them in presence of cluster dynamics.

acm special interest group on data communication | 2013

Leveraging endpoint flexibility in data-intensive clusters

Mosharaf Chowdhury; Srikanth Kandula; Ion Stoica

Many applications do not constrain the destinations of their network transfers. New opportunities emerge when such transfers contribute a large amount of network bytes. By choosing the endpoints to avoid congested links, completion times of these transfers as well as that of others without similar flexibility can be improved. In this paper, we focus on leveraging the flexibility in replica placement during writes to cluster file systems (CFSes), which account for almost half of all cross-rack traffic in data-intensive clusters. The replicas of a CFS write can be placed in any subset of machines as long as they are in multiple fault domains and ensure a balanced use of storage throughout the cluster. We study CFS interactions with the cluster network, analyze optimizations for replica placement, and propose Sinbad -- a system that identifies imbalance and adapts replica destinations to navigate around congested links. Experiments on EC2 and trace-driven simulations show that block writes complete 1.3X (respectively, 1.58X) faster as the network becomes more balanced. As a collateral benefit, end-to-end completion times of data-intensive jobs improve as well. Sinbad does so with little impact on the long-term storage balance.

acm special interest group on data communication | 2016

CODA: Toward Automatically Identifying and Scheduling Coflows in the Dark

Hong Zhang; Li Chen; Bairen Yi; Kai Chen; Mosharaf Chowdhury; Yanhui Geng

Leveraging application-level requirements using coflows has recently been shown to improve application-level communication performance in data-parallel clusters. However, existing coflow-based solutions rely on modifying applications to extract coflows, making them inapplicable to many practical scenarios. In this paper, we present CODA, a first attempt at automatically identifying and scheduling coflows without any application-level modifications. We employ an incremental clustering algorithm to perform fast, application-transparent coflow identification and complement it by proposing an error-tolerant coflow scheduler to mitigate occasional identification errors. Testbed experiments and large-scale simulations with production workloads show that CODA can identify coflows with over 90% accuracy, and its scheduler is robust to inaccuracies, enabling communication stages to complete 2.4x (5.1x) faster on average (95-th percentile) compared to per-flow mechanisms. Overall, CODAs performance is comparable to that of solutions requiring application modifications.

Explore More