Is this you? Create Your Porfile

Steven Y. Ko

University of Illinois at Urbana–Champaign

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Steven Y. Ko is active.

Explore More

Publication

Featured researches published by Steven Y. Ko.

IEEE Computer | 2010

Open Cirrus: A Global Cloud Computing Testbed

Arutyun Avetisyan; Roy H. Campbell; Indranil Gupta; Michael T. Heath; Steven Y. Ko; Gregory R. Ganger; Michael Kozuch; David R. O'Hallaron; M. Kunze; Thomas T. Kwan; Kevin Lai; Martha Lyons; Dejan S. Milojicic; Hing Yan Lee; Yeng Chai Soh; Ng Kwang Ming; Jing-Yuan Luke; Han Namgoong

Open Cirrus is a cloud computing testbed that, unlike existing alternatives, federates distributed data centers. It aims to spur innovation in systems and applications research and catalyze development of an open source service stack for the cloud.

symposium on cloud computing | 2010

Making cloud intermediate data fault-tolerant

Steven Y. Ko; Imranul Hoque; Brian Cho; Indranil Gupta

Parallel dataflow programs generate enormous amounts of distributed data that are short-lived, yet are critical for completion of the job and for good run-time performance. We call this class of data as intermediate data. This paper is the first to address intermediate data as a first-class citizen, specifically targeting and minimizing the effect of run-time server failures on the availability of intermediate data, and thus on performance metrics such as job completion time. We propose new design techniques for a new storage system called ISS (Intermediate Storage System), implement these techniques within Hadoop, and experimentally evaluate the resulting system. Under no failure, the performance of Hadoop augmented with ISS (i.e., job completion time) turns out to be comparable to base Hadoop. Under a failure, Hadoop with ISS outperforms base Hadoop and incurs up to 18% overhead compared to base no-failure Hadoop, depending on the testbed setup.

acm special interest group on data communication | 2010

CloudPolice: taking access control out of the network

Lucian Popa; Minlan Yu; Steven Y. Ko; Sylvia Ratnasamy; Ion Stoica

Cloud computing environments impose new challenges on access control techniques due to multi-tenancy, the growing scale and dynamicity of hosts within the cloud infrastructure, and the increasing diversity of cloud network architectures. The majority of existing access control techniques were originally designed for enterprise environments that do not share these challenges and, as such, are poorly suited for cloud environments. In this paper, we argue that it is both sufficient and advantageous to implement access control only within the hypervisors at the end-hosts. We thus propose Cloud-Police, a system that implements a hypervisor-based access control mechanism. We argue that, not only can CloudPolice support more sophisticated access control policies, it can do so in a manner that is simpler, more scalable and more robust than existing network-based techniques.

ACM Transactions on Autonomous and Adaptive Systems | 2008

A new class of nature-inspired algorithms for self-adaptive peer-to-peer computing

Steven Y. Ko; Indranil Gupta; Yookyung Jo

We present, and evaluate benefits of, a design methodology for translating natural phenomena represented as mathematical models, into novel, self-adaptive, peer-to-peer (p2p) distributed computing algorithms (protocols). Concretely, our first contribution is a set of techniques to translate discrete sequence equations (also known as difference equations) into new p2p protocols called sequence protocols. Sequence protocols are self-adaptive, scalable, and fault-tolerant, with applicability in p2p settings like Grids. A sequence protocol is a set of probabilistic local and message-passing actions for each process. These actions are translated from terms in a set of source sequence equations. Individual processes do not simulate the source sequence equations completely. Instead, each process executes probabilistic local and message passing actions, so that the emergent round-to-round behavior of the sequence protocol in a p2p system can be probabilistically predicted by the source sequence equations. The articles second contribution is the design and evaluation of a set of sequence protocols for detection of two global triggers in a distributed system: threshold detection and interval detection. This articles third contribution is a new self-adaptive Grid computing protocol called HoneyAdapt. HoneyAdapt is derived from sequence equations modeling adaptive bee foraging behavior in nature. HoneyAdapt is intended for Grid applications that allow Grid clients, at run-time, a choice of algorithms for executing chunks of the applications dataset. HoneyAdapt tells each Grid client how to adaptively select at run-time, for each chunk it receives, a good algorithm for computing the chunk—this selection is based on continuous feedback from other clients. Finally, we design a variant of HoneyAdapt, called HoneySort, for application to Grid parallelized sorting settings using the master-worker paradigm. Our evaluation of these contributions consists of mathematical analysis, large-scale trace-based simulation results, and experimental results from a HoneySort deployment.

symposium on reliable distributed systems | 2008

Using Tractable and Realistic Churn Models to Analyze Quiescence Behavior of Distributed Protocols

Steven Y. Ko; Imranul Hoque; Indranil Gupta

Large-scale distributed systems are subject to churn, i.e., continuous arrival, departure and failure of processes. Analysis of protocols under churn requires one to use churn models that are tractable (easy to apply), realistic (apply to deployment settings), and general (apply to many protocols and properties). In this paper, we propose two new churn models - called train and crowd - that together achieve these goals, for a broad class of stability properties called quiescent properties, and for arbitrary distributed protocols. We show (i) how analysis of protocol quiescence in the train model can be extended to the crowd model, (ii) how to apply the train and crowd model to several distributed membership protocols, (iii) how, even under real churn traces, the train and crowd models are reasonably good at predicting system-wide stability metrics for membership protocols.

international middleware conference | 2008

Moara: flexible and scalable group-based querying system

Steven Y. Ko; Praveen Yalagandula; Indranil Gupta; Vanish Talwar; Dejan S. Milojicic; Subu Iyer

Users and administrators of large-scale infrastructures (e.g., datacenters and PlanetLab) are frequently in need of monitoring groups of machines in the infrastructure. Though there exist several distributed querying systems for this monitoring purpose, they are not group-based; they mostly focus on querying the entire system. In this paper, we present Moara, a new querying system that makes two novel contributions. First, Moara builds aggregation trees for different groups and adaptively maintains the trees to optimize the total message cost. Second, Moara supports a query language allowing groups to be specified implicitly via predicates consisting of arbitrarily nested unions and intersections. Our evaluations on Emulab, on PlanetLab, and with large-scale simulations, demonstrate Moaras ability to answer complex queries within a fraction of a second, to deal with high levels of dynamism in groups, and to incur a low bandwidth overhead per host per query in comparison to existing centralized and distributed aggregation systems.

symposium on operating systems principles | 2005

MON: management overlay networks for distributed systems

Jin Liang; Steven Y. Ko; Indranil Gupta; Klara Nahrstedt

The recent deployment of large distributed computing systems such as content distribution networks and the Planet-Lab has made it possible for researchers and practitioners to experiment with real world, large scale distributed applications. However, running an application in such an environment is difficult, due to the scale and frequent node failures of such systems. Thus, an important tool is needed that helps application developers/deployers to manage their applications. Our goal in this work is to develop MON, an extremely lightweight and failure resilient system for managing distributed applications. MON allows users to execute instant management commands on the distributed computing nodes, such as query the current status of the application, or start/stop a process on the distributed nodes. The commands are propagated to all the nodes and executed on each node, and the results are aggregated and returned back. We believe the ability to execute such instant commands is especially useful for the initial deployment of a distributed application, or for the monitoring and diagnoistics of (unexpected) application failures.

winter simulation conference | 2004

A BGP attack against traffic engineering

Jintae Kim; Steven Y. Ko; David M. Nicol; Xenofontas A. Dimitropoulos; George F. Riley

As the Internet grows, traffic engineering has become a widely-used technique to control the flow of packets. For the inter-domain routing, traffic engineering relies on configurations of the border gateway protocol (BGP). While it is recognized that the misconfiguration of BGP can cause negative effects on the Internet, we consider attack methods that disable traffic engineering regardless of the correctness of configurations. We focus on the redirection of traffic as our attack objective, and present attack scenarios on some dominant sample network topologies to achieve this objective. We also evaluate and validate these attacks using two different discrete-event simulators, one that models BGP behavior on a network, and another that emulates it using direct-execution of working BGP code.

acm ifip usenix international conference on middleware | 2007

New worker-centric scheduling strategies for data-intensive grid applications

Steven Y. Ko; Ramsés Morales; Indranil Gupta

Distributed computations, dealing with large amounts of data, are scheduled in Grid clusters today using either a task-centric mechanism, or a worker-centric mechanism. Because of the large data sets, the execution time is bounded by the cost of data transfer. In this paper, we introduce new worker-centric scheduling strategies that are novel in that they aim to implicitly exploit the locality of interest in order to reduce the cost of data transfer. Many Grid applications are characterized by such a locality of interest, i.e., a file is often accessed by multiple tasks and, more importantly, a set of files that are accessed by one task are also likely to be accessed together by other tasks. Our new deterministic, as well as probabilistic, scheduling algorithms implicitly exploit this feature to improve running time. Our experiments are done with traces of a real Grid application (Coadd), and show that our algorithms are able to achieve utilization of over 90%, while reducing makespan significantly compared to task-centric approaches.

International Journal of Parallel Programming | 2007

Dynamic binary instrumentation and data aggregation on large scale systems

Gregory L. Lee; Martin Schulz; Dong H. Ahn; Andrew R. Bernat; Bronis R. de Supinskil; Steven Y. Ko; Barry Rountree

Dynamic binary instrumentation for performance analysis on large scale architectures such as the IBM Blue Gene/L system (BG/L) poses unique challenges. Their unprecedented scale and often limited OS support require new mechanisms to organize binary instrumentation, to interact with the target application, and to collect the resulting data.We describe the design and current status of a new implementation of the Dynamic Probe Class Library (DPCL) API for large scale systems. DPCL provides an easy to use layer for dynamic instrumentation on parallel MPI applications based on the DynInst dynamic instrumentation library for sequential platforms. Our work includes modifying DynInst to control instrumentation from remote I/O nodes and porting DPCL’s communication for performance data collection to use MRNet, a tree-based overlay network that (TBON) supports scalable multicast and data reduction. We describe extensions to the DPCL API that support instrumentation of task subsets and aggregation of collected performance data.

Explore More