Ilya Baldin
University of North Carolina at Chapel Hill
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ilya Baldin.
International Journal of High Performance Computing Applications | 2017
Ewa Deelman; Christopher D. Carothers; Anirban Mandal; Brian Tierney; Jeffrey S. Vetter; Ilya Baldin; Claris Castillo; Gideon Juve; Dariusz Król; V. E. Lynch; Benjamin Mayer; Jeremy S. Meredith; Thomas Proffen; Paul Ruth; Rafael Ferreira da Silva
Computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Thus, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation and data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.
international conference on communications | 2015
Xinming Chen; Tilman Wolf; James Griffioen; Onur Ascigil; Rudra Dutta; George N. Rouskas; Shireesh Bhat; Ilya Baldin; Kenneth L. Calvert
Deployment of innovative new networking services requires support by network providers. Since economic motivation plays an important role for network providers, it is critical that a network architecture intrinsically considers economic relationships. We present the design of a protocol that associates access to network services with economic contracts. We show how this protocol can be realized in fundamentally different ways, using out-of-band signaling and in-band signaling, based on two different prototype implementations. We present results that show the effectiveness of the proposed protocol and thus demonstrate a first step toward realizing an economy plane for the Internet.
scalable information systems | 2015
Alexander Willner; Chrysa A. Papagianni; Mary Giatili; Paola Grosso; Mohamed Morsey; Yahya Al-Hazmi; Ilya Baldin
The Internet remains an unfinished work. There are several approaches to enhancing it that have been experimentally validated within federated testbed environments. To best gain scientific knowledge from these studies, reproducibility and automation are needed in all areas of the experiment life cycle. Within the GENI and FIRE context, several architectures and protocols have been developed for this purpose. However, a major open research issue remains, namely the description and discovery of the heterogeneous resources involved. To remedy this, we propose a semantic information model that can be used to allow declarative interoperability, build dependency graphs, validate requests, infer knowledge and conduct complex queries. The requirements for such an information model have been extracted from current international Future Internet research projects and the practicality of the model is being evaluated through initial implementations. The main outcome of this work is the definition of the Open-Multinet Upper Ontology and related sub-ontologies, which can be used to describe and manage federated infrastructures and their resources.
network aware data management | 2013
Anirban Mandal; Paul Ruth; Ilya Baldin; Yufeng Xin; Claris Castillo; Mats Rynge; Ewa Deelman
This paper presents a performance evaluation of scientific workflows on networked cloud systems with particular emphasis on evaluating the effect of provisioned network bandwidth on application I/O performance. The experiments were run on ExoGENI, a widely distributed networked infrastructure as a service (NIaaS) testbed. ExoGENI orchestrates a federation of independent cloud sites located around the world along with backbone circuit providers. The evaluation used a representative data-intensive scientific workflow application called Montage. The application was deployed on a virtualized HTCondor environment provisioned dynamically from the ExoGENI networked cloud testbed, and managed by the Pegasus workflow manager. The results of our experiments show the effect of modifying provisioned network bandwidth on disk I/O throughput and workflow execution time. The marginal benefit as perceived by the workflow reduces as the network bandwidth allocation increases to a point where disk I/O saturates. There is little or no benefit from increasing network bandwidth beyond this inflection point. The results also underline the importance of network and I/O performance isolation for predictable application performance, and are applicable for general data-intensive workloads. Insights from this work will also be useful for real-time monitoring, application steering and infrastructure planning for data-intensive workloads on networked cloud platforms.
acm special interest group on data communication | 2014
Ilya Baldin; Shu Huang; Rajesh Gopidi
In this paper we address the problem of multi-domain multi-provider SDN-based networks and propose an architecture for controlling them using a collection of agents responsible for ownership and use of SDN resources. Instead of posing the problem in terms of controller coordination for the purpose of establishing connections across the network, we propose to treat it as a resource-management problem with explicit delegations of consumable resources by domains to the users of those resources. The advantage of our approach is in explicitly exposing the resource delegation abstraction. It exposes the control of network elements in different domains by different controllers and permits generalizing several existing multi-domain architectures, making the selection of which one to apply a deployment choice, rather than an architectural principle. We propose a rigorous algebraic formulation for the SDN resource delegation problem and describe the prototyping work in implementing this framework and some of its applications.
international parallel and distributed processing symposium | 2016
Anirban Mandal; Paul Ruth; Ilya Baldin; Dariusz Król; Gideon Juve; Rajiv Mayani; Rafael Ferreira da Silva; Ewa Deelman; Jeremy S. Meredith; Jeffrey S. Vetter; V. E. Lynch; Benjamin Mayer; James Wynne; Mark P. Blanco; Christopher D. Carothers; Justin M. LaPre; Brian Tierney
Modern science is often conducted on large scale, distributed, heterogeneous and high-performance computing infrastructures. Increasingly, the scale and complexity of both the applications and the underlying execution platforms have been growing. Scientific workflows have emerged as a flexible representation to declaratively express complex applications with data andcontrol dependences. However, it is extremely challengingfor scientists to execute their science workflows in a reliable and scalable way due to a lack of understanding of expected and realistic behavior of complex scientific workflows on large scale and distributed HPC systems. This is exacerbated by failures and anomalies in largescale systems and applications, which makes detecting, analyzing and acting on anomaly events challenging. In this work, we present a prototype of an end-to-end system for modeling and diagnosing the runtime performance of complex scientific workflows. We interfaced the Pegasus workflow management system, Aspen performance modeling, monitoring and anomaly detection into an integrated framework that not only improves the understanding of complex scientific applications on large scale complex infrastructure, but also detects anomalies and supports adaptivity. We present a black box modeling tool, a comprehensive online monitoring system, and anomaly detection algorithms that employ the models and monitoring data to detect anomaly events. We present an evaluation of the system with a Spallation Neutron Source workflow as a driving use case.
The GENI Book | 2016
Jeffrey S. Chase; Ilya Baldin
ORCA is an extensible platform for building infrastructure servers based on a foundational leasing abstraction. These servers include Aggregate Managers for diverse resource providers and stateful controllers for dynamic slices. ORCA also defines a brokering architecture and control framework to link these servers together into a federated multi-domain deployment. This chapter reviews the architectural principles of ORCA and outlines how they enabled and influenced the design of the ExoGENI Racks deployment, which is built on the ORCA platform. It also sets ORCA in context with the GENI architecture as it has evolved.
ieee acm international conference utility and cloud computing | 2015
Anirban Mandal; Paul Ruth; Ilya Baldin; Yufeng Xin; Claris Castillo; Gideon Juve; Mats Rynge; Ewa Deelman; Jeffrey S. Chase
Recent advances in cloud technologies and on-demand network circuits have created an unprecedented opportunity to enable complex data-intensive scientific applications to run on dynamic, networked cloud infrastructure. However, there is a lack of tools for supporting high-level applications like scientific workflows on dynamically provisioned, virtualized, networked IaaS (NIaaS) systems. In this paper, we propose an end-to-end system consisting of application-aware and application-independent controllers that provision and adapt complex scientific workflows on NIaaS systems. The application-independent controller enhances the utility of NIaaS systems for higher-level applications by closing the gap between application abstractions and resource provisioning constructs. We also present our approach to predicting dynamic resource requirements for workflows using an application-aware controller that proactively evaluates alternative candidate resource allotments using workflow introspection. We show how these high-level resource requirements can be automatically transformed to low-level NIaaS operations to actuate infrastructure adaptation. The results of our evaluations show that we can make fairly accurate predictions, and the interplay of prediction and adaptation can balance performance and utilization for a representative data-intensive workflow.
international conference on computer communications | 2016
Mohamed Morsey; Alexander Willner; Robyn Loughnane; Mary Giatili; Chrysa A. Papagianni; Ilya Baldin; Paola Grosso; Yahya Al-Hazmi
In cloud environments, the process of matching requests from users with the available computing resources is a challenging task. This is even more complex in federated environments, where multiple providers cooperate to offer enhanced services, suitable for distributed applications. In order to resolve these issues, a powerful modeling methodology can be adopted to facilitate expressing both the request and the available computing resources. This, in turn, leads to an effective matching between the request and the provisioned resources. For this purpose, the Open-Multinet ontologies were developed, which leverage the expressive power of Semantic Web technologies to describe infrastructure components and services. These ontologies have been adopted in a number of federated testbeds. In this article, DBcloud is presented, a system that provides access to Open-Multinet open data via endpoints. DBcloud can be used to simplify the process of discovery and provisioning of cloud resources and services.
international teletraffic congress | 2014
Yufeng Xin; Ilya Baldin; Chris Heermann; Anirban Mandal; Paul Ruth
In this paper, we study the problem of provisioning large-scale virtual clusters over federated clouds connected by multi-domain, layer-2 wide area networks. We first present the virtual cluster request abstraction and the abstraction models for substrate resource pools. Based on these two abstraction models, we developed a novel layer-2 exchange mechanism and an implementation of it in a multi-domain networked cloud environment. The design of the mechanism takes into consideration the realistic constraints in current network and cloud systems. We show that efficient cluster splitting, cloud data center selection and resource allocation algorithms can be developed to provision large-scale virtual clusters across cloud sites. A prototype system has been deployed and integrated into the ExoGENI testbed for about a year, and is being heavily used by scientific and data analytic applications.