Parag Mhashilkar | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Parag Mhashilkar is active.

Explore More

Publication

Featured researches published by Parag Mhashilkar.

computer science and information engineering | 2009

The Pilot Way to Grid Resources Using glideinWMS

I. Sfiligoi; D C Bradley; Burt Holzman; Parag Mhashilkar; Sanjay Padhi; F. Würthwein

Grid computing has become very popular in big and widespread scientific communities with high computing demands, like high energy physics. Computing resources are being distributed over many independent sites with only a thin layer of Grid middleware shared between them. This deployment model has proven to be very convenient for computing resource providers, but has introduced several problems for the users of the system, the three major being the complexity of job scheduling, the non-uniformity of compute resources, and the lack of good job monitoring.Pilot jobs address all the above problems by creating a virtual private computing pool on top of Grid resources. This paper presents both the general pilot concept, as well as a concrete implementation, called glideinWMS, deployed in the Open Science Grid.

Archive | 2009

ReSS: A Resource Selection Service for the Open Science Grid

G. Garzoglio; Tanya Levshina; Parag Mhashilkar; Steve Timm

The Open Science Grid offers access to hundreds of computing and storage resources via standard Grid interfaces. Before the deployment of an automated resource selection system, users had to submit jobs directly to these resources. They would manually select a resource and specify all relevant attributes in the job description prior to submitting the job. The necessity of a human intervention in resource selection and attribute specification hinders automated job management components from accessing OSG resources and it is inconvenient for the users. The Resource Selection Service (ReSS) project addresses these shortcomings. The system integrates condor technology, for the core match making service, with the gLite CEMon component, for gathering and publishing resource information in the Glue Schema format. Each one of these components communicates over secure protocols via web services interfaces. The system is currently used in production on OSG by the DZero Experiment, the Engagement Virtual Organization, and the Dark Energy. It is also the resource selection service for the Fermilab Campus Grid, FermiGrid. ReSS is considered a lightweight solution to push-based workload management. This paper describes the architecture, performance, and typical usage of the system.

Journal of Physics: Conference Series | 2012

End-To-End Solution for Integrated Workload and Data Management using GlideinWMS and Globus Online

Parag Mhashilkar; Zachary Miller; Rajkumar Kettimuthu; G. Garzoglio; Burt Holzman; Cathrin Weiss; Xi Duan; Lukasz Lacinski

Grid computing has enabled scientific communities to effectively share computing resources distributed over many independent sites. Several such communities, or Virtual Organizations (VO), in the Open Science Grid and the European Grid Infrastructure use the GlideinWMS system to run complex application work-flows. GlideinWMS is a pilot-based workload management system (WMS) that creates an on-demand, dynamically-sized overlay Condor batch system on Grid resources. While the WMS addresses the management of compute resources, however, data management in the Grid is still the responsibility of the VO. In general, large VOs have resources to develop complex custom solutions, while small VOs would rather push this responsibility to the infrastructure. The latter requires a tight integration of the WMS and the data management layers, an approach still not common in modern Grids. In this paper we describe a solution developed to address this shortcoming in the context of Center for Enabling Distributed Peta-scale Science (CEDPS) by integrating GlideinWMS with Globus Online (GO). Globus Online is a fast, reliable file transfer service that makes it easy for any user to move data. The solution eliminates the need for the users to provide custom data transfer solutions in the application by making this functionality part of the GlideinWMS infrastructure. To achieve this, GlideinWMS uses the file transfer plug-in architecture of Condor. The paper describes the system architecture and how this solution can be extended to support data transfer services other than Globus Online when used with Condor or GlideinWMS.

Journal of Physics: Conference Series | 2015

Cloud Services for the Fermilab Scientific Stakeholders

Steven Timm; G. Garzoglio; Parag Mhashilkar; J Boyd; G Bernabeu; N Sharma; N Peregonow; Hyunwoo Kim; Seo-Young Noh; S Palur; Ioan Raicu

As part of the Fermilab/KISTI cooperative research project, Fermilab has successfully run an experimental simulation workflow at scale on a federation of Amazon Web Services (AWS), FermiCloud, and local FermiGrid resources. We used the CernVM-FS (CVMFS) file system to deliver the application software. We established Squid caching servers in AWS as well, using the Shoal system to let each individual virtual machine find the closest squid server. We also developed an automatic virtual machine conversion system so that we could transition virtual machines made on FermiCloud to Amazon Web Services. We used this system to successfully run a cosmic ray simulation of the NOvA detector at Fermilab, making use of both AWS spot pricing and network bandwidth discounts to minimize the cost. On FermiCloud we also were able to run the workflow at the scale of 1000 virtual machines, using a private network routable inside of Fermilab. We present in detail the technological improvements that were used to make this work a reality.

Journal of Physics: Conference Series | 2014

Big Data Over a 100G Network at Fermilab

G. Garzoglio; Parag Mhashilkar; Hyunwoo Kim; Dave Dykstra; Marko Slyz

As the need for Big Data in science becomes ever more relevant, networks around the world are upgrading their infrastructure to support high-speed interconnections. To support its mission, the high-energy physics community as a pioneer in Big Data has always been relying on the Fermi National Accelerator Laboratory to be at the forefront of storage and data movement. This need was reiterated in recent years with the data-taking rate of the major LHC experiments reaching tens of petabytes per year. At Fermilab, this resulted regularly in peaks of data movement on the Wide area network (WAN) in and out of the laboratory of about 30 Gbit/s and on the Local are network (LAN) between storage and computational farms of 160 Gbit/s. To address these ever increasing needs, as of this year Fermilab is connected to the Energy Sciences Network (ESnet) through a 100 Gb/s link. To understand the optimal system-and application-level configuration to interface computational systems with the new highspeed interconnect, Fermilab has deployed a Network Research & Development facility connected to the ESnet 100G Testbed. For the past two years, the High Throughput Data Program (HTDP) has been using the Testbed to identify gaps in data movement middleware [5] when transferring data at these high-speeds. The program has published evaluations of technologies typically used in High Energy Physics, such as GridFTP [4], XrootD [9], and Squid [8]. This work presents the new R&D facility and the continuation of the evaluation program.

Journal of Physics: Conference Series | 2010

ReSS: Resource Selection Service for National and Campus Grid Infrastructure

Parag Mhashilkar; G. Garzoglio; Tanya Levshina; Steve Timm

The Open Science Grid (OSG) offers access to around hundred Compute elements (CE) and storage elements (SE) via standard Grid interfaces. The Resource Selection Service (ReSS) is a push-based workload management system that is integrated with the OSG information systems and resources. ReSS integrates standard Grid tools such as Condor, as a brokering service and the gLite CEMon, for gathering and publishing resource information in GLUE Schema format. ReSS is used in OSG by Virtual Organizations (VO) such as Dark Energy Survey (DES), DZero and Engagement VO. ReSS is also used as a Resource Selection Service for Campus Grids, such as FermiGrid. VOs use ReSS to automate the resource selection in their workload management system to run jobs over the grid. In the past year, the system has been enhanced to enable publication and selection of storage resources and of any special software or software libraries (like MPI libraries) installed at computing resources. In this paper, we discuss the Resource Selection Service, its typical usage on the two scales of a National Cyber Infrastructure Grid, such as OSG, and of a campus Grid, such as FermiGrid.

ieee international conference on high performance computing data and analytics | 2012

Poster: Big Data Networking at Fermilab

Phillip J. Demar; David Dykstra; G. Garzoglio; Parag Mhashilkar; Anupam Rajendran; Wenji Wu

Exascale science translates to big data. In the case of the Large Hadron Collider (LHC), the data is not only immense, it is also globally distributed. Fermilab is host to the LHC Compact Muon Solenoid (CMS) experiments US Tier-1 Center, the largest of the LHC Tier-1s. The Laboratory must deal with both scaling and wide-area distribution challenges in processing its CMS data. Fortunately, evolving technologies in the form of 100Gigabit ethernet, multi-core architectures, and GPU processing provide tools to help meet these challenges. Current Fermilab R&D efforts in these areas include optimization of network I/O handling in multi-core systems, modification of middleware to improve application performance in 100GE network environments, and network path reconfiguration and analysis for effective use of high bandwidth networks. This poster will describe the ongoing network-related R&D activities at Fermilab as a mosaic of efforts that combine to facilitate big data processing and movement.

Journal of Physics: Conference Series | 2012

Identifying Gaps in Grid Middleware on Fast Networks with the Advanced Networking Initiative

Dave Dykstra; G. Garzoglio; Hyunwoo Kim; Parag Mhashilkar

As of 2012, a number of US Department of Energy (DOE) National Laboratories have access to a 100 Gb/s wide-area network backbone. The ESnet Advanced Networking Initiative (ANI) project is intended to develop a prototype network, based on emerging 100 Gb/s Ethernet technology. The ANI network will support DOEs science research programs. A 100 Gb/s network test bed is a key component of the ANI project. The test bed offers the opportunity for early evaluation of 100Gb/s network infrastructure for supporting the high impact data movement typical of science collaborations and experiments. In order to make effective use of this advanced infrastructure, the applications and middleware currently used by the distributed computing systems of large-scale science need to be adapted and tested within the new environment, with gaps in functionality identified and corrected. As a user of the ANI test bed, Fermilab aims to study the issues related to end-to-end integration and use of 100 Gb/s networks for the event simulation and analysis applications of physics experiments. In this paper we discuss our findings from evaluating existing HEP Physics middleware and application components, including GridFTP, Globus Online, etc. in the high-speed environment. These will include possible recommendations to the system administrators, application and middleware developers on changes that would make production use of the 100 Gb/s networks, including data storage, caching and wide area access.

Journal of Physics: Conference Series | 2010

Metrics correlation and analysis service (MCAS)

Andrew Baranovski; Dave Dykstra; G. Garzoglio; Ted Hesselroth; Parag Mhashilkar; Tanya Levshina

The complexity of Grid workflow activities and their associated software stacks inevitably involves multiple organizations, ownership, and deployment domains. In this setting, important and common tasks such as the correlation and display of metrics and debugging information (fundamental ingredients of troubleshooting) are challenged by the informational entropy inherent to independently maintained and operated software components. Because such an information pool is disorganized, it is a difficult environment for business intelligence analysis i.e. troubleshooting, incident investigation, and trend spotting. The mission of the MCAS project is to deliver a software solution to help with adaptation, retrieval, correlation, and display of workflow-driven data and of type-agnostic events, generated by loosely coupled or fully decoupled middleware.

Journal of Physics: Conference Series | 2014