Dave Dykstra
Fermilab
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dave Dykstra.
Prepared for International Conference on Computing in High Energy and Nuclear Physics (CHEP 07), Victoria, BC, Canada, 2-7 Sep 2007 | 2008
Barry Blumenfeld; Dave Dykstra; L Lueking; E Wicklund
The CMS experiment at the LHC has established an infrastructure using the FroNTier framework to deliver conditions (i.e. calibration, alignment, etc.) data to processing clients worldwide. FroNTier is a simple web service approach providing client HTTP access to a central database service. The system for CMS has been developed to work with POOL which provides object relational mapping between the C++ clients and various database technologies. Because of the read only nature of the data, Squid proxy caching servers are maintained near clients and these caches provide high performance data access. Several features have been developed to make the system meet the needs of CMS including careful attention to cache coherency with the central database, and low latency loading required for the operation of the online High Level Trigger. The ease of deployment, stability of operation, and high performance make the FroNTier approach well suited to the GRID environment being used for CMS offline, as well as for the online environment used by the CMS High Level Trigger. The use of standard software, such as Squid and various monitoring tools, makes the system reliable, highly configurable and easily maintained. We describe the architecture, software, deployment, performance, monitoring and overall operational experience for the system.
Journal of Physics: Conference Series | 2010
Dave Dykstra; L. Lueking
The CMS detector project loads copies of conditions data to over 100,000 computer cores worldwide by using a software subsystem called Frontier. This subsystem translates database queries into HTTP, looks up the results in a central database at CERN, and caches the results in an industry-standard HTTP proxy/caching server called Squid. One of the most challenging aspects of any cache system is coherency, that is, ensuring that changes made to the underlying data get propagated out to all clients in a timely manner. Recently, the Frontier system was enhanced to drastically reduce the time for changes to be propagated everywhere without heavily loading servers. The propagation time is now as low as 15 minutes for some kinds of data and no more than 60 minutes for the rest of the data. This was accomplished by taking advantage of an HTTP and Squid feature called If-Modified-Since. In order to use this feature, the Frontier server sends a Last-Modified timestamp, but since modification times are not normally tracked by Oracle databases, a PL/SQL program was developed to track the modification times of database tables. We discuss the details of this caching scheme and the obstacles overcome including database and Squid bugs.
Journal of Physics: Conference Series | 2011
Dave Dykstra
The World-Wide-Web has scaled to an enormous size. The largest single contributor to its scalability is the HTTP protocol, particularly when used in conformity to REST (REpresentational State Transfer) principles. High Energy Physics (HEP) computing also has to scale to an enormous size, so it makes sense to base much of it on RESTful protocols. Frontier, which reads databases with an HTTP-based RESTful protocol, has successfully scaled to deliver production detector conditions data from both the CMS and ATLAS LHC detectors to hundreds of thousands of computer cores worldwide. Frontier is also able to re-use a large amount of standard software that runs the Web: on the clients, caches, and servers. I discuss the specific ways in which HTTP and REST enable high scalability for Frontier. I also briefly discuss another protocol used in HEP computing that is HTTP-based and RESTful, and another protocol that could benefit from it. My goal is to encourage HEP protocol designers to consider HTTP and REST whenever the same information is needed in many places.
Journal of Physics: Conference Series | 2014
G. Garzoglio; Parag Mhashilkar; Hyunwoo Kim; Dave Dykstra; Marko Slyz
As the need for Big Data in science becomes ever more relevant, networks around the world are upgrading their infrastructure to support high-speed interconnections. To support its mission, the high-energy physics community as a pioneer in Big Data has always been relying on the Fermi National Accelerator Laboratory to be at the forefront of storage and data movement. This need was reiterated in recent years with the data-taking rate of the major LHC experiments reaching tens of petabytes per year. At Fermilab, this resulted regularly in peaks of data movement on the Wide area network (WAN) in and out of the laboratory of about 30 Gbit/s and on the Local are network (LAN) between storage and computational farms of 160 Gbit/s. To address these ever increasing needs, as of this year Fermilab is connected to the Energy Sciences Network (ESnet) through a 100 Gb/s link. To understand the optimal system-and application-level configuration to interface computational systems with the new highspeed interconnect, Fermilab has deployed a Network Research & Development facility connected to the ESnet 100G Testbed. For the past two years, the High Throughput Data Program (HTDP) has been using the Testbed to identify gaps in data movement middleware [5] when transferring data at these high-speeds. The program has published evaluations of technologies typically used in High Energy Physics, such as GridFTP [4], XrootD [9], and Squid [8]. This work presents the new R&D facility and the continuation of the evaluation program.
Journal of Physics: Conference Series | 2017
Derek Weitzel; Brian Bockelman; Dave Dykstra; Jakob Blomer; Ren Meusel
Data federations have become an increasingly common tool for large collaborations such as CMS and Atlas to efficiently distribute large data files. Unfortunately, these typically are implemented with weak namespace semantics and a non-POSIX API. On the other hand, CVMFS has provided a POSIX-compliant read-only interface for use cases with a small working set size (such as software distribution). The metadata required for the CVMFS POSIX interface is distributed through a caching hierarchy, allowing it to scale to the level of about a hundred thousand hosts. In this paper, we will describe our contributions to CVMFS that merges the data scalability of XRootD-based data federations (such as AAA) with metadata scalability and POSIX interface of CVMFS. We modified CVMFS so it can serve unmodified files without copying them to the repository server. CVMFS 2.2.0 is also able to redirect requests for data files to servers outside of the CVMFS content distribution network. Finally, we added the ability to manage authorization and authentication using security credentials such as X509 proxy certificates. We combined these modifications with the OSGs StashCache regional XRootD caching infrastructure to create a cached data distribution network. We will show performance metrics accessing the data federation through CVMFS compared to direct data federation access. Additionally, we will discuss the improved user experience of providing access to a data federation through a POSIX filesystem.
Computing and Software for Big Science | 2017
Burt Holzman; M. Girone; Dirk Hufnagel; Dave Dykstra; Hyunwoo Kim; Steve Timm; Oliver Gutsche; S. Fuess; L. A. T. Bauerdick; Anthony Tiradani; Panagiotis Spentzouris; N. Magini; Brian Bockelman; Eric Wayne Vaandering; Robert Kennedy; D. Mason; G. Garzoglio; I. Fisk
Historically, high energy physics computing has been performed on large purpose-built computing systems. These began as single-site compute facilities, but have evolved into the distributed computing grids used today. Recently, there has been an exponential increase in the capacity and capability of commercial clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is a growing interest among the cloud providers to demonstrate the capability to perform large-scale scientific computing. In this paper, we discuss results from the CMS experiment using the Fermilab HEPCloud facility, which utilized both local Fermilab resources and virtual machines in the Amazon Web Services Elastic Compute Cloud. We discuss the planning, technical challenges, and lessons learned involved in performing physics workflows on a large-scale set of virtualized resources. In addition, we will discuss the economics and operational efficiencies when executing workflows both in the cloud and on dedicated resources.
Journal of Physics: Conference Series | 2012
Barry Blumenfeld; Dave Dykstra; P. Kreuzer; Ran Du; Weizhen Wang
The Frontier framework is used in the CMS experiment at the LHC to deliver conditions data to processing clients worldwide, including calibration, alignment, and configuration information. Each central server at CERN, called a Frontier Launchpad, uses tomcat as a servlet container to establish the communication between clients and the central Oracle database. HTTP-proxy Squid servers, located close to clients, cache the responses to queries in order to provide high performance data access and to reduce the load on the central Oracle database. Each Frontier Launchpad also has its own reverse-proxy Squid for caching. The three central servers have been delivering about 5 million responses every day since the LHC startup, containing about 40 GB data in total, to more than one hundred Squid servers located worldwide, with an average response time on the order of 10 milliseconds. The Squid caches deployed worldwide process many more requests per day, over 700 million, and deliver over 40 TB of data. Several monitoring tools of the tomcat log files, the accesses of the Squids on the central Launchpad servers, and the availability of remote Squids have been developed to guarantee the performance of the service and make the system easily maintainable. Following a brief introduction of the Frontier framework, we describe the performance of this highly reliable and stable system, detail monitoring concerns and their deployment, and discuss the overall operational experience from the first two years of LHC data-taking.
Proceedings of the 11th Annual Cyber and Information Security Research Conference on | 2016
Jeny Teheran; Dave Dykstra; Mine Altunay
The Fermi National Accelerator Laboratory (FNAL) is facing the challenge of providing scientific data access and grid submission to scientific collaborations that span the globe but are hosted at FNAL. Researchers in these collaborations are currently required to register as FNAL users and obtain FNAL credentials to access grid resources to perform their scientific computations. These requirements burden researchers with managing additional authentication credentials, and put additional load on FNAL for managing user identities. Our design integrates the existing InCommon federated identity infrastructure, CILogon Basic CA, and MyProxy with the FNAL grid submission system to provide secure access for users from diverse experiments and collaborations without requiring each user to have authentication credentials from FNAL. The design automates the handling of certificates, so users do not need to manage them manually. Although the initial implementation is for FNALs grid submission system, the design and the core of the implementation are general and could be applied to other distributed computing systems.
Journal of Physics: Conference Series | 2012
Dave Dykstra; G. Garzoglio; Hyunwoo Kim; Parag Mhashilkar
As of 2012, a number of US Department of Energy (DOE) National Laboratories have access to a 100 Gb/s wide-area network backbone. The ESnet Advanced Networking Initiative (ANI) project is intended to develop a prototype network, based on emerging 100 Gb/s Ethernet technology. The ANI network will support DOEs science research programs. A 100 Gb/s network test bed is a key component of the ANI project. The test bed offers the opportunity for early evaluation of 100Gb/s network infrastructure for supporting the high impact data movement typical of science collaborations and experiments. In order to make effective use of this advanced infrastructure, the applications and middleware currently used by the distributed computing systems of large-scale science need to be adapted and tested within the new environment, with gaps in functionality identified and corrected. As a user of the ANI test bed, Fermilab aims to study the issues related to end-to-end integration and use of 100 Gb/s networks for the event simulation and analysis applications of physics experiments. In this paper we discuss our findings from evaluating existing HEP Physics middleware and application components, including GridFTP, Globus Online, etc. in the high-speed environment. These will include possible recommendations to the system administrators, application and middleware developers on changes that would make production use of the 100 Gb/s networks, including data storage, caching and wide area access.
Journal of Physics: Conference Series | 2010
Andrew Baranovski; Dave Dykstra; G. Garzoglio; Ted Hesselroth; Parag Mhashilkar; Tanya Levshina
The complexity of Grid workflow activities and their associated software stacks inevitably involves multiple organizations, ownership, and deployment domains. In this setting, important and common tasks such as the correlation and display of metrics and debugging information (fundamental ingredients of troubleshooting) are challenged by the informational entropy inherent to independently maintained and operated software components. Because such an information pool is disorganized, it is a difficult environment for business intelligence analysis i.e. troubleshooting, incident investigation, and trend spotting. The mission of the MCAS project is to deliver a software solution to help with adaptation, retrieval, correlation, and display of workflow-driven data and of type-agnostic events, generated by loosely coupled or fully decoupled middleware.