Andrew Hanushevsky
Stanford University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Andrew Hanushevsky.
Proceedings of SPIE | 2006
Jacek Becla; Andrew Hanushevsky; Sergei Nikolaev; Ghaleb Abdulla; Alexander S. Szalay; Maria A. Nieto-santisteban; Ani Thakar; Jim Gray
The 3.2 giga-pixel LSST camera will produce approximately half a petabyte of archive images every month. These data need to be reduced in under a minute to produce real-time transient alerts, and then added to the cumulative catalog for further analysis. The catalog is expected to grow about three hundred terabytes per year. The data volume, the real-time transient alerting requirements of the LSST, and its spatio-temporal aspects require innovative techniques to build an efficient data access system at reasonable cost. As currently envisioned, the system will rely on a database for catalogs and metadata. Several database systems are being evaluated to understand how they perform at these data rates, data volumes, and access patterns. This paper describes the LSST requirements, the challenges they impose, the data access philosophy, results to date from evaluating available database technologies against LSST requirements, and the proposed database architecture to meet the data challenges.
Journal of Physics: Conference Series | 2014
Robert Gardner; S. Campana; G. Duckeck; J. Elmsheuser; Andrew Hanushevsky; Friedrich G Hönig; Jan Iven; F. Legger; I. Vukotic; Wei Yang
In the past year the ATLAS Collaboration accelerated its program to federate data storage resources using an architecture based on XRootD with its attendant redirection and storage integration services. The main goal of the federation is an improvement in the data access experience for the end user while allowing more efficient and intelligent use of computing resources. Along with these advances come integration with existing ATLAS production services (PanDA and its pilot services) and data management services (DQ2, and in the next generation, Rucio). Functional testing of the federation has been integrated into the standard ATLAS and WLCG monitoring frameworks and a dedicated set of tools provides high granularity information on its current and historical usage. We use a federation topology designed to search from the sites local storage outward to its region and to globally distributed storage resources. We describe programmatic testing of various federation access modes including direct access over the wide area network and staging of remote data files to local disk. To support job-brokering decisions, a time-dependent cost-of-data-access matrix is made taking into account network performance and key site performance factors. The systems response to production-scale physics analysis workloads, either from individual end-users or ATLAS analysis services, is discussed.
acm symposium on applied computing | 2002
Heinz Stockinger; Andrew Hanushevsky
Data distribution and replication in distributed systems require special purpose middleware tools for accessing replicated data. Data Grids, special forms of systems distributed over wide-area networks, need to handle data management issues like distribution and replication of large amounts of data in the Tera- and Petabyte scale. Replica catalogues are used for cataloguing and locating replicated files in distributed sites all around the globe. We present a novel and administratively scalable approach for distributing a replica catalogue and resolving file location information by using HTTP redirection. HTTP redirection servers managing local file catalogues allow for greater flexibility and local file management autonomy whereas a global replica catalogue provides the necessary mapping of logical files to individual sites. By distributing the catalogues a site can autonomously move files for load balancing within a site without notifying a global replica catalogue. Our approach scales well in terms of catalogue administration to a large number of sites and file entries and thus establishes a powerful middleware service. We present the design and implementation of our catalogue redirection servers and report on promising experimental results.
ieee/wic/acm international conference on intelligent agent technology | 2005
Fabrizio Furano; Andrew Hanushevsky
In this paper we describe the xrootd file access system, designed in collaboration between the Stanford Linear Accelerator Laboratory (SLAC), USA and Istituto Nazionale di Fisica Nucleare (INFN), Padova, Italy. The system was designed to provide access to over 10¿7 files representing several petabytes of experimental physics data. We analyze the agent-based query algorithm employed by this system to provide a scalable means of locating files that are scattered across a very large file server cluster. In the process we introduce the concept of a passive bidding scheme and describe its relationship to file serving commitments as a way to substantially reduce message traffic.
workshop on software and performance | 2005
Andrew Hanushevsky; Bill Weeks
In this talk, I discuss practical approaches to designing high performance data access systems and how design choices impact performance measurement methodology. The xrootd data access system is used as an example of how far the performance curve can be pushed, invariable tradeoffs that occur, and when one should stop pursuing the ultimate ideal.
grid computing | 2004
Andrew Hanushevsky; Heinz Stockinger
In data intensive sciences like High Energy Physics, large amounts of data are typically distributed and/or replicated to several sites. Although there exist various ways to store and access this data within a Grid environment, site security policies often prohibit end user access to remote sites. These limitations are typically overcome using a proxy service that requires limited network connections to and from remote sites. We present a novel proxy server for the xrootd data server that provides access to data stored in an object-oriented data store (typically in ROOT format). Our proxy service operates in a structured peer-to-peer environment and allows for fault tolerant and reliable access to remote data.
Journal Name: J.Phys.Conf.Ser.119:072016,2008; Conference: Prepared for International Conference on Computing in High Energy and Nuclear Physics (CHEP 07), Victoria, BC, Canada, 2-7 Sep 2007 | 2008
Fabrizio Furano; Andrew Hanushevsky
High Energy Physics data processing and analysis applications typically deal with the problem of accessing and processing data at high speed. Recent studies, development and test work have shown that the latencies due to data access can often be hidden by parallelizing them with the data processing, thus giving the ability to have applications which process remote data with a high level of efficiency. Techniques and algorithms able to reach this result have been implemented in the client side of the Scalla/xrootd system, and in this contribution we describe the results of some tests done in order to compare their performance and characteristics. These techniques, if used together with multiple streams data access, can also be effective in allowing to efficiently and transparently deal with data repositories accessible via a Wide Area Network.
high performance distributed computing | 2000
Jacek Becla; Andrew Hanushevsky
The BaBar experiment at the Stanford Linear Accelerator Center (SLAC) is designed to perform a high precision investigation of the decays of B-meson produced from electron-positron interactions. The experiment, started in May 1999, will generate approximately 300 TB/year of data for 10 years. All of the data will reside in objectivity databases (object oriented databases), accessible via the Advanced Multi-threaded Server (AMS). To date, over 70 TB of data have been placed in Objectivity/DB, making it one of the largest databases in the world. Providing access to such a large quantity of data through a database server is a daunting task. A full-scale testbed environment had to be developed to tune various software parameters and a fundamental change had to occur in the AMS architecture to allow it to scale past several hundred terabytes of data. Additionally, several protocol extensions had to be implemented to provide practical access to large quantities of data. The paper describes the design of the database and the changes that we needed to make in the AMS for scalability reasons and how the lessons we learned would be applicable to virtually any kind of database server seeking to operate in the Petabyte region.
high performance distributed computing | 2006
Chuck Boeheim; Stephen J. Gowdy; Andrew Hanushevsky; David Leith; Randy Melen; Richard Mount; Teela Pulliam; Bill Weeks
Scientific advances depend increasingly on agility in the analysis of data, along with access to massive computation. The PetaCache project addresses the data-access issue by recognizing that the future for intense, non-sequential, data access must be based on low latency solid-state storage. The PetaCache architecture aims at a minimum unit cost, highly scalable hardware and software approach that can take advantage of existing and emerging solid-state storage technologies providing data-access latencies in the range 10-100 microseconds. A prototype system has been constructed as a cluster of 64 nodes hosting a total of one terabyte of memory. Client processors retrieve data from the data-server nodes over a switched Ethernet infrastructure using SLACs xrootd data-server software. The system is in use for performance testing, optimization and trial deployments for scientific data analysis. It also provides an excellent platform for testing new data access paradigms
Archive | 2017
Chin Fang; R 'Les' A. Cottrell; Andrew Hanushevsky; Wilko Kroeger; Wei Yang
We report on the development of ZX software providing high performance data transfer and encryption. The design scales in: computation power, network interfaces, and IOPS while carefully balancing the available resources. Two U.S. patent-pending algorithms help tackle data sets containing lots of small files and very large files, and provide insensitivity to network latency. It has a cluster-oriented architecture, using peer-to-peer technologies to ease deployment, operation, usage, and resource discovery. Its unique optimizations enable effective use of flash memory. Using a pair of existing data transfer nodes at SLAC and NERSC, we compared its performance to that of bbcp and GridFTP and determined that they were comparable. With a proof of concept created using two four-node clusters with multiple distributed multi-core CPUs, network interfaces and flash memory, we achieved 155Gbps memory-to-memory over a 2x100Gbps link aggregated channel and 70Gbps file-to-file with encryption over a 5000 mile 100Gbps link.