Andrew Hanushevsky | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andrew Hanushevsky is active.

Explore More

Publication

Featured researches published by Andrew Hanushevsky.

Proceedings of SPIE | 2006

Designing a Multi-Petabyte Database for LSST

Jacek Becla; Andrew Hanushevsky; Sergei Nikolaev; Ghaleb Abdulla; Alexander S. Szalay; Maria A. Nieto-santisteban; Ani Thakar; Jim Gray

The 3.2 giga-pixel LSST camera will produce approximately half a petabyte of archive images every month. These data need to be reduced in under a minute to produce real-time transient alerts, and then added to the cumulative catalog for further analysis. The catalog is expected to grow about three hundred terabytes per year. The data volume, the real-time transient alerting requirements of the LSST, and its spatio-temporal aspects require innovative techniques to build an efficient data access system at reasonable cost. As currently envisioned, the system will rely on a database for catalogs and metadata. Several database systems are being evaluated to understand how they perform at these data rates, data volumes, and access patterns. This paper describes the LSST requirements, the challenges they impose, the data access philosophy, results to date from evaluating available database technologies against LSST requirements, and the proposed database architecture to meet the data challenges.

Journal of Physics: Conference Series | 2014

Data federation strategies for ATLAS using XRootD

Robert Gardner; S. Campana; G. Duckeck; J. Elmsheuser; Andrew Hanushevsky; Friedrich G Hönig; Jan Iven; F. Legger; I. Vukotic; Wei Yang

In the past year the ATLAS Collaboration accelerated its program to federate data storage resources using an architecture based on XRootD with its attendant redirection and storage integration services. The main goal of the federation is an improvement in the data access experience for the end user while allowing more efficient and intelligent use of computing resources. Along with these advances come integration with existing ATLAS production services (PanDA and its pilot services) and data management services (DQ2, and in the next generation, Rucio). Functional testing of the federation has been integrated into the standard ATLAS and WLCG monitoring frameworks and a dedicated set of tools provides high granularity information on its current and historical usage. We use a federation topology designed to search from the sites local storage outward to its region and to globally distributed storage resources. We describe programmatic testing of various federation access modes including direct access over the wide area network and staging of remote data files to local disk. To support job-brokering decisions, a time-dependent cost-of-data-access matrix is made taking into account network performance and key site performance factors. The systems response to production-scale physics analysis workloads, either from individual end-users or ATLAS analysis services, is discussed.

acm symposium on applied computing | 2002

HTTP redirection for replica catalogue lookups in data grids

Heinz Stockinger; Andrew Hanushevsky

Data distribution and replication in distributed systems require special purpose middleware tools for accessing replicated data. Data Grids, special forms of systems distributed over wide-area networks, need to handle data management issues like distribution and replication of large amounts of data in the Tera- and Petabyte scale. Replica catalogues are used for cataloguing and locating replicated files in distributed sites all around the globe. We present a novel and administratively scalable approach for distributing a replica catalogue and resolving file location information by using HTTP redirection. HTTP redirection servers managing local file catalogues allow for greater flexibility and local file management autonomy whereas a global replica catalogue provides the necessary mapping of logical files to individual sites. By distributing the catalogues a site can autonomously move files for load balancing within a site without notifying a global replica catalogue. Our approach scales well in terms of catalogue administration to a large number of sites and file entries and thus establishes a powerful middleware service. We present the design and implementation of our catalogue redirection servers and report on promising experimental results.

ieee/wic/acm international conference on intelligent agent technology | 2005

Managing commitments in a multi agent system using passive bids

Fabrizio Furano; Andrew Hanushevsky

In this paper we describe the xrootd file access system, designed in collaboration between the Stanford Linear Accelerator Laboratory (SLAC), USA and Istituto Nazionale di Fisica Nucleare (INFN), Padova, Italy. The system was designed to provide access to over 10¿7 files representing several petabytes of experimental physics data. We analyze the agent-based query algorithm employed by this system to provide a scalable means of locating files that are scattered across a very large file server cluster. In the process we introduce the concept of a passive bidding scheme and describe its relationship to file serving commitments as a way to substantially reduce message traffic.

workshop on software and performance | 2005

Designing high performance data access systems: invited talk abstract

Andrew Hanushevsky; Bill Weeks

In this talk, I discuss practical approaches to designing high performance data access systems and how design choices impact performance measurement methodology. The xrootd data access system is used as an example of how far the performance curve can be pushed, invariable tradeoffs that occur, and when one should stop pursuing the ultimate ideal.

grid computing | 2004

A proxy service for the xrootd data server

Andrew Hanushevsky; Heinz Stockinger

In data intensive sciences like High Energy Physics, large amounts of data are typically distributed and/or replicated to several sites. Although there exist various ways to store and access this data within a Grid environment, site security policies often prohibit end user access to remote sites. These limitations are typically overcome using a proxy service that requires limited network connections to and from remote sites. We present a novel proxy server for the xrootd data server that provides access to data stored in an object-oriented data store (typically in ROOT format). Our proxy service operates in a structured peer-to-peer environment and allows for fault tolerant and reliable access to remote data.

Journal Name: J.Phys.Conf.Ser.119:072016,2008; Conference: Prepared for International Conference on Computing in High Energy and Nuclear Physics (CHEP 07), Victoria, BC, Canada, 2-7 Sep 2007 | 2008

Data access performance through parallelization and vectored access. Some results.

Fabrizio Furano; Andrew Hanushevsky

High Energy Physics data processing and analysis applications typically deal with the problem of accessing and processing data at high speed. Recent studies, development and test work have shown that the latencies due to data access can often be hidden by parallelizing them with the data processing, thus giving the ability to have applications which process remote data with a high level of efficiency. Techniques and algorithms able to reach this result have been implemented in the client side of the Scalla/xrootd system, and in this contribution we describe the results of some tests done in order to compare their performance and characteristics. These techniques, if used together with multiple streams data access, can also be effective in allowing to efficiently and transparently deal with data repositories accessible via a Wide Area Network.

high performance distributed computing | 2000

Creating large scale database servers

Jacek Becla; Andrew Hanushevsky

The BaBar experiment at the Stanford Linear Accelerator Center (SLAC) is designed to perform a high precision investigation of the decays of B-meson produced from electron-positron interactions. The experiment, started in May 1999, will generate approximately 300 TB/year of data for 10 years. All of the data will reside in objectivity databases (object oriented databases), accessible via the Advanced Multi-threaded Server (AMS). To date, over 70 TB of data have been placed in Objectivity/DB, making it one of the largest databases in the world. Providing access to such a large quantity of data through a database server is a daunting task. A full-scale testbed environment had to be developed to tune various software parameters and a fundamental change had to occur in the AMS architecture to allow it to scale past several hundred terabytes of data. Additionally, several protocol extensions had to be implemented to provide practical access to large quantities of data. The paper describes the design of the database and the changes that we needed to make in the AMS for scalability reasons and how the lessons we learned would be applicable to virtually any kind of database server seeking to operate in the Petabyte region.

high performance distributed computing | 2006

PetaCache: A memory-Based Data-Server System

Chuck Boeheim; Stephen J. Gowdy; Andrew Hanushevsky; David Leith; Randy Melen; Richard Mount; Teela Pulliam; Bill Weeks

Scientific advances depend increasingly on agility in the analysis of data, along with access to massive computation. The PetaCache project addresses the data-access issue by recognizing that the future for intense, non-sequential, data access must be based on low latency solid-state storage. The PetaCache architecture aims at a minimum unit cost, highly scalable hardware and software approach that can take advantage of existing and emerging solid-state storage technologies providing data-access latencies in the range 10-100 microseconds. A prototype system has been constructed as a cluster of 64 nodes hosting a total of one terabyte of memory. Client processors retrieve data from the data-server nodes over a switched Ethernet infrastructure using SLACs xrootd data-server software. The system is in use for performance testing, optimization and trial deployments for scientific data analysis. It also provides an excellent platform for testing new data access paradigms

Archive | 2017

High Performance Data Transfer for Distributed Data Intensive Sciences

Chin Fang; R 'Les' A. Cottrell; Andrew Hanushevsky; Wilko Kroeger; Wei Yang

We report on the development of ZX software providing high performance data transfer and encryption. The design scales in: computation power, network interfaces, and IOPS while carefully balancing the available resources. Two U.S. patent-pending algorithms help tackle data sets containing lots of small files and very large files, and provide insensitivity to network latency. It has a cluster-oriented architecture, using peer-to-peer technologies to ease deployment, operation, usage, and resource discovery. Its unique optimizations enable effective use of flash memory. Using a pair of existing data transfer nodes at SLAC and NERSC, we compared its performance to that of bbcp and GridFTP and determined that they were comparable. With a proof of concept created using two four-node clusters with multiple distributed multi-core CPUs, network interfaces and flash memory, we achieved 155Gbps memory-to-memory over a 2x100Gbps link aggregated channel and 70Gbps file-to-file with encryption over a 5000 mile 100Gbps link.

Explore More