Architecture of Distributed Data Storage for Astroparticle Physics
AA. Kryukov, A. Demichev
ARCHITECTURE OF DISTRIBUTED DATA STORAGEFOR ASTROPARTICLE PHYSICS
D.V. Skobeltsyn Institute of Nuclear Physics, M.V. LomonosovMoscow State University
E-mail address: [email protected], [email protected]
Abstract .For the successful development of the astrophysics and, accordingly,for obtaining more complete knowledge of the Universe, it is extremelyimportant to combine and comprehensively analyze information of vari-ous types (e.g., about charged cosmic particles, gamma rays, neutrinos,etc.) obtained by using divers large-scale experimental setups locatedthroughout the world. It is obvious that all kinds of activities mustbe performed continually across all stages of the data life cycle to helpsupport effective data management, in particular, the collection andstorage of data, its processing and analysis, refining the physical model,making preparations for publication, and data reprocessing taking re-finement into account. In this paper we present a general approachto construction and the architecture of a system to be able to collect,store, and provide users’ access to astrophysical data. We also suggesta new approach to the construction of a metadata registry based on theblockchain technology. . 68T30, 68P20.
Key words and phrases . Astroparticle Physics, Distributed storage, DataLife Cycle, metadata, blockchain.The work presented in Section 2 of this paper was funded by the RussianScience Foundation in the framework of the grant No. 18-41-06003, and thework presented in Section 3 was funded by the Russian Science Foundation inthe framework of the grant No. 18-11-00075 . a r X i v : . [ c s . D C ] N ov KRYUKOV, DEMICHEV Introduction
Astroparticle physics has become a data intensive science with many ter-abytes of data and often with tens of measured parameters associated to eachobservation. While 10–15 years ago there were 1–10 Tb of data per year inastrophysics, new experimental facilities generate data sets ranging in sizefrom 100s to 1000s of terabytes per year. Moreover new highly complex andmassively large datasets are expected to be produced in the next decades bynovel and more complex scientific instruments as well as results of data simu-lations needed for physical interpretation. Handling and exploring these newhigh volume data and making scientific research, poses a considerable techni-cal challenge that requires the adoption of new approaches in using computingand storage resources and in organizing scientific collaborations, scientific ed-ucation and science communication, where sophisticated public data centreswill play the key role. These trends give rise to a number of emerging issues ofbig data management. An important topic for modern science in general andastroparticle physics in particular is open science, the model of free access todata (see, e.g., [1]): data are accessible not solely to collaboration membersbut to all levels of an inquiring society, amateur or professional. This approachis especially important in the age of Big Data, when a complete analysis ofthe experimental data cannot be performed within one collaboration.The work, presented in this paper, strives to develop a key component of anopen science system, namely a distributed data storage (DDS) for astroparticlephysics, to be able to collect, store, and analyze astrophysical data having theTAIGA [2] and KASCADE [3] experiments as examples. The novelty of theproposed approach is the development of integrated solutions:(1) data combining from several astrophysical experiments in the frame-work of the model of open data (open science);(2) development and adaptation of distributed data storage algorithmsand techniques with a common meta-catalog to provide a unified in-formation space of the distributed repository;(3) development and adaptation of data transmission algorithms as wellas simultaneous data transmission from several data repositories thussignificantly reducing load time;(4) installation of the prototype system of Big Data analysis and export-ing the experimental data from KASKADE and TAIGA for testingtechnology of data life cycle management.The basic idea is the development of a Web services which will provide useraccess to a set of distributed data storage from a single entry point. Theaggregated data from distributed sources has to be generated and transmittedto users on their requests “on the fly”, bypassing the stage of their loading intoa single data storage. Thanks to this virtualization of the storage facilities, theusers see all data storages as a single one without needing to know the internal
RCHITECTURE OF DISTRIBUTED DATA STORAGE 5 structure of each storage. Therefore, our approach avoids the construction ofhuge centralized data storages.The open data model is widely used in scientific areas. This is because, onthe one hand, data acquisition is often very expensive, and on the other hand,like any experimental information, they can be reused for repeated analysis.The current growth in the volume of data received prompts us to move moreeagerly to this model of the functioning of science.One example of such a collaboration in the field of high energy physics isthe open data project in European Organization for Nuclear Research (CERN;http://opendata.cern.ch). At present, only the ATLAS experiment providesopen access to ∼ KRYUKOV, DEMICHEV
The proposed approach provides essential advantages over centralized so-lutions because of high horizontal scalability and ability for expansion of thesystem to the new data sources without changing its structure. As a result,this approach will allow to create reliable, economical and convenient for usersand administrators system with almost unlimited possibility for increasing thevolume of stored and processed astrophysical information. Though the sys-tem under development is intended for astroparticle physics, the proposedapproach can be used for other scientific areas too.Rest of the paper is organized as follows. Section 2 presents basic principlesof construction and DDS architecture. In Section 3, we describe the advancedapproach to building a meta database for DDS. Finally, Section 4 containsconclusion and future work.2.
Basic Principles of Construction and DDS Architecture
The technologies underlying the DDS system are the following: Extract-Transform-Load (ETL) [4], Cloud technology, Web services, SaaS, REST. ETLtechnology includes the three main steps to proceed the data: (1) data extrac-tion and data verification; (2) data transformation including data cleaning anddata integration; (3) data load and data aggregation.The difference between the methodology proposed in this work from theusual ETL is that the selection of data from distributed sources will be gen-erated and transmitted to users on their requests ”on the fly”, bypassing thestage of their loading into a single data storage. Such an approach will allowto serve users’ requests for obtaining the necessary data samples, which, asa rule, require only a small part of the entire available data set for analy-sis. On the other hand, such a virtualization of the access to a number ofstorages does not require the creation of huge centralized repository. A fastdata exchange is reached via caching read only filesystems CVMFS [5] andmicroservice technology with REST architecture style.The main objects of the data model in the case of astroparticle physicsare extensive air showers (EAS) recorded at experimental facilities of varioustypes. The data model of the EAS events contains all the necessary infor-mation for subsequent physical analysis and should include: the exact timeof registration of the event by the detector; time samples (histogram) of thesignal from the detector in some units; service information. In addition, dataon the structure of the experimental setup, for example, the coordinates ofthe detectors, the calibration characteristics of the detectors, other auxiliarydata, should be stored as a meta-data. Indeed, to ensure that the user canactually use the data, an extensive documentation (meta-data) on how thedata has been obtained is needed. Depending on the kind of data, this isat least a description of the detector and the reconstruction procedures em-ployed. Another important aspect is the user and access management. While
RCHITECTURE OF DISTRIBUTED DATA STORAGE 7 there is already a basic implementation of a permission based access limita-tion, a useful categorization of the users into – possibly hierarchical – groups isneeded (no administrator should manually manage privileges of single users)to effectively use it.Despite the fact that the facilities for registering the EAS are scatteredaround the world and similar to each other, they solve different physical prob-lems. Therefore, joint processing of data will allow us to establish such prop-erties of cosmic rays that can not be obtained from the data of individualexperiments. Thus, the future model should cover as many different experi-ments as possible, or provide the ability to convert data using the model beingdeveloped. The XML and JSON languages is used as the description languagefor the data model. Further, the development of algorithms for the big dataanalysis of astrophysical experiments includes, among others, machine learningmethods. To develop DDS prototype such programming languages as Python,JavaScript, C/C++ are used. The system itself is implemented as a set ofweb (micro)services using the architectural style REST. To simplify the users’work with the aggregation service, the web interfaces based on the Djangoframework, Node.js and JavaScriptit is implemented. The prototype of theDDS will be filled with the data of the KASCADE and TAIGA experimentsfor testing and system improvements.The architecture of DDS is presented in Figure 1. Astrophysical data is
Figure 1.
Architecture of DDS stored in remote storages S n (n = 1,2,3, ...). These data can be of the twotypes: data coming from astrophysical experimental facilities (possibly after KRYUKOV, DEMICHEV initial processing) and results of analysis of experimental data obtained af-ter their processing by specialized application programs. Each of the remotestorages can store data in its own format, as well as use different methods ofdata management (in particular, use different directory structure, the struc-ture of requests for operations with files, etc.). Loading of new data into eachof these real repositories is carried out using their own tools and protocolsand is not within the scope of DDS. These repositories are embedded in theDDS system by means of special adapters A n that transform a specific storageAPI and thereby standardize the requests to repositories from the DDS side.Moreover, requests can also be of the two types: requests for operations withfiles and queries for searching data by their metadata. Also, the process ofthe embedding of a new repository must be accompanied by the loading of therelevant metadata in the metadata registry described in the next section. Thebox ”Metadata” in Figure 1 is detailed in the next section and in Figure 2.Operations with files from the DDS side are supposed to be implemented inthe spirit of the Copy-on-Write (COW) technology [6] and overlay file system[7, 8], so that the main operations are retrieving and downloading files. Theability to search data by their metadata in all storages with a single userrequest is a prerequisite for providing the overall common DDS environmentso that for the user the system looked like a single virtual storage. A newapproach to the organization of the DDS metadata registry is presented in thenext section.The central module of the system is the service of data aggregation fromvarious data storages. The user accesses this service with a request to re-ceive data that meet the set of criteria defined by the metadata values. Theservice accesses the metadata registry, which in return outputs the physicaladdresses of the files with data that satisfy the specified criteria. Using theseaddresses, the aggregation service accesses the appropriate remote reposito-ries and downloads the required data files. In the user’s request, in additionto specifying data selection criteria, an indication is given of what operationsshould be performed with the received files. Depending on it, the resultingset of data files is sent to one of the aggregation service submodules, whichare implemented as embedded plug-ins and perform certain operations withdata from the received files. The Plugin Library (PL in Figure 1) is intendedfor implementation of the serialization-aggregation-deserialization process inaccordance with user requirements. They must be registered in the aggre-gation service and run in a separate container to ensure the security of thesystem. In the simplest cases, the plug-ins perform a simple merging of allfiles into one archive, sequential processing of data in files ordered by the timeof data generation, or the imposition of an additional filter, for example, onthe energy of primary particles. In addition to the preinstalled plug-ins, it ispossible to embed plug-ins developed by system users to solve their specifictasks. The aggregated data is then sent either to the user’s local computer for RCHITECTURE OF DISTRIBUTED DATA STORAGE 9 further processing and analysis, or to the application server for processing onhigh-performance external resources.Uploading files with processed data to remote storages is performed bythe user’s appropriate request to the aggregation service. This request isaccompanied by the process of publishing data in the metadata repository,which consists in describing the characteristics and history of the publisheddata (the so-called provenance metadata, see, for example, the review [9] andreferences therein). Likewise, the publishing procedure should also accompanythe uploading to the storages of files with data from experimental facilities.Loading of files with experimental data is carried out under the control ofmanagers on each storage directly or through the corresponding web serviceand is also accompanied by updating of the metadata register.The performance of the system will be determined by the performance ofthe aggregation service. In need to increase the performance, it is possible toinstall several copies of this service.3.
Metadata Registry
Metadata describing data, provide context and are vital for the accurateinterpretation and use of data by both humans and machines. Analysis ofa problem shows a backlog in research in the field of metadata and theirmanagement systems. One of the important types of metadata is provenance(lineage, pedigree) metadata. Provenance from the point of view of computerscience is a meta-information related to the history of obtaining data, startingfrom the source. Metadata of this type is designed to track the steps atwhich data were obtained, their origin, their proper storage, reproduction,for interpretation and confirmation of the scientific results obtained on theirbasis. Thus, provenance metadata (PMD) are important for organizing acorrect research workflow to obtain reliable results.The need for a PMD is especially essential when big data are supposed tobe jointly processed by several research teams as in the case of astroparticlephysics. This requires a wide and intensive exchange of data and programsfor their processing and analysis, covering long periods of time, during whichboth the data sources and the algorithms for their processing can be modified,in particular, due to changes in sensor design, refinements in calibration, oreven physical displacement. Without clear notification to all data processingparticipants this can give rise to catastrophic errors in the processing andanalysis of data. Similar consequences can have a “hidden” evolution of dataprocessing and analysis algorithms, as well as code modification, change ofversions and releases of corresponding computer programs.Our approach is directed to the development of principles and algorithms forthe formation, storage and management of the provenance metadata generatedby large scientific experiments in astroparticle physics. Although a number of projects have been implemented in recent years to create systems for thesupport and management of metadata (see, e.g., [9], [10] and refs therein),including the provenance of data, but all the implemented solutions have se-curity and metadata integrity issues, especially in the case of the open accessmodel and the possibility of using metadata by organizationally unrelated orloosely coupled research communities. This is especially true for metadata ofdata obtained as a result of processing and analysis of primary experimentaldata. For brevity, we will call all such data secondary (although some of themcan be obtained as a result of several processing steps). Indeed, the providersof primary data are a very limited number of experimental installations, andthe corresponding provenance metadata is generated automatically. The se-curity and integrity of the metadata database, including centralized one, forsuch providers can be achieved by standard methods (accounts with the ap-propriate rights, cryptographic keys, etc.). The situation with providers ofsecondary data (that is, researchers performing data processing and analysis)is significantly different. The number of such providers can be quite large anddynamically changing. Therefore, either the overhead of managing the accessrights to the metadata base will be very large, or serious problems with acci-dental or malicious distortion of provenance metadata can occur. An exampleof motivation for intentional distortion of the provenance metadata may bepriority considerations (for example, getting a fictitious priority in obtainingvaluable results from the physical analysis of astrophysical data).Fortunately, in recent years registries on the basis of blockchain technol-ogy have acquired great popularity because they have a number of importantadvantages (see, e.g., [11], [12]) which can be successfully used in DDS. In par-ticular, an efficient and secure verification of the contents of large metadatastructures is achieved by using in the framework of the blockchain technologythe Merkle tree (hash tree) [13] in which every leaf node is labelled with thehash of a data block and every non-leaf node is labeled with the cryptographichash of the labels of its child nodes. The use of the Merkle cryptographic treemakes it possible to verify whether any two versions of the PMD registry arecompatible: that is, a later version includes everything in the earlier version inthe same order, and all new entries are received after entries in the old version.This means that no records were inserted into the registry in hindsight; no en-tries were changed in the registry and the registry has never been branched orbifurcated. Such a proof of consistency is important for verifying that the PMDregistry was not damaged and obtained results are self-consistent too. Thus,such a distributed registry allows to monitor and restore the complete historyof processing and analysis of secondary data. Figure 2 shows the general ar-chitecture of the DDS metadata subsystem (the box “Metadata” in Figure 1).Its main feature is that data creators write the corresponding metadata into adistributed blockchain registry that provides security and data integrity, whileusers requesting metadata access the relational database (in read-only mode)
RCHITECTURE OF DISTRIBUTED DATA STORAGE 11 that allows for sampling based on complex filters. Transformation of metadatafrom transactions of the blockchain to the relational database is carried outby the special module (PMD transforming module).
Figure 2.
Architecture of metadata subsystem of DDS
An important question is how to provide validation of the chain of blockswith transaction records in the case of PMD registry. The use of the most pop-ular proof-of-work (PoW) method [12] on the basis of mining is very resource-intensive, and is poorly suited for management systems for provenance meta-data for the processing of scientific data. Indeed, the calculations that areperformed within the framework of PoW do not serve any useful purpose, andthis is a principle feature. It is very difficult to come up with a proof of workthat would serve a socially useful role. Therefore, if possible, it is better toabandon it. Trying to solve these problems, a community of researchers inthis field offers a variety of consensus algorithms that do not require “work”.The choice of the algorithm heavily depends on the way of access to transac-tion processing. From this point of view, blockchains are classified as follows:permissionless (public) blockchains, in which there are no restrictions on thetransaction handlers (that is, accounts that can create transaction blocks);permissioned blockchains, in which transaction processing is performed by aspecific list of accounts. Permissioned blockchains can form a more controlledand predictable environment than public blockchains. In contrast to the cryp-tocurrencies (permissionless blockchains), in the permissioned blockchains, thebuilt-in coins are usually not used. Built-in coins are required in permission-less blockchains to provide a reward for processing transactions. The creationof blocks in an permissioned blockchains in the simplest case does not requirecalculations related to the work proof algorithms. In particular, the follow-ing block creation protocol, similar to the delegated stake confirmation [12], is possible: there is a fixed number of transaction handlers, i.e., services includedin the distributed computing system (DCS); each handler owns a pair of secretand public keys, the creator of each block being determined by the mandatorydigital signature of the block that is part of the block header; handlers (DCSservices) create blocks in turn at fixed time intervals; the order of creationof blocks can be fixed or changed randomly after each processing cycle byall services included in the DCS; if the service for any reason can not createthe block within the time interval allocated to it, it skips this cycle. In theDDS architecture, the authorized parties that create and sign the blocks arethe data storages and the metadata servers. In order to maliciously change atransaction confirmed by all the services of the DDS, the attacker must gainaccess to all the secret keys of the block handlers. The above protocol is the-oretically even more reliable than the protocol based on the proof of work (inwhich case it is necessary to gain control over 51% of the network nodes for asuccessful attack [15]). It is this approach to the construction of the metadataregistry that will be implemented in the DDS.4.
Conclusion
In this paper we have presented a general approach to construction and thearchitecture of the distributed data storage (DDS) for astroparticle physicsintended for collecting, storing and distributing astrophysical data for analy-sis. The basic idea is the development of a Web services which will provideuser access to a set of distributed data storages from a single entry point.The aggregated data from distributed sources to be generated and transmit-ted to users on their requests “on the fly”, bypassing the stage of accumu-lating all the data into a single data storage. Thanks to this virtualizationof the storage facilities, the users see all data storages as a single one. Thedata formats in storages entering the system can be different, the integritybeing provided through the use of special adapters. In addition we have sug-gested a new approach to the construction of a metadata registry based on theblockchain technology. The Merkle tree will be actively used to control thedata self-consistency and integrity. The new approach guarantees protectionof metadata records from accidental or intentional distortions in the metadataregistry. This, in turn, will significantly improve the quality and reliabilityof scientific results obtained on the basis of processing and analysis of bigscientific data in a distributed computer environment.To implement the DDS, a Karlsruhe-Russian initiative was put forward thatunited the efforts of a number of scientific institutions from Russian (SINPMSU, ISU, ISDCT SB RAS) and Germany side (KIT). The DDS will opennew horizon of the computing in astroparticle physics. As a result of thedeveloped in this work approach and architecture, a distributed system forthe big astrophysical data collecting and processing will be created. A new
RCHITECTURE OF DISTRIBUTED DATA STORAGE 13 methodology for the verification of the scientific results reliability based onthe comprehensive data analysis of many types and from many sources willbe developed. This also will provide an access to open astrophysical data forwide scientific community. It is worth noting that the suggested approach canbe used not only in the astrophysics but also can be adapted to other scientificareas.
References [1] P. A. David, Industrial and Corporate Change , 571 (2004).[2] N.N. Budnev et al., Journal of Instrumentation , C09021 (2014).[3] W.D. Apel et al., Nuclear Instruments and Methods in Physics Research A620 ,202 (2010).[4] P. Vassiliadis, Int. J. of Datawarehouse and Mining , 1 (2009).[5] R. Meusel et al., Journal of Physics: Conference Series, , 012031 (2015).[6] D. M. Dhamdhere, Operating Systems: A Concept-based Approach (TataMcGraw-Hill Education, 2006).[7] J. Blomer, P. Buncic, R. Meusel,
The CernVM File System. Technical Report (http://jblomer.web.cern.ch/jblomer/cvmfstech-2.1-0.pdf, 2013)[8] J. R. Okajima.
Aufs - Advanced multi layered Unification FileSystem (http://aufs.sourceforge.net/)[9] F. Zafar et al., Journal of Network and Computer Applications, , 50 (2017).[10] J. Freire, et al., Computing in Science and Engineering (3), 11 (2008).[11] M. Iansiti and K. R. Lakhani, Harvard Business Review (1), 118 (2017).[12] BitFury Group and J. Garzik, Public versus Private Blockchains (http://bitfury.com/content/5-white-papers-research/public-vs-private-pt1-1.pdf, 2015).[13] R. C. Merkle, Lecture Notes in Computer Science , 369 (1988).[14] G. Greenspan,
Avoiding the pointless blockchain project