AstroCloud, a Cyber-Infrastructure for Astronomy Research: Data Access and Interoperability
Dongwei Fan, Boliang He, Jian Xiao, Shanshan Li, Changhua Li, Chenzhou Cui, Ce Yu, Zhi Hong, Shucheng Yin, Chuanjun Wang, Zihuang Cao, Yufeng Fan, Linying Mi, Wanghui Wan, Jianguo Wang
aa r X i v : . [ a s t r o - ph . I M ] N ov **Volume Title**ASP Conference Series, Vol. **Volume Number****Author** c (cid:13) **Copyright Year** Astronomical Society of the Pacific AstroCloud, a Cyber-Infrastructure for Astronomy Research: DataAccess and Interoperability
Dongwei Fan , Boliang He , Jian Xiao , Shanshan Li , Changhua Li ,Chenzhou Cui , Ce Yu , Zhi Hong , Shucheng Yin , Chuanjun Wang ,Zihuang Cao , Yufeng Fan , Linying Mi , Wanghui Wan , , Jianguo Wang National Astronomical Observatories, Chinese Academy of Sciences(CAS),20A Datun Road, Beijing 100012, China Tianjin University, 92 Weijin Road, Tianjin 300072, China Yunnan Astronomical Observatory, CAS, P.0.Box110, Kunming 650011, China Central China Normal University, 152 Luoyu Road, Wuhan 430079, China
Abstract.
Data access and interoperability module connects the observation pro-posals, data, virtual machines and software. According to the unique identifier of PI(principal investigator), an email address or an internal ID, data can be collected byPIs proposals, or by the search interfaces, e.g. conesearch. Files associated with thesearched results could be easily transported to cloud storages, including the storage withvirtual machines, or several commercial platforms like Dropbox. Benefitted from thestandards of IVOA (International Observatories Alliance), VOTable formatted search-ing result could be sent to kinds of VO software. Latter endeavor will try to integratemore data and connect archives and some other astronomical resources.
1. Introduction
Astronomy archives are more and more bigger. Some tasks become di ffi cult to be doneon personal computer, due to the limitation of the storage, network or the computationcapability. AstroCloud of the China-VO tries to put all the resources, includingcatalogs database, files, software, virtualized computers and so on to the cloud platform.Then astronomers can complete their job from the observation proposal to paper publishon the cloud, without concerning how to get the data, where to backup kinds of data, orhow to configure the powerful hardware et al. The resource that astronomers need willbe collected / mounted to their own virtual machine with their familiar operation system,then all the work could be done on the cloud via the remote connection.Data access and interoperability module is a bridge to connect the resources onthe cloud. A typical observation starts from the proposal, then the observed data arestored by the archiving system. Astrnomers can find the data on the data access moduleand transport files to their own virtual machine or just download the package to local China-VO AstroCloud http://astrocloud.china-vo.org AstroCloud Data http://explore.china-vo.org
Figure 1. Architecture of the AstrolCloud. The general role of the data access andinteroperability module are marked by the dotted rectangles. computer. Since not all the data are published immediately, the data access module alsoprovide an access control mechanism. Other astronomers could apply for the data andthe PI of the data could check and determine whether to approve or not.The interoperability module provides another ability that connects the web pageinterface with local softwares which support IVOA-SAMP protocol (Taylor et al. 2012),e.g. Aladin, Topcat. IVOA Simple Conesearch and Simple Spectral Access Protocolinterface are also provided. Some of the interface could be found on the IVOA registers.In section 2, architecture of the data access and interoperability module is pro-vided. This is followed by the diagram of how data flows in the Astrocloud in section 3.The data access control mechanism is described in section 4. Conclusion is presentedin section 5.
2. Architecture
AstroCloud contains virtual machines, observation proposal submission, data archiv-ing, HPC status & applications and some other subsystems. The role of data accessand interoperability module in the AstroCloud is providing data access service, discov-ery interface and some processes like visualization, as Fig. 1 shows. The target is tosupply an easy way to help astronomers to find the data they want, including their ownobservation data.In order to achieve the objective, we designed a structure in Fig. 2. The essentialresource is the observation data and other astronomy archives. These resources shouldnot be directly accessed by users, due to the data protection requirement. Hence dataaccess under control is needed, and also the query logs and statistics. Several IVOAprotocols are supported as well. On the webpage interface, the web-SAMP could sendthe searched objects list to SAMP supported clients, e.g. Aladin, Topcat. The list arealso formatted in VOTable according to the Simple Conesearch or the Simple SpectraAccess Protocol. This is one part of what we call “interoperability”.stroCloud,aCyber-Infrastructure forAstronomyResearch: DataAccessandInteroperability 3
Figure 2. Architecture diagram of the data access and interoperability module.Several IVOA protocols are supported in this module, so users can invoke VO-drivenapplications to access the data besides the webpage interface.Figure 3. Data can be easily and very fast transported inside the cloud platform,while downloading to local disk is also supported. When choosing download datato the cloud, data will soon present on the created virtual machine or the web client.Thus astronomers could just handle the files on the virtual machine without spendlots of time waiting for the downloading or worrying about the local storage andcalculation capacity.
3. From Proposal to Personal Cloud Storage
The whole AstroCloud system maintain a user table and single sign-on mechanism,which is very beneficial to exchange information among physical individual modules.The unique ID of a user is the key to connect all the resources on the AstroCloud, fromobservation proposal to the archiving data, to personal private cloud storage. In thedata access module, the ID is used to transport the searched files to the cloud storage.Fig. 3 demonstrates that the files could be seen on the cloud storage web page, or evenin the virtual machine that user created. The cloud storage directories are automaticallymounted to the virtual machine. Therefore, astronomers do not have to download thehuge amount of files which take very long time via internet and occupy the whole localdisk. Since the storage space and the data access module are on the cloud, sometimeseven in the same hardware, the files transmission speed is much more faster than normalconnection. Then astronomers could analysis the data on the virtual machine. This isthe way we try to solve the data flood in astronomy. “If there is too much data to movearound, take the analysis to the data.”(Gray & Szalay 2003) Dongwei Fanet al.
Figure 4. Applicant could make a request by the application page at left figure.Then data owner could check the application and grant the authority or not.
4. Data Access Control
Observation data normally have a protection period, e.g. one year or 18 months. Duringthis dates, the data only belongs to the observation proposal PI. The simple way is onlyallowed the PI to access the data, or even hide the data from public. But telescopemanagers would like to publish the data to promote the instruments’ influence. Onesolution is to make a mechanism: if somebody interest in the data, they can write anapplication to access the data. Then the requirements would be delivered to the dataowner, i.e. the PI. If the owner accepted the requests, the applicant could access thedata. The simple process shows in Fig. 4.
5. Conclusion
Big data is not only the trend in astronomy, it is influencing the research. More andmore work become di ffi cult to be handled on personal computation devices. To workon the cloud would be a solution, and the AstroCloud of China-VO is one of the attempt.The data access and interoperability module of AstroCloud tries to collect the requireddata and put them to the cloud storage. To connect data from the observation proposalsto astronomers’ personal cloud storage, then interoperate with other modules / tools. Acknowledgments.
This paper is funded by National Natural Science Foundationof China (U1231108), Ministry of Science and Technology of China (2012FY120500),Chinese Academy of Sciences (XXH12503-05-05). Data resources are supported byChinese Astronomical Data Center.