aa r X i v : . [ a s t r o - ph . I M ] D ec TOPCAT’s TAP Client
M. B. Taylor
H. H. Wills Physics Laboratory, University of Bristol, U.K. [email protected]
Abstract.
TAP, the Table Access Protocol, is a Virtual Observatory (VO) protocolfor executing queries in remote relational databases using ADQL, an SQL-like querylanguage. It is one of the most powerful components of the VO, but also one of themost complex to use, with an extensive stack of associated standards.We present here recent improvements to the client and GUI for interacting withTAP services from the TOPCAT table analysis tool. As well as managing query sub-mission and result retrieval, the GUI attempts to provide the user with as much help aspossible in locating services, understanding service metadata and capabilities, and con-structing correct and useful ADQL queries. The implementation and design are, unlikeprevious versions, both usable and performant even for the largest TAP services.
1. Introduction
TAP, the Table Access Protocol (Dowler et al. 2011), is a Virtual Observatory standardthat allows clients to execute custom queries on remote databases and retrieve the re-sults for local use. Although TAP allows in principle for a variety of query languages, inthis paper we restrict consideration to ADQL, the Astronomical Data Query Language,which is the only one mandated by TAP. ADQL is essentially a standardised dialectof the SQL SELECT statement, and as such allows complex queries against a rela-tional database, including such functions as row selections, column combinations andmulti-table joins. It also defines a number of optional geometrical functions to assist inconstraining results on the celestial sphere. The flexibility a ff orded by this frameworkpermits very powerful queries to be made against potentially very large and complexremote datasets. In particular it is far more capable than the “Simple” VO protocols forcatalogue, image or spectral access that preceded it.The price for this flexibility is however that TAP is a complex beast, requiring useof a stack of related IVOA standards including RegTAP, VOResource, VODataService,TAPRegExt, VOTable, DALI and UWS, as well as TAP and ADQL themselves. Thehard part of the problem is not writing software to communicate with external servicesusing these standards, but presenting an interface to the astronomy user that moderatesthe apparent complexity.TOPCAT (Taylor 2005) is a desktop GUI application, widely used by astronomers,for analysis of catalogues and other tabular data. Since TAP services are a prime po-tential source of tables for local analysis, it has an integrated TAP client as one of its
2. Service Discovery
In order to execute a TAP query, the user must first decide which TAP service to use; attime of writing around 100 are listed in the IVOA Registry, and this number is expectedto rise. Di ff erent services have di ff erent data holdings, from large general services suchas TAPVizieR (Landais et al. 2013) with 30 000 data tables, to much more focussedones like the Chandra Source Catalogue at Harvard with only 3.Typically, the astronomer knows which data sets she wants to use (e.g. WISE orCALIFA) rather than the name or location of the service hosting them (e.g. HEASARCor GAVO DC). When selecting a TAP service to query therefore, it is important tobe able to locate services by searching against table metadata such as table name anddescription, rather than service metadata, which may or may not contain table-leveldetail.The standard way to locate VO services is to use the IVOA Registry, a white pagesfor data services and other VO resources (Demleitner et al. 2015). This can be usedto discover TAP services, but unfortunately does not currently contain su ffi cient table-level information to identify them by table metadata as required. To work round thislimitation, TOPCAT by default uses instead of the Registry a separate, non-standard,database called GloTS. GloTS, the Global TAP Schema, is maintained by the GermanAstrophysical Virtual Observatory (GAVO), being updated automatically by crawlingregistered TAP services to retrieve the detailed table-level metadata they declare, andexposing the results via a TAP service. This enables just the kind of queries that TOP-CAT requires to support its table-oriented service discovery user interface.Alternative approaches to providing table-level metadata within the Registry areunder investigation within the IVOA, and may enable similar functionality by use ofstandardised service interfaces in the future. The service discovery functionality is im-plemented as a pluggable layer within TOPCAT, to provide a platform for experiment-ing with alternative ways to perform such searches. A configuration option is presentin the GUI to switch between di ff erent service discovery backends, though this is onlycurrently useful as a platform for prototyping registry experiments.
3. Metadata Acquisition and Display
Having selected a TAP service to use, the user needs to understand exactly what is pro-vided by the service in order to be able to construct useful queries, and so TOPCAT hasto make this information available in the GUI. The most important items constitutingthis service metadata are the descriptions of available tables, and the descriptions ofcolumns in each table. Although this information may be available from the RegistryOPCAT’sTAPClient 3or GloTS, the most reliable way to obtain it is from the service itself. TAP defines twobasic ways for services to provide this self-description: as an XML document servedfrom the /tables endpoint and as the content of some system-level tables within theexposed database itself, in the reserved TAP_SCHEMA namespace.For most services, it is straightforward and e ffi cient to acquire this metadata bydownloading it in one go when the service is first contacted. It can then be storedwholesale in the client and presented to the user as required; for instance when a tableis selected in the GUI, names and descriptions for the columns of that table can be dis-played. However, for the small number of services which serve very many tables, thisapproach is no longer practical. The metadata for TAPVizieR’s 30 000 tables amountsto around 100 Mbyte, only a tiny proportion of which a user will want to examine in asingle session, so downloading the whole thing pre-emptively is not desirable.To address this, TOPCAT uses a pluggable metadata acquisition layer, with di ff er-ent backend strategies for di ff erent services. By default, an adaptive strategy is used: ifthere are fewer than 5000 columns in total, the metadata is downloaded in one go fromthe XML document, but if there are more, then TAP_SCHEMA queries are made inthe first instance to acquire just a list of tables, and the more bulky column metadata isretrieved as required using subsequent per-table TAP_SCHEMA queries only for thosetables the user expresses an interest in. Again, there are expert options in the GUI forswitching between metadata acquisition strategies as required, but normal users are notexpected to need to be aware of these.Having acquired the service metadata, the application must make it available tothe user through the GUI. This is done using the combination of a selectable tree oftables, with an adjacent panel providing more detail on the currently selected one. Sincemuch information is available and screen real estate is at a premium, the detail panelcontains a number of tabs describing di ff erent aspects of the selected table: schema,table name / description, column list, and foreign key information.Scalability is an issue for GUI usability as well as data transfer bandwidth. Withpotentially thousands of tables from which to choose, browsing a scrollable list is nota useful interface, especially when tables have unintuitive names. A text entry fieldis therefore provided to restrict the content of the currently displayed list of tables tothose whose name and / or description matches one or more given search terms. The listis filtered instantly as search terms are typed.
4. Query Preparation
Once in possession of the available information about the currently targeted service, theuser has to assemble the text of an ADQL SELECT statement specifying the desireddatabase operation. Most astronomers are not, at least initially, fluent in ADQL orSQL, so need some assistance with the syntax. One possible approach is to providea graphical query builder that constructs a SELECT statement from a series of GUIinteractions (e.g. selecting tables, columns and comparison operations from drop-downmenus). That can be e ff ective for simple queries, but is di ffi cult to generalise to moresophisticated or flexible operations. TOPCAT instead takes the approach of providinga libary of example queries that a user can use, edit, adapt and learn from.These examples are available from a menu that fills in the ADQL text ready tosubmit, and fall into three categories. Standard examples use standard TAP features,and are constructed by TOPCAT with reference to metadata retrieved from the current M.B.Taylorservice, so can be used as-is to make working (though not necessarily useful) querieson the database at hand.
Data Model-Specific examples consist of static pre-writtenqueries specific to particular data models for which the current service declares support,if any; for instance if a service declares support for the well-known ObsTAP data model,implying that the ivoa.obscore table is present with a well-known column structure,then TOPCAT will o ff er a list of queries that make sense for that table. Service-Provided examples consist of ADQL text retrieved from the service itself in a standard formatvia the /examples endpoint, and can thus provide ready-to-run queries exploiting theparticular structure and capabilities of the current database. This final category relieson recent enhancements to the TAP protocol stack, for which the details are still underdiscussion, but presents a powerful way for data providers to assist end users in makingbest use of the archived data.Finally, a
Hints tab alongside the metadata display provides a very basic “cheatsheet” with reminders about SELECT statement syntax and pointers to a few externalADQL resources.When actually assembling the ADQL query for submission, the user types into atext entry panel. Queries are validated as they are entered, with reference to ADQLsyntax and service-specific information such as the list of available tables, columns anduser-defined functions, so that syntax errors can be highlighted. Other features of thetext entry panel include undo / redo, multiple tabs, and limited support for pasting incolumn and table names selected from the GUI.
5. Conclusions
It is hoped that TOPCAT’s enhanced TAP user interface, alongside parallel develop-ments in other available TAP clients, evolution of associated standards, and continuingimprovements in service implementations, will lead to more widespread use of TAP fore ff ective exploitation of the vast and increasing amount of astronomical data which isexposed using this protocol. Acknowledgments.
This work has benefitted from assistance from many membersof the TAP / IVOA community; special thanks are due to Markus Demleitner (GloTS& expert on all things TAP) and Grégory Mantelet (ADQL parser library), both atARI Heidelberg. Some of the ideas here were inspired by other TAP clients, includingSeleste (Van Stone et al. 2013) and TAPHandle (Michel et al. 2014).
References
Demleitner, M., Harrison, P., Taylor, M., & Normand, J. 2015, Astronomy and Computing, 10,88.
Dowler, P., Rixon, G., & Tody, D. 2011, ArXiv e-prints.1110.0497