AA Practical Python API for Querying AFLOWLIB
Conrad W. Rosenbrock
Department of Physics and Astronomy,Brigham Young University, Provo, Utah 84602, USA. (Dated: November 5, 2018)
Abstract
Large databases such as aflowlib.org provide valuable data sources for discovering materialtrends through machine learning. Although a REST API and query language are available, thereis a learning curve associated with the AFLUX language that acts as a barrier for new users.Additionally, the data is stored using non-standard serialization formats. Here we present a high-level API that allows immediate access to the aflowlib data using standard python operators andlanguage features. It provides an easy way to integrate aflowlib data with other python materialspackages such as ase and quippy , and provides automatic deserialization into numpy arrays andpython objects. This package is available via pip install aflow . a r X i v : . [ c s . D B ] S e p . INTRODUCTION Recent advances in computation have enabled the creation of large, materials databasesusing Density Functional Theory [4, 5]. aflowlib [2, 3] is one of the largest with morethan 1.7M material compounds (as of September 2017, http://aflowlib.org/). Recently, theAFLUX search API [7] was introduced to provide improved access to the data in a uniformrequest format via REST [8]. Because the API is based on REST, it allows access to thedata from a variety of programming languages through standard libraries.Unfortunately, the data for material properties and calculation parameters are not storedin a standard format. While the custom serialization format is documented, each propertymust be parsed individually to access standard formats (such as numpy [9] arrays for python).Thus, a researcher attempting to access aflowlib data for the first time must 1) read andunderstand the AFLUX request format; 2) lookup the documentation for the properties ofinterest and 3) deserialize them appropriately. aflowlib fields are stored as strings of valuesthat may be comma-separated, colon-separated or have a more complex structure (such asfor the kpoints property). Deserialization refers to the transformation of these strings intohigh-level objects such as dictionaries or arrays. Even though such tasks are well within theabilities of a computational scientist, they are not tasks that leverage scientific expertise.Here, we introduce a high-level python API that abstracts the request and deserializationtasks so that there is virtually no access barrier for newcomers to the aflowlib database.
II. REQUEST EXAMPLE
We begin with the example from the AFLUX paper [7], translated into python using thenew package ( aflow , https://pypi.python.org/pypi/aflow):2 from aflow import * result = search(batch_size=20 ).select(K.agl_thermal_conductivity_300K ).filter(K.Egap > 6 ).orderby(K.agl_thermal_conductivity_300K, True) for entry in result: print (entry.Egap, entry.agl_thermal_conductivity_300K) Listing 1: Example of searching the aflowlib database for materials with large band gapsand large thermal conductivity. This is the same example given in the AFLUX paper [7].The search function returns an object that can be chained continually to apply multiplefilters ( filter ), select additional properties ( select ), apply exclusions ( exclude ) or orderthe results ( orderby ). aflowlib provides more than 110 keywords for various materialproperties, calculation parameters, etc. Somebody new to aflowlib may not initially knowwhat is available and would have to pull up the documentation online. For ease-of-use, theAPI provides all keywords supported by aflowlib as attributes of aflow.K . For IDEs thatsupport auto-complete, this means that researchers can dynamically see what properties areavailable and view descriptions of them by typing K.
III. API FEATURES
The search function described above produces an object that supports iteration over thedatabase entries returned from aflowlib . Normally, AFLUX requires desired properties tobe specified as part of the request URL. Our python API provides “lazy evaluation” function-ality so that database entries from an existing result can be queried for additional properties3sing attributes on the python database entry objects. All requests happen transparently inthe background and are cached to optimize performance.
A. Slicing, Indexing and Deserialization part = result[21:25] result[55] result[55].positions_cartesian Listing 2: The python API supports arbitrary slicing in the result set and lazy evaluationof properties. This means that properties can be fetched from the aflowlib database even ifthey weren’t part of the original request URL.Notice that the Cartesian positions are returned as a numpy array automatically becausethe API handles deserialization. 4 . Operators
The filter method of the search object filters results in aflowlib using standard oper-ators. We overloaded these operators in python to provide an intuitive interface. These arebriefly described here:1. > and < behave as expected. However, these are overloaded for string comparisons inthe spirit of the AFLUX endpoint. For example author < "curtarolo" will match“*curtarolo” and author > "curtarolo" will match “curtarolo*”.2. == behaves as expected for all keywords.3. % allows for string searches. author % "curtarolo" matches “*curtarolo*”.4. ~ inverts the filter (equivalent to a boolean “not”).5. & is the logical “and” between two conditions.6. | is the logical “or” between two conditions.Using these operators, it is possible to form complex queries using intuitive notation: filter((K.Egap > 0) & (K.Egap < 2)) | ((K.Egap > 5) & (K.Egap < 7)) Listing 3: Example of chaining complex filters using the overloaded operators in the API.Because we overload the bit-wise & and |, extra parentheses are required around the numericoperator expressions.
C. Templated Generation
The supported keywords and corresponding properties on the database entry objects aregenerated via requests to the aflowlib schema, which includes documentation. This allowsany additions or modifications at aflowlib to be captured automatically by regeneratingthe python API from template. 5 . Integration with ASE and quippy
The Atomic Simulation Environment (ASE) [1, 6] provides a high-level API for workingwith materials, calculating their properties and performing other high-level transformations. quippy ase with additionalroutines that make it easier to work with collections of material configurations and performadditional tasks (such as crack propagation simulations, calculating descriptors, etc.).Each database entry object in the aflow
API also provides an atoms method that con-structs an atoms object for ase or quippy . This makes it seamless to integrate the aflowlib data into existing workflows. IV. INSTALLATION AND API DOCUMENTATION
The package is available on the python package index and can be installed with: pip install aflow
API Documentation auto-generated by sphinx is available at: https://rosenbrockc.github.io/aflow/
The source code has full test coverage and continuous integration, and is hosted at https://github.com/rosenbrockc/aflowaflow works in both python 2 and python 3. [1] S. R. Bahn and K. W. Jacobsen. An object-oriented scripting interface to a legacy electronicstructure code.
Comput. Sci. Eng. , 4(3):56–66, MAY-JUN 2002.[2] S. Curtarolo, W. Setyawan, G. L. Hart, M. Jahnatek, R. V. Chepulskii, R. H. Taylor, S. Wang,J. Xue, K. Yang, O. Levy, M. J. Mehl, H. T. Stokes, D. O. Demchenko, and D. Morgan. Aflow:An automatic framework for high-throughput materials discovery.
Computational MaterialsScience , 58:218 – 226, 2012.[3] S. Curtarolo, W. Setyawan, S. Wang, J. Xue, K. Yang, R. H. Taylor, L. J. Nelson, G. L.Hart, S. Sanvito, M. Buongiorno-Nardelli, N. Mingo, and O. Levy. Aflowlib.org: A distributed aterials properties repository from high-throughput ab initio calculations. ComputationalMaterials Science , 58:227 – 235, 2012.[4] P. Hohenberg and W. Kohn. Inhomogeneous electron gas.
Phys. Rev. , 136:B864–B871, Nov1964.[5] W. Kohn and L. J. Sham. Self-consistent equations including exchange and correlation effects.
Phys. Rev. , 140:A1133–A1138, Nov 1965.[6] A. H. Larsen, J. J. Mortensen, J. Blomqvist, I. E. Castelli, R. Christensen, M. DuÅĆak,J. Friis, M. N. Groves, B. Hammer, C. Hargus, E. D. Hermes, P. C. Jennings, P. B. Jensen,J. Kermode, J. R. Kitchin, E. L. Kolsbjerg, J. Kubal, K. Kaasbjerg, S. Lysgaard, J. B. Maron-sson, T. Maxson, T. Olsen, L. Pastewka, A. Peterson, C. Rostgaard, J. SchiÃÿtz, O. SchÃijtt,M. Strange, K. S. Thygesen, T. Vegge, L. Vilhelmsen, M. Walter, Z. Zeng, and K. W. Jacob-sen. The atomic simulation environment-a python library for working with atoms.
Journal ofPhysics: Condensed Matter , 29(27):273002, 2017.[7] F. Rose, C. Toher, E. Gossett, C. Oses, M. B. Nardelli, M. Fornari, and S. Curtarolo. AFLUX:The LUX materials search API for the AFLOW data repositories.
Computational MaterialsScience , 137 (Supplement C):362 – 370, 2017.[8] R. H. Taylor, F. Rose, C. Toher, O. Levy, K. Yang, M. B. Nardelli, and S. Curtarolo. ARESTful API for exchanging materials data in the AFLOWLIB.org consortium.
ComputationalMaterials Science , 93 (Supplement C):178 – 192, 2014.[9] S. van der Walt, S. C. Colbert, and G. Varoquaux. The numpy array: A structure for efficientnumerical computation.
Computing in Science Engineering , 13(2):22–30, March 2011.[10] Csányi, Gábor, Steven Winfield, J. Kermode, A. Comisso, A. De Vita, Noam Bernstein, andMike C. Payne Expressive programming for computational physics in Fortran 95+.
IoPComput. Phys. Newsletter , Spring (2007)., Spring (2007).