[PDF] Component Specification in the Cactus Framework: The Cactus Configuration Language

Abstract

Component frameworks are complex systems that rely on many layers of abstraction to function properly. One essential requirement is a consistent means of describing each individual component and how it relates to both other components and the whole framework. As component frameworks are designed to be flexible by nature, the description method should be simultaneously powerful, lead to efficient code, and be easy to use, so that new users can quickly adapt their own code to work with the framework. In this paper, we discuss the Cactus Configuration Language (CCL) which is used to describe components ("thorns'') in the Cactus Framework. The CCL provides a description language for the variables, parameters, functions, scheduling and compilation of a component and includes concepts such as interface and implementation which allow thorns providing the same capabilities to be easily interchanged. We include several application examples which illustrate how community toolkits use the CCL and Cactus and identify needed additions to the language.

Full PDF

CComponent Speciﬁcation in the Cactus Framework:The Cactus Conﬁguration Language

Gabrielle Allen

Center for Computation & TechnologyDepartment of Computer ScienceLouisiana State UniversityBaton Rouge, Louisiana 70803Email: [email protected]

Tom Goodale

Frank L¨ofﬂer

Center for Computation & TechnologyLouisiana State UniversityBaton Rouge, Louisiana 70803Email: [email protected]

David Rideout

Perimeter Institute for Theoretical Physics31 Caroline St. N.Waterloo, Ontario N2L 2Y5CanadaEmail: [email protected]

Erik Schnetter

Center for Computation & TechnologyDepartment of Physics & AstronomyLouisiana State UniversityBaton Rouge, Louisiana 70803Email: [email protected]

Eric L. Seidel

City College of New YorkNew York, New York 10031Center for Computation & TechnologyLouisiana State UniversityBaton Rouge, Louisiana 70803Email: [email protected]

Abstract —Component frameworks are complex systems thatrely on many layers of abstraction to function properly. Oneessential requirement is a consistent means of describing eachindividual component and how it relates to both other compo-nents and the whole framework. As component frameworks aredesigned to be ﬂexible by nature, the description method shouldbe simultaneously powerful, lead to efﬁcient code, and be easyto use, so that new users can quickly adapt their own code towork with the framework.In this paper, we discuss the Cactus Conﬁguration Language(CCL) which is used to describe components (“thorns”) in theCactus Framework. The CCL provides a description language forthe variables, parameters, functions, scheduling and compilationof a component and includes concepts such as interface and imple-mentation which allow thorns providing the same capabilities tobe easily interchanged. We include several application exampleswhich illustrate how community toolkits use the CCL and Cactusand identify needed additions to the language.

I. I

NTRODUCTION

Component frameworks provide a mechanism for efﬁcientlydeveloping and deploying scientiﬁc applications in high–performance computing environments. Such frameworks pro-vide for efﬁcient code reuse, community code developmentand abstraction of specialized capabilities such as adaptivemesh reﬁnement or parallel linear solvers.Component speciﬁcation is obviously an important part ofcomponent frameworks with the speciﬁcation providing thedeﬁnition of the interfaces between components, including forexample a description of the variables and functions both pro-vided by and required by the different components. The choiceof speciﬁcation language impacts the scope of capabilities ofcomponents which can be implemented and exposed as well asthe ease of use of components by both developers and users. Ifthe component speciﬁcation is too general it can hinder easysharing of components, and if the speciﬁcation is too narrow it will reduce the potential functionality of components andthus the application.This paper describes the current speciﬁcation of compo-nents in the Cactus Framework via the Cactus ConﬁgurationLanguage or CCL. Cactus is an open–source componentframework designed for collaborative development of com-plex codes in high–performance computing environments. Thelargest user base for Cactus is in the ﬁeld of numericalrelativity where, for example, over 100 components are nowshared among over ﬁfteen different groups through the Ein-stein Toolkit [17] (Section IV-C). In other application areas,Cactus is used by researchers in ﬁelds including quantumgravity (Section IV-B), computational ﬂuid dynamics, coastalmodeling and computer science.However, as simulation codes grow more complex, forexample requiring multi–physics capabilities, there is now aneed to extend or possibly re-architect the CCL to react to newfeatures required by Cactus application developers. Further,as the number of Cactus components grow, an increasingproblem is how to provide user tools for component assembly,application debugging, and veriﬁcation and validation. Thispaper provides a review of the CCL focusing on how itdescribes the interactions between thorns and implications forthe development of user tools.In Section II we describe the architecture of the CactusFramework that particularly relates to its handling and orches-tration of components, including the Cactus Scheduler, mem-ory allocation, data types provided by Cactus, and existing andplanned tools for component management. In Section III wedescribe the Cactus Thorn conﬁguration ﬁles using the CactusConﬁguration Language, the methods of thorn interaction, andbuilt–in testing options. In Section IV we examine several dif-ferent Cactus applications, the WaveToy Demo, a community a r X i v : . [ c s . D C ] S e p ig. 1. Cactus components are called thorns and the integrating frameworkis called the ﬂesh . The interface between thorns and the ﬂesh is providedby a set of conﬁguration ﬁles writing in the Cactus Conﬁguration Language(CCL). toolkit for quantum gravity, and the Einstein Toolkit, in respectto the dependence among components enforced by the CCL.In Section V we describe some “missing” features of the CCLthat will need to be addressed for future Cactus applications.II. C ACTUS

The Cactus Framework [16], [3] is an open source, modular,portable programming environment for HPC computing. Itwas designed and written speciﬁcally to enable scientists andengineers to develop and perform the large–scale simulationsneeded for modern scientiﬁc discoveries across a broad rangeof disciplines. Cactus is well suited for use in large, interna-tional research collaborations.

A. Architecture

Cactus is a component framework. Its components arecalled thorns whereas the framework itself is called the ﬂesh (Figure 1). The ﬂesh is the core of Cactus, it provides theAPIs for thorns to communicate with each other, and performsa number of administrative tasks at build–time and run–time.Cactus depends on three conﬁguration ﬁles and two optionalﬁles provided by each thorn to direct these tasks and provideinter–thorn APIs. These ﬁles are: • interface.ccl Deﬁnes the thorn interface and inher-itance along with variables and aliased functions. • param.ccl Deﬁnes parameters which can be speciﬁedin a Cactus parameter ﬁle and are set at the start of aCactus run. • schedule.ccl Deﬁnes when and how scheduled func-tions provided by thorns should be invoked by the Cactusscheduler. • configuration.ccl (optional) Deﬁnes build–timedependencies in terms of provided and required capabil-ities, e.g. interfaces to Cactus–external libraries. Conﬁguration Files (CCL)

Interface, Parameters, Schedule, Conﬁguration

Source Code

Fortran/C/C++, include ﬁles, Makeﬁle

Veriﬁcation & Validation

Testsuites

Documentation

Thorn guide, Examples, Metadata

Cactus Thorn

Fig. 2. Cactus thorns are comprised of source code, documentation, test–suites for regression testing, along with a set of conﬁguration ﬁles writtenin the Cactus Conﬁguration Language (CCL) which deﬁne the interface withother thorns and the Cactus ﬂesh. • test.ccl (optional) Deﬁnes how to test a thorn’scorrectness via regression tests.The ﬂesh is responsible for parsing the conﬁguration ﬁles atbuild-time, generating source code to instantiate the differentrequired thorn variables, parameters and functions, as well aschecking required thorn dependencies.At run-time the ﬂesh parses a user provided parameter ﬁlewhich deﬁnes which thorns are required and provides key-value pairs of parameter assignments. The ﬂesh then activatesonly the required thorns, sets the given parameters, usingdefault values for parameters which are not speciﬁed in theparameter ﬁle, and creates the schedule of which functionsprovided by the activated thorns to run at which time.The Cactus ﬂesh provides the main iteration loop for simu-lations (although this can be overloaded by any thorn) but doesnot handle memory allocation for variables or parallelization;this is performed by a driver thorn. The ﬂesh performs nocomputation of its own — this is all done by thorns. It simplyorchestrates the computations deﬁned by the thorns.The thorns are the basic modules of Cactus. They are largelyindependent of each other and communicate via calls to theFlesh API. Thorns are collected into logical groupings called arrangements , This is not strictly required, but strongly recom-mended to aid with their organization. An important conceptis that of an interface . Thorns do not deﬁne relationshipswith other speciﬁc thorns, nor do they communicate directlywith other thorns. Instead they deﬁne relationships with aninterface, which may be provided by multiple thorns. Thisdistinction exists so that thorns providing the same interfacemay be independently swapped without affecting any otherthorns. Interfaces in Cactus are fairly similar to abstract classesin Java or virtual base classes in C++, with the importantdistinction that in Cactus the interface is not explicitly deﬁnedanywhere outside of the thorn.This ability to choose among multiple thorns providing thesame interface is important for introducing new capabilities in Note that this parameter ﬁle is different from the ﬁle param.ccl whichis used to deﬁne which parameters exist, while the former is used to assignvalues to those parameters at run-time. actus with minimal changes to other thorns, so that differentresearch groups can implement their own particular solver forsome problem, yet still take advantage of the large amountof community thorns. For example, the original driver thornfor Cactus which handles domain decomposition and messagepassing is a unigrid driver called

PUGH . More recently, a driverthorn which implements adaptive mesh reﬁnement (AMR)was developed called

Carpet [8], [7], [1]. Carpet makesit possible for simulations to run with multiple levels ofmesh reﬁnement, which can be used to achieve great accuracycompared to unigrid simulations. Both

PUGH and

Carpet provide the interface driver and application thorns canrelatively straightforwardly migrate from unigrid to using theadvanced AMR thorn.Thorns providing the same interface may also be compiledtogether in the same executable, with the user choosing in theparameter ﬁle, at run-time, which implementation to use. Thisallows users to switch among various thorns without havingto recompile Cactus.Thorns include a doc directory which provides the doc-umentation for the thorn in L A TEX format. This allows usersto build one single reference guide to all thorns via a simplecommand.

B. Scheduling

The Cactus ﬂesh provides a rule–based scheduler. Thornfunctions can be speciﬁed to be called by the scheduler atdifferent points in the simulation, in standard time bins. Ascheduled routine can be requested to occur before/after otherfunctions in the same timebin. It is also possible for thornsto deﬁne their own schedule groups , which may be thoughtof as a user–deﬁned time bin. The speciﬁcation of scheduledfunctions in thorns is described in Section III-A2. At run time,the ﬂesh builds a schedule tree and provides an API that allowsthis schedule tree to be traversed such that the functions arecalled in their desired order. Cactus provides the argument listsfor calling these scheduled functions, and provides informationabout which variables need storage allocated and when.

C. Memory Allocation

Memory allocation for Cactus variables is handled bythe driver thorn, using information from the schedule andinterface conﬁguration ﬁles. Memory can be allocated forvariables throughout the simulation, or allocated only duringthe execution of a function or schedule group. This providesa mechanism for reducing and tracking the memory footprintof a simulation. Incorrect memory allocation and the use ofuninitialized variables can easily lead to bugs in codes whichare hard to detect. Various Cactus thorns provide tools whichhelp locate such errors, for example by initializing variablesto have a value of

NaN and then checking for these valuesduring the simulation. A full explanation of

NaN may be found online: http://en.wikipedia.org/wiki/NaN

D. Data Types

Cactus deﬁnes its own data types for thorns. These datatypes include standard integer and real types, and a complexnumber data type. Supported Cactus data types include

Byte,Int, Real, Complex, String, Keyword and Pointer , but the useof some of them is restricted (e.g.

Keyword and

String toparameters). An optional trailing number to the type can beused to set the size in bytes, where applicable. The motivationto provide Cactus data types comes from the fact that thereis not a standard size for data types across all platforms.Providing Cactus-speciﬁc data types allows the framework tomaintain an explicit variable size across all platforms, andprovides maximum code portability. In addition it allows usersto select the size of these standard types at build time acrossall thorns.

E. Tools

As a distributed software framework, Cactus can make useof some additional tools to assemble the code and manage thesimulations. Oftentimes each arrangement of thorns resides inits own source control repository, as they are mostly indepen-dent of each other. This leads to a retrieval process that wouldquickly become unmanageable for end-users (for example theEinstein Toolkit is comprised of 135 thorns). To facilitatethis process we use a thornlist written using the ComponentRetrieval Language [9], which allows the maintainers of adistributed framework to distribute a single ﬁle containing theURLs of the components and the desired directory structure.This ﬁle can then be processed by a program such as ourown

GetComponents script, and the entire retrieval processbecomes automated.In addition to the complex retrieval process, compilingCactus and managing simulations can be a difﬁcult task,especially for new users. There are a large number of optionsthat may be required for a successful compilation, and thesewill vary across various architectures. To assist with thisprocess a tool called the

Simulation Factory [10], [15] wasdeveloped. Simulation Factory provides a central means ofcontrol for managing access to different resources, conﬁguringand building the Cactus codebase, and also managing thesimulations created using Cactus. Simulation Factory usesa database known as the

Machine Database , which allowsSimulation Factory to be resource agnostic, allowing it to runconsistently across any pre-conﬁgured HPC resource.III. C

ACTUS C ONFIGURATION L ANGUAGE

The Cactus Conﬁguration Language (CCL) was providedwith the ﬁrst Cactus 4.0 release in 1999. The language hasevolved since then with the addition of function aliasing(Section III-A2) and the conﬁguration CCL ﬁle (Section II-A),along with a small number of minor changes. The welldesigned initial capabilities and ensuing stability of the CCLis one feature of Cactus which has led to its success acrossdifferent scientiﬁc ﬁelds and its ability to enable the growthof application communities. chedule Bin Description

CCTK_STARTUP

For routines which need to be runbefore the grid hierarchy is set up, forexample, for function registration.

CCTK_PARAMCHECK

For routines that check parame-ter combinations for potential errors.Routines registered here only have ac-cess to the grid size and the parame-ters.

CCTK_INITIAL

For routines which generate initialdata.

CCTK_PRESTEP

Tasks performed before the main evo-lution step.

CCTK_EVOL

The evolution step.

CCTK_POSTSTEP

Tasks performed after the evolutionstep.

CCTK_ANALYSIS

Routines which can analyze data ateach iteration. This time bin is specialin that ANALYSIS routines are onlycalled if output from the routine isrequested, e.g. in the parameter ﬁle

Fig. 3. Scheduled functions in Cactus can be assigned to run in standardtime bins, the most important of which are described in this table.

In this section we outline the structure of the CactusConﬁguration Language and provide syntax deﬁnitions formany of the elements of CCL. A complete speciﬁcation anddiscussion of the language may be found in the Cactus User’sGuide . A. Thorn Conﬁguration1) Groups:

Cactus variables are placed in variable groupswith homogeneous attributes, where the attributes describeproperties such as the data type, variable group type, rank,dimensions, and number of time levels. Many Cactus functionsoperate on groups of variables, for example storage allocation,sychronization between processors, and output functions. Forexample, a vector ﬁeld containing individual variables for ﬂuidﬂow in different directions would typically include all thevector components in a single variable group. By default, allvariable groups are private, however the public keyword canbe used to change the access level for each subsequent variablegroup in the ccl ﬁle.

2) Functions:

Cactus provides two types of functions, scheduled and aliased . Scheduled functions are declared in the schedule.ccl ﬁle and are deﬁned to be called at certainstages in the Cactus simulation by prescribing a time bin , aspeciﬁc time during a simulation, in which to run. StandardCactus time bins are deﬁned which are invoked in a welldeﬁned order, and a list of the most important Cactus standardtime bins is provided in Figure 3.Additionally, thorn developers can deﬁne their own timebins or schedule groups. It is possible to specify the order inwhich two scheduled functions are called, as well as simpleconditionals and loops. Memory allocation of Cactus variablescan be restricted to only the time of execution of a certain http://cactuscode.org/documentation/UsersGuide.pdf function. Figure 4 shows a subset of the syntax which is usedto deﬁne a scheduled function. SCHEDULE [GROUP]AT|IN [WHILE ] [IF ][BEFORE|AFTER |( ...)]*{ [STORAGE: ,...][SYNC: ,...]} "Description of function or schedule group"

Fig. 4. Subset of the syntax for declaring scheduled functions or schedulegroups of functions. A function can be scheduled at a certain time bin orin a schedule group. It can be called while or if a condition is fulﬁlled.Functions or schedule groups can be scheduled before or after other functionsor schedule groups, within the same time bin or schedule group. Storage forCactus variables might only be allocated for a certain function or schedulegroup, to save overall memory. Variables distributed over multiple processescan be automatically synchronized after a certain function or schedule group,if speciﬁed in the ccl ﬁle.

Aliased functions are functions that can be shared betweenthorns. They are declared in the interface.ccl ﬁle andmay be called by a thorn at any point during the simulation.In order to call an aliased function it is not important to knowthe programming language used for its implementation. TheCactus API takes care of possibly necessary conversions.

3) Variables: Grid variables are Cactus variables that arepassed between thorns by the ﬂesh, and are declared inthe interface.ccl ﬁle. They are generally collected into variable groups of the same data type. There are three typesof variable groups: grid functions , arrays , and scalars . Gridfunctions (GFs), the most common variable group type, arearrays with a speciﬁc size set by the parameter ﬁle, whichare distributed across processors. All GFs must have thesame array size, typically deﬁning the shape and size of thecomputational domain.

Arrays are a more general form of GFsin that each array group may have a distinct size which can begiven by Cactus parameters.

Scalars are single variables of agiven basic type, much like rank-zero arrays. Cactus variablescan specify a number of timelevels, which means a certainnumber of copies of this variable for use in time–evolutionschemes where data at a past time is needed to calculate thenew data at a later time. Part of the syntax for declaring avariable group of variables is shown in Figure 5.

4) Parameters: Parameters are used to specify the runtimebehavior of Cactus and are deﬁned in the param.ccl ﬁle.They have a speciﬁc data type and scope, a range of allowedvalues, and a default value. Once parameters have been set,they cannot be modiﬁed unless speciﬁcally declared to be steerable , in which case they may be dynamically changedthroughout the simulation. The allowed datatypes for param-eters are

Int , Real , Keyword , Boolean , and

String . Thorns canuse and extend parameters of other thorns. The syntax fordeclaring Cactus parameters is shown in Figure 6.

5) Include Files:

Header ﬁles can be shared between thornsif speciﬁed in the interface.ccl ﬁle. It is not only data_type> [TYPE=][SIZE=][TIMELEVELS=][{[ [,] ]} [""] ]

Fig. 5. Part of the syntax for declaring Cactus variables. Cactus variableshave to be one of the data types Cactus deﬁnes and are part of a variablegroup. They can have different Cactus variable types, sizes, and number oftime levels. Each variable group needs to have a human–readable description. [EXTENDS|USES] ""{ :: "Range description"}

Fig. 6. Syntax for declaring Cactus parameters. Thorns might use or extendparameters of other thorns, and deﬁne their own. A parameter needs to havea data type. A human–readable description needs to be given, as well as anallowed range with a description for the range and a default value within thatrange. possible to share a single include ﬁle, but also to concatenatemultiple include ﬁles (also from multiple thorns), and usethem like a single include ﬁle. During the build process,Cactus copies all of the source ﬁles located in each thorn’s include directory to a central location from which they maybe accessed by any other thorn using one of two methodsshown in Figure 7.

USES INCLUDE requests an includeﬁle from another thorn, and

INCLUDE adds the code in to . USES INCLUDE: INCLUDE[S]: IN

Fig. 7. Syntax for using include ﬁles in Cactus. Thorns might provide aspeciﬁc header ﬁle to another thorn (the ﬁrst example), or might provide onepart of a concatenation of multiple header ﬁles, possibly from multiple thorns(the latter example).

B. Thorn Interaction1) Scope:

Cactus provides different levels of access forvariables and parameters. Variables can be deﬁned as public or private . Public variables can be inherited by a thorn when thatthorn inherits an interface. Thorn inheritance will be describedin greater detail below.

Private variables can only be seen bythe thorn which deﬁnes them.Similarly, parameters may be deﬁned as restricted or pri-vate . Restricted parameters are available to thorns whichrequest access.

Private parameters, like variables, are onlyvisible to the thorn which deﬁnes them. The access levelshere only specify if those parameters are directly accessiblein the source code; it is possible to access information about any parameter through Cactus API functions regardless of theparameter scope deﬁned in the param.ccl ﬁle.

2) Inheritance:

Cactus provides an inheritance mechanismsimilar to Java’s abstract classes. It allows thorns to gainaccess to variables provided elsewhere by inheriting from theinterface. A key point here is that the thorns are not inheritingfrom other speciﬁc thorns; any number of thorns may declarethemselves as implementing an interface. These thorns may allbe compiled together, allowing the user to decide at run-timewhich thorn should be used. The interface is only speciﬁed bythe thorns implementing it. This means that thorns declaringthe same interface-name need to have an identical interface,which is checked by Cactus.Cactus also provides capabilities which may be declared inthe configuration.ccl ﬁle. Capabilities differ slightlyfrom interfaces in that while any number of thorns providingthe same interface may be compiled together, only one thornproviding a capability may be compiled into a speciﬁc conﬁg-uration. In this sense, while interfaces deﬁne run–time depen-dencies, capabilities deﬁne build–time dependencies. This canbe useful for providing external libraries or functions whichare too complex for aliasing. Also, capabilities play a rolein conﬁguring thorns and external libraries since they interactwith the build system of Cactus.Many design decisions are based on the distinction betweeninterfaces and capabilities. For example, the concept of capa-bilities is important for application performance – knowing aninter-thorn relationship at build time allows optimizations tobe included that are not possible at run time.The syntax for declaring and requiring a capability is shownin Figure 8.

PROVIDES { SCRIPT LANG }REQUIRES

Fig. 8. Part of the syntax for declaring and requiring capabilities in Cactus.Capabilities can be required and provided by thorns. If a thorn provides acapability it interacts with the makesystem through the output of a script whichneeds to be speciﬁed in the ccl ﬁle, as well as it’s programming language tobe able to call it correctly.

The interface.ccl ﬁle also provides a low-level in-clude mechanism, described in Section III-A5, similar to thatfound in C/C++. Thorns may request access to any includeﬁle within the Cactus source tree without specifying whichthorn or interface should provide it. This is used primarily foroptimization reasons as the compiler can then replace inlinefunctions, and in some cases for providing access to externallibraries such as HDF5.

C. Testing

It is strongly recommended, although not required, thatthorns come with one or more test suites. These consist ofample parameter ﬁles and the expected output for thoseparameters. These ﬁles should be located within the test directory in the thorn, so that the test suites may be run us-ing gmake -testsuite . These testsuites serve the dual purposes of regression and portabilitytesting. IV. E

XAMPLES

In this section we show some examples of the dependenciesamong Cactus thorns which are generated by the CCL ﬁlesfor different applications: a simple example application forthe scalar wave equation with a minimal set of thorns; asmall community toolkit for quantum gravity; and a largecommunity toolkit for numerical relativity. The interest onthorn dependencies arises for two core reasons:1) Cactus is particularly targeted at enabling communitiesto generate shared toolkits for solving a variety of prob-lems in a particular ﬁeld. The standard computationaltoolkit which is distributed with Cactus is further usedby many different applications. Thorn dependencies andinterfaces thus need to be carefully thought out andperiodically revisited to make sure that the plug-and-play aim of Cactus, where different thorns can providethe same functionality, is achieved with interfaces whichare as simple, ﬂexible and general as possible. Thisdesign usually involves a delicate balance, taking intoaccount the speed of implementation, complexity of theinterface etc.2) Long time Cactus users work with standard thorn listswhich are built up from experience and shared with col-laborators. These thorn lists are amended as new thornsbecome available or are no longer used, and can containseveral hundred thorns. For new users in particular, thereis an increasing issue with providing a procedure forusers to select the appropriate set of thorns for theirapplication, and to understand the capabilities of differ-ent thorns. One big simpliﬁcation which could be madewould be to reduce the number of thorns in thorn lists byremoving thorns which depend on others and could beautomatically added. Ideally, a tool would be built whichwould allow a user to start from an abstract descriptionof their problem and automatically select appropriatethorns, for example

Evolving Gaussian initial data usingthe 3-D scalar wave equation and outputting 3D data ,or

Evolving two black holes using Einstein’s equationsand calculating gravitational waveforms . The questionis then whether there is currently enough informationin the CCL ﬁles to achieve this, or how additionalinformation could be provided.In this section, we use the dependencies among the setsof thorns described in the CCL ﬁles for these three exampleapplications to view the complete set of thorn dependenciesand to investigate how the thorn set could potentially be gen-erated from an initial minimal set of thorns. The dependenciesused for the ﬁgures are taken from a ﬁle generated during the Cactus build process which contains a complete database ofthe contents of the different thorn conﬁguration ﬁles.A Perl script is used to parse this database and generatea ﬁle in dot format, which can then be processed by aprogram like graphviz [12] and turned into a directedgraph like that in Figure 9. This graph shows ﬁve differenttypes of dependencies. Inheritance is denoted by a regulararrow, dependencies due to a required function are denotedby an arrow with a square head, direct thorn dependencies aredenoted by a dotted arrow, shared variable dependencies aredenoted by an arrow with a circular head, and dependenciesdue to a required capability are denoted by an arrow with adiamond head. There are also shaded and unshaded thorns, thedistinction being that the shaded thorns have no other thornsdepending on them.This Perl script does not show the dependencies generatedby a single thorn, so we also use a set of two Python scripts,the ﬁrst of which parses the actual CCL ﬁles and generates anXML ﬁle containing all of the dependencies. This ﬁle can thenbe queried by the second script, which will search for a singlethorn and ﬁnd all thorns upon which the query depends. Itwill also output a graph in dot format, as seen in Figure 10.The second script will also allow users to choose betweenalternate implementations of the same interface (e.g.

PUGH or carpet ). The motivation here is that this script should allowthe user to generate a complete thornlist that could then beused to build a simulation. A. Simple Example: Scalar Waves

The set of Cactus thorns to solve the 3-D scalar waveequation (WaveToy Demo) was developed as a pedagogicalexample for understanding Cactus, and as a simple and wellunderstood test case for new developments. These thorns solvethe hyperbolic wave equation in 3D Cartesian coordinates withdifferent boundary conditions for a chosen set of initial dataand include different output formats and a web interface. Thisexample is described on the Cactus web pages [16], whichalso provide a thorn list with information about the 22 thornsthat are used. The example application includes two initialdata thorns which specify the initial scalar ﬁeld and sources( idscalarwavec and wavebinarysource ), a scalarﬁeld evolver ( wavetoyc ) along with additional thorns fromthe standard Cactus Computational Toolkit. The example usesthe unigrid driver pugh with associated thorns pughslab for hyperslabbing and pughreduce which provides a set ofstandard reduction operations that can calculate for examplethe maximum value or L2 norm over the grid for any gridvariable.A complete set of dependencies between these thorns asspeciﬁed in the CCL ﬁles is shown in Figure 9. In this diagramwe can see for example the central nature of the ioutil thorn which provides functionality that can be used by thornsimplementing different I/O methods, for example providing aparameter which sets when data for all I/O methods should beoutput and the directory in which to write data. aveToyCBoundary CartGrid3DIDScalarWaveC IsoSurfacerWaveBinarySourceCoordBase HTTPDEextraHTTPD IOAsciiIOBasic IOJpegIOUtil jpeg6bLocalInterp LocalReduce PUGHReducePUGH PUGHSlabSocketSymBase Time Interface InheritanceFunction RequirementDirect Thorn DependencyShared Variable DependencyCapability Requirement

Fig. 9. Dependency graph for complete set of thorns in the simple example application

WaveToy Demo . The shaded items indicate that the thorns are‘leaves’ and have no thorns depending on them.

BoundarySymBase PUGHWaveToyC CartGrid3DIDScalarWaveC CoordBase

Fig. 10. Dependency graph for the WaveToy Demo thornlist. This graph isgenerated using dependencies of thorn IDScalarWaveC which deﬁnes initialdata for the ﬁelds evolved by the scalar wave equation.

The dependency diagram also shows that any method toautomatically generate this set of thorns using dependencyinformation would need 11 thorns speciﬁed as a starting point,these are the shaded thorns in the diagram. For example, if wesimply started from the thorn that speciﬁes the initial scalarﬁeld ( idscalarwavec ) as shown in Figure 10, which couldbe the obvious starting point for a user who knows they wantto evolve a particular scalar ﬁeld then working only withdependencies would result in a set of thorns without using anycoordinate time ( time ), any I/O, or the possibility to includescalar source terms.Adding additional metadata to thorns is one mechanismto supplement the current CCL information to enable thegeneration of thorn lists for a particular application. Forexample, explicitly tagging thorns as providing I/O methodswould allow these thorns to be automatically added or to beselected by a user. In other cases, these diagrams show thatadditional interfaces or dependencies may need to be added.In Figure 10 attention needs to be given to the compile timedependencies that would include thorns time (which should in fact be inherited by the evolution thorn) and

PUGHReduce and localreduce . B. Small Community Code: The CausalSets Toolkit

The CausalSets Toolkit is an example of a small communitycodebase, which implements a wide variety of computa-tions in discrete quantum gravity, in particular with regardto Causal Set Theory [13]. The toolkit is based upon twomajor components. One is a MonteCarlo arrangement, whichprovides a generic API for providing parallel random numbers,i.e. pseudo random numbers which are independent on allprocesses. A second is a CausetBase API, provided by theBinaryCauset thorn, which abstracts the mathematical notionof a causal set (a locally ﬁnite partially ordered set [13]),providing myriad routines for working with such objects.One of the challenges in supporting computations in CausalSet Theory is that there is not a single sort of computation,such as ﬁnding approximate solutions to PDEs by ﬁnitedifference or spectral methods, which one would like toperform. Instead a physicist will ask many different sorts ofquestions about the behavior of discrete partial orders. A givencomputation will share aspects with others, but the overallstructure may differ considerably. Furthermore the communityis in general not terribly experienced with large scale com-putation, and thus beneﬁts from software which insulates thephysicist from many complications of parallel computing. Thecomponent based approach provided by the Cactus Frameworkis well suited to address both of these challenges, by allowingthe physicist to mix and match individual components to buildup the particular computation desired, working with familiarabstract mathematical concepts, rather than having to workdirectly with source code. Additionally the components aredesigned to run readily on large scale hybrid architectures,without the user needing detailed knowledge of how thecomputation is implemented.The dependency diagram for a collection of thorns whichimplements a sample computation is shown in Figure 11. Thiss a computation of spatial homology of a sprinkled causal set,as described in [4]. Here the BinaryCauset thorn implementsthe core CausetBase API, which provides the causal set alongwith a high level abstract interface to it. The MonteCarlo thornprovides parallel random numbers to CFlatSprinkle, whichgenerates a random causal set, and RandomAntichain, whichselects a random antichain within the causal set provided byCFlatSprinkle. The MonteCarlo arrangement gets the actualpseudorandom numbers from thorn RNGs, and also providesa thorn Distributions to provide samples from a variety ofdistributions, such as Poisson and Gaussian. AntichainEvolprovides a sequence of ‘thickened antichains’, which are thenread by the Nerve thorn, which computes a nerve simplicialcomplex from each thickened antichain. The homology groupsof these simplicial complexes are then computed by a separatestandalone homology package chomp [2]. The whole compu-tation relies on PUGH as a standard Cactus driver, and usesCactus’ IOUtil to provide metadata for IO routines.

NerveAntichainEvol BinaryCauset CFlatSprinkleRandomAntichain DistributionsIOUtilMonteCarloPUGH RNGsInterface InheritanceFunction RequirementDirect Thorn DependencyShared Variable DependencyCapability Requirement

Fig. 11. Dependency graph for a sample computation in Causal Set QuantumGravity. The computation is described in detail in [4].

C. Large Community Code: The Einstein Toolkit

The Einstein Toolkit [17] is an open, community devel-oped software infrastructure for relativistic astrophysics. TheEinstein Toolkit is a collection of software components andtools for simulating and analyzing general relativistic astro-physical systems that builds on numerous software efforts inthe numerical relativity community. The Cactus Framework isused as the underlying computational infrastructure providinglarge-scale parallelization, general computational components,and a model for collaborative, portable code development.The toolkit includes modules to build complete codes forsimulating black hole spacetimes as well as systems governedby relativistic hydrodynamics. Current development in theconsortium is targeted at providing additional infrastructurefor general relativistic magnetohydrodynamics.The Einstein Toolkit uses a distributed software model andits different modules are developed, distributed, and supportedeither by the core team of Einstein Toolkit Maintainers, or byindividual groups. When modules are provided by external groups, the Einstein Toolkit Maintainers provide quality con-trol for modules for inclusion in the toolkit and help coordinatesupport.With such a large set of components and a distributed teamof developers, implementing appropriate standards are crucialto maintain coherence across the code base, and to enablefuture development. This is achieved in some part by deﬁning base thorns that act to deﬁne application speciﬁc standards,providing default variables, parameters, functions and schedulebins that are common across an application. For example, inthe Einstein Toolkit application speciﬁc base thorns include

ADMBase (for the vacuum spacetimes),

HydroBase (formatter spacetimes) and

EOSBase (for equations of state) [6].Figure 12 shows the complete dependency graph for theEinstein Toolkit, which is so extensive that it isn’t possibleto examine in detail in print ; however, we include the graphhere to illustrate its complexity. Of the 135 thorns, 9 haveno dependency on other thorns, and 78 thorns (includingthese independent thorns) are needed as the starting point togenerate the whole toolkit using CCL dependency information.The clusters of dependencies for ADMBase , HydroBase and

EOSBase are apparent in the diagram.The Einstein Toolkit dependency diagram also shows anumber of direct thorn dependencies, indicated by the blackdotted lines. This means that thorns depend not on an interfacebut on a speciﬁc thorn. In some cases this is due to missinggeneral interfaces such as appropriate aliased functions whicheither need to be carefully designed or perhaps have simplynot been added where they should have been. A large numberof these direct dependencies are associated with the Carpetadaptive mesh reﬁnement set of thorns where the nature ofthe driver thorn typically enforces a direct dependency forexample for associated I/O or reduction operations. The needto support direct dependency on thorns was one reason why the configuration.ccl ﬁle was introduced as an extensionto the original CCL.Figure 13 shows an example of the direct dependenciesfor an initial data thorn in the Einstein Toolkit. The thorn

IDAnalyticBH provides initial data for several differentblack hole spacetimes with analytic solutions. Starting fromthis thorn, only seven other thorns are picked up directly withdependency information. Given that most production runs fornumerical relativity simulations include of order 100 thorns,it is clear that automatically generating appropriate thorn listswill require additional metadata and physics insight.V. F

UTURE W ORK

The original Cactus Conﬁguration Language was released aspart of the Cactus 4.0b distribution in 1999 and has since thattime been extended in different ways as new features wererequired. Despite serving the Cactus user community wellsince this time, it is clearly time to reexamine the requirementsfor the CCL in the light of current and future needs and to Note that if viewing this paper as a PDF document it is possible to zoomin to see features in detail. dmadmanalysisadmbase admconstraintsahﬁnderahﬁnderdirect calckdistortedbhivp ehﬁnderexactextract grhydroidanalyticbhidaxibrillbhidaxioddbrillbh idbrilldataidconstraintviolateidﬁleadm idlinearwavesmeudon_bin_bhmeudon_bin_ns meudon_mag_nsml_admconstraints ml_admquantitiesml_bssn ml_bssn_helperml_bssn_o2 ml_bssn_o2_helpernoexcision quasilocalmeasuresrotatingdbhivp tmunubasetwopunctures weylscal4coordgaugegrhydro_initdatastaticconformal tovsolver admcoupling admmacrosaeilocalinterp lapackblaslorene boundaryellsor cartoon2d periodicreﬂectionsymmetry rotatingsymmetry180rotatingsymmetry90carpetinterp carpetcarpetinterp2 carpetioasciicarpetiobasic carpetiohdf5carpetioscalar carpetreducecarpetregridcarpetregrid2carpetslab cartgrid3d carpetevolutionmaskcarpetlib carpetmasknanchecker carpettrackerioascii iohdf5utiliojpeg spacemask dissipationhydro_analysishydro_initexcisionlegoexcisionmultipole noisesphericalsurface constantscoordbaseellbase eos_hybrideos_base eos_polytrope eos_idealﬂuideosg_hybrideosg_base eosg_idealﬂuideosg_polytropeformaline fortran newradgenericfdloopcontrol gslhdf5 iohdf5 httpdextrahttpd hydrobasesetmask_sphericalsurfaceinitbaseiobasic ioutiltimerreport terminationtriggerjpeg6b tgrtensorlocalinterplocalreduce ml_bssn_test molnice normspughpughinterppughreducepughslab slabslabtestsocket summationbyparts symbasetatelliptic time

Interface InheritanceFunction RequirementDirect Thorn DependencyShared Variable DependencyCapability Requirement

Fig. 12. Complete dependency graph for the

CartGrid3D CarpetIDAnalyticBH ADMBaseStaticConformal IOUtil InitBaseCoordBase

Fig. 13. Dependency graph for the Einstein Toolkit starting from theIDAnalyticBH thorn. For this graph the thorn Carpet was chosen to providethe driver interface, however PUGH could have been used instead. take into account new technologies and possibilities. In thissection we describe new features required in the CCL andtheir motivation.Cactus (and the set of thorns in the Cactus ComputationalToolkit) currently best supports ﬁnite difference, ﬁnite volume,or ﬁnite element methods implemented on structured grids.Extensions to the CCL are required to support meshlessmethods (e.g. particle methods such as smoothed particlehydrodynamics or particle-in-cell, used for example in manyastrophysics codes) and unstructured meshes where additionalconnectivity information is required to specify how gridpoints are connected (e.g. unstructured grids are importantfor example in coastal modeling to resolve the ﬁne detailsof the coastline). Implementing both these features in Cactus requires developing appropriate parallel driver and associatedinfrastructure thorns in addition to changes to the CCL.Cactus currently operates with a single computational gridso that all physical models need to run on a single domain.Comprehensive multiphysics support is needed where differ-ent physical models can be conﬁgured and run on differentdomains, for example for coupling together wind and currentmodels in coastal science, or modeling different physicalcomponents of a relativistic star.Constants (e.g. π or the solar mass) are commonly used inscientiﬁc codes. Currently in Cactus constants are handled viainclude ﬁles, for example the Einstein Toolkit contains a thornwhich provides commonly used astrophysical constants in aninclude ﬁle. These constants are then only available in sourcecode and not in CCL ﬁles. A preferable approach would be todeﬁne such constants directly as part of the CCL speciﬁcation.Similar to constants, the CCL needs to support enumerationsand user-deﬁned structures, so that e.g. a hydrodynamicalstate vector consisting of density, velocity, and temperaturecan be handled as a combined entity instead of as a setof ﬁve separate variables. This should include the ability tohandle vectors and tensors in a natural manner, a featurethat is missing in many computer languages, but which isnevertheless important in physics simulations. Tensor supportwould need to include support for symmetries (so that e.g.only 6 out of 9 components of the stress tensor are stored). Inimplementing this, it is important that the abstract speciﬁcationof data types is decoupled from the decision of how to lay themut in memory, which needs to be left to the driver to ensurethe highest possible performance on modern architectures thatmay offer vectorization and deep cache hierarchies.While Cactus, through the CCL, contains information onhow thorns ﬁt together computationally the CCL does notcontain information on the scientiﬁc content of the thorns.This issue needs some attention as the number of thorns inparticular domains grows and models become more complex.Options to handle this could include extending the CCL,or adding descriptive metadata separate to the CCL, or byinvestigating whether enough information can be providedfrom the CCL and base thorns for a particular application.Such additional information is important, for example, to beable to automatically construct appropriate thornlists for aparticular physical model.A further issue related to the growth in both the numberof thorns and the complexity of applications is constructingand editing CCL ﬁles. CCL ﬁles for some thorns are nowvery long and complex and difﬁcult to read and comprehend.This issue could be addressed by restructuring the CCL itselfor by providing intuitive and ﬂexible higher level tools forinterpreting, checking and editing ﬁles.A ﬁnal consideration is the syntax for the CCL. Changingthe CCL syntax could improve the ease with which the ﬁlescould be constructed and edited, and importantly provide moreoptions for standard tools which could be used to construct,investigate, debug and edit the CCL ﬁles. As an example,using a standardized syntax for CCL would allow users to takeadvantage of the extensive features of the Eclipse Platform [5].Eclipse is an advanced Integrated Development Environment(IDE) that includes features such as customizable syntaxhighlighting, auto-completion of code, and dynamic syntaxchecking for languages it recognizes. One option for revisingthe CCL syntax would be to use an existing data markuplanguage that incorporates metadata such as the ResourceDescription Framework (RDF) [14]. RDF is a widely usedstandard for describing data in internet tools. It uses URIsto describe the relationship between two objects as well asthe two ends of the link, which is commonly known as a triple . This would be a natural method for describing thedependencies between thorns, however RDF is generally usedas an extension of XML, which is not easily readable byhumans. As the CCL ﬁles must be generated by hand, itwould be preferable to use an alternate format that focuseson readability. One such example is YAML (YAML Ain’tMarkup Language) [11], a data serialization language witha strong emphasis on human readability. YAML representsdata as a series of sequences and mappings, both of whichcan be nested within others. While YAML does not inherentlysupport metadata, it would be quite simple to add metadata tothe thorns by adding extra mappings to the CCL ﬁles.VI. C ONCLUSION

We have presented an overview of the Cactus ConﬁgurationLanguage (CCL) that describes Cactus thorns and have shown http://en.wikipedia.org/wiki/Integrated development environment how the CCL is used in three different applications. Thedependency information included in the CCL speciﬁcationcan be used to identify potential issues in designing complexcodebases, and to build high–level tools to better assist usersin constructing codes for particular applications.New features needed in the CCL speciﬁcation have beenidentiﬁed, including support for more numerical methods,multiple physical models, user-deﬁned structures, scientiﬁcmetadata and to address the growing complexity of interfaces.A CKNOWLEDGMENT

The development of Cactus and the CCL has been a longterm and ongoing effort with many contributors and fun-ders. In particular we acknowledge the contributions of GerdLanfermann, Joan Mass´o, Thomas Radke, and John Shalf,and funding from the National Science Foundation, Max-Planck-Gesellschaft, and Louisiana State University. We alsoacknowledge colleagues in the Einstein Toolkit Consortiumwhose thorns provide the motivation and core use case forthis work.Work on thorn dependencies was funded by NSF

Chomp , http://chomp.rutgers.edu.[3] T. Goodale, G. Allen, G. Lanfermann, J. Mass´o, T. Radke, E. Seidel, andJ. Shalf,

The Cactus framework and toolkit: Design and applications ,Vector and Parallel Processing – VECPAR’2002, 5th International Con-ference, Lecture Notes in Computer Science (Berlin), Springer, 2003.[4] Seth Major, David Rideout, and Sumati Surya,

Stable homology as anindicator manifoldlikeness in causal set theory , Class.Quant.Grav. Multi-physics coupling of einstein and hydrodynamicsevolution: a case study of the einstein toolkit , CBHPC ’08: Proceedingsof the 2008 compFrame/HPC-GECO workshop on Component basedhigh performance (New York, NY, USA), ACM, 2008, pp. 1–9.[7] Erik Schnetter, Peter Diener, Nils Dorband, and Manuel Tiglio,

A multi-block infrastructure for three-dimensional time-dependent numericalrelativity , Class. Quantum Grav. (2006), S553–S578, eprint gr-qc/0602104, URL http://stacks.iop.org/CQG/23/S553.[8] Erik Schnetter, Scott H. Hawley, and Ian Hawke, Evolutions in 3Dnumerical relativity using ﬁxed mesh reﬁnement , Class. Quantum Grav. (2004), no. 6, 1465–1488, eprint gr-qc/0310042.[9] Eric L. Seidel, Gabrielle Allen, Steven Brandt, Frank L¨ofﬂer, andErik Schnetter, Simplifying complex software assembly: the componentretrieval language and implementation ∼ Causal sets: Discrete gravity