[PDF] A Simple Way to Distribute Mathematica Evaluations

Abstract

We present a simple package for distributing evaluations of a Mathematica function for many arguments on a cluster of computers. After setting up the hosts, the only change is to replace Map[f,points] by MapCore[f,points].

Full PDF

aa r X i v : . [ h e p - ph ] F e b MPP–2009–019arXiv:yymm.nnnn [hep-ph]

A Simple Way to Distribute Mathematica Evaluations

M. Bruhnke a , T. Hahn ba Universit¨at W¨urzburg,Am Hubland, D–97074 W¨urzburg, Germany b Max-Planck-Institut f¨ur PhysikF¨ohringer Ring 6, D–80805 Munich, GermanyFebruary 11, 2009

Abstract

We present a simple package for distributing evaluations of a Mathematica func-tion for many arguments on a cluster of computers. After setting up the hosts, theonly change is to replace

Map[f, points] by MapCore[f, points] . With the fairly recent arrival of low-cost multi-core CPUs, institutes often have signiﬁcantcomputing power at their disposal. Mathematica 7, whose main motto is parallel comput-ing, makes it relatively simple to send a calculation to the fellow cores on the same machine,though still not exactly straightforward to distribute a calculation on a larger cluster. Thepackage we present in the following ﬁlls this gap. After a one-time setup of the cluster, itallows to easily distribute calculations to as many hosts as there are Mathematica licensesavailable (both ordinary licenses and Mathematica 7’s sublicenses).We certainly do not propose to parallelize ‘atomic’ Mathematica operations, like

Simplify , which is a daunting task even at the conceptual level. Rather, we focus onlengthy evaluations of one function over many arguments, for example the evaluation ofa cross-section for many points in phase and/or parameter space. Incidentally, our pack-age is not restricted to numerical evaluations, but can handle any kind of Mathematicaexpressions.Many physicists would argue that at least numerical evaluations of a certain volumeshould be done in a compiled language for performance reasons. This is at best partiallytrue, as Mathematica has a formidable arsenal of functions, e.g. for numerical analysis,which are not easily available elsewhere, and it is the choice of algorithm that inﬂuencesthe computation time much more than the speed of a single evaluation. Furthermore, inconjunction with MathLink, e.g. through FormCalc’s Mathematica interface [1], the execu-1ion speed is essentially that of a compiled language and Mathematica’s part is ‘governing’the calculation.The package we present in this paper is remarkably short and contains one main function

MapCore which substitutes

Map in serial calculations. Sect. 2.1 describes usage of thepackage, Sect. 3 provides a function reference, and Sect. 5 describes installation and systemsetup.

The MultiCore package is loaded with << MultiCore‘

The next step is to add cores ∗ on which evaluations can be distributed. This can be donedirectly with e.g. AddCore["pc123.mppmu.mpg.de"] or, if login under a diﬀerent username is required,

AddCore["[email protected]"]

This explicit method becomes cumbersome, however, if many cores with varying loads areinvolved. The alternate invocation

AddCore[10] takes up to ten of the currently ‘free’ cores. This information is supplied by the findcores shell script (part of the MultiCore package) which in turn reads the admissible cores froma .submitrc ﬁle and invokes ruptime to determine the load. The .submitrc ﬁle has thesimple syntax pc380 4pc381 4pc339b 2pc472 ∗ A note on nomenclature: we refer to a ‘core’ as the fundamental computation unit, i.e. a processorable to run a single thread. A physical CPU may have several cores and similarly a host may have severalCPUs. rwhod daemon, since then its loadwill be reported through ruptime and findcores will use only the free cores.In the case of a Linux cluster, the .submitrc ﬁle can be generated (more or less)automatically, with the help of the setupcores script, as in: ./setupcores > $HOME/.submitrc

This script assumes that the hosts are listed via ruptime , that a password-free login via ssh is possible, and that each host is running a ﬂavour of Linux where /proc/cpuinfo canmeaningfully be read out. The ﬁle generated in this way constitutes a ‘raw’ version andshould be reviewed by hand.Each core launched requires a Mathematica license, i.e. a Kernel license. From Math-ematica 7 on, each (main) license includes four sublicenses and it is possible to use thesesublicenses for parallelization (cf. Sect. 3.11, $SublicenseFactor ).One can further take care not to invoke more slave processes than licenses available.To this end

AddCore is invoked with an integer n

0, meaning that it should spawn atmost so many slaves that | n | (main) licenses are left for other users. Also one can providea second integer argument m | m | sublicenses unused. This mode really makessense only for network licenses. For non-network licenses, AddCore silently assumes thatthe other machines listed in .submitrc have similar licenses.MultiCore generally works in a master–slave model, requiring one license (but hardlyany CPU time) for the master and one main or sublicense for each slave. We assume thatall cores in the cluster run the same Mathematica version, in particular that the master’sversion number is the same as all slaves’. In particular we assume that subkernels on slavecores can be launched if and only if the master is running Mathematica 7.Quitting the master’s Mathematica Kernel automatically closes all links, so explicitly‘removing’ registered cores is usually not necessary unless one wants to free Mathematicalicenses. Each slave session is characterized by an identiﬁer of the form host[id] , where host is the host name and id an integer link id. The syntax for RemoveCore is RemoveCore[host]RemoveCore[host[id]] where both host and id may be a pattern. Thus, RemoveCore["pc123"] closes all slaveson host pc123 and

RemoveCore[_] closes all current slave sessions.Once the cores are registered, the only necessary substitution is to replace

Map ( /@ ) by MapCore to make multiple evaluations execute in parallel.

Important:

The only slightly non-straightforward aspect is the remote deﬁnition of thefunction being evaluated.

MapCore sends the deﬁnition of this function to the slave as muchas the

Save function would save it in a ﬁle. This fails to work (for both

MapCore and

Save )if the function depends on a

LinkObject in the master’s session, i.e. if the function is or3nvokes a MathLink function. Even if the slave session has the same MathLink executableinstalled, it will in general not communicate via the syntactically same

LinkObject .To work around such cases, the

AddCore function has an optional second argument.This argument is sent to the slave upon opening of the link as an initialization command.In our opinion the best procedure in the MathLink case mentioned above is not to installthe MathLink executable in the master’s session at all, to prevent sending any explicit

LinkObject pointing to the master’s installed MathLink executables, and instead includethe

Install statement in the

AddCore invocation, as in

AddCore[0, Install["LoopTools"]]

Also, if the function has a very lengthy deﬁnition one might want to place it in a ﬁle andload that via the initialization command, e.g.

AddCore[0, << myfunction.m]

Of course one would have to submit this ﬁle to each slave ﬁrst if they do not have accessto the master’s ﬁlesystem. Note, however, that the slaves’ working directory is the user’shome directory, not the current working directory on the master. In other words, the ﬁleto be loaded must include a path unless it resides in the home directory anyway.

MapCore tries to have the given points calculated as quickly as possible. Therefore it dis-tributes more (less) than patchsize points to faster (slower) cores by evaluating its internaltiming statistics. Once all points of the list are distributed,

MapCore redistributes theunﬁnished points until the result for all points are available. It automatically decreasesthe patchsize according to the remaining list size, too. Although due to the competition N − MapCore returns, the time until all slaves are againready is negligible. The identiﬁer $CallID helps

MapCore to distinguish between new andold data of multiply distributed points.

Especially during long parallized calculations of many CPU-time-expensive points, linkerror handling plays an important role. If the link to one host, i.e. one or more cores, islost,

MapCore redistributes the as yet uncalculated points to the remaining hosts, executesthe equivalent

RemoveCore call and prints a warning message. After

MapCore has returnedone might want to add the lost host by re-invoking

AddHost .4 MultiCore package Function Reference

AddCore adds (registers) cores, i.e. opens links to remote machines for subsequent dis-tributed evaluation with

MapCore . It is invoked in one of the following ways: • AddCore[ hostname ] adds one core on hostname using a main license. • AddCore[ hostname , "subkernel"] adds one core on hostname using a sublicense. • AddCore[ n ] ( n >

0, integer) adds up to n cores using the findcores script (de-scribed below) using a ratio $SublicenseFactor : 1 of sublicenses to main licenses(cf. Sect. 3.11). • AddCore[ n ] ( n

0, integer) adds as many cores as there are main licenses using findcores , but leaves at least | n | main licenses for other users. • AddCore[ n , m ] ( n , m integer) same as above, with n for main licenses and m forsublicenses.The last two invocations really make sense only for network licenses. For non-networklicenses, it is silently assumed that the information taken from $LicenseProcesses and $MaxLicenseProcesses (in the master’s session) holds also for the remote cores. Each linkcorresponds to one core on a remote machine. It is hence permissible to add the same hostmore than once, to account for its number of cores. The links are identiﬁed, apart from thehostname, by a unique integer link id. This id is also sent to each slave process as $CoreID and can be used to e.g. construct unique ﬁlenames. Core additions are cumulative. Linksare released either through explicit removal with RemoveCore or by quitting the master’sMathematica Kernel.The findcores script is part of the MultiCore package. It needs a .submitrc ﬁle inwhich the admissible cores for distributed computing are listed. Each line has the syntax hostname [

Comment lines starting with a are allowed. Cores are processed in sequential order, i.e.the fastest machine should appear at the top of this list. The .submitrc ﬁle is searchedfor in the following order: • ./.submitrc , • $HOME/.submitrc , • ( MultiCore installation directory ) /submitrc , • /usr/local/share/submitrc . 5 indcores invokes ruptime to determine the load on a remote machine. This works onlyif the remote machine is running an rwhod daemon. If not, the load is assumed to be zero,i.e. all cores are taken. RemoveCore removes (unregisters) cores from the internal list, shuts down the correspond-ing remote kernels and closes the links. Each core is identiﬁed by two quantities, thehostname and the link id. Calling

RemoveCore is usually not necessary, as quitting themaster’s Mathematica Kernel automatically closes all links. • RemoveCore[ hostname [ id ]] removes all cores matching hostname and id , whereeither may contain a pattern. For example, RemoveCore[_] removes all links, and

RemoveCore["pc456"[_]] removes all links to pc456 . • RemoveCore[ hostname ] is equivalent to RemoveCore[ hostname [_]] . ListCore lists the currently registered cores. • ListCore[ hostname [ id ]] lists all cores matching hostname and id , where eithermay contain a pattern. ListCore[_] thus lists all cores. • ListCore[ hostname ] is equivalent to ListCore[ hostname [_]] . MapCore is the main function of the MultiCore package. It substitutes

Map in serial calcu-lations. • MapCore[ f , points , patchsize ] distributes the computation of f for all items in points to the cores previously registered with AddCore .The integer argument patchsize is optional (default value: 5) and tells

MapCore howmany points on average should be sent to each core. As every set of results returned bya slave contains timing information, the master distributes points according to the slaves’performance. Until the master has gathered enough statistics about the slaves’ timings itsends exactly patchsize points to each core.The larger the computation time for a single point is, the smaller patchsize should bechosen. A smaller value may also be proﬁtable if the participating cores have signiﬁcantdiﬀerences in speed. A patchsize of 1 achieves the best load-levelling but incurs the high-est communication overhead. We have generally found the communication overhead tobe negligible if the computation time for one patch is several seconds or more (see alsoperformance tests in Section 4). 6 .5 RemoteMath

RemoteMath encodes the invocation of a remote Mathematica Kernel. It receives onearguments and one ﬂag, the hostname and the type of license which shall be used whilelaunching the kernel. If required one can deﬁne diﬀerent invocation strings for diﬀerenthosts. • RemoteMath[ host , opt ] := remotestring deﬁnes remotestring as the command forinvoking a remote Mathematica Kernel on host . Options for the remote kernel aregiven in opt , which is presently restricted to -subkernel for launching a subkernel.The default command is ssh (host) ’exec /bin/sh -lc \"test ‘uname -s‘ = Darwin && nice -19 MathKernel (opt) -mathlink \|| nice -19 math (opt) -mathlink"’ This is an ssh command which starts a remote login shell that executes, with nice 19,

MathKernel on MacOS and math on other systems. Starting a login shell is important asit sources the shell’s initialization ﬁles, which may modify the PATH.If the Mathematica Kernel executable cannot be started using this command becauseit is not on the PATH, we recommend adding the appropriate directories to the PATH onthe remote system rather than modifying the

RemoteMath deﬁnition.

With

RemoteMap one can specify a mapping function which shall be applied on all remotehosts, i.e. slave sessions, to the point patches they receive from the master. Its default

RemoteMap[f_, points_] := Map[f, points] is the usual

Map function. This may be overwritten with an individual function whichmust have the same argument structure as

Map[ f , points ] . This feature could for examplebe used to leave a part of the parallelization to Mathematica 7 using the ParallelMap function. In that case one of course would set the number of cores in .submitrc to 1 forall hosts. $FindCores contains the full path to the findcores script, including (if necessary) anyoptions. The full syntax of findcores is: findcores [-f rcfile] [-h ruptimehost] rcfile speciﬁes the explicit location of the submitrc ﬁle (see Sect. 3.1) and ruptimehost speciﬁes the host on which to invoke ruptime to ﬁnd out the load of themachines listed in the submitrc ﬁle. The latter is necessary if running the master processon a machine not connected to the cluster, e.g. a laptop.Note: changing $FindCores modiﬁes subsequent invocations of

AddCore only, i.e. linksonce established are not changed by a diﬀerent value of $FindCores . $MsgLevel speciﬁes how verbose the master–slave communication is reported on screen. • $MsgLevel = n sets the message level to n .The default message level is 1, which just reports the adding and removing of coresas well as link failures. $CoreID is unique identiﬁer for each slave session. • $CoreID (in the master’s session) is the id of the last slave session spawned. Thisnumber should not be tampered with. • $CoreID (in the slave’s session) is a unique identiﬁer of the session. $CallID is available in both the master and slave session. In the master session it countsthe total number of calls to MapCore . In the slave session it identiﬁes that certain call to

MapCore which invoked the last computation on this slave. Note that they do not have tobe equivalent (see Sect. 2.2).

The integer $SublicenseFactor is a global parameter in the master session which is set to4 if the Mathematica version is 7 or above, and 0 otherwise. Only

AddCore[ n ] with n > $SublicenseFactor manually only makes senseif one uses Mathematica 7 and wants to optimize it to the mean ratio of unused sublicensesto unused main licenses which might be greater than 4 in some cluster networks.8 .12 $ListPositions $ListPositions is available in the slave session only. This list contains the positions ofthe points in the original list which are to be evaluated by the slave.Both $CallID and $ListPositions can e.g. be used to construct unique ﬁlenames.For example, if a single evaluation is very costly in CPU time, one may want to store eachresult immediately after computation. This could be solved through a wrapper function RemoteMap[f_, points_] :=MapThread[store[f], {points, $ListPositions}]store[f_, dir_:"results"][x_, i_] :=Block[ {file = ToFileName[dir, ToString[i]]},If[ FileType[file] === File,Get[file],(* else *)If[ FileType[dir] === None, CreateDirectory[dir] ];(Put[

Results for each point would be stored in results/ n , where n is each point’s index in theoriginal list. In addition to $ListPositions one could use $CallID to generate uniqueﬁlenames over multiple invokations of MapCore in the same master session.

We tested the performance and scalability properties of MultiCore on both a homogeneousand inhomogeneous cluster of 25 cores for diﬀerent evaluation times per point (tpp) anddiﬀerent patchsizes. As a testing function we used a simple pause directive f[p_][x_] := (Pause[p]; x) and mapped it over 10000 resp. 1000 arbitrary points for diﬀerent numbers of cores rangingfrom 0 (local evaluation), 1 (slave) to 25 (slaves) and pausing times p = 0 . , . , T depends on the number and performance of theadded cores: 1 T = 1 n N X i =1 i with tpp i being the tpp of core i and N the number of cores and n the total number ofpoints. The three plots on the right hand side of Figure 1 show the testing results fordiﬀerent tpp’s (of the fastest core) and for diﬀerent patchsizes. Again, the patchsize is nota crucial parameter. As before, deviations occur for the small tpp = 0.01 sec. The scalingbehaviour for large numbers of cores seems to be at most satisfactory since MultiCore’sparallalizing takes about twice as long as the ideal case predicts. But if one compares thetotal timings of 25 unequal cores to the corresponding timings on the left hand side, onesees that it takes only about 10 cores from the homogeneous cluster to do the same job.Therefore one principally has to consider the performance gain before joining much slowercores to one’s cluster. The MultiCore package is available from . Installa-tion is as simple as unpacking the tar ﬁle. MultiCore requires Mathematica versions 5 andup (version 7 preferred).To be able to load MultiCore regardless of the current directory, the MultiCore in-stallation directory has to be added to Mathematica’s $Path , for example by placing astatement like

PrependTo[$Path, "/my/path/to/MultiCore"] in prefdir /Kernel/init.m , where prefdir is one of • /usr/share/Mathematica (system-wide, Linux), • $HOME/.Mathematica (user-speciﬁc, Linux), • /Library/Mathematica (system-wide, MacOS), • $HOME/Library/Mathematica (user-speciﬁc, MacOS), • $ALLUSERSPROFILE/Application Data/Mathematica (system-wide, Cygwin), • $USERPROFILE/Application Data/Mathematica (user-speciﬁc, Cygwin).10 æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ ææ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æà à à à à à à à à à à à à à à à à à à à à à à à à àà à à à à à à à à à à à à à à à à à à à à à à à à à ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ìì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò òò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò òô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ôô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ôç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç çç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç €€€€€€ €€€€ €€€€ €€€€ €€€ ð cores s (cid:144) time tpp = ð points = ç ô ò ì à æ idealpatchsize æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ ææ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æà à à à à à à à à à à à à à à à à à à à à à à à à àà à à à à à à à à à à à à à à à à à à à à à à à à à ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ìì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò òò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò òô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ôô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ôç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç çç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç ç €€€€€€ €€€€ €€€€ ð cores s (cid:144) time tpp = ð points = ç ô ò ì à æ idealpatchsize æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ ææ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æà à à à à à à à à à à à à à à à à à à à à à à à à àà à à à à à à à à à à à à à à à à à à à à à à à à à ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ìì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò òò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò òô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ôô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô €€€€€€ €€€€ €€€€ €€€€ €€€ ð cores s (cid:144) time tpp = ð points = ô ò ì à æ idealpatchsize æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ ææ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æà à à à à à à à à à à à à à à à à à à à à à à à à àà à à à à à à à à à à à à à à à à à à à à à à à à à ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ìì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò òò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò òô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ôô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô €€€€€€ €€€€ €€€€ ð cores s (cid:144) time tpp = ð points = ô ò ì à æ idealpatchsize æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ ææ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æà à à à à à à à à à à à à à à à à à à à à à à à à àà à à à à à à à à à à à à à à à à à à à à à à à à à ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ìì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò òò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò òô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ôô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô €€€€€€€€ €€€€€€ €€€€€€ €€€€ €€€€ ð cores s (cid:144) time tpp = ð points = ô ò ì à æ idealpatchsize æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ ææ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æà à à à à à à à à à à à à à à à à à à à à à à à à àà à à à à à à à à à à à à à à à à à à à à à à à à à ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ìì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ì ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò òò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò ò òô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ôô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô ô €€€€€€€€ €€€€€€ €€€€€€ ð cores s (cid:144) time tpp = ð points = ô ò ì à æ idealpatchsize Figure 1: Reciprocal total timings as a function of number of cores for diﬀerent evaluationtimes per point (tpp), diﬀerent number of points (see heading of corresponding plot) andpatchsizes. The left column shows the result for the homogeneous cluster (tpp i = tpp =const). The right column shows those for the inhomogeneous cluster i.e. tpp i = tpp (1 +3 i − ) for i = 1 , . . . ,

25. 11he package has been tested under Linux, MacOS, and Windows/Cygwin, both as mas-ter and as slave. The communication with remote Mathematica Kernels requires attentionto a few details that may not be obvious: • An sshd daemon must be running on the remote machine and access not restrictedby a ﬁrewall. On Cygwin one has to start sshd once with “ net start sshd ” (asAdministrator) and on MacOS one has to open the ssh port in the ﬁrewall (SystemPreferences – Sharing – Remote Login). • ssh access to remote machines must be possible without password authentication.This requires that a host key is generated with ssh-keygen and the public part of it(typically $HOME/.ssh/id_rsa.pub ) copied to $HOME/.ssh/authorized_keys . • If remote access other than by ssh is required, one needs to redeﬁne the

RemoteMath function, which encodes the command string used to execute remote MathematicaKernels (see Sect. 3.5). This can either be done in the master session before any

AddCore invocations, or once and forever in

MultiCore.m . The MultiCore package provides a simple mechanism to distribute (parallelize) evaluationsof a single functions over many points. After setting up the cores participating in thecalculation with

AddCore , the single replacement of

Map by MapCore suﬃces to distributethe calculation.

MapCore is not limited to numerical evaluations, but can handle any typeof Mathematica expression.From Mathematica 7 on, parallelization on several cores of a single host is a built-infunctionality. Distributing calculations over more than one host is not straightforward,however, but can be done with the same ease using the

MultiCore package.The package is open source and is licensed under the GPL. It can be downloadedfrom and runs on Mathematica versions 5 and up(version 7 recommended).

Acknowledgements

We thank A. Hoang for playing our guinea pig in the beta stage and apologize to the MPIusers for using up too many Mathematica licenses during testing.

References [1] T. Hahn,

Comp. Phys. Commun.178