[PDF] Massive Self-Assembly in Grid Environments

Abstract

Self-assembly plays an essential role in many natural processes, involving the formation and evolution of living or non-living structures, and shows potential applications in many emerging domains. In existing research and practice, there still lacks an ideal self-assembly mechanism that manifests efficiency, scalability, and stability at the same time. Inspired by phototaxis observed in nature, we propose a computational approach for massive self-assembly of connected shapes in grid environments. The key component of this approach is an artificial light field superimposed on a grid environment, which is determined by the positions of all agents and at the same time drives all agents to change their positions, forming a dynamic mutual feedback process. This work advances the understanding and potential applications of self-assembly.

Full PDF

MMassive Self-Assembly in Grid Environments

Wenjie Chu, Wei Zhang, ∗ Haiyan Zhao, Zhi Jin, ∗ Hong Mei

Department of Computer Science and Technology, Peking University, ChinaKey Laboratory of High Conﬁdence Software Technology (Peking University), MoE of China ∗ Correspondence to: (zhangw.sei, zhijin)@pku.edu.cn

February 24, 2021

Abstract

Self-assembly plays an essential role in many natural processes, involving the formation and evo-lution of living or non-living structures, and shows potential applications in many emerging domains.In existing research and practice, there still lacks an ideal self-assembly mechanism that manifests ef-ﬁciency , scalability , and stability at the same time. Inspired by phototaxis observed in nature, we pro-pose a computational approach for massive self-assembly of connected shapes in grid environments.The key component of this approach is an artiﬁcial light ﬁeld superimposed on a grid environment,which is determined by the positions of all agents and at the same time drives all agents to changetheir positions, forming a dynamic mutual feedback process. This work advances the understandingand potential applications of self-assembly. As a kind of interesting and mysterious phenomenon, self-assembly has been unintentionally observedin many natural processes and often appears in science ﬁction movies. In the 2014 animated movie“Big Hero 6”, one of the impressing scenes involves thousands of micro-robots assembling themselvesinto arbitrary shapes and transforming between shapes dynamically, which shows pervasive potentialapplications of self-assembly from the perspective of imagination. Before being perceived by humanbeings, various kinds of self-assembly phenomenon have existed in nature for a long time [1], playingessential roles in the forming of multi-component non-living structures [2–4], multi-cellular living or-ganisms [2, 5, 6], and multi-organism biological systems [7, 8]. These self-assembly phenomena, eitherreal or ﬁctional, all implicitly point to an important research problem: whether we can construct artiﬁcialself-assembly systems . The beneﬁt of resolving this problem is twofold: on the one hand, it would con-tribute to a deep understanding of self-assembly mechanisms; on the other hand, it would facilitate the1 a r X i v : . [ c s . M A ] F e b pplying of self-assembly in many valuable scenarios, including autonomous cooperation of UAVs [9,10]and intelligent transportation systems [11, 12].The problem of constructing artiﬁcial self-assembly systems has attracted increasing attention in re-cent years, but there still lacks an ideal self-assembly mechanism that manifests the three essential fea-tures of efﬁciency , scalability , and stability at the same time. Edge-following based methods [13, 14] leadto a self-assembling process with low efﬁciency , because of the heavily-decreased degree of parallelism.Path planning/scheduling methods based on prior task allocation [15, 16] suffer from poor scalability concerning the number of agents involved in self-assembly due to the high computational cost of globalshortest path generation and task assignment. Methods based on artiﬁcial potential ﬁelds (APF) [17–21]behave well in efﬁciency and scalability, but make a poor showing in stability , because agents may betrapped in local minima , which will be more likely to appear as the number of agents increases [18, 22].Although some improved APF-based methods [19, 23] have been proposed to eliminate local minima,they can only cope with limited scenarios.In general, self-assembly can be viewed as a kind of collective intelligence (CI) phenomena: a groupof agents with limited capabilities exhibits collective intelligent behavior that goes well beyond individ-ual capabilities. Existing research offers two complementary understandings of CI: in the explanatory understanding [24], a key element in CI is the environment , which acts as an external memory of a col-lective of agents and drives each agent’s behavior based on the information the agent perceives from thecurrent environment; in the constructive understanding [25], the key to build a problem-oriented artiﬁ-cial CI system is to enable and maintain an ongoing loop of information exploration , integration , and feedback among agents in the collective, until an acceptable solution to the target problem emerges.Guided by the two understandings of CI, we propose here a computational approach for massive self-assembly of connected shapes in grid environments. This approach mimics the phototaxis observed inmany species [26] (i.e., organisms’ movement towards or away from light sources), by superimposing thegrid environment with an artiﬁcial light ﬁeld (ALF), which plays the dual role of an external memory ofself-assembling agents in the explanatory CI understanding and a carrier for information integration andfeedback in the constructive CI understanding. The essence of this approach is a mutual feedback processbetween the ALF and the agent collective: the current positions of all agents determine the current stateof the ALF, which in turn drives the agents to further change their current positions. In experiments, thisapproach exhibits high efﬁciency, scalability and stability in a set of diverse shape formation tasks. In anextreme case involving 5469 agents in a 135 ×

135 grid environment, this approach accomplishes the self-2ssembly task accurately with only 119 steps/256.6 seconds on average. Compared to the state-of-the-artcentralized distance-optimal algorithm, this approach exhibits a n to n log ( n ) decrease in the absolutecompletion time of self-assembly tasks with respect to task scale n , and can be easily accelerated throughparallelization. The proposed approach consists of ﬁve components (Figure 1.A): a grid environment G , a target shape S , an agent collective A , an artiﬁcial light ﬁeld F superimposed on G , and a lightweight coordinator C . C and all agents in A form a star topology: each agent connects with C through a communicationchannel; no communication channel exists between any two agents. C coordinates each agent’s behaviorby playing three roles: a generator of discrete system times, a recorder of system states, and an actuator of grid locking/unlocking. When resolving a self-assembly problem, each agent interacts with C throughan iterative process (Figure 1.B). Before the process begins, each agent a i reports its initial position p ( a i ) to C ; as a result, C gets the system state at time , denoted as p . After that, C broadcasts p to eachagent in A . In every iteration, each agent sequentially carries out three actions: (i) local ALF calculation ,where the agent retrieves other agent’s position from the C , identify light sources and calculate its localALF; (ii) priority queue generation , where the agent constructs a priority queue of next positions basedon the agent’s state and local ALF; (iii) next position decision , where the agent cooperates with C toobtain a conﬂict-free next position. After that, the agent will move to its next position, inform C thismovement, and enter the next iteration. The process will terminate when agents form the target shape.(See supplementary for more details) Local ALF calculation.

Each agent calculates its local ALF, i.e., the light intensities in its surround-ing 8 grids as well as its current position, based on the system state at current time t , namely p t . The ALFat time t , denoted as F t , is deﬁned as a pair of functions ( r t , b t ) , where the former maps each grid to theintensity of red light at the grid, and the latter to the intensity of blue light. The intensity of red/blue lightat a grid is the sum of all red/blue light sources’ intensities at the grid. At any time t , each agent out ofthe shape is a source of red light, and each unoccupied target grid is a source of blue light. The intensityof the light attenuates with propagation distance. As a result, given a g ∈ G , r t ( g ) and b t ( g ) can be de-ﬁned conceptually as follows: r t ( g ) = (cid:80) a ∈ O t f ( L, α, dis ( g, p t ( a ))) , b t ( g ) = (cid:80) g (cid:48) ∈ U t f ( L, α, dis ( g, g (cid:48) )) ,where O t is the set of agents out of the shape at t , U t is the set of unoccupied target grids at t , L is theintensity of light emitted by a light source, α is the attenuating rate of light, dis is a function that returns3 . System components Grid Environment 𝐺 LightweightCoordinator 𝐶 Agent 𝑎 ! Agent 𝑎 " Agent 𝑎 C. Calculate the local light field

For each agent, the blue/red lightintensity needs be calculated for its current and surrounding 8 grids within the environment. 𝑟 $ 𝑔 = $ %∈’ ! 𝐿1 + 𝛽 𝑚𝑎𝑥 ( (|𝑔 ( − 𝑝 $ 𝑎 ( |)𝑏 $ 𝑔 = $ )*∈+ ! 𝐿1 + 𝛽 𝑚𝑎𝑥 ( (|𝑔 ( − 𝑔′ ( |)𝐿 = 1000,𝛽 = 1 D. Generate a priority queue 𝑄 !, i j × ××××× 𝑷 = 𝟏−𝝀𝑷 = 𝝀 𝒑 𝒕 𝒂 𝒋 LightweightCoordinator 𝐶

B. An iterative process for self-assembly

Foreach candidate position 𝑝′ in 𝑄 to be checked, C will try to lock the grid 𝑔 , and get a result of either success or fail. The success check means it is safe for 𝑎 to move to the candidate position. success successfail E. Choose a conflict-free action

1. Foreach 𝑎 ( ∈ 𝑂 $ , 𝑄 ( is sorted by 𝑏 $ of surrounding 9 grids in descending order.2. Foreach 𝑎 ∈ 𝐼 $ , 𝑄 is sorted jointly by 𝑏 $ , 𝑟 $ of parts of 9 grids which are in S.At each time t, 𝑄 is a queue of 9 grids’ positions surrounding the 𝑎 , including its current position 𝑝 $ (𝑎 ). Agent 𝑎 ! 𝑝 $ Check req (𝑝 * )𝑝 $ (𝑎 ) If ? |’ ! | ≥ 𝑀: Otherwise : 𝒑 𝒕 𝒂 𝒋 𝑝 $6! 𝑎 𝑝 $ Generate a priority queue 𝐹 Move to 𝑝 $6! 𝑎 t ←t+1𝑄 Calculate the local light field c : 𝑝 * = 𝑛𝑢𝑙𝑙 𝑜𝑟 ( 𝑝 * = 𝑝 $ 𝑎 𝑎𝑛𝑑 𝜆 * ≥ 𝜆)c ∶ 𝑝 * = 𝑝 $ 𝑎 𝑎𝑛𝑑 𝜆 * < 𝜆c : 𝑝 * ≠ 𝑛𝑢𝑙𝑙 𝑎𝑛𝑑 𝑝 * ≠ 𝑝 $ 𝑎 Choose a conflict-free action 𝒑 𝒕 (𝒂 𝒊 ) 𝑷 = 𝟏 )|𝑂 ( | 𝑁 = 511,𝑀 = 12 “Leave” signal𝑟𝑒𝑠 𝑝 * ← 𝑄 .𝑝𝑜𝑝(),𝜆 * ←rand(0,1)𝑝 * ? Request check p’𝑝 $6! 𝑎 ← 𝑝 $ (𝑎 )c c c Wait until receive resres? 𝑝 $6! 𝑎 ← 𝑝 * fail success Return 𝑝 $6! (𝑎 )𝑆 , 𝑟𝑒𝑠 ← 𝑡𝑟𝑦_𝑙𝑜𝑐𝑘(𝑝 * ,𝑎 ) Intend conflict? First-come, first-served. C responses to concurrent requests in random order. Un-target Agents Occupied Target Grids Unoccupied Target Grids

Figure 1: An ALF-based self-assembly system. A. The system’s main components; B. An iterativeprocess for self-assembly; C. Method to calculate the local light ﬁeld for each agent; D. Policy to generatethe priority queue of next positions for each agent; E. The decision process between an agent and thelightweight coordinator to choose a conﬂict-free next action.the distance between two grids, and f is a function that returns the intensity of light after the light hastraveled a certain distance from its source. Priority queue generation.

Given an agent a n at time t , its priority queue of next positions, denotedas Q n,t , is a permutation of a n ’s local 9 grids, generated based on its local F t . The strategy for generating Q n,t depends on a n ’s state. When a n is outside the target shape, the local 9 grids are sorted in descendingorder by blue light intensity; this strategy directs agents outside the target shape to move towards theshape. When a n is already inside the target shape, Q n,t will be constructed according to parameter ω ,the ratio of agents outside the target shape to all agents: when ω > ω ≤ ext position decision. After obtaining Q n,t , agent a n will cooperate with C to decide a conﬂict-freenext position from Q n,t , through an iterative decision process. In each iteration, agent a n ﬁrst retrievesthe head element of Q n,t , denoted as (cid:126) . If (cid:126) = p t ( a n ) , a n will immediately go to the next iterationwith the probability of γ ; otherwise, a n will send (cid:126) to C to check whether (cid:126) is conﬂict-free or not. If (cid:126) is conﬂict-free, a n will use (cid:126) as its next position and terminate the decision process; otherwise, a n will go to the next iteration. In the extreme case when Q n,t becomes empty and the decision processhas not terminated, a n will use p t ( a n ) as its next position and terminate the process. C uses a try lockmechanism to determine whether (cid:126) is conﬂict-free for a n or not; each grid in the environment is treatedas a mutex lock [27]. When receiving (cid:126) from a n , C will try to acquire the lock of (cid:126) for a n : if (cid:126) ’s lockis not held by any other agent, C will assign the lock to a n and return a success signal; otherwise, a fail signal will be returned. To evaluate the effectiveness of this approach, we conducted a set of experiments, involving 156 shapesfrom 16 categories (See supplementary for more details). Four methods are selected as baseline: (1) OPT-D [15], a centralized distance-optimal method for self-assembly; (2) HUN [16], an iterative self-assemblymethod based on global task allocation with Hungarian algorithm; (3) DUD [20], a self-assembly methodbased on artiﬁcial potential ﬁeld; (4) E-F [13], a gradient-based edge-following method for self-assembly.In particular, we focus on three measures of completion quality ρ , relative completion time t , and absolutecompletion time τ : ρ denotes the shape completion degree of the agent swarm when achieving a stablestate; t denotes the number of iterations to complete a target shape; and τ denotes the physical time tocomplete a target shape. The three measures are analyzed from three aspects: efﬁciency , scalability , and stability . Parts of the experiments are shown in Figure 2 and Movies S1-S3. Efﬁciency . To evaluate the efﬁciency, we compare our approach with the baseline methods under twodifferent policies of agents’ initial distribution: random and speciﬁc policies. Experiments with the ran-dom policy are carried out in three environments with scale of 16 ×

16, 40 ×

40, and 80 ×

80, respectively.In each environment, for each of the 156 shapes, we observe the three measures of ρ , t and τ of differentmethods when resolving the same self-assembly problem. Figure 3.A shows the experimental results of arandomly-selected shape ( locomotive ) in 80 ×

80 environment. Table S2 and S3 gives each method’s per-formance on 16 representative shapes (listed in Table S1) and 16 categories of shapes, respectively. It isobserved that: (1) for completion quality, our approach shows nearly the same performance ( . ) with5igure 2: Trails when forming different shapes in grid environments with different scales. A. Systemstates at four time steps ( t =

0, 3, 6, 8) when forming letter “F” in a 16 ×

16 grid environment with 52agents, using ALF; B. System states at four time steps ( t =

0, 10, 20, 26) when forming shape “dolphin”in a 40 ×

40 grid environment with 276 agents, using ALF; C. System states at four time steps ( t =

0, 30,60, 78) when forming shape “cat”, which has inner holes, in a 80 ×

80 grid environment with 1033 agents,using ALF; D. System states at t =

50 when forming shape “locomotive” in a 80 ×

80 grid environmentwith 1785 agents and with a ﬁxed initial state (i.e., state at t = ) and outperforms HUN ( × . ) and DUD ( × . ) on all 156 × × . ), and better than HUN ( × . )and DUD (which fails in all 50 repeated experiments) on all 156 × × . ) and HUN ( × . ). E-F method is not included in thecomparison, because of its speciﬁc requirement on agents initial distribution.Experiments with the speciﬁc policy are carried out in the 80 ×

80 environment for the 16 shapeslisted in Table S1, and evaluated by the same measures with random-policy experiments. Figure 3.Billustrates the experimental results for a shape ( locomotive ); see table S4 for complete results. It isobserved that: (1) for completion quality, our approach shows nearly the same performance ( . )with OPT-D ( ), and outperforms the HUN ( × . ), DUD ( × . ), and E-F ( × . ) on all66 shapes; (2) for relative completion time, our approach performs worse than OPT-D ( × . ), andbetter than HUN/DUD (which fails in all 20 repeated experiments) and E-F ( × . ); (3) for absolutecompletion time, our approach outperforms OPT-D ( × . ) and E-F ( × . ); (4) the E-F method,using the edge-following strategy, shows the longest/second-longest relative/absolute completion time(e.g., in the r-6-edge task with 1595 agents, this method takes 6156 iterations/703.9 seconds, while ourapproach only 75 iterations/16.7 seconds); Scalability . To evaluate the scalability, we compare our approach with OPT-D on both t and τ for 16shapes (listed in Table.S1) in 12 different shape scales. Figure 3.C and 3.D shows the experiment resultsof shape “irre-curve-1” on t and τ , respectively; see Figure S3-S4 for results of all the 16 shapes. It isobserved that: (1) for relative completion time t , both our approach and OPT-D shows a log ( n ) increasingas the shape scale n grows ( R = 0 . and . , respectively); (2) for absolute convergence time τ ,our approach shows a n log ( n ) increasing ( R = 0 . ), while OPT-D a n increasing ( R = 0 . ).In addition, our approach is easy to parallelize, e.g., with 16 threads, our approach achieves an average parallel speedup of 12.96, leading to a 92.28% decreasing of τ . Stability . To evaluate the stability, we analyze the standard deviations of ρ , t , and τ of our approachon 50 randomly-initialized experiments for each of the 156 shapes in each of the three environmentswith scale of 16 ×

16, 40 ×

40, and 80 ×

80, respectively. Table S2 and S3 gives the complete results. It isobserved that our approach shows a normalized σ ( ρ ) , σ ( t ) and σ ( τ ) of 0.00034, 0.04410, and 0.00033,respectively;In addition, we also observe that our approach shows a hole-independent property (i.e., the existenceof holes in a shape does not affect the performance of an approach), which is missing in many existingmethods [13, 14].In nature, self-assembly phenomena emerge from collective behaviors of swarms based on chemicalor physical signals, whereas in our approach, we designed a kind of digital signals, namely artiﬁciallight ﬁeld , to enable a massive swarm of agents to gain such ability in grid environments. Experimentshave demonstrated the superiority of our approach in constructing massive self-assembly systems: fora self-assembly task with n agents, the absolute completion time of our approach is decreased from themagnitude of n to n log ( n ) , and can be further decreased through parallelization. We hope our approachcould contribute to a deep understanding of self-assembly mechanisms and motivate new research onadvanced multi-agent algorithms, massive collaboration mechanisms, and artiﬁcial collective intelligencesystems. 7 D Absolute Completion Time (s)Absolute Completion Time (s)

Shape locomotive with random initializationShape locomotive with specific initialization Shape locomotive with specific initialization A t Fitting Results on Shape irre-curve-1

Fitting Results on Shape irre-curve-1 τ Shape locomotive with random initialization C Figure 3: The statistical results of different methods’ efﬁciency and scalability. A. In an 80 ×

80 en-vironment with random and speciﬁc initialization, respectively, the task progress of different methodswhen forming shape “locomotive” as the relative completion time increases; B. In an 80 ×

80 environ-ment with random and speciﬁc initialization, respectively, the task progress of different methods whenforming shape “locomotive” as the absolute completion time increases; C. The changing trend of relativecompletion time by ALF and OPT-D as the number of targets increases; D. The changing trend of abso-lute completion time by ALF and OPT-D as the number of targets increases; in addition to the 1-threadALF (ALF-1T, i.e., the ALF approach running in a 1-thread hardware environment), the 16-thread ALF(ALF-16T) is also investigated.

References [1] G. M. Whitesides, B. Grzybowski, Self-assembly at all scales.

Science , 2418 (2002).[2] B. A. Grzybowski, C. E. Wilmer, J. Kim, K. P. Browne, K. J. M. Bishop, Self-assembly: fromcrystals to cells.

Soft Matter , 1110 (2009).[3] L. A. Estroff, A. D. Hamilton, Water gelation by small organic molecules. Chemical Reviews ,1201 (2004). PMID: 15008620.[4] J. A. Marsh, S. A. Teichmann, Structure, dynamics, assembly, and evolution of protein complexes.

Annual Review of Biochemistry , 551 (2015). PMID: 25494300.[5] C. J. Weijer, Collective cell migration in development. Journal of Cell Science , 3215 (2009).86] E. M´ehes, T. Vicsek, Collective motion of cells: from experiments to models.

Integrative Biology , 831 (2014).[7] S. Camazine, et al. , Self-organization in biological systems , vol. 7 (Princeton university press,2003).[8] N. J. Mlot, C. A. Tovey, D. L. Hu, Fire ants self-assemble into waterproof rafts to survive ﬂoods.

Proceedings of the National Academy of Sciences , 7669 (2011).[9] A. Finn, K. Kabacinski, S. P. Drake, Design challenges for an autonomous cooperative of UAVs. (2007), pp. 160–169.[10] K. Z. Y. Ang, et al. , High-precision multi-UAV teaming for the ﬁrst outdoor night show in Singa-pore.

Unmanned Syst. , 39 (2018).[11] W. Viriyasitavat, O. K. Tonguz, Priority Management of Emergency Vehicles at IntersectionsUsing Self-Organized Trafﬁc Control. (2012), pp. 1–4.[12] D. Str¨ombom, A. Dussutour, Self-organized trafﬁc via priority rules in leaf-cutting ants. PLoSComputational Biology (2018).[13] M. Rubenstein, A. Cornejo, R. Nagpal, Programmable self-assembly in a thousand-robot swarm. Science , 795 (2014).[14] T. Tucci, B. Piranda, J. Bourgeois, A distributed self-assembly planning algorithm for modularrobots.

Proceedings of the 17th International Conference on Autonomous Agents and MultiAgentSystems , pp. 550–558.[15] J. Yu, S. M. LaValle, Shortest path set induced vertex ordering and its application to distributed dis-tance optimal formation path planning and control on graphs. (2013), pp. 2775–2780.[16] J. Alonso-Mora, A. Breitenmoser, M. Ruﬂi, R. Siegwart, P. Beardsley, Multi-robot system forartistic pattern formation. (IEEE,2011), pp. 4512–4517. 917] L. Sabattini, C. Secchi, C. Fantuzzi, Potential based control strategy for arbitrary shape formationsof mobile robots. (IEEE, 2009), pp. 3762–3767.[18] H.-T. Chiang, N. Malone, K. Lesser, M. Oishi, L. Tapia, Path-guided artiﬁcial potential ﬁeldswith stochastic reachable sets for motion planning in highly dynamic environments. (IEEE, 2015), pp. 2347–2354.[19] E. Falomir, S. Chaumette, G. Guerrini, A Mobility model based on improved artiﬁcial potentialﬁelds for swarms of UAVs. (IEEE, 2018), pp. 8499–8504.[20] Q. Bi, Y. Huang, A self-organized shape formation method for swarm controlling. (IEEE, 2018), pp. 7205–7209.[21] J. Wolf, P. Robinson, J. Davies, Vector ﬁeld path planning and control of an autonomous robotin a dynamic environment. the Proceedings of the 2004 FIRA Robot World Congress (Paper 151) (2004).[22] R. Gayle, W. Moss, M. C. Lin, D. Manocha, Multi-robot coordination using generalized socialpotential ﬁelds. (IEEE, 2009),pp. 106–113.[23] L. Sabattini, C. Secchi, C. Fantuzzi, Arbitrarily shaped formations of mobile robots: artiﬁcialpotential ﬁelds and coordinate transformation.

Autonomous Robots , 385 (2011).[24] G. Theraulaz, E. Bonabeau, A brief history of stigmergy. Artiﬁcial Life , 97 (1999).[25] W. Zhang, H. Mei, A constructive model for collective intelligence. National Science Review , 7(2020).[26] G. J´ekely, Philosophical Transactions of the Royal Society B: Biological Sciences , 2795(2009).[27] L. Dalessandro, D. Dice, M. Scott, N. Shavit, M. Spear, Transactional mutex locks.

Euro-Par 2010- Parallel Processing , P. D’Ambra, M. Guarracino, D. Talia, eds. (Springer Berlin Heidelberg,Berlin, Heidelberg, 2010), pp. 2–13. 1028] H. Cheng, Q. Zhu, Z. Liu, T. Xu, L. Lin, Decentralized navigation of multiple agents based onorca and model predictive control. (IEEE, 2017), pp. 3446-3451.[29] S. Chopra, G. Notarstefano, M. Rice, M. Egerstedt, A distributed version of the Hungarian methodfor multirobot assignment.

IEEE Transactions on Robotics , , 932 (2017).[30] H. Wang, M. Rubenstein, Shape formation on homogeneous swarms using local task swapping. IEEE Transactions on Robotics , , 597 (2020). Acknowledgement

Supported by the National Natural Science Foundation of China under grant numbers 61690200 and61751210. 11 upplementary materials

The PDF ﬁle includes:

Materials and MethodsFigs. S1 to S8Tables S1 to S4Captions for Movies S1 to S3Captions and Links for Dataset S1Captions and Links for Website S1

Other Supplementary Materials for this manuscript include the following:

Movies S1 to S3Dataset S1Website S1

Movie S1:

The self-assembly of shapes from 16 categories with thousands of agents in 80 ×

80 environ-ment. One representative is selected for each shape category.

Movie S2:

The comparison of self-assembly processes of a randomly selected shape using differentmethods in 16 ×

16 environment with random initialization, 80 ×

80 environment with random initializa-tion, and 80 ×

80 environment with speciﬁc initialization, respectively.

Movie S3:

The self-assembly of shapes with different scales. Each of the 4 representative shapes areformed in 6 environment scales with agents ranging from minimum 40 to maximum 5469.

Dataset S1:

The self-assembly shape set consisting of 156 shapes, each of which is represented by ablack-white image of size 512 × Website S1:

The website for demonstrating self-assembly processes of the proposed approach. Link:http://self-assembly.qunzhi.fun. 12 aterials and Methods

In the main text, we have demonstrated the performance of our approach for self-assembly with large-scale swarms. Here we provide more details about the problem formulation, the proposed algorithm, andthe experiments.Section 1 gives a formulation of the self-assembly problem. Section 2 presents in detail the proposedALF-based self-assembly algorithm, including a formal deﬁnition of the ALF. Section 3 introduces moredetails of the experiments from 8 aspects: evaluation measures, baseline methods, the shape set usedin experiments, experiment designs, parameter settings, experiment platforms, experimental results andanalysis, discussion about weaknesses of baseline methods, and analysis of the inﬂuences of differentparameter values on the performance of our approach.

The self-assembly problem focused in this paper involves three components: a grid environment G , atarget shape S , and a group of agents A . The grid environment G is deﬁned as a matrix { ( i, j ) | i ∈ [1 , H ] , j ∈ [1 , W ] } , where H / W represents the height/width of the environment, and ( i, j ) denotes thegird at row i and column j . The target shape S is deﬁned as a subset of G that forms a connected graphthrough the neighbor relation between grids, and let | S | = N . Grids in S are called target grids, andother grids un-target grids. The group of agents A is deﬁned as a set { a n | n ∈ [1 , N ] } , where a n denotesthe agent with identity n . At any time, each agent occupies a distinct grid in G .Agents interact with the environment in a sequence of discrete times: , , ..., t, ..., T . At each time t , each agent decides to stay at the current grid or move to one of its eight neighbor grids. When anagent decides to move and no conﬂict occurs, then in the next time t + 1 the agent will appear at the newposition; otherwise, the agent’s position will not be changed. The state of the system at time t , denotedas p t , is an injective function from A to G , mapping each agent to its occupied grid. At any time t , thegroup of agents A is partitioned into two subsets: I t = { a n | a n ∈ A, p t ( a n ) ∈ S } , and O t = A − I t . Thatis, I t consists of agents in shape S , and O t agents out of S . Accordingly, the target shape S is partitionedinto two subsets: C t = { p t ( a n ) | a n ∈ I t } , and U t = S − C t . Grids in C t are called occupied target grids,and grids in U t unoccupied . For simplicity, a 2D grid environment is given here; however, the method proposed in this paper can naturally apply to3D grid environments. ! (𝑎 " ) ← 𝑝𝑜𝑠 " ( 𝑎 " , 𝑝𝑜𝑠 " ) Agent 𝑎 ! Lightweight Coordinator 𝐶 Send 𝑎 " ’s init position 𝑝𝑜𝑠 " falsetrue 𝑝 ! ← ∅𝑡 ← 0 I n i t i a li z a t i o n Broadcast 𝑝𝑢𝑙𝑠𝑒𝑝𝑢𝑙𝑠𝑒 dom( 𝑝 ! ) = 𝐴 Receive 𝑝𝑢𝑙𝑠𝑒

Receive

𝑃𝑜𝑠𝑠

Broadcast img( 𝑝 ) 𝑃𝑜𝑠𝑠 𝑟𝑒𝑠 ←𝒕𝒓𝒚_𝒍𝒐𝒄𝒌(𝑝𝑜𝑠 "$ , 𝑎 " )𝑟𝑒𝑠 = 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 C o n f li c t R e s o l u t i o n Send 𝑙𝑒𝑎𝑣𝑒 signal false 𝑝 (𝑎 " ) ← 𝑛𝑢𝑙𝑙 ( 𝑎 " , 𝑙𝑒𝑎𝑣𝑒 ) 𝑝𝑜𝑠 " ← 𝑝𝑜𝑠 "$ Send 𝑎 " ’s new position 𝑝𝑜𝑠 " 𝑝 (𝑎 " ) ← 𝑝𝑜𝑠 " ( 𝑎 " , 𝑚𝑜𝑣𝑒, 𝑝𝑜𝑠 " ) true M o v e m e n t dom( 𝑝 ) = 𝐴𝑝 ← ∅ false 𝑒 ←𝑆 ∖ img( 𝑝 ) = ∅ Broadcast 𝑒𝑒 truetrue Receive 𝑒 𝑒 false 𝑒 truefalse 𝑡 ← 𝑡 + 1 Iteration 𝑸 𝒏,𝒕 ← 𝑸(𝒑𝒐𝒔 𝒏 ,𝑷𝒐𝒔𝒔,𝑺) Interaction between an agent and the lightweight coordinator 𝒕𝒓𝒚_𝒍𝒐𝒄𝒌(𝐺 ’ $ () % ) )𝒖𝒏𝒍𝒐𝒄𝒌(𝐺 ’ & () % ) )𝒑𝒐𝒔 𝒏$ ← 𝒉(𝑸 𝒏,𝒕 ) ( 𝑎 " , 𝑙𝑜𝑐𝑘, 𝑝𝑜𝑠 "$ ) Send 𝑟𝑒𝑠𝑟𝑒𝑠𝑝𝑜𝑠 " = 𝑝𝑜𝑠 "$ truefalse Send 𝑝𝑜𝑠 "$ Figure S1: The interaction protocol between each agent and the lightweight coordinator.14he goal of resolving this problem is to ﬁnd a way from an initial state to a target state as quicklyas possible, following the interaction rule described above. In an initial state at time 0, all agents arerandomly distributed in the environment. A target state is a state in which every target grid is occupiedby an agent, i.e., img ( p t ) = S . As mentioned in the main text, we design an artiﬁcial self-assembly system, consisting of ﬁve compo-nents: G , S , A , an artiﬁcial light ﬁeld F superimposed on G , and a lightweight coordinator C . Whenresolving a self-assembly task, each agent interacts with C through an iterative process (Figure S1),which consists of an initialization stage and a sequence of iteration stages corresponding to the sequenceof system times. In the initialization stage,1) each agent reports its initial position to C ; (Algorithm 2 line 1-2)2) C gets the system state at time 0, denoted as p . (Algorithm 3 line 1-2)In each iteration at time t ,3) C broadcasts img ( p t ) , i.e., all the positions occupied by agents, to each agent in A ; (line 5 inAlgorithm 3 and Algorithm 2)4) each agent calculates a priority queue of next positions (encapsulated in the Q function); (line 6 inAlgorithm 2)5) each agent sequentially retrieves elements from its queue and request C to lock correspondingposition for it, until ﬁnding a conﬂict-free next position (encapsulated in the h function); (line 7-12in Algorithm 2 and line 7-9 in Algorithm 3)6) the agent sends a leave signal before moving, updates its position, and then reports its new positionto C , so that C can update the system state accordingly; (line 13-15 in Algorithm 2 and line 10-14in Algorithm 3)7) as the last step in each iteration, C checks whether a target state is achieved, and triggers a newiteration at time t + 1 if not or broadcasts an exit signal if true. (line 15-16 in Algorithm 3 and line16, 4 in Algorithm 2) 15 lgorithm 1: System initialization

Input: G : a grid environment, S : a target shape, A : a group of agents, p : the system’s initialstate, C : a coordinator, γ : exploration rate, ﬂag : whether agents can leave shape afterentering it, W : policy transformation parameter for agents in shape; Thread( C ).start( G , S , A ); for each a n ∈ A do Thread( a n ).start( G , S , C , p ( a n ) , γ , ﬂag , W ); Algorithm 2:

Behavior of an agent a n Input: G : a grid environment, S : a target shape, C : a coordinator, pos : a n ’s initial position, γ :exploration rate, ﬂag : whether agents can leave shape after entering it, W : policytransformation parameter for agents in shape; sendInitPos( C , pos ); let t ← , pulse ; while true do pulse ← recvPulse( C ); if pulse = STOP then break; let Poss ← recvPoss( C ); let Q ← calcPrefPosQueue( pos , S , Poss , ﬂag , W ); let res ← FAIL , prefPos ; while Q. empty() = false and res = FAIL do prefPos ← Q .pop(); if prefPos = pos then rnd(0,1) < γ ? continue : break; sendPrefPosReq( C , prefPos ); res ← recvPrefPosRes( C ); if res = SUCC then sendLeaveSig( C ); pos ← prefPos ; sendNewPos( C , pos ); t ← t + 1 ; To support each agent calculating its priority queue of next positions, an artiﬁcial light ﬁeld (ALF) issuperimposed on the grid environment and updated dynamically according to the current system state.The ALF at time t , denoted as F t , is deﬁned as a pair of functions ( r t , b t ) , where the former maps eachgrid to the intensity of red light at the grid, and the latter to the intensity of blue light at each grid.The intensity of red/blue light at a grid is the sum of all red/blue light sources’ intensities at the grid.At any time t, each agent in O t is a source of red light, and each grid in U t is a source of blue light. Theintensity of light from a source attenuates linearly with propagation distance. As a result, r t and b t can16 lgorithm 3: Behavior of the coordinator

Input: G : a grid environment, S : a target shape, A : a group of agents; let p ← recvPoss( A ); lockAll( p ); let t ← pulse ← W ORK ; while true do broadcastPulse( A , pulse ); if pulse = STOP then break; broadcastPoss( A , img( p t )); while true do let ( a n , msg ) ← recvMsg( A ); if msg.type = PREF POS REQ then sendPrefPosRes( a n , tryLock( a n , msg.value )); else if msg.type = LEAVE SIG then unlock( p t ( a n )); else if msg.type = NEW POS then p t +1 ( a n ) ← msg.value ; if all a n ∈ A ﬁnished actions at t then break; pulse ← ( S \ img ( p t +1 ) = ∅ ) ? STOP : WORK ; t ← t + 1 ;be deﬁned conceptually as follows: r t ( g ) = (cid:88) a ∈ O t f ( L, α, dis ( g, p t ( a ))) , g ∈ Gb t ( g ) = (cid:88) g (cid:48)∈ U t f ( L, α, dis ( g, g (cid:48) )) , g ∈ G (1)where L is the intensity of light emitted by a light source, α is the attenuating rate of light, dis is afunction that returns the distance between two grids, and f is a function that returns the intensity of lightafter the light has traveled a certain distance from its source.At each time step, each agent will calculate its local light ﬁeld based on the above equations, aspresented in Algorithm 4. Q Function for Generating Priority Queues

The Q function (deﬁned in Algorithm 5) encapsulates an agent’s behavior strategy by returning theagent’s priority queue of next positions based on the agent’s local light ﬁeld. Each element in the priorityqueue is either one of the agent’s eight neighbor positions or the agent’s current position, and no duplicateelements exists in the priority queue.Two behavior strategies are designed for two kinds of agent state, respectively. When an agent a n ∈ O t , its priority queue of next positions is constructed by sorting all candidate positions in descending17 lgorithm 4: getNeiLightField Input: pos : the current position of a n , S : a target shape, Poss : the set of all agents’ positions

Output: F : the blue and red light intensities at surrounding 8 grids and the current position; let F = dict(), L , β ; let U t = S \ P oss , O t = P oss \ S ; for each p in surrounding and current positions do b p = (cid:80) g ∈ U t L/ (1 + β max i ( | p [ i ] − g [ i ] | )) ; if pos ∈ S then r p = (cid:80) g ∈ O t L/ (1 + β max i ( | p [ i ] − g [ i ] | )) ; F [ p ] = ( b p , r p ); else F [ p ] = b p ; return F ;order of their intensities of blue light. The strategy drives an agent to move eagerly towards unoccupiedtarget grids, as long as no conﬂicts occurs.When an agent a n ∈ I t , its priority queue of next position will be constructed by two constructionprinciples according to the task progress. The ﬁrst principle obtains a priority queue by sorting all can-didate positions in descending order of blue light intensity primarily, and in ascending order of red lightintensity secondarily (line 4-5, 10-13 in Algorithm 5). This principle motivates an agent to keep movingtowards the vacant position in the center of the shape, accelerating convergence at the beginning of thetask. The second principle obtains a priority queue by sorting all candidate positions in ascending orderof red light intensity (line 6-7, 10-13 in Algorithm 5). This strategy motivates an agent to leave the pe-ripheral position open until convergence. For each agent in I t , it uses the completion rate W ∈ [0 , todecide which principle should be taken, where W is deﬁned as the proportion of occupied target grids toall target grids. When W is less than a threshold W , the agent will adopt the ﬁrst construction principle;otherwise, the second principle will be adopted. In our experiments, we set the threshold W = 15% .In Figure 1.D, for instance, a i is out of the shape and on the edge of the environment, so a i onlyhas four candidate positions for the next step (including the current position of a i , denoted as p t ( a i ) ).In this case, since a i ∈ O t , a priority queue is generated according to the ﬁrst behavior strategy. Theposition below p t ( a i ) has the highest blue light intensity, so it priors to all the other candidate positions.Speciﬁcally, although p t ( a i ) has the same blue light intensity with the position at the right side of p t ( a i ) ,we prior the position at the right side in the queue, because we always prefer agents to move. For another18 lgorithm 5: calcPrefPosQueue Input: pos : the current position of a n , S : a target shape, P oss : the set of all agents’ positions, f lag :whether agents can leave shape after entering it, W : policy transformation parameter for agentsin shape; Output: Q : priority queue of next positions; let W = | P oss \ S | / | S | , Q ; let F = getNeiLightField( pos , S , P oss ); if pos ∈ S then if W > W then Q .comp = bool func( p , p ) { return F [ p ] .b < F [ p ] .b or F [ p ] .b == F [ p ] .b and F [ p ] .r > F [ p ] .r ; } ; else Q .comp = bool func( p , p ) { return F [ p ] .r > F [ p ] .r ; } ; else Q .comp = bool func( p , p ) { return F [ p ] .b < F [ p ] .b ; } ; Q .push( pos ); for each p in surrounding 8 positions do if ( pos / ∈ S ) or ( pos ∈ S and ( p ∈ S or ( p / ∈ S and not f lag ))) then Q .push(p); return Q ;instance, a j is inside the shape and W = 6 / < W , so its priority queue is generated by the ﬁrstconstruction principle of the second strategy. Regrading both blue and red light, the current position isthe best choice. h and try lock Functions for Conﬂict Avoidance

The h and try lock functions encapsulate a decentralized strategy to mediate between different agents’behavior by selecting an element from each agent’s priority queue as the agent’s next position, so as toavoid two kinds of conﬂict: (1) an agent moves to a next position that has been occupied by anotheragent; (2) two agents move to the same unoccupied next position.For each agent, it uses h function to repeatedly retrieves elements from Q n,t , and sends request to C for locking the corresponding position pos for the agent until receiving a success response. For thelightweight coordinator C , a lock mechanism is designed to achieve the goal of conﬂict avoidance. Eachgrid in G is treated as an exclusive resource, and its accessibility is managed by a mutex lock. In theinitialization stage of our approach, when receiving a position p ( a n ) , C locks the grid at p ( a n ) for a n .19n each iteration at time t , when receiving a leave signal from agent a n , C unlocks the grid at p t ( a n ) ;when receiving a request from a n for locking next position pos , C tries to lock the position for a n andthen returns a success/fail response res to a n .Specially, when the retrieved element from Q n,t is the position of p t ( a n ) , since this position has beenlocked by the agent, the h function has a − γ chance to directly return p t ( a n ) as a n ’s next position, anda γ chance to ignore p t ( a n ) and continue retrieving the remaining elements after p t ( a n ) (This stochasticstrategy helps each agent to escape the local extremum). If all elements in Q n,t (except for p t ( a n )) areinaccessible, the h function will simply return p t ( a n ) .Consequently, after obtaining p t +1 ( a n ) , agent a n will move to p t +1 ( a n ) and send a leave signal to C , causing C to release the lock of p t ( a n ) . The try lock mechanism is adopted in implementation toavoid dead lock caused by the simple lock until acquire mechanism: when the try lock is applied on aninaccessible grid, the locking process will immediately return a fail result, so that the remaining positionsin the priority queue can be checked timely. In the main text, we introduce three measures ( completion quality ρ , relative completion time t , and ab-solute completion time τ ) to evaluate the performance of a self-assembly algorithm. The three measuresin forming a shape are estimated through a set of repeated experiments, using the following equations:Given a target shape S , a self-assembly algorithm A , and N repeated experiments for A to form S ,1. the completion quality is estimated by ˆ ρ ( S, A , N ) = N (cid:80) Ne =1 | C S, A ,e || S | , where C S, A ,e denotes theset of occupied target grids when the e ’th experiment terminates (the experiment terminates wheneither S is formed, or the number of iterations exceeds a pre-deﬁned threshold K );2. the relative completion time is estimated by ˆ t ( S, A , N ) = (cid:80) Ne =1 ( e ) (cid:80) Ne =1 ( e ) T S, A ,e , where T S, A ,e denotes the number of iterations to form shape S by algorithm A in the e ’th experiment; ( e ) = 1 if S is formed when the e ’th experiment terminates, and 0 otherwise;3. the absolute completion time is estimated by ˆ τ ( S, A , N ) = (cid:80) Ne =1 ( e ) (cid:80) Ne =1 ( e )Γ S, A ,e , where Γ S, A ,e denotes the physical time to form shape S by algorithm A in the e ’th experiment.20n addition, the three measures in forming a group of shapes with different scales are estimatedthrough multiple sets of repeated experiments, using the following equations:Given a set of target shape S = { S , ..., S M } , a self-assembly algorithm A , and N repeated experi-ments for A to form S m ( m = 1 , , .., M ) ,1. the completion quality is estimated by ˆ ρ ( S , A , N ) = M (cid:80) Mm =1 ˆ ρ ( S m , A , N ) ;2. the relative completion time is estimated by ˆ t ( S , A , N ) = M (cid:80) Mm =1 | max ( S ) || S m | ˆ t ( S m , A , N ) , where max ( S ) denotes the shape whose number of target grids is maximum in S ;3. the relative completion time is estimated by ˆ τ ( S , A , N ) = M (cid:80) Mm =1 | max ( S ) || S m | ˆ τ ( S m , A , N ) ; To verify the advantage of our approach, we compare our approach with four state-of-the-art methods:

Centralized Distance-Optimal Method (OPT-D) [15]: this method uses a three-step process toresolve self-assembly problems: 1) calculates the shortest path between any pair of agent and targetgrid; 2) calculates a distance-optimal agent-grid assignment by Hungarian algorithm; 3) orders vertexesalong paths and resolves conﬂicts by swapping the assigned girds of two conﬂicting agents. A signiﬁcantproperty of OPT-D is that the maximum iteration to form a shape can be theoretically guaranteed to be | A | + d max − , where | A | is the number of agents, and d max is the maximal minimal distance betweenagents and girds in a distance-optimal agent-grid assignment. One of the drawbacks of OPT-D is that thecomputational cost of global agent-gird assignment and vertex ordering will increase in n -form as theshape scale n increases, causing poor scalability. Hungarian-Based Path Replanning (HUN) [16]: this method uses an iterative process to resolveself-assembly problems, a process consisting of two alternating steps: 1) uses Hungarian algorithm tocalculate a distance-optimal agent-grid assignment; 2) performs optimal reciprocal collision avoidance(ORCA) [28] to avoid local conﬂicts, until no agent can move without conﬂicts. These two steps arerepeated until the shape is formed. In the case of sparse target grid distribution, this method can obtainnear-optimal travel distances, whereas in the case of a dense target grid distribution, more iterations willbe used for replanning. The cost of iterative replanning of targets and paths is high, causing both poorefﬁciency and scalability.

Dynamic Uniform Distribution (DUD) [20]: This method is based on the concept of artiﬁcial po-tential ﬁeld (APF) [18], which generally consists of an attraction ﬁeld and a repulse ﬁeld. In DUD, the21istance-based gradients are used to generate the attraction ﬁeld, and for any agent, the repulse ﬁeld areonly triggered when some other agent moves near the agent (i.e., the distance between the two agents isless than a pre-deﬁned distance). The combined forces of attraction and repulsion directs agents to moveuntil convergence. DUD has the advantage of high scalability, but suffers from the problem of localminima, causing poor stability. In addition, in our experiments, in order to apply the control strategy ofDUD (which is originally designed for continuous environments) to discrete grid environments, two ormore agents are allowed to occupy a same grid at the same time.

Gradient-Based Edge-Following (E-F) [13]: This method models each agent’s motion in self-assembly as an iterative edge-following process based on the gradient information towards seed grids.In initialization, four pre-localized seed grids of the target shape are settled and all agents are connectedto the seed grids (directly or indirectly through other agents); in each iteration, two steps are carried outfor each agent: 1. the agent calculates its relative gradients towards seed grids; 2. if the agent ﬁnds itis at the outer edge (i.e., has a highest gradient value among all its neighbors), it will move along theedge clockwise (namely, edge-following ); otherwise, it will keep stationary. Each agent will continueits edge-following behavior until one of the two stop conditions is satisﬁed: 1. the agent has enteredthe target shape but is about to move out of the shape; 2. the agent is next to a stopped agent with thesame or greater gradient. E-F has good scalability, but its efﬁciency and parallelism are limited due tothe edge-following strategy.

To evaluate the performance of different methods on the self-assembly problem, we build a shape set with sufﬁcient diversity. In particular, different sets are selected based on a shape classiﬁcation as shownin the top of Figure S2:1. The shape set (containing 156 connected shapes) is divided into two subsets: shapes without holes ,and shapes with holes , according to whether the shape has holes or not;2. The set of shapes without holes (containing 87 shapes) is divided into two subsets: convex shapes (containing 20 shapes), and concave shapes (containing 67 shapes). The set of shapes with holes (containing 69 shapes) is divided into two subsets: one-hole shapes (containing 33 shapes) and multi-hole shapes (containing 36 shapes), according to the number of holes; The shape set can be found at https://github.com/Catherine-Chu/Self-Assembly-Shape-Set.

22. For each of the two sets of convex shapes and concave shapes , it is divided into three subsets:shapes enclosed by line segments (convex/concave line-enclosed , containing 11/28 shapes), shapesenclosed by curves (convex/concave curve-enclosed , containing 4/15 shapes), and shapes enclosedby both line segments & curves (convex/concave line & curve-enclosed , containing 5/24 shapes),according to the smoothness of shape edge. The set of one-hole shapes is divided into two sets: convex hole (containing 21 shapes) and concave hole (containing 12 shapes). The set of multi-hole shapes is are divided into three subsets: convex holes (containing 13 shapes), concave holes (containing 10 shapes), and convex & concave holes (containing13 shapes), by composing multipleholes’ convexity and concavity;4. For each of the ﬁve sets of convex hole , concave hole , convex holes , concave holes , and convex & concaveholes , it is divided into two subsets: convex contour (containing 7/6/6/5/5 shapes), and concavecontour (containing 14/6/7/5/8 shapes), by the convexity and concavity of a shape’s contour.In the shape set, each shape is represented as a 512 ×

512 black-white ﬁgure. A black pixel in a ﬁgurecorresponds to a target grid, and a white pixel an un-target grid. In experiments, we zoom the ﬁgure intodifferent scales on demand, from 15 ×

15 to 180 × To evaluate the efﬁciency of our approach, we compare it with the four baseline methods under two dif-ferent policies of agents’ initial distribution: random and speciﬁc policies. Experiments with the random policy are carried out in three environments with scale of 16 ×

16, 40 ×

40, and 80 ×

80, respectively. Ineach environment, for each of the 156 shapes, we conduct 50, 50, 20, and 10 experiments for ALF, DUD,HUN, and OPT-D, respectively, and calculate three factors of completion quality ρ , relative completiontime t, and absolute completion time τ of different methods. Experiments with the speciﬁc policy are car-ried out in the 80 ×

80 environment for the 16 representative shapes listed in Table S1. For each of the 16shapes, we conduct 50, 50, 20, 10 and 5 experiments for ALF, DUD, HUN, OPT-D and E-F, respectively,and the same measures with random-policy experiments are calculated.To evaluate the scalability of our approach, we compare it with OPT-D on both relative completiontime t and absolute completion time τ for the 16 shapes (listed in Table.S1) in 12 different environmentscales (the number of target grids varies from 40 to 5469, and the scale of environments varies from15 ×

15 to 180 × oncave Contour (cid:1153) (cid:1154) Convex Shapes (cid:1153) (cid:1154) Line-Enclosed (cid:1153) (cid:1154) One-Hole Shapes (cid:1153) (cid:1154) Curve-Enclosed (cid:1153) (cid:1154) Line&Curve-Enclosed (cid:1153) (cid:1154) Line-Enclosed (cid:1153) (cid:1154) Curve-Enclosed (cid:1153) (cid:1154) Line&Curve-Enclosed (cid:1153) (cid:1154) Convex Contour (cid:1153) (cid:1154) Concave Contour (cid:1153) (cid:1154) Convex Contour (cid:1153) (cid:1154) Concave Contour (cid:1153) (cid:1154) Convex Contour (cid:1153) (cid:1154) Concave Contour (cid:1153) (cid:1154) Convex Contour (cid:1153) (cid:1154) Convex Contour (cid:1153)(cid:24)(cid:1154)

Concave Contour (cid:1153)(cid:27)(cid:1154)

Concave Shapes (cid:1153) (cid:1154) Convex Hole (cid:1153) (cid:1154) Concave Hole (cid:1153) (cid:1154) Convex Holes (cid:1153) (cid:1154) Concave Holes (cid:1153) (cid:1154) Convex&Concave Holes (cid:1153) (cid:1154) Multi-Hole Shapes (cid:1153) (cid:1154) Shapes without Holes (cid:1153) (cid:1154) Shapes with Holes (cid:1153) (cid:1154) Shape Set (cid:1153) (cid:1154)

Figure S2. A shape classiﬁcation and a shape set with 156 shapes.single-thread setting and a 16-thread setting. OPT-D is executed only on a single-thread setting, becauseit is not easy to transform OPT-D into a corresponding multi-thread version. We calculate t and τ in eachexperiment, and analyze the changing trend of t and τ as the number of target grids increases using twokinds of ﬁtting: a linear ﬁtting and a log ﬁtting.To evaluate the stability of our approach, we calculate the standard deviations of ρ , t , and τ of ourapproach on 50 randomly-initialized experiments for each of the 156 shapes in each of the three envi-ronments with scale of 16 ×

16, 40 ×

40, and 80 ×

80, respectively. In addition, for each shape in eachenvironment, we also compare the relative completion time t of our approach with that of OPT-D.24 hapeID (m) Shapename CategoryID (k) Category Name < Out > Concave < In > Convex8 cloud lightning 8 Hole: < Out > Concave < In > Concave9 end oval 9 Hole: < Out > Convex < In > Convex10 gong-bank 10 Hole: < Out > Convex < In > Concave11 scissor 11 Multi-holes: < Out > Concave < In > Convex12 aircraft 12 Multi-holes: < Out > Concave < In > Concave13 locomotive 13 Multi-holes: < Out > Concave < In > Convex&Concave14 maplog 14 Multi-holes: < Out > Convex < In > Concave15 3-holes 15 Multi-holes: < Out > Convex < In > Convex16 train-roadsign 16 Multi-holes: < Out > Convex < In > Concave&Convex

Table S1: The representative shapes from 16 categories in the shape set. f ( L, β, g, g (cid:48) ) =  L − β (cid:80) D − i =0 | g i − g (cid:48) i | type L − β (cid:113)(cid:80) D − i =0 ( g i − g (cid:48) i ) type L − β max i ∈ [0 ,D ) | g i − g (cid:48) i | type L β (cid:80) D − i =0 | g i − g (cid:48) i | type L β √ (cid:80) D − i =0 ( g i − g (cid:48) i ) type L β max i ∈ [0 ,D ) | g i − g (cid:48) i | type L β ( (cid:80) D − i =0 | g i − g (cid:48) i | ) type L β (cid:80) i =0 D − g i − g (cid:48) i ) type L β (max i ∈ [0 ,D ) | g i − g (cid:48) i | ) type (2)In all experiments in the main text, we set parameters L = 1000 , β = 1 , W = 0 . , f lag = T rue , γ = 0 . , where L is the light intensity released by light sources, β is the light discount coefﬁcient, W is the threshold used to control the time of policy changing, γ is the agents’ exploration rate (when γ = 0 , it means the agent will never move to a position that is worse than the current one even thereis no other better position to move), and ﬂag denotes whether agents are allowed to move out of theshape after entering. For the light discount function f (which describes the decay of light intensitywith propagation distance), 9 different implementations are considered (see equation (2)), and the type f are generated by different25ombinations of two dimensions: the light-intensity distance-discount function, and the two-grid-distancemeasurement function. Three different values are considered in the former dimension: linear-discount( type type type type

1, 4, 7), the European distance ( type

2, 5, 8), and the Chebyshev distance ( type

3, 6, 9).The selection of these parameters is based on the observations of their effects on our algorithm’sperformance through a set of experiments, in which 16 shapes listed in Table S1 are formed 50 times in80 ×

80 environment with random initialization using different parameter-value combinations. In partic-ular, we test 12 × × × W , γ , ﬂag , and f (12 values of W , 6 values of γ , 2values of ﬂag , and 9 values of f ) for each of the 16 shapes. The combination of ( W = 0 . , ﬂag = True , γ = 0 . , and f = type -6) achieves the minimum average relative completion time in the experiments,and we choose it as the default parameter values for our algorithm. The effects of each parameter onperformance will be further discussed in Section 3.8. All experiments are carried out on an HPC platform provided by Peking University, which can be ac-cessed at http://hpc.pku.edu.cn/stat/wmyh. In particular, each experiment is carried out on a 16-coresHPC node (Intel Xeon E5-2697A V4) with 512G memory.

The complete experimental results for evaluating efﬁciency are presented in Table S2-S4:1. Table S2 shows the shape-speciﬁc performance on ˆ ρ , ˆ t and ˆ τ of OPT-D, HUN, DUD and ALF (ourapproach) under random initialization policy for each of the 16 shapes (listed in Table S1) in 3different environment scales.2. Table S3 shows the category-speciﬁc performance on ˆ ζ , ˆ ρ , ˆ t and ˆ τ of OPT-D, HUN, DUD andALF under random initialization policy for each of the 16 categories (listed in Table S1), where ˆ ζ denotes the success rate of all experiments in a category.3. Table S4 shows the shape-speciﬁc performance on ˆ ρ , ˆ t and ˆ τ of OPT-D, HUN, DUD, E-F andALF under speciﬁc initialization policy for each of the 16 shapes (listed in Table S1) in 80 × ˆ ρ , OPT-D, HUN, DUD, and ALF (our approach) achieve 1.000, 0.913,0.887, and 0.999 on average in all 156 × ˆ ζ are 100%, 26.90%, 0%, and 97.12%, respectively.2. For relative completion time ˆ t , OPT-D, HUN, and ALF achieve 211.28, 8653.21, and 640.64 iter-ations on average in all 156 × ×

16 environments.3. For absolute completion time ˆ τ , OPT-D, HUN, and ALF achieve 1706.46, 70.66, and 12.24 secondson average in all 156 × ˆ ρ , OPT-D, HUN, DUD, E-F, and ALF achieve 1.000, 0.907, 0.935, 0.739,and 0.999 on average in experiments for 16 shapes, respectively, and the corresponding experimentsuccess rate ˆ ζ are 100%, 0%, 0%, 43.8% and 87.5%, respectively.2. For relative completion time ˆ t , OPT-D, E-F, and ALF achieve 124.57, 7873.27, and 137.74 itera-tions on 16 shapes, respectively; HUN/DUD fails in all 20/50 experiments for each shape.3. For absolute completion time ˆ τ , OPT-D, E-F, and our approach achieve 1654.53, 732.71, and 29.56seconds on 16 shapes, respectively.In summary, in terms of efﬁciency, our approach outperforms HUN, DUD, and E-F on all threemeasures; although performing worse than OPT-D on relative completion time, our approach achievescomparable completion quality and superior absolute completion time.27 D(m) W/H | S m | OPT-D HUN DUD ALF ˆ ρ ˆ t ˆ τ ˆ ρ ˆ t ˆ τ ˆ ρ ˆ t ˆ τ ˆ ρ ˆ t ˆ τ σ ( ρ ) σ ( t ) σ ( τ ) Table S2: The shape-speciﬁc self-assembly performance of different methods with random initialization policyin three environment scales.28 D ( k ) | S k | O P T - D H UNDUDA L F ˆ ζ ˆ ρ ˆ t ˆ τ ˆ ζ ˆ ρ ˆ t ˆ τ ˆ ζ ˆ ρ ˆ t ˆ τ ˆ ζ ˆ ρ ˆ t ˆ τ σ ( ρ ) σ ( t ) σ ( τ ) . % . . . . % . . . % . -- . % . . . . . . . % . . . . % . . . % . -- . % . . . . . . . % . . . . % . . . % . -- . % . . . . . . . % . . . . % . . . % . -- . % . . . . . . . % . . . . % . . . % . -- . % . . . . . . . % . . . . % . . . % . -- . % . . . . . . . % . . . . % . . . % . -- . % . . . . . . . % . . . . % . . . % . -- . % . . . . . . . % . . . . % . . . % . -- . % . . . . . . . % . . . . % . . . % . -- . % . . . . . . . % . . . . % . . . % . -- . % . . . . . . . % . . . . % . . . % . -- . % . . . . . . . % . . . . % . . . % . -- . % . . . . . . . % . . . . % . . . % . -- . % . . . . . . . % . . . . % . . . % . -- . % . . . . . . . % . . . . % . . . % . -- . % . . . . . . A ll . % . . . . % . . . % . -- . % . . . . . . T a b l e S : T h eca t e go r y - s p ec i ﬁ c s e l f- a ss e m b l yp e rf o r m a n ce o f d i ff e r e n t m e t hod s w it h r a ndo m i n iti a li za ti onpo li c y . I D ( m ) | S m | O P T - D H UNDUD E - F A L F . . . . -- . -- . . . . . .

63 213161 . . . . -- . -- . . . . . .

58 312961 . . . . -- . -- . -- . . .

83 415921 . . . . -- . -- . . . . . .

44 512231 . . . . -- . -- . . . . . .

79 610881 . . . . -- . -- . . . . . .

95 715851 . . . . -- . -- . -- . . .

62 817291 . . . . -- . -- . -- . -- . . . . -- . -- . -- . . .

17 1017221 . . . . -- . -- . . . . -- . . . . -- . -- . . . . . .

70 1218651 . . . . -- . -- . -- . . .

50 1317851 . . . . -- . -- . -- . . .

80 1416551 . . . . -- . -- . -- . . .

02 1516701 . . . . -- . -- . -- . . .

45 1614181 . . . . -- . -- . -- . . . A ll - . . . . -- . -- . . . . . . T a b l e S : T h e s h a p e - s p ec i ﬁ c s e l f- a ss e m b l yp e rf o r m a n ce o f d i ff e r e n t m e t hod s w it h s p ec i ﬁ c i n iti a li za ti onpo li c y i n80 × e nv i r on m e n t s . .7.2 Scalability A. B.D.C.

Figure S3: The changing trends of relative completion time to form the 16 shapes in Table S1 as theshape scale grows, via ALF (our approach) and OPT-D, respectively.The complete experimental results for evaluating scalability are presented in Figure S3 and S4:1. Figure S3 shows the changing trends of relative completion time to form the 16 shapes in Table S1as the shape scale grows, via ALF (our approach) and OPT-D, respectively. Two forms of ﬁttingfunction (a linear function: y = a · n + b , and a log function: y = a · log ( b · n + c ) , where n is theindependent variable, denoting the shape scale) are investigated.2. Figure S4 shows the changing trends of absolute completion time to form the 16 shapes in Table S1as the shape scale grows, via ALF-1T (1-thread), ALF-16T (16-threads), and OPT-D, respectively.OPT-D is ﬁtted by the function form of y = a · n + b · n + c · n + d , and ALF is ﬁtted by the functionform of y = ( a · n + b · n + c ) · ( d · n + e ) + f .The results show that, 30. For relative completion time, both methods achieve a log ( n ) -likely increase as the shape scalegrows. Speciﬁcally, in the log ﬁtting, ALF and OPT-D achieve the R of 0.9506 and 0.9799 onaverage in 16 shapes, respectively, better than the R of 0.8982 and 0.9641 in linear ﬁtting.2. For absolute completion time, OPT-D achieves the R > n -form function, and ALF achieves the R > n log ( n ) -form function. Furthermore,ALF can be easily accelerated through parallelization, and the speedup of ALF-16T is 12.96. A. B. C. D.E. F. G. H.I. J. K. L.M. N. O. P.

Figure S4: The changing trends of absolute completion time to form each of the 16 shapes in Table S1as the shape scale grows, via ALF-1T (1-thread), ALF-16T (16-threads), and OPT-D, respectively.In summary, compared to the state-of-the-art centralized distance-optimal algorithm OPT-D, ALFexhibits a n to n log ( n ) decrease in the absolute completion time of self-assembly tasks with respect totask scale n , and can be easily accelerated through parallelization, manifesting a good scalability.31 .7.3 Stability The complete results for evaluating stability are presented by Table S3. The results show that ALFachieves normalized σ ( ρ ) =0.00034, σ ( t ) =0.04410, and σ ( τ ) =0.00033 for the entire shape set in 156 × × S , and N repeated experiments for an algorithm (ALF/OPT-D) to form S ,the ratio of ˆ t ( S, ALF, N ) over ˆ t ( S, OP T − D, N ) , calculated by ˆ r ( S, N ) = ˆ t ( S,ALF,N )ˆ t ( S,OP T − D,N ) , is relativelystable. Speciﬁcally, for most of the shapes (127 out of 156), the ˆ r ( S, N ) values keep relatively stablewhen the shape scale increases regardless the shape’s type (Figure S5.A); and most ˆ r ( S, N ) values arearound 1.5 and do not exceed 3.6 in all experiments (Figure S5.B). Since the relative completion timeof OPT-D has a theoretical upper bound (see Section 3.2), the highly correlated relation between therelative completion time of ALF and OPT-D indicates that ALF may also possess a similar property inthe statistical sense. The iteration ratio of ALF over OPT-D as It e r a t i on R a t i o

10 28 31 37 41 44 47 52 54 60 65 68 71 76 92 180 202 232 261 283 303 329 342 386 405 427 438 470 499 718 782 923 1033 1128 1202 1316 1385 1567 1664 1712 1786 1899 2248

The iteration ratio of ALF over OPT in 127 shapesEnvironment Scale It e r a t i on R a t i o Environment scale R a t i o ω The ratio between iteration of VLF and OPT-D

A. B.

Figure S5: The ratio of ALF’s relative completion time over that of OPT in 129 shapes with differentshape/environment scales.

In experiments, we observed that HUN, OPT-D, DUD, and E-F have some weaknesses, resulting in theirineffective solutions to the self-assembly problem in grid environments. In the following, we elaborateon these weaknesses and analyze possible causes of these weaknesses.Two weaknesses are observed in HUN: 32. The success rate of HUN decreases as the shape scale grows. HUN successfully formed 98.07%shapes in 16 ×

16 environments, whereas in 40 ×

40 and 80 ×

80 environments, only 3.21% and 0%shapes were successfully formed by HUN. One possible cause of this weakness is that, whenassigning target grids to agents based on shortest distances, HUN ignores the potential conﬂicts be-tween paths assigned to agents, which are more likely to appear as the number of agents increases.2. Trafﬁc jams often occur in HUN, resulting in low efﬁciency. Speciﬁcally, a trafﬁc jam is a specialkind of path conﬂict between agents, which occurs when agents reach their target grids located atthe shape boundary earlier than those agents whose target grids located at the shape’s inner area,and thus prevent these agents from entering the shape. One possible cause of this weakness isthat HUN does not take account of the temporal relation between agent movements during pathplanning. Figure S6 shows an example of trafﬁc jam in HUN.One weakness is observed in OPT-D: the absolute completion time of OPT-D is extremely high. Thecause is that the two activities of agent-grid assignment and path-vertex ordering in OPT-D both havea high computational complexity of O ( n ) for a self-assembly task with n agents. Accordingly, thetime efﬁciency of OPT-D could be improved from two points: using a distributed agent-grid assignmentalgorithm [29] to improve parallelism; replacing global path-vertex ordering with a lightweight localpriority negotiation protocol [30].Two weaknesses are observed in DUD: high frequency of agent collisions, and low completion qual-ity. One possible cause for the two weaknesses is that the control strategy of DUD does not suitable forself-assembly tasks in grid environments. Speciﬁcally, when an agent enters the target shape, the attrac-tive force will turn to 0, and the agent will keep moving along the same direction until some other agentsappear in its neighborhood, making the agent changes its moving direction. For agents inside the targetshape, this strategy leads to the phenomena of oscillation, i.e., each agent moves back and forth aroundits target grid. This strategy is suitable for self-assembly in continuous environments with sparse tar-get distribution. But in the discrete grid environments with dense target distribution, the oscillation willcause an increase in the number of overlapping agents within the target shape, and the position-correctingactivity in DUD cannot separate those overlapping agents correctly, resulting in a low completion quality.Two weaknesses are observed in E-F:1. In general, the efﬁciency of E-F is extremely low. The cause is that the edge-following strategygreatly increases an agent’s travel distance from its initial position to its destination, and also greatly33 ptimal plan atcost=2 Actual plan atcost=2 AgentTarget gridMovement at a a a t t t Target grid for a i g g g i Figure S6: An example of trafﬁc jam in HUN. At time t , it is observed that agent a moves ﬁrst to itstarget grid g and blocks the way of agent a towards its target grid g , so re-planning is required beforethe next time step t . The optimal plan is that the a and a can swap their target grids and move leftby one grid together with a total cost of 2. However, since HUN allocates goals directed by the minimaltravel distance without considering path conﬂicts, so the actual plan may still be that a stays at g and a moves to g , which has the same distance cost as the optimal plan but is impracticable.decreases the system parallelism because at any time only those agents that locate at the swarm’sboundary can move.2. For shapes with holes, both the success rate and the completion quality and of E-F are low. When E-F terminates in a self-assembly task of a target shape with holes, there are usually many unoccupiedareas around the holes within the formed shape; in extreme cases, E-F even never terminates. Thecause is that the agent stop condition of E-F does not suitable for shapes with holes. In E-F, onceentering the shape, an agent will stop moving when one of two conditions is satisﬁed: 1. theagent is about to move out of the shape; 2. the agent is next to a stopped agent with the sameor greater gradient. From the second stop condition, the following property can be induced: anagent will stop moving as long as it connects two stopped agents with different gradients (sucha scenario usually happens when the agent moves along a hole in the shape, even there are stillunoccupied grids around the hole). The reason is that if a moving agent connects two stoppedneighbors with gradients of x and y satisfying x (cid:54) = y , then the moving agent’s gradient will beupdated to min ( x + 1 , y + 1) ; since x (cid:54) = y , so min ( x + 1 , y + 1) ≤ max ( x, y ) , which meansthe agent reaches the second stop condition. The updated gradient will further propagate through34onnected agents, causing subsequent agents to stop moving earlier and thus resulting in manyunoccupied grids. When an agent cannot enter the shape, it will never reach any stop condition andthus keep moving around the swarm’s outer edge. An example is shown in Figure S7. t j t k t i

39 39 39 3940 40 40 404116 19 19 19 1918 18 18 1917164040 41 20202039 39 39 3940 40 40 404116 404041 19 19 19 1918 18 18 191716 202020 t i +1 a a a a a a a a t i ,0 t i ,1 t i ,2 Figure S7: An example of forming a shape with holes by E-F. At time t i , three micro-steps ( t i, , t i, ,and t i, ) are observed: at t i, , the gradients of a and a are both 41, and both agents are planning tomove left; at t i, , a moves left, and its movement doesn’t trigger the update of gradients or any stopcondition, so a plans to keep moving left at next time step t i +1 ; at t i, , a moves left and connects theupper (pink) and the below (yellow) neighbors, with gradients of 40 and 16, respectively, which resultsin changes of agents’ gradients (in particular, the gradient of a is changed to 17, and the gradient ofits upper neighbor is changed accordingly to 18) and triggers the stop condition of a , leading to anunoccupied area at the left of a . At time t i +1 , agent a moves left and triggers the stop condition ofitself, leaving an unoccupied grid between a and a . Subsequent agents will also stop earlier like a dueto the recalculation of gradients. At time t j , many unoccupied areas appear in the yellow rectangle, anda similar scenario of t i occurs again in the purple rectangle, resulting in more unoccupied areas. At time t k , many agents keep moving around the outer edge of the shape since they have no chance to enter theshape. 35 .8 Parameter Analysis Our approach has four adjustable parameters:

W ∈ [0 , that controls the time of policy changing, ﬂag ∈ { true, f alse } that determines whether agents are allowed to move out of the target shape afterentering, γ ∈ [0 , that indicates the probability of choosing actions that are worse than staying still, and fthat represents one of the 9 types of the distance-discount function (see section 3.5). In order to investigatethe effect of different parameter values on ALF’s performance, we select a subset of experiments on thetwo shapes of “3-holes” and “r-6-edge” from the experiments mentioned in section 3.5. In particular, foreach of the four parameters, we ﬁx other parameters to their values in the best parameter combination(i.e., W = 0 . , ﬂag = True , γ = 0 . , and f = type -6), change the parameter’s value (12 values of W , 6 values of γ , 2 values of ﬂag , and 9 values of f ), and observe ALF’s relative completion times ondifferent values. The results are shown in Figure S8.For parameter W , it is observed that: when W = 0 , the relative completion time is relatively high;as W increases to 0.05, the relative completion time decreases rapidly; after that, as W increases, therelative completion time decreases slowly, and when W = 0 . , the relative completion time achievesthe minimal value; after that, as W increases further, the relative completion time also increase slowly.As a result, to achieve a shorter relative completion time, it is better to set W > .For parameter γ , it is observed that: when γ = 0 , the relative completion time of the algorithm isrelatively low; when γ = 0 . , the relative completion time achieves the minimal value; as γ increasesfrom 0.2 to 0.6, the relative completion time increases slowly; and as γ increases further, the relativecompletion time increases rapidly. As a result, to achieve a shorter relative completion time, it is betterto set γ < . .For parameter f , it is observed that:1. For linear and inverse light-intensity distance-discount functions, Chebyshev distance measure-ment function performs better than other two distance measurement functions. For square-inverse light-intensity distance-discount function,

European distance measurement function performs best.2. For all distance measurement functions, linear light-intensity distance-discount function performsworse than the other two distance-discount functions, and the performance of inverse and square-inverse distance-discount functions shows little difference.3. The type -6 f function achieves the best performance on both shapes;36s a result, to achieve a shorter relative completion time, it is better to set f to type -6 (the combinationof inverse light-intensity distance-discount function and Chebyshev distance measurement function) or type -8 (the combination of square-inverse function and

European function).For parameter ﬂag , it is observed that:1. Both shapes are formed faster when setting ﬂag as T rue than

F alse ;2. The change of ﬂag shows much greater effects on the relative completion time when forming 3-holes than r-6-edge;As a result, to achieve a shorter relative completion time, it is better to set ﬂag as T rue . R e l a ti v e c o m p l e ti on ti m e A. The value of

R-6-edge 3-holes

Min 45.8Min 59.7 R e l a ti v e c o m p l e ti on ti m e B. The value of γ Min 45.8Min 69.4 R e l a ti v e c o m p l e ti on ti m e C. The type of f

Min 45.8Min 59.7 R e l a ti v e c o m p l e ti on ti m e D. The value of flag

TRUE FALSE