A new record of graph enumeration enabled by parallel processing
AA new record of graph enumerationenabled by parallel processing
Zhipeng Xu a,b , Xiaolong Huang b , Fabian Jimenez c , Yuefan Deng b, ∗ a School of Data and Computer Science, Sun Yat-sen University, Guangzhou, Guangdong510275, P.R. China b Department of Applied Mathematics and Statistics, Stony Brook University, StonyBrook, NY 11794, USA c Technologies, Empresa Pblica Yachay, Centro de Emprendimiento Innopolis 100115,Ecuador
Abstract
Using three supercomputers, we broke a record set in 2011, in the enumera-tion of non-isomorphic regular graphs by expanding the sequence of A006820in Online Encyclopedia of Integer Sequences (OEIS), to achieve the numberfor 4-regular graphs of order 23 as 429,668,180,677,439, while discoveringserval optimal regular graphs with minimum average shortest path lengths(ASPL) that can be used as interconnection networks for parallel computers.The number of 4-regular graphs and the optimal graphs, extremely time-consuming to calculate, result from a method we adapt from GENREG, aclassical regular graph generator, to fit for supercomputers’ strengths of usingthousands of processor cores.
Keywords:
Regular graph, parallel computing, dynamical scheduling ∗ Corresponding author
Email addresses: [email protected] (Zhipeng Xu), [email protected] (Xiaolong Huang), [email protected] (Fabian Jimenez), [email protected] (Yuefan Deng) a r X i v : . [ c s . D M ] O c t . Introduction Analysis of regular graphs for their properties, including eigen-spectra andautomorphisms, is a fertile field for discovering and applications in algebraicgraph theory [1]. Yet, there are many unsolved problems, e.g., the Conways99-graph problem [2] and the 57-regular Moore graph [3]. For analysis ofinterconnection networks, regularity is essential for its direct and useful re-lationship to the complexity of network implementation, and as such, manyregular graphs including the Peterson graph, hypercube graph, and their ex-tensions [4, 5, 6, 7, 8, 9] are widely used to construct interconnection networksfor parallel computers.For 3-regular graphs of order n , Robinson and Wormald [10, 11] presentedall counting results for n ≤
40, while pointing out that enumeration forunlabeled k -regular graphs with k > n from 22 to 23, forthe first time, in the sequence A006820 [23] of OEIS, which is the numberof connected 4-regular graphs of order n . Kimberley [23] used GENREG toenumerate the 4-regular graphs for up to the order 22 in 2011 [24]. Thisrecord for n = 23 remained unchallenged until our enumeration for n = 23,enabled by our parallel computing implementation to advance it a step.
2. The enumeration framework and results
For enumerating the regular graphs, published packages such as mini-baum [13], snarkhunter [14], and GENREG have their own strengths andweaknesses, GENREG is more general in covering the graph degrees thanminibaum and snarkhunter that only support 3-regular graphs.3n our parallel computing framework, we designate one node as the mas-ter, whose task involves adaptive scheduling and dispatching, and the rest asa team of workers. When our program starts, the workers send a message tothe master to request a task, and the workers continue the requests until thelist of tasks exhausts. As usual, when the master sends a task to a worker,this task is marked as selected and becomes unavailable. At last, when thetask pool empties, the master signals all workers to exit.Our dynamical scheduling strategy keeps cores in the cluster busy foruseful tasks to allow us efficient search for graphs with specified parameters,e.g., diameters or eigenvalues, by inserting external serial programs. In ad-dition to load balance, our parallel program reduces the communication costto N task ×
2, less requirement of bandwidth because the message itself is themessage count. If we use a dedicated thread for task scheduling, the scala-bility and limit the maximum computer system can both shrink. Dependingon the communication sub-system, the maximum scalable system our cur-rent approach can reach is approximately 3,000 cores, due to communicationcongestions, eventually. We may improve the scalability of our program bya multi-level scheduling; particularly, for the many-core systems.
In the interconnection networks of supercomputers and data centers, reg-ularity is a very significant feature because it is related to the complexity ofthe network configuration. For the topologies of regular graphs applied to theinterconnection networks, it is highly desirable to obtain graphs with min-imal ASPLs because they help reduce communication latencies. Let d ( i, j )4e the distance between vertex i and j , the ASPL is calculated as follows,ASPL = 1 N ( N − (cid:88) d ( i, j ) (1)Cerf [25] calculated and proved lower bound of ASPL, hence the goal ofoptimal graph is to find graphs with minimum ASPLs. Usually and thusfar, random or heuristic methods or intuitions were resorted to searching forgraphs with such desired properties for large networks; Graph Golf [26], acompetition of searching for graphs with the minimal diameters and ASPLs,generated many graphs that are very commonly, but asymmetrically. How-ever, we still hope to discover graphs with all desired properties includingminimum diameters, ASPLs, symmetry, and robustness but with one disad-vantage: smaller graphs. However, these graphs can be adopted in smallerclusters or multiple modules of these clusters or system-on-chip. (a) (32,3)-optimal (b) (32,4)-optimal Figure 1: The optimal graphs used in, for example, a Beowulf cluster [22].
Using this framework, we decomposed the search into 200,000 sub-tasks5nd completed the exhaustive search of all of 3-regular graphs of order 32with diameter 4 and minimal ASPL. The program running on the Sunway-Bluelight supercomputer [27] with 80,000 cores for 72 hours and discovered56 graphs with the minimal ASPLs after exhausting all 18,941,522,184,590possible graphs predicted by Robinson et al. [10] who only enumerated themwithout finding the graphs with desired properties including minimum diam-eters and ASPLs. Deng et al. [22] applied one (Fig. 1 (a)) of these graphsto construct a Beowulf [28] cluster, and their benchmark results show thatthe graphs with the minimal ASPL outpeform other classical topologies.Fig. 2 shows the distribution of generated graph numbers for all cases,which look like Gaussian distribution with the long tail containing hundredsof millions of graphs and the frequency fluctuates by 4 orders of magnitude.Clearly, the tasks of enumerations vary greatly from graph to graph, and ourdynamic task scheduling has eliminated substantial waiting time due to loadimbalance. ·10 F r e qu e n c y Number of graphs
Figure 2: Number of graphs and frequency. able 1: OEIS A006820 Order Quartics5 16 17 28 69 1610 5911 26512 1,54413 10,77814 88,16815 805,49116 8,037,41817 86,221,63418 985,870,52219 11,946,487,64720 152,808,063,18121 2,056,692,014,47422 28,566,273,166,527
23 429,668,180,677,439
When we search for the minimum-ASPL graphs among all possible 4-regular graphs of order 32, there is no enumeration result for this scale when girth ≤
5. We can still use our software to search for regular graphs withminimum ASPL, as shown in Fig. 1 (b), without confirmation of exhaus-tiveness. Therefore, we verified the results in Tab. 1 after decomposing theproblem into 50,000 sub-tasks and setting the split-level as 12. We manageto increase the n of the sequence A006820 from 22 to 23 and confirmed thenumber of 4-regular graphs of order 23 as 429,668,180,677,439 by using ourparallel GENREG with the same parameters.7 able 2: Cost on three clusters Cluster Total processed(10 graphs) Computing time(core years) Core speed(10 graphs/s)Seawulf 371 .
13 66.12 178Tianhe-1 69 .
51 19.53 113IBM Quinde 23 .
60 13.25 56Our work to obtain the new enumeration for n = 23 is estimated tocost nearly 100 core-years. We ganged three supercomputers, the SeaWulf atStony Brook University, while the Tianhe-1 with Intel Xeon X5670 processorsand the IBM Quinde with Power8 processors can process 113,000 and 56,469graphs per second per core, respectively. As shown in Tab. 2, the Seawulfwith Intel Xeon Gold 6148 processors has the highest efficiency for searching178,000 graphs per second per core, while the Tianhe-1 with Intel Xeon X5670processors and the IBM Quinde with Power8 processors contributed the rest.In fact, the result of graph counting for (23,4)-regular graphs was ob-tained by the strategic and opportunistic use of the fragmented and sharedcomputing resources. In addition to the policy of fair and efficient share formost supercomputers, the Tianhe-1 would terminate tasks that last long formany cores because of maintenance, which prevents us from running longtasks. The external scheduling system we developed helps overcome the lim-itations of computing resources while facilitating optimal utilization of theoccasionally available cores. 8 . Conclusions Our parallel method adapting the GENREG enables us to complete theresearch and enumeration on systems of 3000 processor cores. For the firsttime, using this new approach, we discovered several optimal graphs of order32 and found the enumeration count for 4-regular graphs of order 23, gainingconfidence in the graph theory community that high-performance computingcan help solve otherwise intractable problems.
Acknowledgements
Z. Xu is supported by the special project high performance computingof National Key Research and Development Program 2016YFB0200604. Wethank the National Supercomputing Centers in Jinan and Changsha in China,for computing resources and, M. Meringer of German Aerospace Center,for technical support and of GENREG and beneficial suggestions of thismanuscript via e-mails.
References [1] C. Godsil, G. Royle, Algebraic Graph Theory, Springer New York, 2001. doi:10.1007/978-1-4613-0163-9 .[2] J. Conway, Five $1,000 problems (update 2017), On-Line Encyclopediaof Integer Sequences (2019).[3] A. J. Hoffman, R. R. Singleton, On moore graphs with diameters 2 and3, IBM Journal of Research and Development 4 (5) (1960) 497–504. doi:10.1147/rd.45.0497 . 94] S. Das, A. Banerjee, Hyper petersen network: yet another hypercube-like topology, in: [Proceedings 1992] The Fourth Symposium on theFrontiers of Massively Parallel Computation, IEEE Comput. Soc. Press,1992. doi:10.1109/fmpc.1992.234949 .[5] S. Ohring, S. Das, Folded petersen cube networks: new competitorsfor the hypercubes, in: Proceedings of 1993 5th IEEE Symposium onParallel and Distributed Processing, IEEE Comput. Soc. Press, 1993. doi:10.1109/spdp.1993.395482 .[6] S. Ohring, S. K. Das, The folded petersen network : A newcommunication-efficient multiprocessor topology, in: 1993 InternationalConference on Parallel Processing - ICPP 93 Vol1, IEEE, 1993. doi:10.1109/icpp.1993.175 .[7] J.-H. Seo, Three-dimensional petersen-torus network: a fixed-degree net-work for massively parallel computers, The Journal of Supercomputing64 (3) (2011) 987–1007. doi:10.1007/s11227-011-0716-z .[8] J.-H. Seo, J.-S. Kim, H. J. Chang, H.-O. Lee, The hierarchical pe-tersen network: a new interconnection network with fixed degree, TheJournal of Supercomputing 74 (4) (2017) 1636–1654. doi:10.1007/s11227-017-2186-4 .[9] J.-H. Seo, H. Lee, M. suk Jang, Petersen-torus networks for multi-computer systems, in: 2008 Fourth International Conference on Net-worked Computing and Advanced Information Management, IEEE,2008. doi:10.1109/ncm.2008.47 .1010] R. W. Robinson, N. C. Wormald, Numbers of cubic graphs, Journal ofGraph Theory 7 (4) (1983) 463–467. doi:10.1002/jgt.3190070412 .[11] R. W. Robinson, Counting cubic graphs, Journal of Graph Theory 1 (3)(1977) 285–286. doi:10.1002/jgt.3190010310 .[12] M. Meringer, Fast generation of regular graphs and construction ofcages, Journal of Graph Theory 30 (2) (1999) 137–146. doi:10.1002/(sici)1097-0118(199902)30:2<137::aid-jgt7>3.0.co;2-g .[13] G. Brinkmann, Fast generation of cubic graphs, Journal of Graph The-ory 23 (2) (1996) 139–149. doi:10.1002/(sici)1097-0118(199610)23:2<139::aid-jgt5>3.0.co;2-u .[14] G. Brinkmann, J. Goedgebeur, Generation of cubic graphs and snarkswith large girth, Journal of Graph Theory 86 (2) (2017) 255–272. doi:10.1002/jgt.22125 .[15] G. Brinkmann, K. Coolsaet, J. Goedgebeur, H. M´elot, House of graphs:A database of interesting graphs, Discrete Applied Mathematics 161 (1-2) (2013) 311–314. doi:10.1016/j.dam.2012.07.018 .[16] OEIS Foundation Inc, The on-line encyclopaedia of integer sequences(2019).URL http://oeis.org [17] OEIS Foundation Inc, A068934 in the on-line encyclopaedia of integersequences (2019).URL http://oeis.org/wiki/User:Jason_Kimberley/A068934 doi:10.1137/s0097539797321602 .[19] M. Meringer, Structure enumeration and sampling, in: Handbook ofChemoinformatics Algorithms, Chapman and Hall/CRC, 2010, pp. 233–267. doi:10.1201/9781420082999-c8 .[20] M. Meringer, H. J. Cleaves, Exploring astrobiology using in silico molec-ular structure generation, Philosophical Transactions of the Royal So-ciety A: Mathematical, Physical and Engineering Sciences 375 (2109)(2017) 20160344. doi:10.1098/rsta.2016.0344 .[21] W. Gropp, E. Lusk, A. Skjellum (Eds.), Using MPI: Portable ParallelProgramming with the Message Passing Interface, MIT Press Ltd, 2014.[22] Y. Deng, M. Guo, A. F. Ramos, X. Huang, Z. Xu, W. Liu, Optimal low-latency network topologies for cluster performance enhancement, arxiv(2019). arXiv:http://arxiv.org/abs/1904.00513v1 .[23] OEIS Foundation Inc, A006820 in the on-line encyclopaedia of integersequences (2019).URL http://oeis.org/A006820 [24] F. Larri´on, M. Piza˜na, R. Villarroel-Flores, On self-clique shoal graphs,Discrete Appl. Math. 205 (C) (2016) 86–100. doi:10.1016/j.dam.2016.01.013 .URL https://doi.org/10.1016/j.dam.2016.01.013 doi:10.1007/bfb0069540 .[26] M. Koibuchi, I. Fujiwara, S. Fujita, K. Nakano, T. I. T. Uno,K. Kawarabayashi, Graph Golf: The Order/degree Problem Compe-tition, http://research.nii.ac.jp/graphgolf/ .[27] The Sunway Blue Light in Top 500 List (June 2018).URL