GraphGallery: A Platform for Fast Benchmarking and Easy Development of Graph Neural Networks Based Intelligent Software
GGraphGallery: A Platform for Fast Benchmarkingand Easy Development of Graph Neural NetworksBased Intelligent Software
Jintang Li, Kun Xu, Liang Chen*, Zibin Zheng
Sun Yat-sen University, China { lijt55, xukun6 } @mail2.sysu.edu.cn { chenliang6, zhzibin } @mail.sysu.edu.cn Xiao Liu
Deakin University, Australia [email protected]
Abstract —Graph Neural Networks (GNNs) have recentlyshown to be powerful tools for representing and analyzing graphdata. So far GNNs is becoming an increasingly critical role in soft-ware engineering including program analysis, type inference, andcode representation. In this paper, we introduce GraphGallery, aplatform for fast benchmarking and easy development of GNNsbased software. GraphGallery is an easy-to-use platform thatallows developers to automatically deploy GNNs even with lessdomain-specific knowledge. It offers a set of implementations ofcommon GNN models based on mainstream deep learning frame-works. In addition, existing GNNs toolboxes such as PyG andDGL can be easily incorporated into the platform. Experimentsdemonstrate the reliability of implementations and superiority infast coding. The official source code of GraphGallery is availableat https://github.com/EdisonLeeeee/GraphGallery and a demovideo can be found at https://youtu.be/mv7Zs1YeaYo.
Index Terms —Graph Neural Networks, Benchmarking, Intel-ligent Software Development, Open-source Platform
I. I
NTRODUCTION
Graph Neural Networks (GNNs) have received a consider-able amount of attention from academia and industries mainlydue to the powerful ability in representing and modelingthe relationships between nodes or edges in a graph. GNNsoperate on graphs and manifolds by generalizing traditionaldeep learning to irregular domains and thus can deal directlywith more general data forms. So far many novel architectureshave been put forward and resulted in recent breakthroughs intasks across various domains [1]–[7]. For example, PinSage[7], a GNNs based system, has been successfully deployed atPinterest recommendation system with millions of users anditems.However, mainstream deep learning frameworks, such asTensorFlow [8] and PyTorch [9], have not yet integrated func-tional APIs to implement GNNs conveniently and efficiently.Unlike what has been developed and exploited for convolu-tional neural networks (CNNs), the implementation of corecomponents of GNNs remains a challenge for developers andresearchers. To this end, a line of specialized GNNs toolboxeshave been developed for efficient training. For instance, PyG [10] and DGL [11] are two popular toolboxes for deep learningon graph-structured data.While these toolboxes have eased the development ofcomplex GNN models, they also come with steep learningcurves for inexperienced developers, since it requires neces-sary domain-specific knowledge as well as certain scriptingand coding experience in machine learning. Developers havesought a platform that can deploy GNN models more con-veniently while requiring less expert knowledge. Specifically,there is room for improvement in the following aspects.
Deployability . It is time-consuming, tedious, and demand-ing for developers who wish to deploy a GNN model. De-velopers have to build their own pipeline including trainingand inference procedures from scratch.
Reproducibility . De-velopers might not be able to use the existing implementationsfrom published algorithms directly, since they are possiblyimplemented with various deep learning frameworks. Conse-quently, developers have to adapt the code of one to anotheraccordingly.
Benchmark . More efforts would be made for re-searchers who want to conduct benchmarking experiments andperformance comparisons. As researchers have to implementdifferent benchmarking models, and it is necessary but usuallytedious work for proper training and fair parameter tuning.We tackle the aforementioned problems by our proposedGraphGallery, an open-source platform built on TensorFlowand PyTorch. It allows technically inexperienced developersto automatically deploy GNNs based software and also easesthe development of new models. In particular, GraphGalleryincludes the following features: • Unified and extensible interface.
As a user-friendlyplatform, GraphGallery offers unified interfaces for de-velopers to deploy GNNs based software even withoutexpert knowledge. • Multiple frameworks support.
GraphGallery interfaceswith the most popular deep learning frameworks: Tensor-Flow and PyTorch. Additionally, it integrates seamlesslywith PyG, DGL, and other GNNs toolboxes. • Fast coding.
GraphGallery provides a model gallery witheasy and elegant APIs for developers and researchers, aGNN model can be deployed in a few lines of code. a r X i v : . [ c s . A I] F e b raphGallery is an out-of-the-box platform and flexiblysupports multi-level reuse. Developers with minimum scriptingand coding experience are sufficient to deploy a completeGNN model. With this platform, developers can be morefocused on designing the architecture of GNNs based softwareinstead of other tedious works. Moreover, GraphGallery pro-vides comprehensive tutorials and examples, which can serveas a good starting point for beginners.In conclusion, we make the following contributions with ourGraphGallery platform: • An extensible, user-friendly, and open-source platform forfast benchmarking and easy development. • A series of building blocks for creating GNNs basedsoftware quickly. • A collection of out-of-the-box GNN models for re-searchers and developers.II. R
ELATED W ORK
One of the closest analogs of GraphGallery is PyTorchGeometric (PyG) [10] toolbox, which has been recently re-leased as a deep learning library on irregularly structured data.PyG follows a message passing scheme and immutable dataflow paradigm and provides the essential building blocks forcreating GNNs.Deep Graph Library (DGL) [11] is another deep learningtoolbox developed for graph-structured data. DGL takes thegeneralized sparse tensor operations as computational schemesof GNNs and advocates graph as the central programmingabstraction.In addition, there are other toolboxes proposed to facilitatethe research and development on GNNs, such as NeuGraph[12] and Spektral [13]. However, they have been built upondifferent deep learning frameworks, and the provided APIsare more complicated for inexperienced developers and re-searchers in this field.GraphGallery is quite different from the aforementionedtoolboxes in both design principles and concepts, it considersmore about deploying GNNs based software quickly and easilyfor developers. We show the comparison of GraphGallery andother toolboxes in Table I, here we mainly compare the keydifferences with PyG and DGL. Importantly, GraphGalleryis designed initially to lower the bar of entry and accelerateresearch and development on GNNs based software. To thisend, GraphGallery offers unified and easy-to-use APIs forbuilding GNN models and supports various deep learningframeworks. In addition, it provides unified interfaces for moreflexible customizations as well as developing interfaces inimplementing GNNs.III. G
RAPH G ALLERY P LATFORM
The overall architecture of GraphGallery is presented in Fig.1. GraphGallery follows an end-to-end design and providesa set of unified interfaces for developers. Such interfacesenable developers to build a custom GNN model and accessthe intermediate results during training and testing. Withthese interfaces, developers could be more concerned with
TABLE IC
OMPARISON OF G RAPH G ALLERY AND OTHER
GNN
S TOOLBOXES . Features PyG DGL GraphGallery
Built-in Dataset (cid:51) (cid:51) (cid:51)
Custom Dataset API (cid:55) (cid:55) (cid:51)
Developing Interface (cid:51) (cid:51) (cid:51)
Model Gallery (cid:55) (cid:55) (cid:51)
PyTorch Support (cid:51) (cid:51) (cid:51)
TensorFlow Support (cid:55) (cid:51) (cid:51)
Training/Inference Pipeline (cid:55) (cid:55) (cid:51)
Unified Interfaces for Training/ValidationEarly Stopping Model Checkpoint Logger … • Loss• Accuracy• Time• … Output Report Trained ModelData FlowInput Graph Training PipelineInference PipelineModel Building
Building Blocks Model Gallery … Built-in orUser-defined
TensorFlow/PyTorch Indicator BackendUnified Interfaces for InferenceGraph Purification AdversaryDetection … Fig. 1. Conceptual architecture of GraphGallery. the design of the model (“Model Building” in Fig. 1) whileother tedious work such as parameter tuning and break-pointresume is automatically done by this platform. Besides, wealso consider the robustness and stability of GNNs [14],where several specialized methods have been integrated intothe inference pipeline for enhancing the robustness of GNNsagainst adversarial attacks.GraphGallery has high independility which allows devel-opers to change any components of the framework withoutimpacting the other. For instance, developers could add customcode in the components to redefine the data flow through themodel while keeping the rest of the code virtually unchanged.Such a design of GraphGallery endows its maintainability,understandability, and extensibility. Moreover, developers areencouraged to extend GraphGallery to support other GNNstoolboxes (see Sec. IV-D).Similar to the popular DGL toolbox, GraphGallery adoptsa framework-neutral design instead of being framework-agnostic. That is, a PyTorch model still needs to be made someadjustments if it is to be run in TensorFlow. Since TensorFlowand PyTorch are two different deep learning frameworks andvary in many aspects, being completely framework-agnosticrequires massive effort over all conceivable operators acrossthe two frameworks, and this makes it hard to maintain andextend. Instead, we adopt another practical approach to reducethe dependencies as much as possible and endow the platforma higher degree of flexibility.Figure 2 demonstrates the framework-neutral design of yTorchTensorUnified InterfacesScipyPure InputsNumpy TensorFlowTensor TensorFlowOperatorPyTorchOperatorIndicator
Fig. 2. Framework-neutral design of GraphGallery.
GraphGallery. We adopt Numpy and Scipy compressedsparse row (CSR) matrices as pure inputs, which are popularforms of data in machine learning. This is quite different fromPyG and DGL that have their own data paradigms, and itbenefits the extensibility of GraphGallery. Then, the inputscould be transformed into various framework-specific tensorswith unified interfaces. The indicator (dotted square) can beswitched dynamically to another backend during programrunning, developers hardly notice the differences of GNNswhen working with TensorFlow or PyTorch.IV. F EATURES OF G RAPH G ALLERY
GraphGallery is featured in many aspects, including generaland user-defined datasets, comprehensive benchmark models,multiple frameworks support, and high extensibility.
A. Built-in and User-defined Dataset
GraphGallery has already assembled a number of bench-mark datasets commonly seen in the literature. The datasetsare saved as a Numpy compressed
NpzFile (A zipped archiveof files) and researchers can directly use them in their workswithout taking into consideration extra preprocessing. More-over, it is much easier to define and employ custom datasetswith the provided interfaces.
B. Model Gallery
GraphGallery is an easy-to-use platform that provides anout-of-the-box model gallery. It is suitable for the threetypical scenarios: (i) A beginner with less domain-specificknowledge who wants to understand how GNN works; (ii) Aresearcher who wants to conduct benchmarking experiments;(iii) A developer who wants to deploy GNNs on existingapplications/systems. Specifically, they do not need to considerthe API details with different frameworks backend. Instead,they can call a complete GNN implementation from the modelgallery directly by just 3-4 lines of code. data = load_data() from graphgallery.gallery import GCN model = GCN(data) model.process(optional_paras).build(optional_paras) history = model.train(training_set, validation_set) report = model.test(testing_set) Listing 1. An example to evaluate GCN on a graph using GraphGallery https://numpy.org/ We showcase a typical example for GraphGallery in Listing1, here a simple GCN [1] with default hyperparameters hasbeen deployed and tested. Developers only need to feed thedataset to initialize the corresponding model class (line 5),and optionally specify the hyperparameters into preprocessingand building methods: process and build (line 6). Thecomplete training pipeline is applied to optimize the modeland the training details are automatically recorded (line 7).Finally, GraphGallery will assess the model on the testing setand generate the corresponding report (line 8).
C. Multiple Frameworks Support
GraphGallery supports multiple frameworks: TensorFlowand PyTorch. Alternatively, developers can write codes withTensorFlow or PyTorch backend without considering the dif-ferences of APIs. In addition, the model gallery introducedin Sec. IV-B has also multiple versions of implementations.Developers can dynamically switch it to another backend byusing graphgallery.set_backend(backend) , where backend could be
TensorFlow or PyTorch . For example, sup-pose developers want to deploy a GCN model with PyTorchimplementation, they can directly use the codes in Listing 1with an extra line of code set_backend("PyTorch") above, and likewise for deploying model with TensorFlowimplementations. This indeed gives the developers a highdegree of flexibility and requires less effort in programmingwith different frameworks.
D. PyG and DGL Integration
PyG and DGL can be integrated seamlessly into Graph-Gallery. For researchers that develop GNN models using PyGor DGL, the integration requires a small amount of effort, sinceall these works have in common that training and inference canbe formulated as a unified operation over reusable data flows.Alternatively, GraphGallery also has a line of implementationsusing PyG and DGL, which researchers are free to use bydynamically switching the backend to another like Sec.IV-C,e.g., set_backend("PyG") .V. B
ENCHMARK AND E VALUATION
In this section, we present numerical experiments withthe proposed platform. For our experiment, three mainstreamGNNs, GCN [1], SGC [15] and GAT [16] are selected andevaluated on three common citation datasets (CiteSeer, Coraand PubMed [17]) with TensorFlow and PyTorch backend,respectively.
Setup.
In order to evaluate the correctness and reliability ofGraphGallery implementations, we compare all methods withcorresponding implementations from other GNNs toolboxes:PyG and DGL. We keep model architectures and hyperparam-eters the same as the original literature or implementations.Follow this work [1], we use fixed dataset splits and report theaverage accuracy and standard deviation for semi-supervisednode classification performance over ten runs with differentrandom seeds. The experiments are running on a single
ABLE IIB
ENCHMARK : E
VALUATION R ESULTS OF S ELECTED M ODELS IN M ODEL G ALLERY . CiteSeer Cora PubMedGCN SGC GAT GCN SGC GAT GCN SGC GAT
PyG 70.76 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± NVIDIA TITAN RTX GPU, using TensorFlow 2.1.2, PyTorch1.6, PyG 1.6.1, and DGL 0.5.2.
Reliability of implementation.
Table II shows the perfor-mance of three GNN models with different implementations.Here we only show partial results of our experiments for thesake of brevity. As we can see, accuracy values obtained by ourPyTorch implementation are almost the same with PyG andDGL (here DGL adopts PyTorch backend). The differencesare within 0.5%. The remaining TensorFlow backend has arelatively larger difference than the rest due to different deeplearning frameworks. However, the maximum difference is0.8% and the average difference is admissible around 0.6%.Therefore, the above results show that our implementationsare reliable across different frameworks and toolboxes.
Fast coding.
As described in Listing 1, the benchmarkingexperiments could be finished within a few lines of code.Specifically, after loading the data, the model can be set upautomatically. Moreover, training and inference pipelines arebuilt with the provided interfaces. And for developers whowish to use another framework implementation of the selectedmodel, what they simply need is one extra line above to call the set_backend function. In a word, the whole benchmarkingexperiment takes much less effort and could be completedwithin a few lines of code. In contrast, if developers plan to runbenchmarks on a couple of models without GraphGallery, theyneed to implement their own training and inference procedureswhile considering lots of tedious works, which involves tens ofcode and more time. Above all, GraphGallery is a fast-codingand user-friendly platform for researchers and developers.VI. C
ONCLUSION
We open-source GraphGallery, a platform for fast bench-marking and easy development of GNNs based software.GraphGallery supports multiple frameworks, it hides cum-bersome details from developers and performs training andoptimization automatically. It is easy for absolute beginnersand inexperienced developers to deploy GNNs based software,and it also benefits researchers to simplify implementing andworking with GNNs. In the future, we hope more researchersand developers in the deep graph learning community willuse GraphGallery for software development and benchmarkingexperiment to accelerate their works.VII. A
CKNOWLEDGEMENT
The research is supported by the Key-Area Researchand Development Program of Guangdong Province (No. 2020B010165003), the National Natural Science Foundationof China (No. U1711267), the Guangdong Basic and Ap-plied Basic Research Foundation (No. 2020A1515010831),and the Program for Guangdong Introducing Innovative andEntrepreneurial Teams (No. 2017ZT07X355).R
EFERENCES[1] T. N. Kipf and M. Welling, “Semi-supervised classification with graphconvolutional networks,” in
ICLR , 2017.[2] L. Chen, Y. Liu, X. He, L. Gao, and Z. Zheng, “Matching user with itemset: Collaborative bundle recommendation with deep attention network,”in
IJCAI , 2019, pp. 2095–2101.[3] L. Chen, Y. Liu, Z. Zheng, and P. Yu, “Heterogeneous neural attentivefactorization machine for rating prediction,” in
CIKM . ACM, 2018, pp.833–842.[4] J. Zhang, X. Wang, H. Zhang, H. Sun, K. Wang, and X. Liu, “A novelneural source code representation based on abstract syntax tree,” in
ICSE . IEEE / ACM, 2019, pp. 783–794.[5] N. D. Q. Bui, Y. Yu, and L. Jiang, “Autofocus: Interpreting attention-based neural networks by code perturbation,” in
ASE . IEEE, 2019, pp.38–41.[6] A. LeClair, S. Haque, L. Wu, and C. McMillan, “Improved codesummarization via a graph neural network,” in
ICPC . ACM, 2020,pp. 184–195.[7] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, andJ. Leskovec, “Graph convolutional neural networks for web-scale rec-ommender systems,” in
KDD , 2018, pp. 974–983.[8] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin,S. Ghemawat, G. Irving, M. Isard et al. , “Tensorflow: A system forlarge-scale machine learning,” in
OSDI , 2016, pp. 265–283.[9] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan,T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al. , “Pytorch: Animperative style, high-performance deep learning library,” in
NeurIPS ,2019, pp. 8026–8037.[10] M. Fey and J. E. Lenssen, “Fast graph representation learning withPyTorch Geometric,” in
ICLR Workshop on Representation Learningon Graphs and Manifolds , 2019.[11] M. Wang, D. Zheng, Z. Ye, Q. Gan, M. Li, X. Song, J. Zhou, C. Ma,L. Yu, Y. Gai, T. Xiao, T. He, G. Karypis, J. Li, and Z. Zhang, “Deepgraph library: A graph-centric, highly-performant package for graphneural networks,” arXiv preprint arXiv:1909.01315 , 2019.[12] L. Ma, Z. Yang, Y. Miao, J. Xue, M. Wu, L. Zhou, and Y. Dai,“Neugraph: Parallel deep neural network computation on large graphs,”in
USENIX . USENIX Association, 2019, pp. 443–458.[13] D. Grattarola and C. Alippi, “Graph neural networks in tensorflow andkeras with spektral,” arXiv preprint arXiv:2006.12138 , 2020.[14] L. Chen, J. Li, J. Peng, T. Xie, Z. Cao, K. Xu, X. He, andZ. Zheng, “A survey of adversarial learning on graph,” arXiv preprintarXiv:2003.05730 , 2020.[15] F. Wu, A. H. S. Jr., T. Zhang, C. Fifty, T. Yu, and K. Q. Weinberger,“Simplifying graph convolutional networks,” in
ICML , vol. 97. PMLR,2019, pp. 6861–6871.[16] P. Veliˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Li`o, andY. Bengio, “Graph Attention Networks,”
ICLR , 2018.[17] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad, “Collective classification in network data,”