The Vldb Journal | 2021

Distributed temporal graph analytics with GRADOOP

 
 
 
 
 
 
 
 
 

Abstract


Temporal property graphs are graphs whose structure and properties change over time. Temporal graph datasets tend to be large due to stored historical information, asking for scalable analysis capabilities. We give a complete overview of Gradoop, a graph dataflow system for scalable, distributed analytics of temporal property graphs which has been continuously developed since 2005. Its graph model TPGM allows bitemporal modeling not only of vertices and edges but also of graph collections. A declarative analytical language called GrALa allows analysts to flexibly define analytical graph workflows by composing different operators that support temporal graph analysis. Built on a distributed dataflow system, large temporal graphs can be processed on a shared-nothing cluster. We present the system architecture of Gradoop, its data model TPGM with composable temporal graph operators, like snapshot, difference, pattern matching, graph grouping and several implementation details. We evaluate the performance and scalability of selected operators and a composed workflow for synthetic and real-world temporal graphs with up to 283\xa0M vertices and 1.8\xa0B edges, and a graph lifetime of about 8 years with up to 20\xa0M new edges per year. We also reflect on lessons learned from the Gradoop effort.

Volume None
Pages 1-27
DOI 10.1007/S00778-021-00667-4
Language English
Journal The Vldb Journal

Full Text