Hassan Chafi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hassan Chafi is active.

Explore More

Publication

Featured researches published by Hassan Chafi.

international conference on management of data | 2015

The LDBC Social Network Benchmark: Interactive Workload

Orri Erling; Alex Averbuch; Josep-lluis Larriba-pey; Hassan Chafi; Andrey Gubichev; Arnau Prat; Minh-Duc Pham; Peter A. Boncz

The Linked Data Benchmark Council (LDBC) is now two years underway and has gathered strong industrial participation for its mission to establish benchmarks, and benchmarking practices for evaluating graph data management systems. The LDBC introduced a new choke-point driven methodology for developing benchmark workloads, which combines user input with input from expert systems architects, which we outline. This paper describes the LDBC Social Network Benchmark (SNB), and presents database benchmarking innovation in terms of graph query functionality tested, correlated graph generation techniques, as well as a scalable benchmark driver on a workload with complex graph dependencies. SNB has three query workloads under development: Interactive, Business Intelligence, and Graph Algorithms. We describe the SNB Interactive Workload in detail and illustrate the workload with some early results, as well as the goals for the two other workloads.

ieee international conference on high performance computing data and analytics | 2015

PGX.D: a fast distributed graph processing engine

Sungpack Hong; Siegfried Depner; Thomas Manhardt; Jan Van Der Lugt; Merijn Verstraaten; Hassan Chafi

Graph analysis is a powerful method in data analysis. Although several frameworks have been proposed for processing large graph instances in distributed environments, their performance is much lower than using efficient single-machine implementations provided with enough memory. In this paper, we present a fast distributed graph processing system, namely PGX.D. We show that PGX.D outperforms other distributed graph systems like GraphLab significantly (3x -- 90x). Furthermore, PGX.D on 4 to 16 machines is also faster than an implementation optimized for single-machine execution. Using a fast cooperative context-switching mechanism, we implement PGX.D as a low-overhead, bandwidth-efficient communication framework that supports remote data-pulling patterns. Moreover, PGX.D achieves large traffic reduction and good workload balance by applying selective ghost nodes, edge partitioning, and edge chunking transparently to the user. Our analysis confirms that each of these features is indeed crucial for overall performance of certain kinds of graph algorithms. Finally, we advocate the use of balanced beefy clusters where the sustained random DRAM-access bandwidth in aggregate is matched with the bandwidth of the underlying interconnection fabric.

Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems | 2016

PGQL: a property graph query language

Oskar van Rest; Sungpack Hong; Jinha Kim; Xuming Meng; Hassan Chafi

Graph-based approaches to data analysis have become more widespread, which has given need for a query language for graphs. Such a graph query language needs not only SQL-like functionality for querying structured data, but also intrinsic support for typical graph-style applications: reachability analysis, path finding and graph construction.n We propose a new query language for the popular Property Graph (PG) data model: the Property Graph Query Language (PGQL). PGQL is based on the paradigm of graph pattern matching, closely follows syntactic structures of SQL, and provides regular path queries with conditions on labels and properties to allow for reachability and path finding queries. Besides intrinsic vertex, edge and path types, PGQL also has the graph as intrinsic type and allows for graph construction and query composition.

very large data bases | 2015

Taming subgraph isomorphism for RDF query processing

Jin-Ha Kim; Hyungyu Shin; Wook-Shin Han; Sungpack Hong; Hassan Chafi

RDF data are used to model knowledge in various areas such as life sciences, Semantic Web, bioinformatics, and social graphs. The size of real RDF data reaches billions of triples. This calls for a framework for efficiently processing RDF data. The core function of processing RDF data is subgraph pattern matching. There have been two completely different directions for supporting efficient subgraph pattern matching. One direction is to develop specialized RDF query processing engines exploiting the properties of RDF data for the last decade, while the other direction is to develop efficient subgraph isomorphism algorithms for general, labeled graphs for over 30 years. Although both directions have a similar goal (i.e., finding subgraphs in data graphs for a given query graph), they have been independently researched without clear reason. We argue that a subgraph isomorphism algorithm can be easily modified to handle the graph homomorphism, which is the RDF pattern matching semantics, by just removing the injectivity constraint. In this paper, based on the state-of-the-art subgraph isomorphism algorithm, we propose an in-memory solution, TurboHOM++, which is tamed for the RDF processing, and we compare it with the representative RDF processing engines for several RDF benchmarks in a server machine where billions of triples can be loaded in memory. In order to speed up TurboHOM++, we also provide a simple yet effective transformation and a series of optimization techniques. Extensive experiments using several RDF benchmarks show that TurboHOM++ consistently and significantly outperforms the representative RDF engines. Specifically, TurboHOM++ outperforms its competitors by up to five orders of magnitude.

First International Workshop on Graph Data Management Experiences and Systems | 2013

Graph analysis: do we have to reinvent the wheel?

Adam Welc; Raghavan Raman; Zhe Wu; Sungpack Hong; Hassan Chafi; Jay Banerjee

The problem of efficiently analyzing graphs of various shapes and sizes has been recently enjoying an increased level of attention both in the academia and in the industry. This trend prompted creation of specialized graph databases that have been rapidly gaining popularity of late. In this paper we argue that there exist alternatives to graph databases, providing competitive or superior performance, that do not require replacement of the entire existing storage infrastructure by the companies wishing to deploy them.

social network mining and analysis | 2014

Fast In-Memory Triangle Listing for Large Real-World Graphs

Martin Sevenich; Sungpack Hong; Adam Welc; Hassan Chafi

Triangle listing, or identifying all the triangles in an undirected graph, is a very important graph problem that serves as a building block of many other graph algorithms. The compute-intensive nature of the problem, however, necessitates an efficient method to solve this problem, especially for large real-world graphs. In this paper we propose a fast and precise in-memory solution for the triangle listing problem. Our solution includes fast common neighborhoods finding methods that consider power law degree distribution of real-word graphs. We prove how theoretic lower bound can be achieved by sorting the nodes in the graph by their degree and applying pruning. We explain how our techniques can be applied automatically by an optimizing DSL compiler. Our experiments show that hundreds of billions of triangles in a five billion edge graph can be enumerated in about a minute with a single server-class machine.

Proceedings of Workshop on GRAph Data management Experiences and Systems | 2014

PGX.ISO: Parallel and Efficient In-Memory Engine for Subgraph Isomorphism

Raghavan Raman; Oskar van Rest; Sungpack Hong; Zhe Wu; Hassan Chafi; Jay Banerjee

Subgraph isomorphism, or finding matching patterns in a graph, is a classic graph problem that has many practical use cases. There are even commercialized solutions for this problem such as RDF databases with their support for SPARQL queries. In this paper, we present an efficient, parallel in-memory solution to this problem. Our solution exploits efficient data representations as well as algorithmic extensions, both tailored for parallel, in-memory processing. Moreover, when processing RDF data, we reduce the problem size by converting certain nodes and edges into properties. We also propose a new graph query language where such a conversion can be encoded. Our evaluation shows that our solution can achieve significant performance boost over an existing secondary storage based RDF database.

very large data bases | 2016

Using domain-specific languages for analytic graph databases

Martin Sevenich; Sungpack Hong; Oskar van Rest; Zhe Wu; Jayanta Banerjee; Hassan Chafi

Recently graph has been drawing lots of attention both as a natural data model that captures fine-grained relationships between data entities and as a tool for powerful data analysis that considers such relationships. In this paper, we present a new graph database system that integrates a robust graph storage with an efficient graph analytics engine. Primarily, our system adopts two domain-specific languages (DSLs), one for describing graph analysis algorithms and the other for graph pattern matching queries. Compared to the API-based approaches in conventional graph processing systems, the DSL-based approach provides users with more flexible and intuitive ways of expressing algorithms and queries. Moreover, the DSL-based approach has significant performance benefits as well, (1) by skipping (remote) API invocation overhead and (2) by applying high-level optimization from the compiler.

First International Workshop on Graph Data Management Experiences and Systems | 2013

Early experiences in using a domain-specific language for large-scale graph analysis

Sungpack Hong; Jan Van Der Lugt; Adam Welc; Raghavan Raman; Hassan Chafi

Large-scale graph analysis has recently been drawing lots of attention from both industry and academia. Although there are already several frameworks designed for scalable graph analysis, e.g. Giraph [1], all these frameworks adopt non-traditional programming models and APIs. This can significantly lower the productivity of the framework user. This paper discusses the feasibility of using an intuitive Domain-Specific Language (DSL) for graph analysis. Specifically, we use a compiler to translate Green-Marl [5] programs into an equivalent Giraph application, automatically bridging between very different programming models. We observe that the DSL programs are concise and intuitive, and that the compiler generated Giraph implementations exhibit performance on par with that of hand-written ones. However, the DSL compilation cannot but fail if the algorithm is fundamentally not compatible with the target framework. Overall, we believe that the DSL-based approach will provide great productivity benefits once it matures.

Journal of Parallel and Distributed Computing | 2017

Modeling, analysis, and experimental comparison of streaming graph-partitioning policies

Yong Guo; Sungpack Hong; Hassan Chafi; Alexandru Iosup; Dick H. J. Epema

Abstract In recent years, many distributed graph-processing systems have been designed and developed to analyze large-scale graphs. For all distributed graph-processing systems, partitioning graphs is a key part of processing and an important aspect to achieve good processing performance. To keep low the overhead of partitioning graphs, even when processing the ever-increasing modern graphs, many previous studies use lightweight streaming graph-partitioning policies. Although many such policies exist, currently there is no comprehensive study of their impact on load balancing and communication overheads, and on the overall performance of graph-processing systems. This relative lack of understanding hampers the development and tuning of new streaming policies, and could limit the entire research community to the existing classes of policies. We address these issues in this work. We begin by modeling the execution time of distributed graph-processing systems. By analyzing this model under the load of realistic graph-data characteristics, we propose a method to identify important performance issues and then design new streaming graph-partitioning policies to address them. By using three typical large-scale graphs and three popular graph-processing algorithms, we conduct comprehensive experiments to study the performance of our and of many alternative streaming policies on a real distributed graph-processing system. We also explore the impact on performance of using different real-world networks and of other real-world technical details. We further discuss how to use our results, the coverage of our model and method, and the design of future partitioning policies.

Explore More