Shaikh Arifuzzaman
Virginia Bioinformatics Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shaikh Arifuzzaman.
conference on information and knowledge management | 2013
Shaikh Arifuzzaman; Maleq Khan; Madhav V. Marathe
Massive networks arising in numerous application areas poses significant challenges for network analysts as these networks grow to billions of nodes and are prohibitively large to fit in the main memory. Finding the number of triangles in a network is an important problem in the analysis of complex networks. Several interesting graph mining applications depend on the number of triangles in the graph. In this paper, we present an efficient MPI-based distributed memory parallel algorithm, called PATRIC, for counting triangles in massive networks. PATRIC scales well to networks with billions of nodes and can compute the exact number of triangles in a network with one billion nodes and 10 billion edges in 16 minutes. Balancing computational loads among processors for a graph problem like counting triangles is a challenging issue. We present and analyze several schemes for balancing load among processors for the triangle counting problem. These schemes achieve very good load balancing. We also show how our parallel algorithm can adapt an existing edge sparsification technique to approximate the number of triangles with very high accuracy. This modification allows us to count triangles in even larger networks.
international conference on e-science | 2012
Sherif Elmeligy Abdelhamid; Richard Aló; Shaikh Arifuzzaman; Peter H. Beckman; Hasanuzzaman Bhuiyan; Keith R. Bisset; Edward A. Fox; Geoffrey C. Fox; Kevin Hall; S. M. Shamimul Hasan; Anurodh Joshi; Maleq Khan; Chris J. Kuhlman; Spencer J. Lee; Jonathan P. Leidig; Hemanth Makkapati; Madhav V. Marathe; Henning S. Mortveit; Judy Qiu; S. S. Ravi; Zalia Shams; Ongard Sirisaengtaksin; Rajesh Subbiah; Samarth Swarup; Nick Trebon; Anil Vullikanti; Zhao Zhao
Networks are an effective abstraction for representing real systems. Consequently, network science is increasingly used in academia and industry to solve problems in many fields. Computations that determine structure properties and dynamical behaviors of networks are useful because they give insights into the characteristics of real systems. We introduce a newly built and deployed cyberinfrastructure for network science (CINET) that performs such computations, with the following features: (i) it offers realistic networks from the literature and various random and deterministic network generators; (ii) it provides many algorithmic modules and measures to study and characterize networks; (iii) it is designed for efficient execution of complex algorithms on distributed high performance computers so that they scale to large networks; and (iv) it is hosted with web interfaces so that those without direct access to high performance computing resources and those who are not computing experts can still reap the system benefits. It is a combination of application design and cyberinfrastructure that makes these features possible. To our knowledge, these capabilities collectively make CINET novel. We describe the system and illustrative use cases, with a focus on the CINET user.
international conference on big data | 2015
Shaikh Arifuzzaman; Maleq Khan; Madhav V. Marathe
Finding the number of triangles in a graph (network) is an important problem in graph analysis. The number of triangles also has important applications in graph mining. Big graphs emerging from numerous application areas pose a significant challenge for the analysis and mining since these graphs consist of millions, or even billions, of nodes and edges. Graphs of such scale necessitate the development of efficient parallel algorithms. Existing distributed memory parallel algorithms for counting exact triangles are either Map-Reduce or message passing interface (MPI) based. Map-Reduce based algorithms generate prohibitively large intermediate data and do not demonstrate reasonably good runtime efficiency. The MPI based algorithms offer fast computation of the number of triangles. However, the partitioning and load balancing schemes these algorithms employ are static in nature - the partitions are precomputed based on some estimations. In this paper, we present an efficient MPI-based parallel algorithm for counting triangles in large graph. We consider the case where the main memory of each compute node is large enough to contain the entire graph. We observe that for such a case, computation load can be balanced dynamically and present a dynamic load balancing scheme which improves the performance of the algorithm significantly. Our algorithm demonstrates very good speedups and scales to a large number of processors. The algorithm computes the exact number of triangles in a network with 1 billion edges in 2 minutes with only 100 processors. Our results demonstrate that the algorithm is significantly faster than the related algorithms with static partitioning. In fact, for the real-world networks we experimented on, our algorithm achieves at least 2 times runtime efficiency over the fastest algorithm with static load balancing.
high performance computing and communications | 2015
Shaikh Arifuzzaman; Maleq Khan; Madhav V. Marathe
Finding the number of triangles in a network (graph) is an important problem in mining and analysis of complex networks. Massive networks emerging from numerous application areas pose a significant challenge in network analytics since these networks consist of millions, or even billions, of nodes and edges. Such massive networks necessitate the development of efficient parallel algorithms. There exist several MapReduce and an only MPI (Message Passing Interface) based distributed-memory parallel algorithms for counting triangles. MapReduce based algorithms generate prohibitively large intermediate data. The MPI based algorithm can work on quite large networks, however, the overlapping partitions employed by the algorithm limit its capability to deal with very massive networks. In this paper, we present a space-efficient MPI based parallel algorithm for counting exact number of triangles in massive networks. The algorithm divides the network into non-overlapping partitions. Our results demonstrate up to 25-fold space saving over the algorithm with overlapping partitions. This space efficiency allows the algorithm to deal with networks which are 25 times larger. We present a novel approach that reduces communication cost drastically (up to 90%) leading to both a space-and runtime-efficient algorithm. Our adaptation of a parallel partitioning scheme by computing a novel weight function adds further to the efficiency of the algorithm. Denoting average degree of nodes and the number of partitions by d and P, respectively, our algorithm achieves up to O(P2)-factor space efficiency over existing MapReduce based algorithms and up to d-factor (approx.) over the algorithm with overlapping partitioning.
ieee international conference on high performance computing data and analytics | 2012
Shaikh Arifuzzaman; Maleq Khan; Madhav V. Marathe
We present MPI-based parallel algorithms for counting triangles and computing clustering coefficients in massive networks. Counting triangles is important in the analysis of various networks, e.g., social, biological, web etc. Emerging massive networks do not fit in the main memory of a single machine and are very challenging to work with. Our distributed-memory parallel algorithm allows us to deal with such massive networks in a time- and space-efficient manner. We were able to count triangles in a graph with 2 billions of nodes and 50 billions of edges in 10 minutes. Our parallel algorithm for computing clustering coefficients uses efficient external memory aggregation. We also show how edge sparsification technique can be used with our parallel algorithm to find approximate number of triangles without sacrificing the accuracy of estimation. In addition, we propose a simple modification of a state-of-the-art sequential algorithm that improves both runtime and space requirement.
arXiv: Distributed, Parallel, and Cluster Computing | 2017
Shaikh Arifuzzaman; Maleq Khan; Madhav V. Marathe
arXiv: Distributed, Parallel, and Cluster Computing | 2014
Shaikh Arifuzzaman; Maleq Khan; Madhav V. Marathe
dependable autonomic and secure computing | 2017
Shaikh Arifuzzaman; Bikesh Pandey
Archive | 2016
Shaikh Arifuzzaman
ieee international conference on high performance computing data and analytics | 2015
Shaikh Arifuzzaman; Maleq Khan