Md. Mostofa Ali Patwary

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Md. Mostofa Ali Patwary is active.

Explore More

Publication

Featured researches published by Md. Mostofa Ali Patwary.

international conference on data mining | 2011

Twitter Trending Topic Classification

Kathy Lee; Diana Palsetia; Ramanathan Narayanan; Md. Mostofa Ali Patwary; Ankit Agrawal; Alok N. Choudhary

With the increasing popularity of microblogging sites, we are in the era of information explosion. As of June 2011, about 200 million tweets are being generated everyday. Although Twitter provides a list of most popular topics people tweet about known as Trending Topics in real time, it is often hard to understand what these trending topics are about. Therefore, it is important and necessary to classify these topics into general categories with high accuracy for better information retrieval. To address this problem, we classify Twitter Trending Topics into 18 general categories such as sports, politics, technology, etc. We experiment with 2 approaches for topic classification, (i) the well-known Bag-of-Words approach for text classification and (ii) network-based classification. In text-based classification method, we construct word vectors with trending topic definition and tweets, and the commonly used tf-idf weights are used to classify the topics using a Naive Bayes Multinomial classifier. In network-based classification method, we identify top 5 similar topics for a given topic based on the number of common influential users. The categories of the similar topics and the number of common influential users between the given topic and its similar topics are used to classify the given topic using a C5.0 decision tree learner. Experiments on a database of randomly selected 768 trending topics (over 18 classes) show that classification accuracy of up to 65% and 70% can be achieved using text-based and network-based classification modeling respectively.

ieee international conference on high performance computing data and analytics | 2012

A new scalable parallel DBSCAN algorithm using the disjoint-set data structure

Md. Mostofa Ali Patwary; Diana Palsetia; Ankit Agrawal; Wei-keng Liao; Fredrik Manne; Alok N. Choudhary

DBSCAN is a well-known density based clustering algorithm capable of discovering arbitrary shaped clusters and eliminating noise data. However, parallelization of DBSCAN is challenging as it exhibits an inherent sequential data access order. Moreover, existing parallel implementations adopt a master-slave strategy which can easily cause an unbalanced workload and hence result in low parallel efficiency. We present a new parallel DBSCAN algorithm (PDSDBSCAN) using graph algorithmic concepts. More specifically, we employ the disjoint-set data structure to break the access sequentiality of DBSCAN. In addition, we use a tree-based bottom-up approach to construct the clusters. This yields a better-balanced workload distribution. We implement the algorithm both for shared and for distributed memory. Using data sets containing up to several hundred million high-dimensional points, we show that PDSDBSCAN significantly outperforms the master-slave approach, achieving speedups up to 25.97 using 40 cores on shared memory architecture, and speedups up to 5,765 using 8,192 cores on distributed memory architecture.

very large data bases | 2015

GraphMat: high performance graph analytics made productive

Narayanan Sundaram; Nadathur Satish; Md. Mostofa Ali Patwary; Subramanya R. Dulloor; Michael J. Anderson; Satya Gautam Vadlamudi; Dipankar Das; Pradeep Dubey

Given the growing importance of large-scale graph analytics, there is a need to improve the performance of graph analysis frameworks without compromising on productivity. GraphMat is our solution to bridge this gap between a user-friendly graph analytics framework and native, hand-optimized code. GraphMat functions by taking vertex programs and mapping them to high performance sparse matrix operations in the backend. We thus get the productivity benefits of a vertex programming framework without sacrificing performance. GraphMat is a single-node multicore graph framework written in C++ which has enabled us to write a diverse set of graph algorithms with the same effort compared to other vertex programming frameworks. GraphMat performs 1.1-7X faster than high performance frameworks such as GraphLab, CombBLAS and Galois. GraphMat also matches the performance of MapGraph, a GPU-based graph framework, despite running on a CPU platform with significantly lower compute and bandwidth resources. It achieves better multicore scalability (13-15X on 24 cores) than other frameworks and is 1.2X off native, hand-optimized code on a variety of graph algorithms. Since GraphMat performance depends mainly on a few scalable and well-understood sparse matrix operations, GraphMat can naturally benefit from the trend of increasing parallelism in future hardware.

ACM Transactions on Mathematical Software | 2013

ColPack: Software for graph coloring and related problems in scientific computing

Assefaw Hadish Gebremedhin; Duc C. Nguyen; Md. Mostofa Ali Patwary; Alex Pothen

We present a suite of fast and effective algorithms, encapsulated in a software package called ColPack, for a variety of graph coloring and related problems. Many of the coloring problems model partitioning needs arising in compression-based computation of Jacobian and Hessian matrices using Algorithmic Differentiation. Several of the coloring problems also find important applications in many areas outside derivative computation, including frequency assignment in wireless networks, scheduling, facility location, and concurrency discovery and data movement operations in parallel and distributed computing. The presentation in this article includes a high-level description of the various coloring algorithms within a common design framework, a detailed treatment of the theory and efficient implementation of known as well as new vertex ordering techniques upon which the coloring algorithms rely, a discussion of the packages software design, and an illustration of its usage. The article also includes an extensive experimental study of the major algorithms in the package using real-world as well as synthetically generated graphs.

workshop on algorithms and models for the web graph | 2013

Fast Algorithms for the Maximum Clique Problem on Massive Sparse Graphs

Bharath Pattabiraman; Md. Mostofa Ali Patwary; Assefaw Hadish Gebremedhin; Wei-keng Liao; Alok N. Choudhary

The maximum clique problem is a well known NP-Hard problem with applications in data mining, network analysis, information retrieval and many other areas related to the World Wide Web. There exist several algorithms for the problem with acceptable runtimes for certain classes of graphs, but many of them are infeasible for massive graphs. We present a new exact algorithm that employs novel pruning techniques and is able to quickly find maximum cliques in large sparse graphs. Extensive experiments on different kinds of synthetic and real-world graphs show that our new algorithm can be orders of magnitude faster than existing algorithms. We also present a heuristic that runs orders of magnitude faster than the exact algorithm while providing optimal or near-optimal solutions.

international world wide web conferences | 2014

Fast maximum clique algorithms for large graphs

Ryan A. Rossi; David F. Gleich; Assefaw Hadish Gebremedhin; Md. Mostofa Ali Patwary

We propose a fast, parallel maximum clique algorithm for large sparse graphs that is designed to exploit characteristics of social and information networks. Despite cliques status as an NP-hard problem with poor approximation guarantees, our method exhibits nearly linear runtime scaling over real-world networks ranging from 1000 to 100 million nodes. In a test on a social network with 1.8 billion edges, the algorithm finds the largest clique in about 20 minutes. Key to the efficiency of our algorithm are an initial heuristic procedure that finds a large clique quickly and a parallelized branch and bound strategy with aggressive pruning and ordering techniques. We use the algorithm to compute the largest temporal strong components of temporal contact networks.

international parallel and distributed processing symposium | 2012

Multi-core Spanning Forest Algorithms using the Disjoint-set Data Structure

Md. Mostofa Ali Patwary; Peder Refsnes; Fredrik Manne

We present new multi-core algorithms for computing spanning forests and connected components of large sparse graphs. The algorithms are based on the use of the disjoint-set data structure. When compared with the previous best algorithms for these problems our algorithms are appealing for several reasons: Extensive experiments using up to 40 threads on several different types of graphs show that they scale better. Also, the new algorithms do not make use of any hardware specific routines, and thus are highly portable. Finally, the algorithms are quite simple and easy to implement.

symposium on experimental and efficient algorithms | 2010

Experiments on union-find algorithms for the disjoint-set data structure

Md. Mostofa Ali Patwary; Jean R. S. Blair; Fredrik Manne

The disjoint-set data structure is used to maintain a collection of non-overlapping sets of elements from a finite universe. Algorithms that operate on this data structure are often referred to as Union-Find algorithms. They are used in numerous practical applications and are also available in several software libraries. This paper presents an extensive experimental study comparing the time required to execute 55 variations of Union-Find algorithms. The study includes all the classical algorithms, several recently suggested enhancements, and also different combinations and optimizations of these. Our results clearly show that a somewhat forgotten simple algorithm developed by Rem in 1976 is the fastest, in spite of the fact that its worst-case time complexity is inferior to that of the commonly accepted “best” algorithms.

International Journal of Climatology | 2010

Parallel greedy graph matching using an edge partitioning approach

Md. Mostofa Ali Patwary; Rob H. Bisseling; Fredrik Manne

We present a parallel version of the Karp-Sipser graph matching heuristic for the maximum cardinality problem. It is bulk-synchronous, separating computation and communication, and uses an edge-based partitioning of the graph, translated from a two-dimensional partitioning of the corresponding adjacency matrix. It is shown that the communication volume of Karp-Sipser graph matching is proportional to that of parallel sparse matrix-vector multiplication (SpMV), so that efficient partitioners developed for SpMV can be used. The algorithm is presented using a small basic set of 7 message types, which are discussed in detail. Experimental results show that for most matrices, edge-based partitioning is superior to vertex-based partitioning, in terms of both parallel speedup and matching quality. Good speedups are obtained on up to 64 processors.

ieee international conference on high performance computing, data, and analytics | 2015

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

Md. Mostofa Ali Patwary; Nadathur Satish; Narayanan Sundaram; Jongsoo Park; Michael J. Anderson; Satya Gautam Vadlamudi; Dipankar Das; Sergey G. Pudov; Vadim O. Pirogov; Pradeep Dubey

Sparse matrix-matrix multiplication (SpGEMM) is a key kernel in many applications in High Performance Computing such as algebraic multigrid solvers and graph analytics. Optimizing SpGEMM on modern processors is challenging due to random data accesses, poor data locality and load imbalance during computation. In this work, we investigate different partitioning techniques, cache optimizations (using dense arrays instead of hash tables), and dynamic load balancing on SpGEMM using a diverse set of real-world and synthetic datasets. We demonstrate that our implementation outperforms the state-of-the-art using Intel\(^{{\textregistered }}\) Xeon\(^{{\textregistered }}\) processors. We are up to 3.8X faster than Intel\(^{{\textregistered }}\) Math Kernel Library (MKL) and up to 257X faster than CombBLAS. We also outperform the best published GPU implementation of SpGEMM on nVidia GTX Titan and on AMD Radeon HD 7970 by up to 7.3X and 4.5X, respectively on their published datasets. We demonstrate good multi-core scalability (geomean speedup of 18.2X using 28 threads) as compared to MKL which gets 7.5X scaling on 28 threads.

Explore More