Ahmed Shamsul Arefin
University of Newcastle
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ahmed Shamsul Arefin.
PLOS ONE | 2012
Ahmed Shamsul Arefin; Carlos Riveros; Regina Berretta; Pablo Moscato
Background The analysis of biological networks has become a major challenge due to the recent development of high-throughput techniques that are rapidly producing very large data sets. The exploding volumes of biological data are craving for extreme computational power and special computing facilities (i.e. super-computers). An inexpensive solution, such as General Purpose computation based on Graphics Processing Units (GPGPU), can be adapted to tackle this challenge, but the limitation of the device internal memory can pose a new problem of scalability. An efficient data and computational parallelism with partitioning is required to provide a fast and scalable solution to this problem. Results We propose an efficient parallel formulation of the k-Nearest Neighbour (kNN) search problem, which is a popular method for classifying objects in several fields of research, such as pattern recognition, machine learning and bioinformatics. Being very simple and straightforward, the performance of the kNN search degrades dramatically for large data sets, since the task is computationally intensive. The proposed approach is not only fast but also scalable to large-scale instances. Based on our approach, we implemented a software tool GPU-FS-kNN (GPU-based Fast and Scalable k-Nearest Neighbour) for CUDA enabled GPUs. The basic approach is simple and adaptable to other available GPU architectures. We observed speed-ups of 50–60 times compared with CPU implementation on a well-known breast microarray study and its associated data sets. Conclusion Our GPU-based Fast and Scalable k-Nearest Neighbour search technique (GPU-FS-kNN) provides a significant performance improvement for nearest neighbour computation in large-scale networks. Source code and the software tool is available under GNU Public License (GPL) at https://sourceforge.net/p/gpufsknn/.
Neurogenetics | 2014
Michaela D. Filiou; Ahmed Shamsul Arefin; Pablo Moscato; Manuel B. Graeber
Abstract‘Neuroinflammation’ has become a widely applied term in the basic and clinical neurosciences but there is no generally accepted neuropathological tissue correlate. Inflammation, which is characterized by the presence of perivascular infiltrates of cells of the adaptive immune system, is indeed seen in the central nervous system (CNS) under certain conditions. Authors who refer to microglial activation as neuroinflammation confuse this issue because autoimmune neuroinflammation serves as a synonym for multiple sclerosis, the prototypical inflammatory disease of the CNS. We have asked the question whether a data-driven, unbiased in silico approach may help to clarify the nomenclatorial confusion. Specifically, we have examined whether unsupervised analysis of microarray data obtained from human cerebral cortex of Alzheimers, Parkinsons and schizophrenia patients would reveal a degree of relatedness between these diseases and recognized inflammatory conditions including multiple sclerosis. Our results using two different data analysis methods provide strong evidence against this hypothesis demonstrating that very different sets of genes are involved. Consequently, the designations inflammation and neuroinflammation are not interchangeable. They represent different categories not only at the histophenotypic but also at the transcriptomic level. Therefore, non-autoimmune neuroinflammation remains a term in need of definition.
PLOS ONE | 2012
Ahmed Shamsul Arefin; Luke Mathieson; Daniel M. Johnstone; Regina Berretta; Pablo Moscato
Background One primary goal of transcriptomic studies is identifying gene expression patterns correlating with disease progression. This is usually achieved by considering transcripts that independently pass an arbitrary threshold (e.g. p<0.05). In diseases involving severe perturbations of multiple molecular systems, such as Alzheimer’s disease (AD), this univariate approach often results in a large list of seemingly unrelated transcripts. We utilised a powerful multivariate clustering approach to identify clusters of RNA biomarkers strongly associated with markers of AD progression. We discuss the value of considering pairs of transcripts which, in contrast to individual transcripts, helps avoid natural human transcriptome variation that can overshadow disease-related changes. Methodology/Principal Findings We re-analysed a dataset of hippocampal transcript levels in nine controls and 22 patients with varying degrees of AD. A large-scale clustering approach determined groups of transcript probe sets that correlate strongly with measures of AD progression, including both clinical and neuropathological measures and quantifiers of the characteristic transcriptome shift from control to severe AD. This enabled identification of restricted groups of highly correlated probe sets from an initial list of 1,372 previously published by our group. We repeated this analysis on an expanded dataset that included all pair-wise combinations of the 1,372 probe sets. As clustering of this massive dataset is unfeasible using standard computational tools, we adapted and re-implemented a clustering algorithm that uses external memory algorithmic approach. This identified various pairs that strongly correlated with markers of AD progression and highlighted important biological pathways potentially involved in AD pathogenesis. Conclusions/Significance Our analyses demonstrate that, although there exists a relatively large molecular signature of AD progression, only a small number of transcripts recurrently cluster with different markers of AD progression. Furthermore, considering the relationship between two transcripts can highlight important biological relationships that are missed when considering either transcript in isolation.
international conference on algorithms and architectures for parallel processing | 2011
Ahmed Shamsul Arefin; Mario Inostroza-Ponta; Luke Mathieson; Regina Berretta; Pablo Moscato
Novel analytical techniques have dramatically enhanced our understanding of many application domains including biological networks inferred from gene expression studies. However, there are clear computational challenges associated to the large datasets generated from these studies. The algorithmic solution of some NP-hard combinatorial optimization problems that naturally arise on the analysis of large networks is difficult without specialized computer facilities (i.e. supercomputers). In this work, we address the data clustering problem of large-scale biological networks with a polynomial-time algorithm that uses reasonable computing resources and is limited by the available memory. We have adapted and improved the MSTkNN graph partitioning algorithm and redesigned it to take advantage of external memory (EM) algorithms. We evaluate the scalability and performance of our proposed algorithm on a well-known breast cancer microarray study and its associated dataset.
international conference on computer science and education | 2012
Ahmed Shamsul Arefin; Carlos Riveros; Regina Berretta; Pablo Moscato
Data clustering is a distinctive method for analyzing complex networks in terms of functional relationships of the comprising elements. A number of graph-based algorithms have been proposed so far to tackle the complexity of the problem and many of them are based on the representation of data in the form of a minimum spanning tree (MST). In this work, we propose a graph-based agglomerative clustering method that is based the k-Nearest Neighbor (kNN) graphs and the Borůvkas-MST Algorithm, (termed as, kNN-MST-Agglomerative). The proposed method is inherently parallel and in addition it is applicable to a wide class of practical problems involving large datasets. We demonstrate the performance of our method on a set of real-world biological networks constructed from a renowned breast cancer study.
international conference on computational science and its applications | 2012
Ahmed Shamsul Arefin; Carlos Riveros; Regina Berretta; Pablo Moscato
Computation of the minimum spanning tree (MST) is a common task in numerous fields of research, such as pattern recognition, computer vision, network design (telephone, electrical, hydraulic, cable TV, computer, road networks etc.), VLSI layout, to name a few. However, for a large-scale dataset when the graphs are complete, classical MST computation algorithms become unsuitable on general purpose computers. Interestingly, in such a case the k-nearest neighbor (kNN) structure can provide an efficient solution to this problem. Only a few attempts were found in the literature that focus on solving the problem using the kNNs. This paper briefs the state-of-the-art strategies for the MST problem and a fast and scalable solution combining the classical Borůvkas MST algorithm and the kNN graph structure. The proposed algorithm is implemented for CUDA enabled GPUs kNN-Borůvka-GPU), but the basic approach is simple and adaptable to other available architectures. Speed-ups of 30-40 times compared with CPU implementation was observed for several large-scale synthetic and real world data sets.
computer and information technology | 2007
Ahmed Shamsul Arefin; M.A. Kashem Mia
The minimum edge-ranking spanning tree (MERST) problem on a graph is to find a spanning tree of G whose edge-ranking needs least number of ranks. Although polynomial-time algorithm to solve the minimum edge-ranking spanning tree problem on series-parallel graphs with bounded degrees has been found, but for the unbounded degrees no polynomial-time algorithm is known. In this paper, we prove that the minimum edge-ranking spanning tree problem on general series-parallel graph is NP-complete.
Australasian Conference on Artificial Life and Computational Intelligence | 2015
Ahmed Shamsul Arefin; Carlos Riveros; Regina Berretta; Pablo Moscato
In this work, we incorporate new edges from a paraclique-identification approach to the output of the MST-kNN graph partitioning method. We present a statistical analysis of the results on a dataset originated from a computational linguistic study of 84 Indo-European languages. We also present results from a computational stylistic study of 168 plays of the Shakespearean era. For the latter, results of the Kruskal-Wallis test 1 (observed vs. all permutations) showed a p-value of a 1.62E-11 and a Wilcoxon test a p-value of 8.1E-12. Overall, our results clearly show in both cases that the modified approach provides statistically more significant results than the use of the MST-kNN alone, thus providing a highly-scalable alternative and statistically sound approach for data clustering.
international conference on computer science and education | 2012
Ahmed Shamsul Arefin; Carlos Riveros; Regina Berretta; Pablo Moscato
A distance matrix is simply an n×n two-dimensional array that contains pairwise distances of a set of n points in a metric space. It has a wide range of usage in several fields of scientific research e.g., data clustering, machine learning, pattern recognition, image analysis, information retrieval, signal processing, bioinformatics etc. However, as the size of n increases, the computation of distance matrix becomes very slow or incomputable on traditional general purpose computers. In this paper, we propose an inexpensive and scalable data-parallel solution to this problem by dividing the computational tasks and data on GPUs. We demonstrate the performance of our method on a set of real-world biological networks constructed from a renowned breast cancer study.
computer and information technology | 2007
Ahmed Shamsul Arefin; K.M.M. Habib; R. Sultana; S.M.L. Kabir
A low cost microcontroller based device for multipurpose electronic learning has been developed. This paper describes the design methodology and the development of its hardware and software. Four applications and the development of the general purpose device have been explained in this paper. The technique presented can be applied, following the approach described in a small or in a large project.