Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing | 2019

Graph sparsification with parallelization to optimize the identification of causal genes and dysregulated pathways

 
 
 
 
 

Abstract


Disease causing genes and their pathways can be identified by mapping genetic interaction sequences into a biological network and performing graph traversals based on certain approach specific parameters. Considering the large size of biological networks, one would expect their network graphs to be sparse, since all large real-world graphs, like that of social networks, tend to be sparse. Gene interaction network graphs of complex organisms, however, are large and dense due to the large number of interactions in them, and their inherent complex nature. Computational approaches to the task encounter real-time challenges with processing and mining relevant information, as traversal algorithms may have multiple runs through the network itself, compounding the time taken. A large running time greatly restrains the number of genes that may be found within a reasonable time period. However, integrating graph parallel processing techniques provided by the MapReduce utility of Apache Hadoop, the execution time is brought down to a certain extent using a multi-node cluster, which is significant reduction in time but yet, not optimal. In this paper, existing algorithms used in the traversal of molecular interaction networks for the identification of causal genes and dysregulated pathways are optimized, by integrating graph sparsification with MapReduce computing paradigm for parallel processing, to identify relevant complex interaction sequences in a more efficient manner than the existing algorithms.

Volume None
Pages None
DOI 10.1145/3297280.3297352
Language English
Journal Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing

Full Text