Patrick Flick
Georgia Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Patrick Flick.
ieee international conference on high performance computing data and analytics | 2015
Patrick Flick; Chirag Jain; Tony Pan; Srinivas Aluru
Dramatic advances in DNA sequencing technology have made it possible to study microbial environments by direct sequencing of environmental DNA samples. Yet, due to the huge volume and high data complexity, current de novo assemblers cannot handle large metagenomic datasets or fail to perform assembly with acceptable quality. This paper presents the first parallel solution for decomposing the metagenomic assembly problem without compromising the post-assembly quality. We transform this problem into that of finding weakly connected components in the de Bruijn graph. We propose a novel distributed memory algorithm to identify the connected subgraphs, and present strategies to minimize the communication volume. We demonstrate the scalability of our algorithm on a soil metagenome dataset with 1.8 billion reads. Our approach achieves a runtime of 22 minutes using 1280 Intel Xeon cores for a 421 GB uncompressed FASTQ dataset. Moreover, our solution is generalizable to finding connected components in arbitrary undirected graphs.
ieee international conference on high performance computing data and analytics | 2015
Patrick Flick; Srinivas Aluru
Suffix arrays and trees are fundamental string data structures of importance to many applications in computational biology. Consequently, their parallel construction is an actively studied problem. To date, algorithms with best practical performance lack efficient worst-case run-time guarantees, and vice versa. In addition, much of the recent work targeted low core count, shared memory parallelization. In this paper, we present parallel algorithms for distributed memory construction of suffix arrays and longest common prefix (LCP) arrays that simultaneously achieve good worst-case run-time bounds and superior practical performance. Our algorithms run in O(Tsort(n, p) · log n) worst-case time where Tsort(n, p) is the run-time of parallel sorting. We present several algorithm engineering techniques that improve performance in practice. We demonstrate the construction of suffix and LCP arrays of the human genome in less than 8 seconds on 1,024 Intel Xeon cores, reaching speedups of over 110X compared to the best sequential suffix array construction implementation divsufsort.
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2017
Tony Pan; Patrick Flick; Chirag Jain; Yongchao Liu; Srinivas Aluru
Counting and indexing fixed length substrings, or
Scientific Reports | 2018
Dv Klopfenstein; Liangsheng Zhang; Brent S. Pedersen; Fidel Ramírez; Alex Warwick Vesztrocy; Aurélien Naldi; Christopher J. Mungall; Jeffrey M. Yunes; Olga Botvinnik; Mark Weigel; Will Dampier; Christophe Dessimoz; Patrick Flick; Haibao Tang
k
international parallel and distributed processing symposium | 2017
Patrick Flick; Srinivas Aluru
k-mers, in biological sequences is a key step in many bioinformatics tasks including genome alignment and mapping, genome assembly, and error correction. While advances in next generation sequencing technologies have dramatically reduced the cost and improved latency and throughput, few bioinformatics tools can efficiently process the datasets at the current generation rate of 1.8 terabases per 3-day experiment from a single sequencer. We present Kmerind, a high performance parallel
Archive | 2015
Haibao Tang; Patrick Flick; Kenta Sato (佐藤 建太); Fidel Ramírez; Debra Klopfenstein; Christopher J. Mungall; Jeff Yunes; Brent S. Pedersen
k
IEEE Transactions on Parallel and Distributed Systems | 2017
Chirag Jain; Patrick Flick; Tony Pan; Oded Green; Srinivas Aluru
k-mer indexing library for distributed memory environments. The Kmerind library provides a set of simple and consistent APIs with sequential semantics and parallel implementations that are designed to be flexible and extensible. Kmerinds
Archive | 2016
Chirag Jain; Patrick Flick; Tony Pan; Oded Green; Srinivas Aluru
k
parallel computing | 2017
Patrick Flick; Chirag Jain; Tony Pan; Srinivas Aluru
k-mer counter performs similarly or better than the best existing
Archive | 2016
Dv Klopfenstein; Patrick Flick; Douglas Myers-Turnbull; Fidel Ramírez; Brent Pedersen Bioinformatics; Christopher J. Mungall; Jeff Yunes; Kenta Sato (佐藤 建太); Mark Fiers; chri; Haibao Tang
k