Is this you? Create Your Porfile

Patrick Flick

Georgia Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Patrick Flick is active.

Explore More

Publication

Featured researches published by Patrick Flick.

ieee international conference on high performance computing data and analytics | 2015

A parallel connectivity algorithm for de Bruijn graphs in metagenomic applications

Patrick Flick; Chirag Jain; Tony Pan; Srinivas Aluru

Dramatic advances in DNA sequencing technology have made it possible to study microbial environments by direct sequencing of environmental DNA samples. Yet, due to the huge volume and high data complexity, current de novo assemblers cannot handle large metagenomic datasets or fail to perform assembly with acceptable quality. This paper presents the first parallel solution for decomposing the metagenomic assembly problem without compromising the post-assembly quality. We transform this problem into that of finding weakly connected components in the de Bruijn graph. We propose a novel distributed memory algorithm to identify the connected subgraphs, and present strategies to minimize the communication volume. We demonstrate the scalability of our algorithm on a soil metagenome dataset with 1.8 billion reads. Our approach achieves a runtime of 22 minutes using 1280 Intel Xeon cores for a 421 GB uncompressed FASTQ dataset. Moreover, our solution is generalizable to finding connected components in arbitrary undirected graphs.

ieee international conference on high performance computing data and analytics | 2015

Parallel distributed memory construction of suffix and longest common prefix arrays

Patrick Flick; Srinivas Aluru

Suffix arrays and trees are fundamental string data structures of importance to many applications in computational biology. Consequently, their parallel construction is an actively studied problem. To date, algorithms with best practical performance lack efficient worst-case run-time guarantees, and vice versa. In addition, much of the recent work targeted low core count, shared memory parallelization. In this paper, we present parallel algorithms for distributed memory construction of suffix arrays and longest common prefix (LCP) arrays that simultaneously achieve good worst-case run-time bounds and superior practical performance. Our algorithms run in O(Tsort(n, p) · log n) worst-case time where Tsort(n, p) is the run-time of parallel sorting. We present several algorithm engineering techniques that improve performance in practice. We demonstrate the construction of suffix and LCP arrays of the human genome in less than 8 seconds on 1,024 Intel Xeon cores, reaching speedups of over 110X compared to the best sequential suffix array construction implementation divsufsort.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2017

Kmerind: A Flexible Parallel Library for K-mer Indexing of Biological Sequences on Distributed Memory Systems

Tony Pan; Patrick Flick; Chirag Jain; Yongchao Liu; Srinivas Aluru

Counting and indexing fixed length substrings, or

Scientific Reports | 2018

GOATOOLS: A Python library for Gene Ontology analyses

Dv Klopfenstein; Liangsheng Zhang; Brent S. Pedersen; Fidel Ramírez; Alex Warwick Vesztrocy; Aurélien Naldi; Christopher J. Mungall; Jeffrey M. Yunes; Olga Botvinnik; Mark Weigel; Will Dampier; Christophe Dessimoz; Patrick Flick; Haibao Tang

international parallel and distributed processing symposium | 2017

Parallel Construction of Suffix Trees and the All-Nearest-Smaller-Values Problem

Patrick Flick; Srinivas Aluru

k-mers, in biological sequences is a key step in many bioinformatics tasks including genome alignment and mapping, genome assembly, and error correction. While advances in next generation sequencing technologies have dramatically reduced the cost and improved latency and throughput, few bioinformatics tools can efficiently process the datasets at the current generation rate of 1.8 terabases per 3-day experiment from a single sequencer. We present Kmerind, a high performance parallel

Archive | 2015

GOATOOLS: Tools for Gene Ontology

Haibao Tang; Patrick Flick; Kenta Sato (佐藤建太); Fidel Ramírez; Debra Klopfenstein; Christopher J. Mungall; Jeff Yunes; Brent S. Pedersen

IEEE Transactions on Parallel and Distributed Systems | 2017

An Adaptive Parallel Algorithm for Computing Connected Components

Chirag Jain; Patrick Flick; Tony Pan; Oded Green; Srinivas Aluru

k-mer indexing library for distributed memory environments. The Kmerind library provides a set of simple and consistent APIs with sequential semantics and parallel implementations that are designed to be flexible and extensible. Kmerinds

Archive | 2016

An Adaptive Parallel Algorithm for Computing Connectivity.

Chirag Jain; Patrick Flick; Tony Pan; Oded Green; Srinivas Aluru

parallel computing | 2017

Reprint of “A parallel connectivity algorithm for de Bruijn graphs in metagenomic applications”

Patrick Flick; Chirag Jain; Tony Pan; Srinivas Aluru

k-mer counter performs similarly or better than the best existing

Archive | 2016

goatools: GOATOOLS v0.6.4

Dv Klopfenstein; Patrick Flick; Douglas Myers-Turnbull; Fidel Ramírez; Brent Pedersen Bioinformatics; Christopher J. Mungall; Jeff Yunes; Kenta Sato (佐藤建太); Mark Fiers; chri; Haibao Tang

Explore More