Is this you? Create Your Porfile

Paul Breimyer

North Carolina State University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Paul Breimyer is active.

Explore More

Publication

Featured researches published by Paul Breimyer.

Journal of Physics: Conference Series | 2008

Coupling graph perturbation theory with scalable parallel algorithms for large-scale enumeration of maximal cliques in biological graphs

Nagiza F. Samatova; Matthew C. Schmidt; William Hendrix; Paul Breimyer; Kevin Thomas; Byung-Hoon Park

Data-driven construction of predictive models for biological systems faces challenges from data intensity, uncertainty, and computational complexity. Data-driven model inference is often considered a combinatorial graph problem where an enumeration of all feasible models is sought. The data-intensive and the NP-hard nature of such problems, however, challenges existing methods to meet the required scale of data size and uncertainty, even on modern supercomputers. Maximal clique enumeration (MCE) in a graph derived from such biological data is often a rate-limiting step in detecting protein complexes in protein interaction data, finding clusters of co-expressed genes in microarray data, or identifying clusters of orthologous genes in protein sequence data. We report two key advances that address this challenge. We designed and implemented the first (to the best of our knowledge) parallel MCE algorithm that scales linearly on thousands of processors running MCE on real-world biological networks with thousands and hundreds of thousands of vertices. In addition, we proposed and developed the Graph Perturbation Theory (GPT) that establishes a foundation for efficiently solving the MCE problem in perturbed graphs, which model the uncertainty in the data. GPT formulates necessary and sufficient conditions for detecting the differences between the sets of maximal cliques in the original and perturbed graphs and reduces the enumeration time by more than 80% compared to complete recomputation.

social network mining and analysis | 2009

Incremental all pairs similarity search for varying similarity thresholds

Amit Awekar; Nagiza F. Samatova; Paul Breimyer

All Pairs Similarity Search (APSS) is a ubiquitous problem in many data mining applications and involves finding all pairs of records with similarity scores above a specified threshold. In this paper, we introduce the problem of Incremental All Pairs Similarity Search (IAPSS), where APSS is performed multiple times over the same dataset by varying the similarity threshold. To the best of our knowledge, this is the first work that addresses the IAPSS problem. All existing solutions for APSS perform redundant computations by invoking APSS independently for each threshold value. In contrast, our solution to the IAPSS problem avoids redundant computations by storing the history of previous APSS invocations and using index splitting. While offering obvious benefits, the computation and I/O intensive nature of the IAPSS solution raises two key research challenges: (1) to develop efficient I/O techniques to manage computation history and (2) to efficiently identify and prune redundant computations. We address these challenges through the proposed (a) history binning technique that clusters record pairs based on similarity values and performs I/O during the similarity computation, and (b) splitting of inverted index that maps each dimension to a list of records that have a non-zero projection along that dimension. As a result, we evaluate the effectiveness of our techniques by demonstrating speed-ups in the order of 2X to over 105 X over the state-of-the-art APSS algorithm for four real-world large-scale datasets.

knowledge discovery and data mining | 2009

On perturbation theory and an algorithm for maximal clique enumeration in uncertain and noisy graphs

William Hendrix; Matthew C. Schmidt; Paul Breimyer; Nagiza F. Samatova

The maximal clique enumeration (MCE) problem can be used to find very tightly-coupled collections of objects inside a network or graph of relationships. However, when such networks are based on noisy or uncertain data, the solutions to the MCE problem for several closely related graphs may be necessary to accurately define the collections. Thus, we propose an algorithm that efficiently solves the MCE problem on altered, or perturbed, graphs. The algorithm utilizes the enumeration of a baseline graph and identifies only those maximal cliques that the perturbation adds and/or removes. We detail the algorithm and the underlying theory required to guarantee correctness. Further, we report average runtime speedups of 7 and 9 for our algorithm over traditional enumeration techniques in the cases of adding and removing edges, respectively, from graphs constructed from protein interaction data.

BMC Medical Informatics and Decision Making | 2009

BioDEAL: community generation of biological annotations

Paul Breimyer; Nathan David Green; Vinay Kumar; Nagiza F. Samatova

BackgroundPublication databases in biomedicine (e.g., PubMed, MEDLINE) are growing rapidly in size every year, as are public databases of experimental biological data and annotations derived from the data. Publications often contain evidence that confirm or disprove annotations, such as putative protein functions, however, it is increasingly difficult for biologists to identify and process published evidence due to the volume of papers and the lack of a systematic approach to associate published evidence with experimental data and annotations. Natural Language Processing (NLP) tools can help address the growing divide by providing automatic high-throughput detection of simple terms in publication text. However, NLP tools are not mature enough to identify complex terms, relationships, or events.ResultsIn this paper we present and extend BioDEAL, a community evidence annotation system that introduces a feedback loop into the database-publication cycle to allow scientists to connect data-driven biological concepts to publications.ConclusionBioDEAL may change the way biologists relate published evidence with experimental data. Instead of biologists or research groups searching and managing evidence independently, the community can collectively build and share this knowledge.

2008 Workshop on Ultrascale Visualization | 2008

An outlook into ultra-scale visualization of large-scale biological data

Nagiza F. Samatova; Paul Breimyer; William Hendrix; Matthew C. Schmidt; Theresa-Marie Rhyne

As bioinformatics has evolved from a reductionistic approach to a complementary multi-scale integrative approach, new challenges in ultra-scale visualization have arisen. Even though visualization is a critical component to large-scale biological data analysis, the ultra-scale nature of systems biology has given rise to novel problems in visualization that are not addressed by existing methods. Visualization is a rich and actively researched domain, and there are many open research questions pertaining to the increasing demands of visualization in bioinformatics. In this paper, we present several broadly important ultra-scale visualization challenges and discuss specific examples of ultra-scale applications in systems biology.

international conference on bioinformatics | 2008

BioDEAL: Biological data-evidence-annotation linkage system

Paul Breimyer; Nathan David Green; Vinay Kumar; Nagiza F. Samatova

The size of publication databases in biomedicine (e.g., PubMed, MEDLINE) are growing rapidly every year, as are public databases of experimental biological data and annotations derived from the data. Publications often contain evidence that confirms or disproves annotations such as putative protein functions, however, it is increasingly difficult for biologists to identify and process published evidence due to the volume of papers and the lack of a systematic approach to associate published evidence with experimental data and annotations. NLP tools can help address the growing divide by providing automatic high-throughput detection of simple terms in publication text. However, NLP tools are not mature enough to identify complex terms, relationships, or events. In this paper we present BioDEAL, a community evidence annotation system that introduces a feedback loop into the database-publication cycle to allow scientists to connect data-driven biological concepts to publications.

linguistic annotation workshop | 2010

PackPlay: Mining Semantic Data in Collaborative Games

Nathan David Green; Paul Breimyer; Vinay Kumar; Nagiza F. Samatova

Theoretical Computer Science | 2010

Theoretical underpinnings for maximal clique enumeration on perturbed graphs

William Hendrix; Matthew C. Schmidt; Paul Breimyer; Nagiza F. Samatova

IKE | 2009

Incremental All Pairs Similarity Search for Varying Similarity Thresholds with Reduced I/O Overhead.

Amit Awekar; Nagiza F. Samatova; Paul Breimyer

Archive | 2009

Scientific Data Analysis

Chandrika Kamath; Nikil Wale; George Karypis; Gaurav Pandey; Vipin Kumar; Krishna Rajan; Nagiza F. Samatova; Paul Breimyer; Guruprasad Kora; Chongle Pan; Srikanth B. Yoginath

Explore More