Matthew C. Schmidt | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthew C. Schmidt is active.

Explore More

Publication

Featured researches published by Matthew C. Schmidt.

Journal of Physics: Conference Series | 2008

Coupling graph perturbation theory with scalable parallel algorithms for large-scale enumeration of maximal cliques in biological graphs

Nagiza F. Samatova; Matthew C. Schmidt; William Hendrix; Paul Breimyer; Kevin Thomas; Byung-Hoon Park

Data-driven construction of predictive models for biological systems faces challenges from data intensity, uncertainty, and computational complexity. Data-driven model inference is often considered a combinatorial graph problem where an enumeration of all feasible models is sought. The data-intensive and the NP-hard nature of such problems, however, challenges existing methods to meet the required scale of data size and uncertainty, even on modern supercomputers. Maximal clique enumeration (MCE) in a graph derived from such biological data is often a rate-limiting step in detecting protein complexes in protein interaction data, finding clusters of co-expressed genes in microarray data, or identifying clusters of orthologous genes in protein sequence data. We report two key advances that address this challenge. We designed and implemented the first (to the best of our knowledge) parallel MCE algorithm that scales linearly on thousands of processors running MCE on real-world biological networks with thousands and hundreds of thousands of vertices. In addition, we proposed and developed the Graph Perturbation Theory (GPT) that establishes a foundation for efficiently solving the MCE problem in perturbed graphs, which model the uncertainty in the data. GPT formulates necessary and sufficient conditions for detecting the differences between the sets of maximal cliques in the original and perturbed graphs and reduces the enumeration time by more than 80% compared to complete recomputation.

knowledge discovery and data mining | 2009

On perturbation theory and an algorithm for maximal clique enumeration in uncertain and noisy graphs

William Hendrix; Matthew C. Schmidt; Paul Breimyer; Nagiza F. Samatova

The maximal clique enumeration (MCE) problem can be used to find very tightly-coupled collections of objects inside a network or graph of relationships. However, when such networks are based on noisy or uncertain data, the solutions to the MCE problem for several closely related graphs may be necessary to accurately define the collections. Thus, we propose an algorithm that efficiently solves the MCE problem on altered, or perturbed, graphs. The algorithm utilizes the enumeration of a baseline graph and identifies only those maximal cliques that the perturbation adds and/or removes. We detail the algorithm and the underlying theory required to guarantee correctness. Further, we report average runtime speedups of 7 and 9 for our algorithm over traditional enumeration techniques in the cases of adding and removing edges, respectively, from graphs constructed from protein interaction data.

international conference on data mining | 2010

The Multiple Alignment Algorithm for Metabolic Pathways without Abstraction

Wenbin Chen; Andrea M. Rocha; William Hendrix; Matthew C. Schmidt; Nagiza F. Samatova

Computational problems associated with metabolic pathways have been extensively studied in computational biology. The problem of aligning multiple metabolic pathways is very challenging. Tohsato et al.’s algorithm for aligning multiple metabolic pathways is based on similarities between enzymes, however, a metabolic pathway consists of three types of entities: reactions, compounds, and enzymes. In this paper, we propose the first algorithm for the problem of aligning multiple metabolic pathways based on the similarities among reactions, compounds, enzymes, and pathway topology. First, we compute a weight between each pair of like entities in different input pathways based on the entities’ similarity score and topological structure using the methods by Ferhat Ay et al.. We then construct a weighted k-partite graph for the reactions, compounds, and enzymes. We extract a mapping between these entities by solving the maximum-weighted k-partite matching problem by applying a novel heuristic algorithm. By analyzing the alignment results of multiple pathways in different organisms, we show that the alignments found by our algorithm correctly identify common sub networks among multiple pathways.

bioinformatics and biomedicine | 2009

An Algorithm for the Discovery of Phenotype Related Metabolic Pathways

Matthew C. Schmidt; Nagiza F. Samatova

Microorganisms are being increasingly used in industrial processes due to certain beneficial phenotypes they exhibit. Improving the ability of microorganisms to exhibit these phenotypes has driven interest in identifying the genes that are responsible for a given phenotype. Some of these phenotypes are the result of various chemical compounds being modified by a series of metabolic reactions, or metabolic pathways, catalyzed by specific enzymes. Recently, comprehensive, generic metabolic networks have been defined, which describe possible ways in which certain chemical compounds may be modified by known metabolic reactions. In this paper, we aim to discover phenotype related metabolic pathways by identifying subnetworks of a generic metabolic network that are highly conserved in phenotype expressing organisms and rarely conserved in non-phenotype expressing organisms. To do this, we introduce a graph search algorithm that finds and expands highly conserved seed networks based on their evolutionary bias towards phenotype expressing organisms. We hypothesize that the evolutionarily conservation of these subnetworks in phenotype expressing organisms is likely due to the fact that they represent metabolic pathways responsible for the expression of the phenotype. We test our approach using aerobic and anaerobic organisms to identify pathways related to aerobic respiration. We find that the pathways identified by our algorithm are found primarily in aerobic organisms and that metabolic pathways known to be related to aerobic respiration are covered by the pathways identified by our algorithm. We finish by discussing the ongoing and future work related to this methodology.

2008 Workshop on Ultrascale Visualization | 2008

An outlook into ultra-scale visualization of large-scale biological data

Nagiza F. Samatova; Paul Breimyer; William Hendrix; Matthew C. Schmidt; Theresa-Marie Rhyne

As bioinformatics has evolved from a reductionistic approach to a complementary multi-scale integrative approach, new challenges in ultra-scale visualization have arisen. Even though visualization is a critical component to large-scale biological data analysis, the ultra-scale nature of systems biology has given rise to novel problems in visualization that are not addressed by existing methods. Visualization is a rich and actively researched domain, and there are many open research questions pertaining to the increasing demands of visualization in bioinformatics. In this paper, we present several broadly important ultra-scale visualization challenges and discuss specific examples of ultra-scale applications in systems biology.

PLOS Computational Biology | 2012

NIBBS-Search for Fast and Accurate Prediction of Phenotype-Biased Metabolic Systems

Matthew C. Schmidt; Andrea M. Rocha; Kanchana Padmanabhan; Yekaterina Shpanskaya; Jillian F. Banfield; Kathleen M. Scott; James R. Mihelcic; Nagiza F. Samatova

Understanding of genotype-phenotype associations is important not only for furthering our knowledge on internal cellular processes, but also essential for providing the foundation necessary for genetic engineering of microorganisms for industrial use (e.g., production of bioenergy or biofuels). However, genotype-phenotype associations alone do not provide enough information to alter an organisms genome to either suppress or exhibit a phenotype. It is important to look at the phenotype-related genes in the context of the genome-scale network to understand how the genes interact with other genes in the organism. Identification of metabolic subsystems involved in the expression of the phenotype is one way of placing the phenotype-related genes in the context of the entire network. A metabolic system refers to a metabolic network subgraph; nodes are compounds and edges labels are the enzymes that catalyze the reaction. The metabolic subsystem could be part of a single metabolic pathway or span parts of multiple pathways. Arguably, comparative genome-scale metabolic network analysis is a promising strategy to identify these phenotype-related metabolic subsystems. Network Instance-Based Biased Subgraph Search (NIBBS) is a graph-theoretic method for genome-scale metabolic network comparative analysis that can identify metabolic systems that are statistically biased toward phenotype-expressing organismal networks. We set up experiments with target phenotypes like hydrogen production, TCA expression, and acid-tolerance. We show via extensive literature search that some of the resulting metabolic subsystems are indeed phenotype-related and formulate hypotheses for other systems in terms of their role in phenotype expression. NIBBS is also orders of magnitude faster than MULE, one of the most efficient maximal frequent subgraph mining algorithms that could be adjusted for this problem. Also, the set of phenotype-biased metabolic systems output by NIBBS comes very close to the set of phenotype-biased subgraphs output by an exact maximally-biased subgraph enumeration algorithm ( MBS-Enum ). The code (NIBBS and the module to visualize the identified subsystems) is available at http://freescience.org/cs/NIBBS.

BMC Bioinformatics | 2011

Efficient α, β-motif finder for identification of phenotype-related functional modules

Matthew C. Schmidt; Andrea M. Rocha; Kanchana Padmanabhan; Zhengzhang Chen; Kathleen M. Scott; James R. Mihelcic; Nagiza F. Samatova

BackgroundMicrobial communities in their natural environments exhibit phenotypes that can directly cause particular diseases, convert biomass or wastewater to energy, or degrade various environmental contaminants. Understanding how these communities realize specific phenotypic traits (e.g., carbon fixation, hydrogen production) is critical for addressing health, bioremediation, or bioenergy problems.ResultsIn this paper, we describe a graph-theoretical method for in silico prediction of the cellular subsystems that are related to the expression of a target phenotype. The proposed (α, β)-motif finder approach allows for identification of these phenotype-related subsystems that, in addition to metabolic subsystems, could include their regulators, sensors, transporters, and even uncharacterized proteins. By comparing dozens of genome-scale networks of functionally associated proteins, our method efficiently identifies those statistically significant functional modules that are in at least α networks of phenotype-expressing organisms but appear in no more than β networks of organisms that do not exhibit the target phenotype. It has been shown via various experiments that the enumerated modules are indeed related to phenotype-expression when tested with different target phenotypes like hydrogen production, motility, aerobic respiration, and acid-tolerance.ConclusionThus, we have proposed a methodology that can identify potential statistically significant phenotype-related functional modules. The functional module is modeled as an (α, β)-clique, where α and β are two criteria introduced in this work. We also propose a novel network model, called the two-typed, divided network. The new network model and the criteria make the problem tractable even while very large networks are being compared. The code can be downloaded from http://www.freescience.org/cs/ABClique/

Journal of Combinatorial Optimization | 2011

On the parameterized complexity of the Multi-MCT and Multi-MCST problems

Wenbin Chen; Matthew C. Schmidt; Nagiza F. Samatova

The comparison of tree structured data is widespread since trees can be used to represent wide varieties of data, such as XML data, evolutionary histories, or carbohydrate structures. Two graph-theoretical problems used in the comparison of such data are the problems of finding the maximum common subtree (MCT) and the minimum common supertree (MCST) of two trees. These problems generalize to the problem of finding the MCT and MCST of multiple trees (Multi-MCT and Multi-MCST, respectively). In this paper, we prove parameterized complexity hardness results for the different parameterized versions of the Multi-MCT and Multi-MCST problem under isomorphic embeddings.

Theoretical Computer Science | 2009

On parameterized complexity of the Multi-MCS problem

Wenbin Chen; Matthew C. Schmidt; Nagiza F. Samatova

We introduce the maximum common subgraph problem for multiple graphs (Multi-MCS) inspired by various biological applications such as multiple alignments of gene sequences, protein structures, metabolic pathways, or protein-protein interaction networks. Multi-MCS is a generalization of the two-graph Maximum Common Subgraph problem (MCS). On the basis of the framework of parameterized complexity theory, we derive the parameterized complexity of Multi-MCS for various parameters for different classes of graphs. For example, for directed graphs with labeled vertices, we prove that the parameterized m-Multi-MCS problem is W[2]-hard, while the parameterized k-Multi-MCS problem is W[t]-hard (@?t>=1), where m and k are the size of the maximum common subgraph and the number of multiple graphs, respectively. We show similar results for other parameterized versions of the Multi-MCS problem for directed graphs with vertex labels and undirected graphs with vertex and edge labels by giving linear FPT reductions of the problems from parameterized versions of the longest common subsequence problem. Likewise, for unlabeled undirected graphs, we show that a parameterized version of the Multi-MCS problem with a fixed number of input graphs is W[1]-complete by showing a linear FPT reduction to and from a parameterized version of the maximum clique problem.

ieee international conference on technologies for homeland security | 2015

Global Pattern Search at Scale

R. Jordan Crouser; Matthew C. Schmidt; Stephen Kelley; Benjamin A. Miller; Daniel J. Van Hook; Lauren Edwards; Maja Milosavljevic; Elizabeth Michel; Elizabeth Ferme; Robert Carrington; Albert Reuther

In recent years, data collection has far outpaced the tools for data analysis in the area of non-traditional GEOINT analysis. Traditional tools are designed to analyze small-scale numerical data, but there are few good interactive tools for processing large amounts of unstructured data such as raw text. In addition to the complexities of data processing, presenting the data in a way that is meaningful to the end user poses another challenge. In our work, we focused on analyzing a corpus of 35,000 news articles and creating an interactive geovisualization tool to reveal patterns to human analysts. Our comprehensive tool, Global Pattern Search at Scale (GPSS), addresses three major problems in data analysis: free text analysis, high volumes of data, and interactive visualization. GPSS uses an Accumulo database for high-volume data storage, and a matrix of word counts and event detection algorithms to process the free text. For visualization, the tool displays an interactive web application to the user, featuring a map overlaid with document clusters and events, search and filtering options, a timeline, and a word cloud. In addition, the GPSS tool can be easily adapted to process and understand other large free-text datasets.

Explore More