Matthew B. Carson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthew B. Carson is active.

Explore More

Publication

Featured researches published by Matthew B. Carson.

Proceedings of the National Academy of Sciences of the United States of America | 2002

Genome sequence of Streptococcus mutans UA159, a cariogenic dental pathogen

Dragana Ajdic; William M. McShan; Robert McLaughlin; Gorana Savić; Jin Chang; Matthew B. Carson; Charles Primeaux; Runying Tian; Steve Kenton; Honggui Jia; Shaoping Lin; Yudong Qian; Shuling Li; Hua Zhu; Fares Z. Najar; Hongshing Lai; James R. White; Bruce A. Roe; Joseph J. Ferretti

Streptococcus mutans is the leading cause of dental caries (tooth decay) worldwide and is considered to be the most cariogenic of all of the oral streptococci. The genome of S. mutans UA159, a serotype c strain, has been completely sequenced and is composed of 2,030,936 base pairs. It contains 1,963 ORFs, 63% of which have been assigned putative functions. The genome analysis provides further insight into how S. mutans has adapted to surviving the oral environment through resource acquisition, defense against host factors, and use of gene products that maintain its niche against microbial competitors. S. mutans metabolizes a wide variety of carbohydrates via nonoxidative pathways, and all of these pathways have been identified, along with the associated transport systems whose genes account for almost 15% of the genome. Virulence genes associated with extracellular adherent glucan production, adhesins, acid tolerance, proteases, and putative hemolysins have been identified. Strain UA159 is naturally competent and contains all of the genes essential for competence and quorum sensing. Mobile genetic elements in the form of IS elements and transposons are prominent in the genome and include a previously uncharacterized conjugative transposon and a composite transposon containing genes for the synthesis of antibiotics of the gramicidin/bacitracin family; however, no bacteriophage genomes are present.

Journal of Bacteriology | 2005

Genomic Sequence of an Otitis Media Isolate of Nontypeable Haemophilus influenzae: Comparative Study with H. influenzae Serotype d, Strain KW20

Alistair Harrison; David W. Dyer; Allison F. Gillaspy; William C. Ray; Rachna Mungur; Matthew B. Carson; Huachun Zhong; Jenny Gipson; M. Gipson; Linda S. Johnson; Lisa A. Lewis; Lauren O. Bakaletz; Robert S. Munson

In 1995, the Institute for Genomic Research completed the genome sequence of a rough derivative of Haemophilus influenzae serotype d, strain KW20. Although extremely useful in understanding the basic biology of H. influenzae, these data have not provided significant insight into disease caused by nontypeable H. influenzae, as serotype d strains are not pathogens. In contrast, strains of nontypeable H. influenzae are the primary pathogens of chronic and recurrent otitis media in children. In addition, these organisms have an important role in acute otitis media in children as well as other respiratory diseases. Such strains must therefore contain a gene repertoire that differs from that of strain Rd. Elucidation of the differences between these genomes will thus provide insight into the pathogenic mechanisms of nontypeable H. influenzae. The genome of a representative nontypeable H. influenzae strain, 86-028NP, isolated from a patient with chronic otitis media was therefore sequenced and annotated. Despite large regions of synteny with the strain Rd genome, there are large rearrangements in strain 86-028NPs genome architecture relative to the strain Rd genome. A genomic island similar to an island originally identified in H. influenzae type b is present in the strain 86-028NP genome, while the mu-like phage present in the strain Rd genome is absent from the strain 86-028NP genome. Two hundred eighty open reading frames were identified in the strain 86-028NP genome that were absent from the strain Rd genome. These data provide new insight that complements and extends the ongoing analysis of nontypeable H. influenzae virulence determinants.

Journal of Bacteriology | 2005

Identification of the Iron-Responsive Genes of Neisseria gonorrhoeae by Microarray Analysis in Defined Medium

Thomas F. Ducey; Matthew B. Carson; Joshua Orvis; Alain Stintzi; David W. Dyer

To ensure survival, most bacteria must acquire iron, a resource that is sequestered by mammalian hosts. Pathogenic bacteria have therefore evolved intricate systems to sense iron limitation and regulate gene expression appropriately. We used a pan-Neisseria microarray to examine genes regulated in Neisseria gonorrhoeae in response to iron availability in defined medium. Overall, 203 genes varied in expression, 109 up-regulated and 94 down-regulated by iron deprivation. In iron-replete medium, genes essential to rapid bacterial growth were preferentially expressed, while iron transport functions, and predominantly genes of unknown function, were expressed in low-iron medium. Of those TonB-dependent proteins encoded in the FA1090 genome with unknown ligand specificity, expression of three was not controlled by iron availability, suggesting that these receptors may not be high-affinity transporters for iron-containing ligands. Approximately 30% of the operons regulated by iron appeared to be directly under control of Fur. Our data suggest a regulatory cascade where Fur indirectly controls gene expression by affecting the transcription of three secondary regulators. Our data also suggest that a second MerR-like regulator may be directly responding to iron availability and controlling transcription independent of the Fur protein. Comparison of our data with those recently published for Neisseria meningitidis revealed that only a small portion of genes were found to be similarly regulated in these closely related pathogens, while a large number of genes derepressed during iron starvation were unique to each organism.

Nucleic Acids Research | 2010

NAPS: a residue-level nucleic acid-binding prediction server

Matthew B. Carson; Robert E. Langlois; Hui Lu

Nucleic acid-binding proteins are involved in a great number of cellular processes. Understanding the mechanisms underlying these proteins first requires the identification of specific residues involved in nucleic acid binding. Prediction of NA-binding residues can provide practical assistance in the functional annotation of NA-binding proteins. Predictions can also be used to expedite mutagenesis experiments, guiding researchers to the correct binding residues in these proteins. Here, we present a method for the identification of amino acid residues involved in DNA- and RNA-binding using sequence-based attributes. The method used in this work combines the C4.5 algorithm with bootstrap aggregation and cost-sensitive learning. Our DNA-binding model achieved 79.1% accuracy, while the RNA-binding model reached an accuracy of 73.2%. The NAPS web server is freely available at http://proteomics.bioengr.uic.edu/NAPS.

Infection and Immunity | 2004

Partial Analysis of the Genomes of Two Nontypeable Haemophilus influenzae Otitis Media Isolates

Robert S. Munson; Alistair Harrison; Allison F. Gillaspy; William C. Ray; Matthew B. Carson; David Armbruster; Jenny Gipson; M. Gipson; Linda S. Johnson; Lisa A. Lewis; David W. Dyer; Lauren O. Bakaletz

ABSTRACT In 1995, The Institute for Genomic Research completed the genomic sequence of a rough derivative of Haemophilus influenzae serotype d, strain KW20. This sequence, though extremely useful in understanding the basic biology of H. influenzae, has yet to provide significant insight into our understanding of disease caused by nontypeable H. influenzae (NTHI), because serotype d strains are not generally pathogens. In contrast, NTHI strains are frequently mucosal pathogens and are the primary pathogens of chronic otitis media as well as a significant cause of acute otitis media in children. Thus, it is of great importance to further understand their biology. We used a DNA-based microarray approach to identify genes present in a clinical isolate of NTHI that were absent from strain Rd. We also sequenced the genome of a second NTHI isolate from a child with chronic otitis media to threefold coverage and then used an array of bioinformatics tools to identify genes present in this NTHI strain but absent from strain Rd. These methods were complementary in approach and results. We identified, in both strains, homologues of H. influenzae lav, an autotransported protein of unknown function; tnaA, which encodes tryptophanase; as well as a homologue of Pasteurella multocida tsaA, which encodes an alkyl peroxidase that may play a role in protection against reactive oxygen species. We also identified a number of putative restriction-modification systems, bacteriophage genes and transposon-related genes. These data provide new insight that complements and extends our ongoing analysis of NTHI virulence determinants.

PLOS Computational Biology | 2010

Analysis of combinatorial regulation: scaling of partnerships between regulators with the number of governed targets.

Nitin Bhardwaj; Matthew B. Carson; Alexej Abyzov; Koon Kiu Yan; Hui Lu; Mark Gerstein

Through combinatorial regulation, regulators partner with each other to control common targets and this allows a small number of regulators to govern many targets. One interesting question is that given this combinatorial regulation, how does the number of regulators scale with the number of targets? Here, we address this question by building and analyzing co-regulation (co-transcription and co-phosphorylation) networks that describe partnerships between regulators controlling common genes. We carry out analyses across five diverse species: Escherichia coli to human. These reveal many properties of partnership networks, such as the absence of a classical power-law degree distribution despite the existence of nodes with many partners. We also find that the number of co-regulatory partnerships follows an exponential saturation curve in relation to the number of targets. (For E. coli and Bacillus subtilis, only the beginning linear part of this curve is evident due to arrangement of genes into operons.) To gain intuition into the saturation process, we relate the biological regulation to more commonplace social contexts where a small number of individuals can form an intricate web of connections on the internet. Indeed, we find that the size of partnership networks saturates even as the complexity of their output increases. We also present a variety of models to account for the saturation phenomenon. In particular, we develop a simple analytical model to show how new partnerships are acquired with an increasing number of target genes; with certain assumptions, it reproduces the observed saturation. Then, we build a more general simulation of network growth and find agreement with a wide range of real networks. Finally, we perform various down-sampling calculations on the observed data to illustrate the robustness of our conclusions.

Annals of Biomedical Engineering | 2007

Learning to translate sequence and structure to function: identifying DNA binding and membrane binding proteins.

Robert E. Langlois; Matthew B. Carson; Nitin Bhardwaj; Hui Lu

A protein’s function depends in a large part on interactions with other molecules. With an increasing number of protein structures becoming available every year, a corresponding structural annotation approach identifying such interactions grows more expedient. At the same time, machine learning has gained popularity in bioinformatics providing robust annotation of genes and proteins without sequence homology. Here we have developed a general machine learning protocol to identify proteins that bind DNA and membrane. In general, there is no theory or even rule of thumb to pick the best machine learning algorithm. Thus, a systematic comparison of several classification algorithms known to perform well is investigated. Indeed, the boosted tree classifier is found to give the best performance, achieving 93% and 88% accuracy to discriminate non-homologous proteins that bind membrane and DNA, respectively, significantly outperforming all previously published works. We also attempted to address the importance of the attributes in function prediction and the relationships between relevant attributes. A graphical model based on boosted trees is applied to study the important features in discriminating DNA-binding proteins. In summary, the current protocol identified physical features important in DNA and membrane binding, rather than annotating function through sequence similarity.

Journal of the American Medical Informatics Association | 2015

Visualizing collaborative electronic health record usage for hospitalized patients with heart failure

Nicholas D. Soulakis; Matthew B. Carson; Young Ji Lee; Daniel Schneider; Connor T Skeehan; Denise M. Scholtens

Objective To visualize and describe collaborative electronic health record (EHR) usage for hospitalized patients with heart failure. Materials and methods We identified records of patients with heart failure and all associated healthcare provider record usage through queries of the Northwestern Medicine Enterprise Data Warehouse. We constructed a network by equating access and updates of a patient’s EHR to a provider-patient interaction. We then considered shared patient record access as the basis for a second network that we termed the provider collaboration network. We calculated network statistics, the modularity of provider interactions, and provider cliques. Results We identified 548 patient records accessed by 5113 healthcare providers in 2012. The provider collaboration network had 1504 nodes and 83 998 edges. We identified 7 major provider collaboration modules. Average clique size was 87.9 providers. We used a graph database to demonstrate an ad hoc query of our provider-patient network. Discussion Our analysis suggests a large number of healthcare providers across a wide variety of professions access records of patients with heart failure during their hospital stay. This shared record access tends to take place not only in a pairwise manner but also among large groups of providers. Conclusion EHRs encode valuable interactions, implicitly or explicitly, between patients and providers. Network analysis provided strong evidence of multidisciplinary record access of patients with heart failure across teams of 100+ providers. Further investigation may lead to clearer understanding of how record access information can be used to strategically guide care coordination for patients hospitalized for heart failure.

Drug Safety | 2017

Natural Language Processing for EHR-Based Pharmacovigilance: A Structured Review

Yuan Luo; William K. Thompson; Timothy M. Herr; Zexian Zeng; Mark A. Berendsen; Siddhartha R. Jonnalagadda; Matthew B. Carson; Justin Starren

The goal of pharmacovigilance is to detect, monitor, characterize and prevent adverse drug events (ADEs) with pharmaceutical products. This article is a comprehensive structured review of recent advances in applying natural language processing (NLP) to electronic health record (EHR) narratives for pharmacovigilance. We review methods of varying complexity and problem focus, summarize the current state-of-the-art in methodology advancement, discuss limitations and point out several promising future directions. The ability to accurately capture both semantic and syntactic structures in clinical narratives becomes increasingly critical to enable efficient and accurate ADE detection. Significant progress has been made in algorithm development and resource construction since 2000. Since 2012, statistical analysis and machine learning methods have gained traction in automation of ADE mining from EHR narratives. Current state-of-the-art methods for NLP-based ADE detection from EHRs show promise regarding their integration into production pharmacovigilance systems. In addition, integrating multifaceted, heterogeneous data sources has shown promise in improving ADE detection and has become increasingly adopted. On the other hand, challenges and opportunities remain across the frontier of NLP application to EHR-based pharmacovigilance, including proper characterization of ADE context, differentiation between off- and on-label drug-use ADEs, recognition of the importance of polypharmacy-induced ADEs, better integration of heterogeneous data sources, creation of shared corpora, and organization of shared-task challenges to advance the state-of-the-art.

BMC Bioinformatics | 2013

A fast weak motif-finding algorithm based on community detection in graphs

Caiyan Jia; Matthew B. Carson; Jian Yu

BackgroundIdentification of transcription factor binding sites (also called ‘motif discovery’) in DNA sequences is a basic step in understanding genetic regulation. Although many successful programs have been developed, the problem is far from being solved on account of diversity in gene expression/regulation and the low specificity of binding sites. State-of-the-art algorithms have their own constraints (e.g., high time or space complexity for finding long motifs, low precision in identification of weak motifs, or the OOPS constraint: one occurrence of the motif instance per sequence) which limit their scope of application.ResultsIn this paper, we present a novel and fast algorithm we call TFBSGroup. It is based on community detection from a graph and is used to discover long and weak (l,d) motifs under the ZOMOPS constraint (zero, one or multiple occurrence(s) of the motif instance(s) per sequence), where l is the length of a motif and d is the maximum number of mutations between a motif instance and the motif itself. Firstly, TFBSGroup transforms the (l, d) motif search in sequences to focus on the discovery of dense subgraphs within a graph. It identifies these subgraphs using a fast community detection method for obtaining coarse-grained candidate motifs. Next, it greedily refines these candidate motifs towards the true motif within their own communities. Empirical studies on synthetic (l, d) samples have shown that TFBSGroup is very efficient (e.g., it can find true (18, 6), (24, 8) motifs within 30 seconds). More importantly, the algorithm has succeeded in rapidly identifying motifs in a large data set of prokaryotic promoters generated from the Escherichia coli database RegulonDB. The algorithm has also accurately identified motifs in ChIP-seq data sets for 12 mouse transcription factors involved in ES cell pluripotency and self-renewal.ConclusionsOur novel heuristic algorithm, TFBSGroup, is able to quickly identify nearly exact matches for long and weak (l, d) motifs in DNA sequences under the ZOMOPS constraint. It is also capable of finding motifs in real applications. The source code for TFBSGroup can be obtained from http://bioinformatics.bioengr.uic.edu/TFBSGroup/.

Explore More