Manuel Middendorf
Columbia University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Manuel Middendorf.
Proceedings of the National Academy of Sciences of the United States of America | 2005
Manuel Middendorf; Etay Ziv; Chris H. Wiggins
Naturally occurring networks exhibit quantitative features revealing underlying growth mechanisms. Numerous network mechanisms have recently been proposed to reproduce specific properties such as degree distributions or clustering coefficients. We present a method for inferring the mechanism most accurately capturing a given network topology, exploiting discriminative tools from machine learning. The Drosophila melanogaster protein network is confidently and robustly (to noise and training data subsampling) classified as a duplication-mutation-complementation network over preferential attachment, small-world, and a duplication-mutation mechanism without complementation. Systematic classification, rather than statistical study of specific properties, provides a discriminative approach to understand the design of complex networks.
research in computational molecular biology | 2005
Manuel Middendorf; Anshul Kundaje; Mihir Shah; Yoav Freund; Chris H. Wiggins; Christina S. Leslie
We present MEDUSA, an integrative method for learning motif models of transcription factor binding sites PSSMs by incorporating promoter sequence and transcriptome gene expression data. We use a modern large-margin machine learning approach, based on boosting, to enable feature selection from the high-dimensional search space of candidate binding sequences while avoiding overfitting. At each iteration of the algorithm, MEDUSA builds a motif model whose presence in the promoter region of a gene, coupled with activity of a regulator in an experiment, is predictive of differential expression. In this way, we learn motifs that are functional and predictive of regulatory response rather than motifs that are simply overrepresented in promoter sequences. Moreover, MEDUSA produces a model of the transcriptional control logic that can predict the expression of any gene in the organism, given the sequence of the promoter region of the target gene and the expression state of a set of known or putative transcription factors and signaling molecules. Each motif model is either a k-length sequence, a dimer, or a PSSM that is built by agglomerative probabilistic clustering of sequences with similar boosting loss. By applying MEDUSA to a set of environmental stress response expression data in yeast, we learn motifs whose ability to predict differential expression of target genes outperforms motifs from the TRANSFAC dataset and from a previously published candidate set of PSSMs. We also show that MEDUSA retrieves many experimentally confirmed binding sites associated with environmental stress response from the literature.
research in computational molecular biology | 2004
Manuel Middendorf; Anshul Kundaje; Chris H. Wiggins; Yoav Freund; Christina S. Leslie
We present a novel classification-based algorithm called GeneClass for learning to predict gene regulatory response. Our approach is motivated by the hypothesis that in simple organisms such as Saccharomyces cerevisiae, we can learn a decision rule for predicting whether a gene is up- or down-regulated in a particular experiment based on (1) the presence of binding site subsequences (“motifs”) in the genes regulatory region and (2) the expression levels of regulators such as transcription factors in the experiment (“parents”). Thus our learning task integrates two qualitatively different data sources: genome-wide cDNA microarray data across multiple perturbation and mutant experiments along with motif profile data from regulatory sequences. Rather than focusing on the regression task of predicting real-valued gene expression measurements, GeneClass performs the classification task of predicting +1 and -1 labels, corresponding to up- and down-regulation beyond the levels of biological and measurement noise in microarray measurements. GeneClass uses the Adaboost learning algorithm with a margin-based generalization of decision trees called alternating decision trees. In computational experiments based on the Gasch S. cerevisiae dataset, we show that the GeneClass method predicts up- and down-regulation on held-out experiments with high accuracy. We explore a range of experimental setups related to environmental stress response, and we retrieve important regulators, binding site motifs, and relationships between regulators and binding sites that are known to be associated to specific stress response pathways. Our method thus provides predictive hypotheses, suggests biological experiments, and provides interpretable insight into the structure of genetic regulatory networks. Supplementary website: http://www.cs.columbia.edu/compbio/geneclass
Physical Review E | 2005
Etay Ziv; Manuel Middendorf; Chris H. Wiggins
intelligent systems in molecular biology | 2004
Manuel Middendorf; Anshul Kundaje; Chris H. Wiggins; Yoav Freund; Christina S. Leslie
BMC Bioinformatics | 2004
Manuel Middendorf; Etay Ziv; Carter Adams; Jennifer Hom; Robin Koytcheff; Chaya Levovitz; Gregory Woods; Linda Chen; Chris H. Wiggins
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2005
Anshul Kundaje; Manuel Middendorf; Feng Gao; Chris H. Wiggins; Christina S. Leslie
Physical Review E | 2005
Etay Ziv; Robin Koytcheff; Manuel Middendorf; Chris H. Wiggins
BMC Bioinformatics | 2006
Anshul Kundaje; Manuel Middendorf; Mihir Shah; Chris H. Wiggins; Yoav Freund; Christina S. Leslie
Archive | 2005
Chris H. Wiggins; Manuel Middendorf