David Emms
University of York
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David Emms.
Genome Biology | 2015
David Emms; Steven Kelly
Identifying homology relationships between sequences is fundamental to biological research. Here we provide a novel orthogroup inference algorithm called OrthoFinder that solves a previously undetected gene length bias in orthogroup inference, resulting in significant improvements in accuracy. Using real benchmark datasets we demonstrate that OrthoFinder is more accurate than other orthogroup inference methods by between 8 % and 33 %. Furthermore, we demonstrate the utility of OrthoFinder by providing a complete classification of transcription factor gene families in plants revealing 6.9 million previously unobserved relationships.
Pattern Recognition | 2009
David Emms; Simone Severini; Richard C. Wilson; Edwin R. Hancock
In this paper we explore how a spectral technique suggested by coined quantum walks can be used to distinguish between graphs that are cospectral with respect to standard matrix representations. The algorithm runs in polynomial time and, moreover, can distinguish many graphs for which there is no subexponential time algorithm that is proven to be able to distinguish between them. In the paper, we give a description of the coined quantum walk from the field of quantum computing. The evolution of the walk is governed by a unitary matrix. We show how the spectrum of this matrix is related to the spectrum of the transition matrix of the classical random walk. However, despite this relationship the behaviour of the quantum walk is vastly different from the classical walk. This leads us to define a new matrix based on the amplitudes of paths of the walk whose spectrum we use to characterise graphs. We carry out three sets of experiments using this matrix representation. Firstly, we test the ability of the spectrum to distinguish between sets of graphs that are cospectral with respect to standard matrix representation. These include strongly regular graphs, and incidence graphs of balanced incomplete block designs (BIBDs). Secondly, we test our method on ALL regular graphs on up to 14 vertices and ALL trees on up to 24 vertices. This demonstrates that the problem of cospectrality is often encountered with conventional algorithms and tests the ability of our method to resolve this problem. Thirdly, we use distances obtained from the spectra of S^+(U^3) to cluster graphs derived from real-world image data and these are qualitatively better than those obtained with the spectra of the adjacency matrix. Thus, we provide a spectral representation of graphs that can be used in place of standard spectral representations, far less prone to the problems of cospectrality.
Quantum Information Processing | 2011
Peng Ren; Tatjana Aleksić; David Emms; Richard C. Wilson; Edwin R. Hancock
In this paper we explore an interesting relationship between discrete-time quantum walks and the Ihara zeta function of a graph. The paper commences by reviewing the related literature on the discrete-time quantum walks and the Ihara zeta function. Mathematical definitions of the two concepts are then provided, followed by analyzing the relationship between them. Based on this analysis we are able to account for why the Ihara zeta function can not distinguish cospectral regular graphs. This analysis suggests a means by which to develop zeta functions that have potential in distinguishing such structures.
Pattern Recognition | 2009
David Emms; Richard C. Wilson; Edwin R. Hancock
We consider how continuous-time quantum walks can be used for graph matching. We focus in detail on both exact and inexact graph matching, and consider in depth the problem of measuring graph similarity. We commence by constructing an auxiliary graph, in which the two graphs to be matched are co-joined by a layer of indicator vertices (one for each potential correspondence between a pair of vertices). We simulate a continuous-time quantum walk in parallel on the two graphs. The layer of connecting indicator vertices in the auxiliary graph allow quantum interference to take place between the two walks. The interference amplitudes on the indicator vertices are determined by differences in the two walks, and can be used to calculate probabilities for matches between pairs of vertices from the graphs. By applying the Hungarian (Kuhn-Munkres) algorithm to these probabilities, we recover a correspondence mapping between the graphs. To calculate graph similarity, we combine these probabilities with edge-consistency information to give a consistency measure. Based on the consistency measure, we define two graph similarity measures, one of which requires correspondence matches while the second does not. We analyse our approach experimentally using synthetic and real-world graphs. This reveals that our method gives results that are intermediate between the most sophisticated iterative techniques available, and simpler less complex ones.
GbRPR'07 Proceedings of the 6th IAPR-TC-15 international conference on Graph-based representations in pattern recognition | 2007
David Emms; Richard C. Wilson; Edwin R. Hancock
In this paper, we explore analytically and experimentally the commute time of the continuous-time quantum walk. For the classical random walk, the commute time has been shown to be robust to errors in edge weight structure and to lead to spectral clustering algorithms with improved performance. Our analysis shows that the commute time of the continuous-time quantum walk can be determined via integrals of the Laplacian spectrum, calculated using Gauss-Laguerre quadrature. We analyse the quantum commute times with reference to their classical counterpart. Experimentally, we show that the quantum commute times can be used to emphasise cluster-structure.
Molecular Biology and Evolution | 2016
David Emms; Sarah Covshoff; Julian M. Hibberd; Steven Kelly
C4 photosynthesis is considered one of the most remarkable examples of evolutionary convergence in eukaryotes. However, it is unknown whether the evolution of C4 photosynthesis required the evolution of new genes. Genome-wide gene-tree species-tree reconciliation of seven monocot species that span two origins of C4 photosynthesis revealed that there was significant parallelism in the duplication and retention of genes coincident with the evolution of C4 photosynthesis in these lineages. Specifically, 21 orthologous genes were duplicated and retained independently in parallel at both C4 origins. Analysis of this gene cohort revealed that the set of parallel duplicated and retained genes is enriched for genes that are preferentially expressed in bundle sheath cells, the cell type in which photosynthesis was activated during C4 evolution. Furthermore, functional analysis of the cohort of parallel duplicated genes identified SWEET-13 as a potential key transporter in the evolution of C4 photosynthesis in grasses, and provides new insight into the mechanism of phloem loading in these C4 species. Key words: C4 photosynthesis, gene duplication, gene families, parallel evolution.
Image and Vision Computing | 2009
David Emms; Richard C. Wilson; Edwin R. Hancock
In this paper, we consider how discrete-time quantum walks can be applied to graph-matching problems. The matching problem is abstracted using an auxiliary graph that connects pairs of vertices from the graphs to be matched by way of auxiliary vertices. A discrete-time quantum walk is simulated on this auxiliary graph and the quantum interference on the auxiliary vertices indicates possible matches. When dealing with graphs for which there is no exact match, the interference amplitudes together with edge consistencies are used to define a consistency measure. We also explore the use of the method for inexact graph-matching problems. We have tested the algorithm on graphs derived from the NCI molecule database and found it to significantly reduce the space of possible permutation matchings, typically by a factor of 10^-^2^0-10^-^3^0, thereby allowing the graphs to be matched directly. An analysis of the quantum walk in the presence of structural errors between graphs is used as the basis of the consistency measure. We test the performance of this measure on graphs derived from images in the COIL-100 database.
SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition | 2008
David Emms; Richard C. Wilson; Edwin R. Hancock
We consider how continuous-time quantum walks can be used for graph matching. We focus in detail on both exact and inexact graph matching, and consider in depth the problem of measuring graph similarity. We commence by constructing an auxiliary graph, in which the two graph to be matched are co-joined by a layer of indicator nodes (one for each potential correspondence between a pair of nodes). We simulate a continuous time quantum walk in parallel on the two graphs. The layer of connecting indicator nodes in the auxiliary graph allow quantum interference to take place between the two walks. The interference amplitudes on the indicator nodes are determined by differences in the two walks. We show how these interference amplitudes can be used to compute graph edit distances without explicitly determining node correspondences.
Molecular Biology and Evolution | 2017
David Emms; Steven Kelly
Abstract The correct interpretation of any phylogenetic tree is dependent on that tree being correctly rooted. We present STRIDE, a fast, effective, and outgroup-free method for identification of gene duplication events and species tree root inference in large-scale molecular phylogenetic analyses. STRIDE identifies sets of well-supported in-group gene duplication events from a set of unrooted gene trees, and analyses these events to infer a probability distribution over an unrooted species tree for the location of its root. We show that STRIDE correctly identifies the root of the species tree in multiple large-scale molecular phylogenetic data sets spanning a wide range of timescales and taxonomic groups. We demonstrate that the novel probability model implemented in STRIDE can accurately represent the ambiguity in species tree root assignment for data sets where information is limited. Furthermore, application of STRIDE to outgroup-free inference of the origin of the eukaryotic tree resulted in a root probability distribution that provides additional support for leading hypotheses for the origin of the eukaryotes.
Genome Biology and Evolution | 2017
Steve Kelly; Alasdair Ivens; Glenn Mott; Ellis C. O'Neill; David Emms; Olivia Macleod; Paul Voorheis; Kevin M. Tyler; Matthew D. Clark; Jacqueline B. Matthews; Keith R. Matthews; Mark Carrington
Abstract There are hundreds of Trypanosoma species that live in the blood and tissue spaces of their vertebrate hosts. The vast majority of these do not have the ornate system of antigenic variation that has evolved in the small number of African trypanosome species, but can still maintain long-term infections in the face of the vertebrate adaptive immune system. Trypanosoma theileri is a typical example, has a restricted host range of cattle and other Bovinae, and is only occasionally reported to cause patent disease although no systematic survey of the effect of infection on agricultural productivity has been performed. Here, a detailed genome sequence and a transcriptome analysis of gene expression in bloodstream form T. theileri have been performed. Analysis of the genome sequence and expression showed that T. theileri has a typical kinetoplastid genome structure and allowed a prediction that it is capable of meiotic exchange, gene silencing via RNA interference and, potentially, density-dependent growth control. In particular, the transcriptome analysis has allowed a comparison of two distinct trypanosome cell surfaces, T. brucei and T. theileri, that have each evolved to enable the maintenance of a long-term extracellular infection in cattle. The T. theileri cell surface can be modeled to contain a mixture of proteins encoded by four novel large and divergent gene families and by members of a major surface protease gene family. This surface composition is distinct from the uniform variant surface glycoprotein coat on African trypanosomes providing an insight into a second mechanism used by trypanosome species that proliferate in an extracellular milieu in vertebrate hosts to avoid the adaptive immune response.