Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Irina I. Abnizova is active.

Publication


Featured researches published by Irina I. Abnizova.


PLOS Biology | 2004

Highly Conserved Non-Coding Sequences Are Associated with Vertebrate Development

Adam Woolfe; Martin Goodson; Debbie K. Goode; Phil Snell; Gayle K. McEwen; Tanya Vavouri; Sarah Smith; Phil North; Heather Callaway; Krys Kelly; Klaudia Walter; Irina I. Abnizova; Walter R. Gilks; Yvonne J. K. Edwards; Julie Cooke; Greg Elgar

In addition to protein coding sequence, the human genome contains a significant amount of regulatory DNA, the identification of which is proving somewhat recalcitrant to both in silico and functional methods. An approach that has been used with some success is comparative sequence analysis, whereby equivalent genomic regions from different organisms are compared in order to identify both similarities and differences. In general, similarities in sequence between highly divergent organisms imply functional constraint. We have used a whole-genome comparison between humans and the pufferfish, Fugu rubripes, to identify nearly 1,400 highly conserved non-coding sequences. Given the evolutionary divergence between these species, it is likely that these sequences are found in, and furthermore are essential to, all vertebrates. Most, and possibly all, of these sequences are located in and around genes that act as developmental regulators. Some of these sequences are over 90% identical across more than 500 bases, being more highly conserved than coding sequence between these two species. Despite this, we cannot find any similar sequences in invertebrate genomes. In order to begin to functionally test this set of sequences, we have used a rapid in vivo assay system using zebrafish embryos that allows tissue-specific enhancer activity to be identified. Functional data is presented for highly conserved non-coding sequences associated with four unrelated developmental regulators (SOX21, PAX6, HLXB9, and SHH), in order to demonstrate the suitability of this screen to a wide range of genes and expression patterns. Of 25 sequence elements tested around these four genes, 23 show significant enhancer activity in one or more tissues. We have identified a set of non-coding sequences that are highly conserved throughout vertebrates. They are found in clusters across the human genome, principally around genes that are implicated in the regulation of development, including many transcription factors. These highly conserved non-coding sequences are likely to form part of the genomic circuitry that uniquely defines vertebrate development.


Bioinformatics | 2009

Swift: primary data analysis for the Illumina Solexa sequencing platform

Nava Whiteford; Tom Skelly; Christina Curtis; Matthew E. Ritchie; Andrea Löhr; Alexander Wait Zaranek; Irina I. Abnizova; Clive Gavin Brown

Motivation: Primary data analysis methods are of critical importance in second generation DNA sequencing. Improved methods have the potential to increase yield and reduce the error rates. Openly documented analysis tools enable the user to understand the primary data, this is important for the optimization and validity of their scientific work. Results: In this article, we describe Swift, a new tool for performing primary data analysis on the Illumina Solexa Sequencing Platform. Swift is the first tool, outside of the vendors own software, which completes the full analysis process, from raw images through to base calls. As such it provides an alternative to, and independent validation of, the vendor supplied tool. Our results show that Swift is able to increase yield by 13.8%, at comparable error rate. Availability and Implementation: Swift is implemented in C++and supported under Linux. It is supplied under an open source license (LGPL3), allowing researchers to build upon the platform. Swift is available from http://swiftng.sourceforge.net. Contact: [email protected]; [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Journal of Bioinformatics and Computational Biology | 2006

Statistical measures of the structure of genomic sequences: entropy, complexity, and position information.

Yuriy L. Orlov; Rene te Boekhorst; Irina I. Abnizova

Identifying regions of DNA with extreme statistical characteristics is an important aspect of the structural analysis of complete genomes. Linguistic methods, mainly based on estimating word frequency, can be used for this as they allow for the delineation of regions of low complexity. Low complexity may be due to biased nucleotide composition, by tandem- or dispersed repeats, by palindrome-hairpin structures, as well as by a combination of all these features. We developed software tools in which various numerical measures of text complexity are implemented, including combinatorial and linguistic ones. We also added Hurst exponent estimate to the software to measure dependencies in DNA sequences. By applying these tools to various functional genomic regions, we demonstrate that the complexity of introns and regulatory regions is lower than that of coding regions, whilst Hurst exponent is larger. Further analysis of promoter sequences revealed that the lower complexity of these regions is associated with long-range correlations caused by transcription factor binding sites.


Journal of Bioinformatics and Computational Biology | 2006

Transcription Binding Site Prediction Using Markov Models

Irina I. Abnizova; Alistair G. Rust; Mark Robinson; I. René J. A. te Boekhorst; Walter R. Gilks

One of the main goals of analysing DNA sequences is to understand the temporal and positional information that specifies gene expression. An important step in this process is the recognition of gene expression regulatory elements. Experimental procedures for this are slow and costly. In this paper we present a computational non-supervised algorithm that facilitates the process by statistically identifying the most likely regions within a putative regulatory sequence. A probabilistic technique is presented, based on the approximation of regulatory DNA with a Markov chain, for the location of putative transcription factor binding sites in a single stretch of DNA. Hereto we developed a procedure to approximate the order of Markov model for a given DNA sequence that circumvents some of the prohibitive assumptions underlying Markov modeling. Application of the algorithm to data from 55 genes in five species shows the high sensitivity of this Markov search algorithm. Our algorithm does not require any prior knowledge in the form of description or cross-genomic comparison; it is context sensitive and takes DNA heterogeneity into account.


Comparative Biochemistry and Physiology Part D: Genomics and Proteomics | 2006

Characterisation of conserved non-coding sequences in vertebrate genomes using bioinformatics, statistics and functional studies

Yvonne J. K. Edwards; Klaudia Walter; Gayle K. McEwen; Tanya Vavouri; Krystyna A. Kelly; Irina I. Abnizova; Adam Woolfe; Debbie K. Goode; Martin Goodson; Phil North; Phil Snell; Heather Callaway; Sarah Smith; Walter R. Gilks; Julie Cooke; Greg Elgar

We recently identified approximately 1400 conserved non-coding elements (CNEs) shared by the genomes of fugu (Takifugu rubripes) and human that appear to be associated with developmental regulation in vertebrates [Woolfe, A., Goodson, M., Goode, D.K., Snell, P., McEwen, G.K., Vavouri, T., Smith, S.F., North, P., Callaway, H., Kelly, K., Walter, K., Abnizova, I., Gilks, W., Edwards, Y.J.K., Cooke, J.E., Elgar, G., 2005. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol. 3 (1), e7]. This study encompassed a multi-disciplinary approach using bioinformatics, statistical methods and functional assays to identify and characterise the CNEs. Using an in vivo enhancer assay, over 90% of tested CNEs up-regulate tissue-specific GFP expression. Here we review our groups research in the field of characterising non-coding sequences conserved in vertebrates. We take this opportunity to discuss our research in progress and present some results of new and additional analyses. These include a phylogenomics analysis of CNEs, sequence conservation patterns in vertebrate CNEs and the distribution of human SNPs in the CNEs. We highlight the usefulness of the CNE dataset to help correlate genetic variation in health and disease. We also discuss the functional analysis using the enhancer assay and the enrichment of predicted transcription factor binding sites for two CNEs. Public access to the CNEs plus annotation is now possible and is described. The content of this review was presented by Dr. Y.J.K. Edwards at the TODAI International Symposium on Functional Genomics of the Pufferfish, Tokyo, Japan, 3-6 November 2004.


Journal of Bioinformatics and Computational Biology | 2007

Statistical information characterization of conserved non-coding elements in vertebrates.

Irina I. Abnizova; Klaudia Walter; R. Te Boekhorst; Greg Elgar; Walter R. Gilks

Recently, a set of highly conserved non-coding elements (CNEs) has been derived from a comparison between the genomes of the puffer fish, Takifugu or Fugu rubripes, and man. In order to facilitate the identification of these conserved elements in silico, we characterize them by a number of statistical features. We found a pronounced information pattern around CNE borders; although the CNEs themselves are AT rich and have high entropy (complexity), they are flanked by GC-rich regions of low entropy (complexity). We also identified the most abundant motifs within and around of CNEs, and identified those that group around their borders. Like in human promoter regions, the TBP, NF-Y and some other binding motifs are clustered around CNE boundaries, which may suggest a possible transcription regulatory function of CNEs.


Current Genomics | 2007

Recent Computational Approaches to Understand Gene Regulation: Mining Gene Regulation In Silico

Irina I. Abnizova; T Subhankulova; Walter R. Gilks

This paper reviews recent computational approaches to the understanding of gene regulation in eukaryotes. Cis-regulation of gene expression by the binding of transcription factors is a critical component of cellular physiology. In eukaryotes, a number of transcription factors often work together in a combinatorial fashion to enable cells to respond to a wide spectrum of environmental and developmental signals. Integration of genome sequences and/or Chromatin Immunoprecipitation on chip data with gene-expression data has facilitated in silico discovery of how the combinatorics and positioning of transcription factors binding sites underlie gene activation in a variety of cellular processes.The process of gene regulation is extremely complex and intriguing, therefore all possible points of view and related links should be carefully considered. Here we attempt to collect an inventory, not claiming it to be comprehensive and complete, of related computational biological topics covering gene regulation, which may en-lighten the process, and briefly review what is currently occurring in these areas.We will consider the following computational areas:o gene regulatory network construction;o evolution of regulatory DNA;o studies of its structural and statistical informational properties;o and finally, regulatory RNA.


BMC Genomics | 2018

Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome

Fedor Naumenko; Irina I. Abnizova; Nathan Beka; Mikhail A. Genaev; Yuriy L. Orlov

BackgroundThe use of artificial data to evaluate the performance of aligners and peak callers not only improves its accuracy and reliability, but also makes it possible to reduce the computational time. One of the natural ways to achieve such time reduction is by mapping a single chromosome.ResultsWe investigated whether a single chromosome mapping causes any artefacts in the alignments’ performances. In this paper, we compared the accuracy of the performance of seven aligners on well-controlled simulated benchmark data which was sampled from a single chromosome and also from a whole genome.We found that commonly used statistical methods are insufficient to evaluate an aligner performance, and applied a novel measure of a read density distribution similarity, which allowed to reveal artefacts in aligners’ performances.We also calculated some interesting mismatch statistics, and constructed mismatch frequency distributions along the read.ConclusionsThe generation of artificial data by mapping of reads generated from a single chromosome to a reference chromosome is justified from the point of view of reducing the benchmarking time. The proposed quality assessment method allows to identify the inherent shortcoming of aligners that are not detected by conventional statistical methods, and can affect the quality of alignment of real data.


Archive | 2006

New Methods to Infer DNA Function from Sequence Information

Irina I. Abnizova; R. Te Boekhorst; Klaudia Walter; Walter R. Gilks

We present a new computational approach to infer DNA function from eukaryotic DNA sequence information. It is based on the fact that exons, regulatory regions, and non-coding non-regulatory DNA exhibit different statistical patterns. We suggest capturing and measuring these patterns by the following suite of statistical tools: (1) the ‘fluffy-tail’ test, a bootstrap procedure to recognize statistically significant abundant similar words in regulatory DNA; (2) an algorithm to assess the density of patches of low entropy as a new measure of homogeneity. This measure can be used to distinguish coding from non-coding and regulatory regions; (3) an adaptive window technique applied to rescaled range analysis and entropy measurements. This is an optimization technique to segment DNA into homogeneous parts (that are therefore likely to be coding), of which the outcomes are independent of the size of the sliding window and hence avoids averaging. The application of our methods to several annotated data sets from six eukaryotic species enables a clear separation of coding, regulatory, and non-coding non-regulatory DNA. We propose that established computational methods complemented by our new statistical tests and augmented with the novel optimization technique for sliding windows create a powerful tool for the characterization and annotation of DNA sequences. The software is available from the authors on request.


bioRxiv | 2016

Exploratory analysis and error modeling of a sequencing technology

Michael Inouye; Kerrin S. Small; Yik Y. Teo; Heng Li; Nava Whiteford; Tom Skelly; Irina I. Abnizova; Daniel J. Turner; Panos Deloukas; Dominic P. Kwiatkowski; Clive Gavin Brown; Taane G. Clark

Next generation DNA sequencing methods have created an unprecedented leap in sequence data generation, thus novel computational tools and statistical models are required to optimize and assess the resulting data. In this report, we explore underlying causes of error for the Illumina Genome Analyzer (IGA) sequencing technology and attempt to quantify their effects using a human bacterial artificial chromosome sequenced to 60,000 fold coverage. Seven potential error predictors are considered: Phred score, read entropy, tile coordinates, local tile density, base position within read, nucleotide call, and lane. With these parameters, logistic regression and log-linear models are constructed and used to show that each of the potential predictors contributes to error (P<1×10−4). With this additional information, we apply the logistic model and achieve a 3% improvement in both the sensitivity and specificity to detect IGA errors. Further, we demonstrate that these modeling approaches can be used as a feedback loop to inform laboratory methods and identify specific machine or run bias.

Collaboration


Dive into the Irina I. Abnizova's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Klaudia Walter

Wellcome Trust Sanger Institute

View shared research outputs
Top Co-Authors

Avatar

Rene te Boekhorst

University of Hertfordshire

View shared research outputs
Top Co-Authors

Avatar

Yuriy L. Orlov

Novosibirsk State University

View shared research outputs
Top Co-Authors

Avatar

Clive Gavin Brown

Wellcome Trust Sanger Institute

View shared research outputs
Top Co-Authors

Avatar

Tom Skelly

Wellcome Trust Sanger Institute

View shared research outputs
Top Co-Authors

Avatar

Yvonne J. K. Edwards

University of Massachusetts Medical School

View shared research outputs
Top Co-Authors

Avatar

Fedor Naumenko

Novosibirsk State University

View shared research outputs
Top Co-Authors

Avatar

Adam Woolfe

Queen Mary University of London

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge