Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Niina Haiminen is active.

Publication


Featured researches published by Niina Haiminen.


Genome Biology | 2013

The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color

Juan Carlos Motamayor; Keithanne Mockaitis; Jeremy Schmutz; Niina Haiminen; Donald Livingstone; Omar E. Cornejo; Seth D. Findley; Ping Zheng; Filippo Utro; Stefan Royaert; Christopher A. Saski; Jerry Jenkins; Ram Podicheti; Meixia Zhao; Brian E. Scheffler; Joseph C Stack; Frank Alex Feltus; Guiliana Mustiga; Freddy Amores; Wilbert Phillips; Jean Philippe Marelli; Gregory D. May; Howard Shapiro; Jianxin Ma; Carlos Bustamante; Raymond J. Schnell; Dorrie Main; Don Gilbert; Laxmi Parida; David N. Kuhn

BackgroundTheobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders.ResultsWe describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation.ConclusionsWe report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits.


Bioinformatics | 2012

GenomicTools: a computational platform for developing high-throughput analytics in genomics

Aristotelis Tsirigos; Niina Haiminen; Erhan Bilal; Filippo Utro

MOTIVATION Recent advances in sequencing technology have resulted in the dramatic increase of sequencing data, which, in turn, requires efficient management of computational resources, such as computing time, memory requirements as well as prototyping of computational pipelines. RESULTS We present GenomicTools, a flexible computational platform, comprising both a command-line set of tools and a C++ API, for the analysis and manipulation of high-throughput sequencing data such as DNA-seq, RNA-seq, ChIP-seq and MethylC-seq. GenomicTools implements a variety of mathematical operations between sets of genomic regions thereby enabling the prototyping of computational pipelines that can address a wide spectrum of tasks ranging from pre-processing and quality control to meta-analyses. Additionally, the GenomicTools platform is designed to analyze large datasets of any size by minimizing memory requirements. In practical applications, where comparable, GenomicTools outperforms existing tools in terms of both time and memory usage. AVAILABILITY The GenomicTools platform (version 2.0.0) was implemented in C++. The source code, documentation, user manual, example datasets and scripts are available online at http://code.google.com/p/ibm-cbc-genomic-tools.


PLOS ONE | 2011

Evaluation of methods for de novo genome assembly from high-throughput sequencing reads reveals dependencies that affect the quality of the results.

Niina Haiminen; David N. Kuhn; Laxmi Parida; Isidore Rigoutsos

Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers of usable sequences per instrument-run continue to make whole-genome assembly an appealing target application. In this paper we evaluate the feasibility of de novo genome assembly from short reads (≤100 nucleotides) through a detailed study involving genomic sequences of various lengths and origin, in conjunction with several of the currently popular assembly programs. Our extensive analysis demonstrates that, in addition to sequencing coverage, attributes such as the architecture of the target genome, the identity of the used assembly program, the average read length and the observed sequencing error rates are powerful variables that affect the best achievable assembly of the target sequence in terms of size and correctness.


BMC Genomics | 2011

Sequencing of a QTL-rich region of the Theobroma cacao genome using pooled BACs and the identification of trait specific candidate genes

Frank Alex Feltus; Christopher A. Saski; Keithanne Mockaitis; Niina Haiminen; Laxmi Parida; Zachary D. Smith; James Ford; Margaret Staton; Stephen P. Ficklin; Barbara Blackmon; Chun-Huai Cheng; Raymond J. Schnell; David N. Kuhn; Juan-Carlos Motamayor

BackgroundBAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library.ResultsThis pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight.ConclusionsOur results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed.


BMC Genetics | 2013

iXora: exact haplotype inferencing and trait association.

Filippo Utro; Niina Haiminen; Donald Livingstone; Omar E. Cornejo; Stefan Royaert; Raymond J. Schnell; Juan Carlos Motamayor; David N. Kuhn; Parida Laxmi

BackgroundWe address the task of extracting accurate haplotypes from genotype data of individuals of large F1 populations for mapping studies. While methods for inferring parental haplotype assignments on large F1 populations exist in theory, these approaches do not work in practice at high levels of accuracy.ResultsWe have designed iXora (Identifying crossovers and recombining alleles), a robust method for extracting reliable haplotypes of a mapping population, as well as parental haplotypes, that runs in linear time. Each allele in the progeny is assigned not just to a parent, but more precisely to a haplotype inherited from the parent. iXora shows an improvement of at least 15% in accuracy over similar systems in literature. Furthermore, iXora provides an easy-to-use, comprehensive environment for association studies and hypothesis checking in populations of related individuals.ConclusionsiXora provides detailed resolution in parental inheritance, along with the capability of handling very large populations, which allows for accurate haplotype extraction and trait association. iXora is available for non-commercial use from http://researcher.ibm.com/project/3430.


Algorithms | 2013

Efficient in silico Chromosomal Representation of Populations via Indexing Ancestral Genomes

Niina Haiminen; Filippo Utro; Claude Lebreton; Pascal Flament; Zivan Karaman; Laxmi Parida

One of the major challenges in handling realistic forward simulations for plant and animal breeding is the sheer number of markers. Due to advancing technologies, the requirement has quickly grown from hundreds of markers to millions. Most simulators are lagging behind in handling these sizes, since they do not scale well. We present a scheme for representing and manipulating such realistic size genomes, without any loss of information. Usually, the simulation is forward and over tens to hundreds of generations with hundreds of thousands of individuals at each generation. We demonstrate through simulations that our representation can be two orders of magnitude faster and handle at least two orders of magnitude more markers than existing software on realistic breeding scenarios.


Frontiers in Plant Science | 2017

Application of genome wide association and genomic prediction for improvement of cacao productivity and resistance to black and frosty pod diseases

J. Alberto Romero Navarro; Wilbert Phillips-Mora; Adriana Arciniegas-Leal; Allan Mata-Quirós; Niina Haiminen; Guiliana Mustiga; Donald Livingstone; Harm van Bakel; David N. Kuhn; Laxmi Parida; Andrew Kasarskis; Juan Carlos Motamayor

Chocolate is a highly valued and palatable confectionery product. Chocolate is primarily made from the processed seeds of the tree species Theobroma cacao. Cacao cultivation is highly relevant for small-holder farmers throughout the tropics, yet its productivity remains limited by low yields and widespread pathogens. A panel of 148 improved cacao clones was assembled based on productivity and disease resistance, and phenotypic single-tree replicated clonal evaluation was performed for 8 years. Using high-density markers, the diversity of clones was expressed relative to 10 known ancestral cacao populations, and significant effects of ancestry were observed in productivity and disease resistance. Genome-wide association (GWA) was performed, and six markers were significantly associated with frosty pod disease resistance. In addition, genomic selection was performed, and consistent with the observed extensive linkage disequilibrium, high predictive ability was observed at low marker densities for all traits. Finally, quantitative trait locus mapping and differential expression analysis of two cultivars with contrasting disease phenotypes were performed to identify genes underlying frosty pod disease resistance, identifying a significant quantitative trait locus and 35 differentially expressed genes using two independent differential expression analyses. These results indicate that in breeding populations of heterozygous and recently admixed individuals, mapping approaches can be used for low complexity traits like pod color cacao, or in other species single gene disease resistance, however genomic selection for quantitative traits remains highly effective relative to mapping. Our results can help guide the breeding process for sustainable improved cacao productivity.


Methods of Molecular Biology | 2015

BAC sequencing using pooled methods.

Christopher A. Saski; F. Alex Feltus; Laxmi Parida; Niina Haiminen

Shotgun sequencing and assembly of a large, complex genome can be both expensive and challenging to accurately reconstruct the true genome sequence. Repetitive DNA arrays, paralogous sequences, polyploidy, and heterozygosity are main factors that plague de novo genome sequencing projects that typically result in highly fragmented assemblies and are difficult to extract biological meaning. Targeted, sub-genomic sequencing offers complexity reduction by removing distal segments of the genome and a systematic mechanism for exploring prioritized genomic content through BAC sequencing. If one isolates and sequences the genome fraction that encodes the relevant biological information, then it is possible to reduce overall sequencing costs and efforts that target a genomic segment. This chapter describes the sub-genome assembly protocol for an organism based upon a BAC tiling path derived from a genome-scale physical map or from fine mapping using BACs to target sub-genomic regions. Methods that are described include BAC isolation and mapping, DNA sequencing, and sequence assembly.


BMC Genomics | 2014

Comparative exomics of Phalaris cultivars under salt stress.

Niina Haiminen; Manfred Klaas; Zeyu Zhou; Filippo Utro; Paul Cormican; Thomas Didion; Christian Sig Jensen; Christopher E. Mason; Susanne Barth; Laxmi Parida

BackgroundReed canary grass (Phalaris arundinacea) is an economically important forage and bioenergy grass of the temperate regions of the world. Despite its economic importance, it is lacking in public genomic data. We explore comparative exomics of the grass cultivars in the context of response to salt exposure. The limited data set poses challenges to the computational pipeline.MethodsAs a prerequisite for the comparative study, we generate the Phalaris reference transcriptome sequence, one of the first steps in addressing the issue of paucity of processed genomic data in this species. In addition, the differential expression (DE) and active-but-stable genes for salt stress conditions were analyzed by a novel method that was experimentally verified on human RNA-seq data. For the comparative exomics, we focus on the DE and stable genic regions, with respect to salt stress, of the genome.Results and conclusionsIn our comparative study, we find that phylogeny of the DE and stable genic regions of the Phalaris cultivars are distinct. At the same time we find the phylogeny of the entire expressed reference transcriptome matches the phylogeny of only the stable genes. Thus the behavior of the different cultivars is distinguished by the salt stress response. This is also reflected in the genomic distinctions in the DE genic regions. These observations have important implications in the choice of cultivars, and their breeding, for bio-energy fuels. Further, we identified genes that are representative of DE under salt stress and could provide vital clues in our understanding of the stress handling mechanisms in general.


workshop on algorithms in bioinformatics | 2014

Best-Fit in Linear Time for Non-generative Population Simulation

Niina Haiminen; Claude Lebreton; Laxmi Parida

Constructing populations with pre-specified characteristics is a fundamental problem in population genetics and other applied areas. We present a novel non-generative approach that deconstructs the desired population into essential local constraints and then builds the output bottom-up. This is achieved using primarily best-fit techniques from discrete methods, which ensures accuracy of the output. Also, the algorithms are fast, i.e., linear, or even sublinear, in the size of the output. The non-generative approach also results in high sensitivity in the algotihms. Since the accuracy and sensitivity of the population simulation is critical to the quality of the output of the applications that use them, we believe that these algorithms will provide a strong foundation to the methods in these studies.

Collaboration


Dive into the Niina Haiminen's collaboration.

Top Co-Authors

Avatar

David N. Kuhn

Agricultural Research Service

View shared research outputs
Top Co-Authors

Avatar

Donald Livingstone

Agricultural Research Service

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Juan Carlos Motamayor

Agricultural Research Service

View shared research outputs
Top Co-Authors

Avatar

Raymond J. Schnell

Agricultural Research Service

View shared research outputs
Researchain Logo
Decentralizing Knowledge