Steven Maenhout
Hogeschool Gent
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Steven Maenhout.
Theoretical and Applied Genetics | 2007
Steven Maenhout; B. De Baets; Geert Haesaert; E. Van Bockstaele
Accurate prediction of the phenotypical performance of untested single-cross hybrids allows for a faster genetic progress of the breeding pool at a reduced cost. We propose a prediction method based on ɛ-insensitive support vector machine regression (ɛ-SVR). A brief overview of the theoretical background of this fairly new technique and the use of specific kernel functions based on commonly applied genetic similarity measures for dominant and co-dominant markers are presented. These different marker types can be integrated into a single regression model by means of simple kernel operations. Field trial data from the grain maize breeding programme of the private company RAGT R2n are used to assess the predictive capabilities of the proposed methodology. Prediction accuracies are compared to those of one of today’s best performing prediction methods based on best linear unbiased prediction. Results on our data indicate that both methods match each other’s prediction accuracies for several combinations of marker types and traits. The ɛ-SVR framework, however, allows for a greater flexibility in combining different kinds of predictor variables.
Theoretical and Applied Genetics | 2010
Steven Maenhout; Bernard De Baets; Geert Haesaert
Accurate prediction of the phenotypic performance of a hybrid plant based on the molecular fingerprints of its parents should lead to a more cost-effective breeding programme as it allows to reduce the number of expensive field evaluations. The construction of a reliable prediction model requires a representative sample of hybrids for which both molecular and phenotypic information are accessible. This phenotypic information is usually readily available as typical breeding programmes test numerous new hybrids in multi-location field trials on a yearly basis. Earlier studies indicated that a linear mixed model analysis of this typically unbalanced phenotypic data allows to construct ɛ-insensitive support vector machine regression and best linear prediction models for predicting the performance of single-cross maize hybrids. We compare these prediction methods using different subsets of the phenotypic and marker data of a commercial maize breeding programme and evaluate the resulting prediction accuracies by means of a specifically designed field experiment. This balanced field trial allows to assess the reliability of the cross-validation prediction accuracies reported here and in earlier studies. The limits of the predictive capabilities of both prediction methods are further examined by reducing the number of training hybrids and the size of the molecular fingerprints. The results indicate a considerable discrepancy between prediction accuracies obtained by cross-validation procedures and those obtained by correlating the predictions with the results of a validation field trial. The prediction accuracy of best linear prediction was less sensitive to a reduction of the number of training examples compared with that of support vector machine regression. The latter was, however, better at predicting hybrid performance when the size of the molecular fingerprints was reduced, especially if the initial set of markers had a low information content.
Theoretical and Applied Genetics | 2009
Steven Maenhout; B. De Baets; Geert Haesaert
Molecular markers allow to estimate the pairwise relatedness between the members of a breeding pool when their selection history is no longer available or has become too complex for a classical pedigree analysis. The field of population genetics has several estimation procedures at its disposal, but when the genotyped individuals are highly selected inbred lines, their application is not warranted as the theoretical assumptions on which these estimators were built, usually linkage equilibrium between marker loci or even Hardy–Weinberg equilibrium, are not met. An alternative approach requires the availability of a genotyped reference set of inbred lines, which allows to correct the observed marker similarities for their inherent upward bias when used as a coancestry measure. However, this approach does not guarantee that the resulting coancestry matrix is at least positive semi-definite (psd), a necessary condition for its use as a covariance matrix. In this paper we present the weighted alikeness in state (WAIS) estimator. This marker-based coancestry estimator is compared to several other commonly applied relatedness estimators under realistic hybrid breeding conditions in a number of simulations. We also fit a linear mixed model to phenotypical data from a commercial maize breeding programme and compare the likelihood of the different variance structures. WAIS is shown to be psd which makes it suitable for modelling the covariance between genetic components in linear mixed models involved in breeding value estimation or association studies. Results indicate that it generally produces a low root mean squared error under different breeding circumstances and provides a fit to the data that is comparable to that of several other marker-based alternatives. Recommendations for each of the examined coancestry measures are provided.
Genetics | 2010
Steven Maenhout; Bernard De Baets; Geert Haesaert
Efficient genomic selection in animals or crops requires the accurate prediction of the agronomic performance of individuals from their high-density molecular marker profiles. Using a training data set that contains the genotypic and phenotypic information of a large number of individuals, each marker or marker allele is associated with an estimated effect on the trait under study. These estimated marker effects are subsequently used for making predictions on individuals for which no phenotypic records are available. As most plant and animal breeding programs are currently still phenotype driven, the continuously expanding collection of phenotypic records can only be used to construct a genomic prediction model if a dense molecular marker fingerprint is available for each phenotyped individual. However, as the genotyping budget is generally limited, the genomic prediction model can only be constructed using a subset of the tested individuals and possibly a genome-covering subset of the molecular markers. In this article, we demonstrate how an optimal selection of individuals can be made with respect to the quality of their available phenotypic data. We also demonstrate how the total number of molecular markers can be reduced while a maximum genome coverage is ensured. The third selection problem we tackle is specific to the construction of a genomic prediction model for a hybrid breeding program where only molecular marker fingerprints of the homozygous parents are available. We show how to identify the set of parental inbred lines of a predefined size that has produced the highest number of progeny. These three selection approaches are put into practice in a simulation study where we demonstrate how the trade-off between sample size and sample quality affects the prediction accuracy of genomic prediction models for hybrid maize.
Euphytica | 2008
Steven Maenhout; Bernard De Baets; Geert Haesaert; Erik Van Bockstaele
The phenomenon of heterosis is widely used in hybrid breeding programmes, despite the fact that no satisfactory molecular explanation is available. Estimators of quantitative genetic components like GCA and SCA values are tools used by the plant breeder to identify superior parental individuals and to search for high heterosis combinations. Obtaining these estimators usually requires the creation of new parental combinations and testing their offspring in multi-environment field trials. In this study we explore the use of ɛ-insensitive Support Vector Machine Regression (ɛ-SVR) for the prediction of GCA and SCA values from the molecular marker scores of parental inbred lines as an alternative to these field trials. Prediction accuracies are obtained by means of cross-validation on a grain maize data set from the private breeding company RAGT R2n. Results indicate that the proposed method allows the routine screening of new inbred lines despite the fact that predicting the SCA value of an untested hybrid remains problematic with the available molecular marker information and standard kernel functions. The genotypical performance of a testcross hybrid, originating from a cross between an untested inbred line and a well-known complementary tester, can be predicted with moderate to high accuracy while this cannot be said for a cross between two untested inbred lines.
Bioinformatics | 2009
Steven Maenhout; Bernard De Baets; Geert Haesaert
MOTIVATION Phenotypic data collected in breeding programs and marker-trait association studies are often analyzed by means of linear mixed models. In these models, the covariance between the genetic background effects of all genotypes under study is modeled by means of pairwise coefficients of coancestry. Several marker-based coancestry estimation procedures allow to estimate this covariance matrix, but generally introduce a certain amount of bias when the examined genotypes are part of a breeding program. CoCoa implements the most commonly used marker-based coancestry estimation procedures and as such, allows to select the best fitting covariance structure for the phenotypic data at hand. This better model fit translates into an increased power and improved type I error control in association studies and an improved accuracy in phenotypic prediction studies. The presented software package also provides an implementation of the new Weighted Alikeness in State (WAIS) estimator for use in hybrid breeding programs. Besides several matrix manipulation tools, CoCoa implements two different bending heuristics, in case the inverse of an ill-conditioned coancestry matrix estimate is needed. AVAILABILITY AND IMPLEMENTATION The software package CoCoa is freely available at http://webs.hogent.be/cocoa. Source code, manual, binaries for 32 and 64-bit Linux systems and an installer for Microsoft Windows are provided. The core components of CoCoa are written in C++, while the graphical user interface is written in Java.
Genetics | 2016
Arne De Coninck; Bernard De Baets; Drosos Kourounis; Fabio Verbosio; Olaf Schenk; Steven Maenhout; Jan Fostier
Genomic prediction relies on genotypic marker information to predict the agronomic performance of future hybrid breeds based on trial records. Because the effect of markers may vary substantially under the influence of different environmental conditions, marker-by-environment interaction effects have to be taken into account. However, this may lead to a dramatic increase in the computational resources needed for analyzing large-scale trial data. A high-performance computing solution, called Needles, is presented for handling such data sets. Needles is tailored to the particular properties of the underlying algebraic framework by exploiting a sparse matrix formalism where suited and by utilizing distributed computing techniques to enable the use of a dedicated computing cluster. It is demonstrated that large-scale analyses can be performed within reasonable time frames with this framework. Moreover, by analyzing simulated trial data, it is shown that the effects of markers with a high environmental interaction can be predicted more accurately when more records per environment are available in the training data. The availability of such data and their analysis with Needles also may lead to the discovery of highly contributing QTL in specific environmental conditions. Such a framework thus opens the path for plant breeders to select crops based on these QTL, resulting in hybrid lines with optimized agronomic performance in specific environmental conditions.
Genetics | 2014
Arne De Coninck; Jan Fostier; Steven Maenhout; Bernard De Baets
In genomic prediction, common analysis methods rely on a linear mixed-model framework to estimate SNP marker effects and breeding values of animals or plants. Ridge regression–best linear unbiased prediction (RR-BLUP) is based on the assumptions that SNP marker effects are normally distributed, are uncorrelated, and have equal variances. We propose DAIRRy-BLUP, a parallel, Distributed-memory RR-BLUP implementation, based on single-trait observations (y), that uses the Average Information algorithm for restricted maximum-likelihood estimation of the variance components. The goal of DAIRRy-BLUP is to enable the analysis of large-scale data sets to provide more accurate estimates of marker effects and breeding values. A distributed-memory framework is required since the dimensionality of the problem, determined by the number of SNP markers, can become too large to be analyzed by a single computing node. Initial results show that DAIRRy-BLUP enables the analysis of very large-scale data sets (up to 1,000,000 individuals and 360,000 SNPs) and indicate that increasing the number of phenotypic and genotypic records has a more significant effect on the prediction accuracy than increasing the density of SNP arrays.
parallel, distributed and network-based processing | 2015
Arne De Coninck; Drosos Kourounis; Fabio Verbosio; Olaf Schenk; Bernard De Baets; Steven Maenhout; Jan Fostier
Genomic prediction for plant breeding requires taking into account environmental effects and variations of genetic effects across environments. The latter can be modelled by estimating the effect of each genetic marker in every possible environmental condition, which leads to a huge amount of effects to be estimated. Nonetheless, the information about these effects is only sparsely present, due to the fact that plants are only tested in a limited number of environmental conditions. In contrast, the genotypes of the plants are a dense source of information and thus the estimation of both types of effects in one single step would require as well dense as sparse matrix formalisms. This paper presents a way to efficiently apply a high performance computing infrastructure for dealing with large-scale genomic prediction settings, relying on the coupling of dense and sparse matrix algebra.
Communications in agricultural and applied biological sciences | 2015
Arne De Coninck; Drosos Kourounis; Fabio Verbosio; Olaf Schenk; Bernard De Baets; Steven Maenhout; Jan Fostier