Khalid Belkhir | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Khalid Belkhir is active.

Explore More

Publication

Featured researches published by Khalid Belkhir.

Nature | 2014

Comparative population genomics in animals uncovers the determinants of genetic diversity

Jonathan Romiguier; Philippe Gayral; Marion Ballenghien; Aurélien Bernard; Vincent Cahais; Anne Chenuil; Ylenia Chiari; R. Dernat; Laurent Duret; Nicolas Faivre; Etienne Loire; João M. Lourenço; Benoit Nabholz; Camille Roux; Georgia Tsagkogeorga; A.A.T. Weber; Lucy A. Weinert; Khalid Belkhir; Nicolas Bierne; Sylvain Glémin; Nicolas Galtier

Genetic diversity is the amount of variation observed between DNA sequences from distinct individuals of a given species. This pivotal concept of population genetics has implications for species health, domestication, management and conservation. Levels of genetic diversity seem to vary greatly in natural populations and species, but the determinants of this variation, and particularly the relative influences of species biology and ecology versus population history, are still largely mysterious. Here we show that the diversity of a species is predictable, and is determined in the first place by its ecological strategy. We investigated the genome-wide diversity of 76 non-model animal species by sequencing the transcriptome of two to ten individuals in each species. The distribution of genetic diversity between species revealed no detectable influence of geographic range or invasive status but was accurately predicted by key species traits related to parental investment: long-lived or low-fecundity species with brooding ability were genetically less diverse than short-lived or highly fecund ones. Our analysis demonstrates the influence of long-term life-history strategies on species response to short-term environmental perturbations, a result with immediate implications for conservation policies.

Nature Communications | 2014

European sea bass genome and its variation provide insights into adaptation to euryhalinity and speciation

Mbaye Tine; Heiner Kuhl; Pierre-Alexandre Gagnaire; Bruno Louro; Erick Desmarais; Rute S.T. Martins; Jochen Hecht; Florian Knaust; Khalid Belkhir; Sven Klages; Roland Dieterich; Kurt Stueber; Francesc Piferrer; Bruno Guinand; Nicolas Bierne; Filip Volckaert; Luca Bargelloni; Deborah M. Power; François Bonhomme; Adelino V. M. Canario; Richard Reinhardt

The European sea bass (Dicentrarchus labrax) is a temperate-zone euryhaline teleost of prime importance for aquaculture and fisheries. This species is subdivided into two naturally hybridizing lineages, one inhabiting the north-eastern Atlantic Ocean and the other the Mediterranean and Black seas. Here we provide a high-quality chromosome-scale assembly of its genome that shows a high degree of synteny with the more highly derived teleosts. We find expansions of gene families specifically associated with ion and water regulation, highlighting adaptation to variation in salinity. We further generate a genome-wide variation map through RAD-sequencing of Atlantic and Mediterranean populations. We show that variation in local recombination rates strongly influences the genomic landscape of diversity within and differentiation between lineages. Comparing predictions of alternative demographic models to the joint allele-frequency spectrum indicates that genomic islands of differentiation between sea bass lineages were generated by varying rates of introgression across the genome following a period of geographical isolation.

Molecular Ecology Resources | 2012

Reference-free transcriptome assembly in non-model animals from next-generation sequencing data.

Vincent Cahais; Philippe Gayral; Georgia Tsagkogeorga; José Melo-Ferreira; Marion Ballenghien; Lucy A. Weinert; Ylenia Chiari; Khalid Belkhir; Vincent Ranwez; Nicolas Galtier

Next‐generation sequencing (NGS) technologies offer the opportunity for population genomic study of non‐model organisms sampled in the wild. The transcriptome is a convenient and popular target for such purposes. However, designing genetic markers from NGS transcriptome data requires assembling gene‐coding sequences out of short reads. This is a complex task owing to gene duplications, genetic polymorphism, alternative splicing and transcription noise. Typical assembling programmes return thousands of predicted contigs, whose connection to the species true gene content is unclear, and from which SNP definition is uneasy. Here, the transcriptomes of five diverse non‐model animal species (hare, turtle, ant, oyster and tunicate) were assembled from newly generated 454 and Illumina sequence reads. In two species for which a reference genome is available, a new procedure was introduced to annotate each predicted contig as either a full‐length cDNA, fragment, chimera, allele, paralogue, genomic sequence or other, based on the number of, and overlap between, blast hits to the appropriate reference. Analyses showed that (i) the highest quality assemblies are obtained when 454 and Illumina data are combined, (ii) typical de novo assemblies include a majority of irrelevant cDNA predictions and (iii) assemblies can be appropriately cleaned by filtering contigs based on length and coverage. We conclude that robust, reference‐free assembly of thousands of genes from transcriptomic NGS data is possible, opening promising perspectives for transcriptome‐based population genomics in animals. A Galaxy pipeline implementing our best‐performing assembling strategy is provided.

Heredity | 2002

Heterozygote deficiencies in small lacustrine populations of brook charr Salvelinus Fontinalis Mitchill (Pisces, Salmonidae): a test of alternative hypotheses

V Castric; L Bernatchez; Khalid Belkhir; François Bonhomme

Empirical studies of natural populations have commonly reported departures from Hardy-Weinberg expected proportions of heterozygote individuals. Recent advances in statistical population genetics now offer the potential to exploit individual multilocus genotypic information to test more rigorously for possible sources of heterozygote deficiencies. In a previous study in lacustrine brook charr (Salvelinus fontinalis), we reported stronger deficits in small than in large lakes. In the present paper, we propose a methodology for empirically testing alternative hypotheses to identify the cause of the deficits observed in three of the smallest lakes (85, 109 and 182 ha) analysed. First, as in several salmonid species, brook charr may exhibit a trophic polymorphism in north temperate lakes. If morphs are genetically divergent, indiscriminate sampling of both forms would result in less heterozygote individuals than expected in a randomly mating population (Wahlund effect). Using an individual-based method aiming at detecting cryptic population structure, we can reject this explanation as the sole source of deficits for all three lakes. Secondly, mating among relatives could also be frequent in small lakes and lead to heterozygote deficiencies. Significantly more fish than expected at random had low individual multilocus heterozygosity in two of the lakes, suggesting that inbred fish may have been present. Thirdly, sampling of genetically related fish would also lead to departures from Hardy-Weinberg proportions. In the same two lakes, the distribution of pairwise relatedness coefficients departed from its random expectation, suggesting that non-random sampling of kin may have occurred.

Molecular Ecology | 2011

Reference-free transcriptome assembly in non-model animals from next generation sequencing data

Vincent Cahais; Philippe Gayral; Georgia Tsagkogeorga; Marion Ballenghien; Lucy A. Weinert; Ylenia Chiari; Khalid Belkhir; Vincent Ranwez; Nicolas Galtier

BMC Evolutionary Biology | 2007

OrthoMaM: a database of orthologous genomic markers for placental mammal phylogenetics.

Vincent Ranwez; Frédéric Delsuc; Sylvie Ranwez; Khalid Belkhir; Marie-Ka Tilak; Emmanuel J. P. Douzery

BackgroundMolecular sequence data have become the standard in modern day phylogenetics. In particular, several long-standing questions of mammalian evolutionary history have been recently resolved thanks to the use of molecular characters. Yet, most studies have focused on only a handful of standard markers. The availability of an ever increasing number of whole genome sequences is a golden mine for modern systematics. Genomic data now provide the opportunity to select new markers that are potentially relevant for further resolving branches of the mammalian phylogenetic tree at various taxonomic levels.DescriptionThe EnsEMBL database was used to determine a set of orthologous genes from 12 available complete mammalian genomes. As targets for possible amplification and sequencing in additional taxa, more than 3,000 exons of length > 400 bp have been selected, among which 118, 368, 608, and 674 are respectively retrieved for 12, 11, 10, and 9 species. A bioinformatic pipeline has been developed to provide evolutionary descriptors for these candidate markers in order to assess their potential phylogenetic utility. The resulting OrthoMaM (Orthologous Mammalian Markers) database can be queried and alignments can be downloaded through a dedicated web interface http://kimura.univ-montp2.fr/orthomam.ConclusionThe importance of marker choice in phylogenetic studies has long been stressed. Our database centered on complete genome information now makes possible to select promising markers to a given phylogenetic question or a systematic framework by querying a number of evolutionary descriptors. The usefulness of the database is illustrated with two biological examples. First, two potentially useful markers were identified for rodent systematics based on relevant evolutionary parameters and sequenced in additional species. Second, a complete, gapless 94 kb supermatrix of 118 orthologous exons was assembled for 12 mammals. Phylogenetic analyses using probabilistic methods unambiguously supported the new placental phylogeny by retrieving the monophyly of Glires, Euarchontoglires, Laurasiatheria, and Boreoeutheria. Muroid rodents thus do not represent a basal placental lineage as it was mistakenly reasserted in some recent phylogenomic analyses based on fewer taxa. We expect the OrthoMaM database to be useful for further resolving the phylogenetic tree of placental mammals and for better understanding the evolutionary dynamics of their genomes, i.e., the forces that shaped coding sequences in terms of selective constraints.

BMC Bioinformatics | 2006

Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics.

Julien Y. Dutheil; Sylvain Gaillard; Eric Bazin; Sylvain Glémin; Vincent Ranwez; Nicolas Galtier; Khalid Belkhir

BackgroundA large number of bioinformatics applications in the fields of bio-sequence analysis, molecular evolution and population genetics typically share input/ouput methods, data storage requirements and data analysis algorithms. Such common features may be conveniently bundled into re-usable libraries, which enable the rapid development of new methods and robust applications.ResultsWe present Bio++, a set of Object Oriented libraries written in C++. Available components include classes for data storage and handling (nucleotide/amino-acid/codon sequences, trees, distance matrices, population genetics datasets), various input/output formats, basic sequence manipulation (concatenation, transcription, translation, etc.), phylogenetic analysis (maximum parsimony, markov models, distance methods, likelihood computation and maximization), population genetics/genomics (diversity statistics, neutrality tests, various multi-locus analyses) and various algorithms for numerical calculus.ConclusionImplementation of methods aims at being both efficient and user-friendly. A special concern was given to the library design to enable easy extension and new methods development. We defined a general hierarchy of classes that allow the developer to implement its own algorithms while remaining compatible with the rest of the libraries. Bio++ source code is distributed free of charge under the CeCILL general public licence from its website http://kimura.univ-montp2.fr/BioPP.

PLOS Genetics | 2013

Reference-Free Population Genomics from Next-Generation Transcriptome Data and the Vertebrate–Invertebrate Gap

Philippe Gayral; José Melo-Ferreira; Sylvain Glémin; Nicolas Bierne; Miguel Carneiro; Benoit Nabholz; João M. Lourenço; Paulo C. Alves; Marion Ballenghien; Nicolas Faivre; Khalid Belkhir; Vincent Cahais; Etienne Loire; Aurélien Bernard; Nicolas Galtier

In animals, the population genomic literature is dominated by two taxa, namely mammals and drosophilids, in which fully sequenced, well-annotated genomes have been available for years. Data from other metazoan phyla are scarce, probably because the vast majority of living species still lack a closely related reference genome. Here we achieve de novo, reference-free population genomic analysis from wild samples in five non-model animal species, based on next-generation sequencing transcriptome data. We introduce a pipe-line for cDNA assembly, read mapping, SNP/genotype calling, and data cleaning, with specific focus on the issue of hidden paralogy detection. In two species for which a reference genome is available, similar results were obtained whether the reference was used or not, demonstrating the robustness of our de novo inferences. The population genomic profile of a hare, a turtle, an oyster, a tunicate, and a termite were found to be intermediate between those of human and Drosophila, indicating that the discordant genomic diversity patterns that have been reported between these two species do not reflect a generalized vertebrate versus invertebrate gap. The genomic average diversity was generally higher in invertebrates than in vertebrates (with the notable exception of termite), in agreement with the notion that population size tends to be larger in the former than in the latter. The non-synonymous to synonymous ratio, however, did not differ significantly between vertebrates and invertebrates, even though it was negatively correlated with genetic diversity within each of the two groups. This study opens promising perspective regarding genome-wide population analyses of non-model organisms and the influence of population size on non-synonymous versus synonymous diversity.

Molecular Biology and Evolution | 2013

Bio++: efficient extensible libraries and tools for computational molecular evolution

Laurent Guéguen; Sylvain Gaillard; Bastien Boussau; Manolo Gouy; Mathieu Groussin; Nicolas C. Rochette; Thomas Bigot; David Fournier; Fanny Pouyet; Vincent Cahais; Aurélien Bernard; Celine Scornavacca; Benoit Nabholz; Annabelle Haudry; Loïc Dachary; Nicolas Galtier; Khalid Belkhir; Julien Y. Dutheil

Efficient algorithms and programs for the analysis of the ever-growing amount of biological sequence data are strongly needed in the genomics era. The pace at which new data and methodologies are generated calls for the use of pre-existing, optimized-yet extensible-code, typically distributed as libraries or packages. This motivated the Bio++ project, aiming at developing a set of C++ libraries for sequence analysis, phylogenetics, population genetics, and molecular evolution. The main attractiveness of Bio++ is the extensibility and reusability of its components through its object-oriented design, without compromising the computer-efficiency of the underlying methods. We present here the second major release of the libraries, which provides an extended set of classes and methods. These extensions notably provide built-in access to sequence databases and new data structures for handling and manipulating sequences from the omics era, such as multiple genome alignments and sequencing reads libraries. More complex models of sequence evolution, such as mixture models and generic n-tuples alphabets, are also included.

Molecular Ecology | 2011

Isolation and gene flow: inferring the speciation history of European house mice

Ludovic Duvaux; Khalid Belkhir; Matthieu Boulesteix; Pierre Boursot

Inferring the history of isolation and gene flow during species differentiation can inform us on the processes underlying their formation. Following their recent expansion in Europe, two subspecies of the house mouse (Mus musculus domesticus and Mus musculus musculus) have formed a hybrid zone maintained by hybrid incompatibilities and possibly behavioural reinforcement, offering a good model of incipient speciation. We reconstruct the history of their divergence using an approximate Bayesian computation framework and sequence variation at 57 autosomal loci. We find support for a long isolation period preceding the advent of gene flow around 200 000 generations ago, much before the formation of the European hybrid zone a few thousand years ago. The duration of the allopatric episode appears long enough (74% of divergence time) to explain the accumulation of many post‐zygotic incompatibilities expressed in the present hybrid zone. The ancient contact inferred could have played a role in mating behaviour divergence and laid the ground for further reinforcement. We suggest that both subspecies originally colonized the Middle East from the northern Indian subcontinent, domesticus settling on the shores of the Persian Gulf and musculus on those of the Caspian Sea. Range expansions during interglacials would have induced secondary contacts, presumably in Iran, where they must have also interacted with Mus musculus castaneus. Future studies should incorporate this possibility, and we point to Iran and its surroundings as a hot spot for house mouse diversity and speciation studies.

Explore More