François Laviolette | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where François Laviolette is active.

Explore More

Publication

Featured researches published by François Laviolette.

Journal of Machine Learning Research | 2016

Domain-adversarial training of neural networks

Yaroslav Ganin; Evgeniya Ustinova; Hana Ajakan; Pascal Germain; Hugo Larochelle; François Laviolette; Mario Marchand; Victor S. Lempitsky

We introduce a new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions. Our approach is directly inspired by the theory on domain adaptation suggesting that, for effective domain transfer to be achieved, predictions must be made based on features that cannot discriminate between the training (source) and test (target) domains. The approach implements this idea in the context of neural network architectures that are trained on labeled data from the source domain and unlabeled data from the target domain (no labeled target-domain data is necessary). As the training progresses, the approach promotes the emergence of features that are (i) discriminative for the main learning task on the source domain and (ii) indiscriminate with respect to the shift between the domains. We show that this adaptation behaviour can be achieved in almost any feed-forward model by augmenting it with few standard layers and a new gradient reversal layer. The resulting augmented architecture can be trained using standard backpropagation and stochastic gradient descent, and can thus be implemented with little effort using any of the deep learning packages. We demonstrate the success of our approach for two distinct classification problems (document sentiment analysis and image classification), where state-of-the-art domain adaptation performance on standard benchmarks is achieved. We also validate the approach for descriptor learning task in the context of person re-identification application.

GigaScience | 2013

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species

Keith Bradnam; Joseph Fass; Anton Alexandrov; Paul Baranay; Michael Bechner; Inanc Birol; Sébastien Boisvert; Jarrod Chapman; Guillaume Chapuis; Rayan Chikhi; Hamidreza Chitsaz; Wen Chi Chou; Jacques Corbeil; Cristian Del Fabbro; Roderick R. Docking; Richard Durbin; Dent Earl; Scott J. Emrich; Pavel Fedotov; Nuno A. Fonseca; Ganeshkumar Ganapathy; Richard A. Gibbs; Sante Gnerre; Élénie Godzaridis; Steve Goldstein; Matthias Haimel; Giles Hall; David Haussler; Joseph Hiatt; Isaac Ho

BackgroundThe process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly.ResultsIn Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies.ConclusionsMany current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.

Journal of Computational Biology | 2010

Ray: Simultaneous Assembly of Reads from a Mix of High-Throughput Sequencing Technologies

Sébastien Boisvert; François Laviolette; Jacques Corbeil

An accurate genome sequence of a desired species is now a pre-requisite for genome research. An important step in obtaining a high-quality genome sequence is to correctly assemble short reads into longer sequences accurately representing contiguous genomic regions. Current sequencing technologies continue to offer increases in throughput, and corresponding reductions in cost and time. Unfortunately, the benefit of obtaining a large number of reads is complicated by sequencing errors, with different biases being observed with each platform. Although software are available to assemble reads for each individual system, no procedure has been proposed for high-quality simultaneous assembly based on reads from a mix of different technologies. In this paper, we describe a parallel short-read assembler, called Ray, which has been developed to assemble reads obtained from a combination of sequencing platforms. We compared its performance to other assemblers on simulated and real datasets. We used a combination of Roche/454 and Illumina reads to assemble three different genomes. We showed that mixing sequencing technologies systematically reduces the number of contigs and the number of errors. Because of its open nature, this new tool will hopefully serve as a basis to develop an assembler that can be of universal utilization (availability: http://deNovoAssembler.sf.Net/). For online Supplementary Material , see www.liebertonline.com.

Genome Biology | 2012

Ray Meta: scalable de novo metagenome assembly and profiling.

Sébastien Boisvert; Frédéric Raymond; Élénie Godzaridis; François Laviolette; Jacques Corbeil

AbstractaVoluminous parallel sequencing datasets, especially metagenomic experiments, require distributed computing for de novo assembly and taxonomic profiling. Ray Meta is a massively distributed metagenome assembler that is coupled with Ray Communities, which profiles microbiomes based on uniquely-colored k-mers. It can accurately assemble and profile a three billion read metagenomic experiment representing 1,000 bacterial genomes of uneven proportions in 15 hours with 1,024 processor cores, using only 1.5 GB per core. The software will facilitate the processing of large and complex datasets, and will help in generating biological insights for specific environments. Ray Meta is open source and available at http://denovoassembler.sf.net.

international conference on machine learning | 2009

PAC-Bayesian learning of linear classifiers

Pascal Germain; Alexandre Lacasse; François Laviolette; Mario Marchand

We present a general PAC-Bayes theorem from which all known PAC-Bayes risk bounds are obtained as particular cases. We also propose different learning algorithms for finding linear classifiers that minimize these bounds. These learning algorithms are generally competitive with both AdaBoost and the SVM.

GigaScience | 2013

Assemblathon 2: evaluating de novo

Keith Bradnam; Joseph Fass; Anton Alexandrov; Paul Baranay; Michael Bechner; Inanc Birol; Sébastien Boisvert; Jarrod Chapman; Guillaume Chapuis; Rayan Chikhi; Hamidreza Chitsaz; Wen-Chi Chou; Jacques Corbeil; Cristian Del Fabbro; T. Roderick Docking; Richard Durbin; Dent Earl; Scott J. Emrich; Pavel Fedotov; Nuno A. Fonseca; Ganeshkumar Ganapathy; Richard A. Gibbs; Sante Gnerre; Élénie Godzaridis; Steve Goldstein; Matthias Haimel; Giles Hall; David Haussler; Joseph Hiatt; Isaac Ho

quantitative evaluation of systems | 2008

Approximate Analysis of Probabilistic Processes: Logic, Simulation and Games

Josée Desharnais; François Laviolette; Mathieu Tracol

We tackle the problem of non robustness of simulation and bisimulation when dealing with probabilistic processes. It is important to ignore tiny deviations in probabilities because these often come from experiments or estimations. A few approaches have been proposed to treat this issue, for example metrics to quantify the non bisimilarity (or closeness) of processes. Relaxing the definition of simulation and bisimulation is another avenue which we follow. We define a new semantics to a known simple logic for probabilistic processes and show that it characterises a notion of epsi-simulation. We also define two-players games that correspond to these notions: the existence of a winning strategy for one of the players determines epsi-(bi)simulation. Of course, for all the notions defined, letting epsi = 0 gives back the usual notions of logical equivalence, simulation and bisimulation. However, in contrast to what happens in fully probabilistic systems when epsi = 0, two-way e-simulation for epsi > 0 is not equal to epsi-bisimulation. Next we give a polynomial time algorithm to compute a naturally derived metric: distance between states s and t is defined as the smallest epsi such that s and t are epsi-equivalent. This is the first polynomial algorithm for a non-discounted metric. Finally we show that most of these notions can be extended to deal with probabilistic systems that allow non-determinism as well.

Theoretical Computer Science | 2013

Tighter PAC-Bayes bounds through distribution-dependent priors

Guy Lever; François Laviolette; John Shawe-Taylor

We further develop the idea that the PAC-Bayes prior can be informed by the data-generating distribution. We use this framework to prove sharp risk bounds for stochastic exponential weights algorithms, and develop insights into controlling function class complexity in this method. In particular we consider controlling capacity with respect to the unknown geometry defined by the data-generating distribution. We also use the method to obtain new bounds for RKHS regularization schemes such as SVMs.

IEEE Transactions on Information Theory | 2012

PAC-Bayesian Inequalities for Martingales

Yevgeny Seldin; François Laviolette; Nicolò Cesa-Bianchi; John Shawe-Taylor; Peter Auer

We present a set of high-probability inequalities that control the concentration of weighted averages of multiple (possibly uncountably many) simultaneously evolving and interdependent martingales. Our results extend the PAC-Bayesian (probably approximately correct) analysis in learning theory from the i.i.d. setting to martingales opening the way for its application to importance weighted sampling, reinforcement learning, and other interactive learning domains, as well as many other domains in probability theory and statistics, where martingales are encountered. We also present a comparison inequality that bounds the expectation of a convex function of a martingale difference sequence shifted to the [0, 1] interval by the expectation of the same function of independent Bernoulli random variables. This inequality is applied to derive a tighter analog of Hoeffding-Azumas inequality.

Retrovirology | 2008

HIV-1 coreceptor usage prediction without multiple alignments: an application of string kernels

Sébastien Boisvert; Mario Marchand; François Laviolette; Jacques Corbeil

BackgroundHuman immunodeficiency virus type 1 (HIV-1) infects cells by means of ligand-receptor interactions. This lentivirus uses the CD4 receptor in conjunction with a chemokine coreceptor, either CXCR4 or CCR5, to enter a target cell. HIV-1 is characterized by high sequence variability. Nonetheless, within this extensive variability, certain features must be conserved to define functions and phenotypes. The determination of coreceptor usage of HIV-1, from its protein envelope sequence, falls into a well-studied machine learning problem known as classification. The support vector machine (SVM), with string kernels, has proven to be very efficient for dealing with a wide class of classification problems ranging from text categorization to protein homology detection. In this paper, we investigate how the SVM can predict HIV-1 coreceptor usage when it is equipped with an appropriate string kernel.ResultsThree string kernels were compared. Accuracies of 96.35% (CCR5) 94.80% (CXCR4) and 95.15% (CCR5 and CXCR4) were achieved with the SVM equipped with the distant segments kernel on a test set of 1425 examples with a classifier built on a training set of 1425 examples. Our datasets are built with Los Alamos National Laboratory HIV Databases sequences. A web server is available at http://genome.ulaval.ca/hiv-dskernel.ConclusionWe examined string kernels that have been used successfully for protein homology detection and propose a new one that we call the distant segments kernel. We also show how to extract the most relevant features for HIV-1 coreceptor usage. The SVM with the distant segments kernel is currently the best method described.

Explore More