Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Lior Pachter is active.

Publication


Featured researches published by Lior Pachter.


Cell systems | 2017

PROBer Provides a General Toolkit for Analyzing Sequencing-Based Toeprinting Assays

Bo Li; Akshay Tambe; Sharon Aviran; Lior Pachter

A number of sequencing-based transcriptase drop-off assays have recently been developed to probe post-transcriptional dynamics of RNA-protein interaction, RNA structure, and RNA modification. Although these assays survey a diverse set of epitranscriptomic marks, we use the term toeprinting assays since they share methodological similarities. Their interpretation is predicated on addressing a similar computational challenge: how to learn isoform-specific chemical modification profiles in the face of complex read multi-mapping. We introduce PROBer, a statistical model and associated software, that addresses this challenge for the analysis of toeprinting assays. PROBer takes sequencing data as input and outputs estimated transcript abundances and isoform-specific modification profiles. Results on both simulated and biological data demonstrate that PROBer significantly outperforms individual methods tailored for specific toeprinting assays. Since the space of toeprinting assays is ever expanding and these assays are likely to be performed and analyzed together, we believe PROBers unified data analysis solution will be valuable to the RNA community.


Genome Biology | 2018

Gene-level differential analysis at transcript-level resolution

Lynn Yi; Harold Pimentel; Nicolas Bray; Lior Pachter

Compared to RNA-sequencing transcript differential analysis, gene-level differential expression analysis is more robust and experimentally actionable. However, the use of gene counts for statistical analysis can mask transcript-level dynamics. We demonstrate that ‘analysis first, aggregation second,’ where the p values derived from transcript analysis are aggregated to obtain gene-level results, increase sensitivity and accuracy. The method we propose can also be applied to transcript compatibility counts obtained from pseudoalignment of reads, which circumvents the need for quantification and is fast, accurate, and model-free. The method generalizes to various levels of biology and we showcase an application to gene ontologies.


bioRxiv | 2017

Column subset selection for single-cell RNA-Seq clustering

Shannon McCurdy; Vasilis Ntranos; Lior Pachter

The first step in the analysis of single-cell RNA sequencing (scRNA-Seq) is dimensionality reduction, which reduces noise and simplifies data visualization. However, techniques such as principal components analysis (PCA) fail to preserve non-negativity and sparsity structures present in the original matrices, and the coordinates of projected cells are not easily interpretable. Commonly used thresholding methods avoid those pitfalls, but ignore collinearity and covariance in the original matrix. We show that a deterministic column subset selection (DCSS) method possesses many of the favorable properties of PCA and common thresholding methods, while avoiding pitfalls from both. We derive new spectral bounds for DCSS. We apply DCSS to two measures of gene expression from two scRNA-Seq experiments with different clustering workflows, and compare to three thresholding methods. In each case study, the clusters based on the small subset of the complete gene expression profile selected by DCSS are similar to clusters produced from the full set. The resulting clusters are informative for cell type.


research in computational molecular biology | 2018

Identification of transcriptional signatures for cell types from single-cell RNA-Seq

Lynn Yi; Vasilis Ntranos; Páll Melsted; Lior Pachter

Single-cell RNA-Seq makes it possible to characterize the transcriptomes of cell types and identify their transcriptional signatures via differential analysis. We present a fast and accurate method for discriminating cell types that takes advantage of the large numbers of cells that are assayed. When applied to transcript compatibility counts obtained via pseudoalignment, our approach provides a quantification-free analysis of 3’ single-cell RNA-Seq that can identify previously undetectable marker genes.


bioRxiv | 2018

Highly Multiplexed Single-Cell RNA-seq for Defining Cell Population and Transcriptional Spaces

Jase Gehring; Jong Hwee Park; Sisi Chen; Matthew Thomson; Lior Pachter

We describe a universal sample multiplexing method for single-cell RNA-seq in which cells are chemically labeled with identifying DNA oligonucleotides. Analysis of a 96-plex perturbation experiment revealed changes in cell population structure and transcriptional states that cannot be discerned from bulk measurements, establishing a cost effective means to survey cell populations from large experiments and clinical samples with the depth and resolution of single-cell RNA-seq.


bioRxiv | 2018

Expression reflects population structure

Brielin C. Brown; Nicolas Bray; Lior Pachter

Population structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of the gene expression matrix does not cluster by population. We identify a linear projection that reveals population structure in gene expression data. Our approach relies on the coupling of the principal components of genotype to the principal components of gene expression via canonical correlation analysis. Futhermore, we analyze the variance of each gene within the projection matrix to determine which genes significantly influence the projection. We identify thousands of significant genes, and show that a number of the top genes have been implicated in diseases that disproportionately impact African Americans. Author Summary High dimensional, multi-modal genomics datasets are becoming increasingly common, which warrants investigation into analysis techniques that can reveal structure in the data without over-fitting. Here, we show that the coupling of principal component analysis to canonical correlation analysis offers an efficient approach to exploratory analysis of this kind of data. We apply this method to the GEUVADIS dataset of genotype and gene expression values of European and Yoruban individuals, finding as-of-yet unstudied population structure in the gene expression values. Moreover, many of the top genes identified by our method have been previously implicated in diseases that disproportionately impact African Americans.


bioRxiv | 2018

A direct comparison of genome alignment and transcriptome pseudoalignment

Lynn Yi; Lauren Liu; Páll Melsted; Lior Pachter

Motivation Genome alignment of reads is the first step of most genome analysis workflows. In the case of RNA-Seq, transcriptome pseudoalignment of reads is a fast alternative to genome alignment, but the different “coordinate systems” of the genome and transcriptome have made it difficult to perform direct comparisons between the approaches. Results We have developed tools for converting genome alignments to transcriptome pseudoalignments, and conversely, for projecting transcriptome pseudoalignments to genome alignments. Using these tools, we performed a direct comparison of genome alignment with transcriptome pseudoalignment. We find that both approaches produce similar quantifications. This means that for many applications genome alignment and transcriptome pseudoalignment are interchangeable. Availability and Implementation bam2tcc is a C++14 software for converting alignments in SAM/BAM format to transcript compatibility counts (TCCs) and is available at https://github.com/pachterlab/bam2tcc. kallisto genomebam is a user option of kallisto that outputs a sorted BAM file in genome coordinates as part of transcriptome pseudoalignment. The feature has been released with kallisto v0.44.0, and is available at https://pachterlab.github.io/kallisto/. Supplementary Material N/A Contact Lior Pachter ([email protected])


Molecular Cell | 2018

RNA Velocity: Molecular Kinetics from Single-Cell RNA-Seq.

Valentine Svensson; Lior Pachter

Applying a kinetic model of RNA transcription and splicing, La Manno etxa0al. (2018) predict changes in mRNA levels of individual cells from single-cell RNA-seq data.


bioRxiv | 2017

Barcode identification for single cell genomics

Akshay Tambe; Lior Pachter

Single-cell sequencing experiments use short DNA barcode ‘tags’ to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computations. However, this step can be difficult due to high rates of mismatch and deletion errors that can afflict barcodes. Here we present an approach to identify and error-correct barcodes by traversing the de Bruijn graph of circularized barcode k-mers. This allows for assignment of reads to consensus fingerprints constructed from k-mers, and we show that for single-cell RNA-Seq this improves the recovery of accurate single-cell transcriptome estimates.


bioRxiv | 2017

Fusion detection and quantification by pseudoalignment

Páll Melsted; Shannon Hateley; Isaac Joseph; Harold Pimentel; Nicolas Bray; Lior Pachter

RNA sequencing in cancer cells is a powerful technique to detect chromosomal rearrangements, allowing for de novo discovery of actively expressed fusion genes. Here we focus on the problem of detecting gene fusions from raw sequencing data, assembling the reads to define fusion transcripts and their associated breakpoints, and quantifying their abundances. Building on the pseudoalignment idea that simplifies and accelerates transcript quantification, we introduce a novel approach to fusion detection based on inspecting paired reads that cannot be pseudoaligned due to conflicting matches. The method and software, called pizzly, filters false positives, assembles new transcripts from the fusion reads, and reports candidate fusions. With pizzly, fusion detection from raw RNA-Seq reads can be performed in a matter of minutes, making the program suitable for the analysis of large cancer gene expression databases and for clinical use. pizzly is available at https://github.com/pmelsted/pizzly

Collaboration


Dive into the Lior Pachter's collaboration.

Top Co-Authors

Avatar

Nicolas Bray

University of California

View shared research outputs
Top Co-Authors

Avatar

Inna Dubchak

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Akshay Tambe

University of California

View shared research outputs
Top Co-Authors

Avatar

Edward M. Rubin

United States Department of Energy

View shared research outputs
Top Co-Authors

Avatar

Lynn Yi

California Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alexander Poliakov

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Isaac Joseph

University of California

View shared research outputs
Researchain Logo
Decentralizing Knowledge