Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jean-Pierre A. Kocher is active.

Publication


Featured researches published by Jean-Pierre A. Kocher.


Journal of Computational Biology | 2013

Calculating Sample Size Estimates for RNA Sequencing Data

Steven N. Hart; Terry M. Therneau; Yuji Zhang; Gregory A. Poland; Jean-Pierre A. Kocher

BACKGROUNDnGiven the high technical reproducibility and orders of magnitude greater resolution than microarrays, next-generation sequencing of mRNA (RNA-Seq) is quickly becoming the de facto standard for measuring levels of gene expression in biological experiments. Two important questions must be taken into consideration when designing a particular experiment, namely, 1) how deep does one need to sequence? and, 2) how many biological replicates are necessary to observe a significant change in expression?nnnRESULTSnBased on the gene expression distributions from 127 RNA-Seq experiments, we find evidence that 91% ± 4% of all annotated genes are sequenced at a frequency of 0.1 times per million bases mapped, regardless of sample source. Based on this observation, and combining this information with other parameters such as biological variation and technical variation that we empirically estimate from our large datasets, we developed a model to estimate the statistical power needed to identify differentially expressed genes from RNA-Seq experiments.nnnCONCLUSIONSnOur results provide a needed reference for ensuring RNA-Seq gene expression studies are conducted with the optimally sample size, power, and sequencing depth. We also make available both R code and an Excel worksheet for investigators to calculate for their own experiments.


BMC Bioinformatics | 2014

MAP-RSeq: Mayo Analysis Pipeline for RNA sequencing

Krishna R. Kalari; Asha Nair; Jaysheel D. Bhavsar; Daniel O’Brien; Jaime Davila; Matthew A Bockol; Jinfu Nie; Xiaojia Tang; Saurabh Baheti; Jay B Doughty; Sumit Middha; Hugues Sicotte; Aubrey E. Thompson; Yan W. Asmann; Jean-Pierre A. Kocher

BackgroundAlthough the costs of next generation sequencing technology have decreased over the past years, there is still a lack of simple-to-use applications, for a comprehensive analysis of RNA sequencing data. There is no one-stop shop for transcriptomic genomics. We have developed MAP-RSeq, a comprehensive computational workflow that can be used for obtaining genomic features from transcriptomic sequencing data, for any genome.ResultsFor optimization of tools and parameters, MAP-RSeq was validated using both simulated and real datasets. MAP-RSeq workflow consists of six major modules such as alignment of reads, quality assessment of reads, gene expression assessment and exon read counting, identification of expressed single nucleotide variants (SNVs), detection of fusion transcripts, summarization of transcriptomics data and final report. This workflow is available for Human transcriptome analysis and can be easily adapted and used for other genomes. Several clinical and research projects at the Mayo Clinic have applied the MAP-RSeq workflow for RNA-Seq studies. The results from MAP-RSeq have thus far enabled clinicians and researchers to understand the transcriptomic landscape of diseases for better diagnosis and treatment of patients.ConclusionsOur software provides gene counts, exon counts, fusion candidates, expressed single nucleotide variants, mapping statistics, visualizations, and a detailed research data report for RNA-Seq. The workflow can be executed on a standalone virtual machine or on a parallel Sun Grid Engine cluster. The software can be downloaded from http://bioinformaticstools.mayo.edu/research/maprseq/.


Bioinformatics | 2014

CrossMap: A versatile tool for coordinate conversion between genome assemblies

Hao Zhao; Zhifu Sun; Jing Wang; Haojie Huang; Jean-Pierre A. Kocher; Liguo Wang

MOTIVATIONnReference genome assemblies are subject to change and refinement from time to time. Generally, researchers need to convert the results that have been analyzed according to old assemblies to newer versions, or vice versa, to facilitate meta-analysis, direct comparison, data integration and visualization. Several useful conversion tools can convert genome interval files in browser extensible data or general feature format, but none have the functionality to convert files in sequence alignment map or BigWig format. This is a significant gap in computational genomics tools, as these formats are the ones most widely used for representing high-throughput sequencing data, such as RNA-seq, chromatin immunoprecipitation sequencing, DNA-seq, etc.nnnRESULTSnHere we developed CrossMap, a versatile and efficient tool for converting genome coordinates between assemblies. CrossMap supports most of the commonly used file formats, including BAM, sequence alignment map, Wiggle, BigWig, browser extensible data, general feature format, gene transfer format and variant call format.nnnAVAILABILITY AND IMPLEMENTATIONnCrossMap is written in Python and C. Source code and a comprehensive users manual are freely available at: http://crossmap.sourceforge.net/.


Bioinformatics | 2012

TREAT: a bioinformatics tool for variant annotations and visualizations in targeted and exome sequencing data

Yan W. Asmann; Sumit Middha; Asif Hossain; Saurabh Baheti; Ying Li; High-seng Chai; Zhifu Sun; Patrick H. Duffy; Ahmed A. Hadad; Asha Nair; Xiaoyu Liu; Yuji Zhang; Eric W. Klee; Krishna R. Kalari; Jean-Pierre A. Kocher

Summary: TREAT (Targeted RE-sequencing Annotation Tool) is a tool for facile navigation and mining of the variants from both targeted resequencing and whole exome sequencing. It provides a rich integration of publicly available as well as in-house developed annotations and visualizations for variants, variant-hosting genes and host-gene pathways. Availability and implementation: TREAT is freely available to non-commercial users as either a stand-alone annotation and visualization tool, or as a comprehensive workflow integrating sequencing alignment and variant calling. The executables, instructions and the Amazon Cloud Images of TREAT can be downloaded at the website: http://ndc.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm Contact: [email protected]; [email protected] Supplementary information: Supplementary data are provided at Bioinformatics online.


Journal of Biological Chemistry | 2015

RNA Toxicity and Missplicing in the Common Eye Disease Fuchs Endothelial Corneal Dystrophy

Jintang Du; Ross A. Aleff; Elisabetta Soragni; Krishna R. Kalari; Jinfu Nie; Xiaojia Tang; Jaime Davila; Jean-Pierre A. Kocher; Sanjay V. Patel; Joel M. Gottesfeld; Keith H. Baratz; Eric D. Wieben

Background: Expansion of intronic (CTG·CAG)n repeats in TCF4 is found in most Fuchs endothelial corneal dystrophy (FECD) patients. Results: RNA foci co-localizing with the splicing factor MBNL1 are found in FECD cells, and changes in mRNA splicing occur. Conclusion: Trinucleotide repeat expansion in FECD is associated with RNA focus formation and missplicing. Significance: RNA toxicity occurs in a disease affecting millions of patients. Fuchs endothelial corneal dystrophy (FECD) is an inherited degenerative disease that affects the internal endothelial cell monolayer of the cornea and can result in corneal edema and vision loss in severe cases. FECD affects ∼5% of middle-aged Caucasians in the United States and accounts for >14,000 corneal transplantations annually. Among the several genes and loci associated with FECD, the strongest association is with an intronic (CTG·CAG)n trinucleotide repeat expansion in the TCF4 gene, which is found in the majority of affected patients. Corneal endothelial cells from FECD patients harbor a poly(CUG)n RNA that can be visualized as RNA foci containing this condensed RNA and associated proteins. Similar to myotonic dystrophy type 1, the poly(CUG)n RNA co-localizes with and sequesters the mRNA-splicing factor MBNL1, leading to missplicing of essential MBNL1-regulated mRNAs. Such foci and missplicing are not observed in similar cells from FECD patients who lack the repeat expansion. RNA-Seq splicing data from the corneal endothelia of FECD patients and controls reveal hundreds of differential alternative splicing events. These include events previously characterized in the context of myotonic dystrophy type 1 and epithelial-to-mesenchymal transition, as well as splicing changes in genes related to proposed mechanisms of FECD pathogenesis. We report the first instance of RNA toxicity and missplicing in a common non-neurological/neuromuscular disease associated with a repeat expansion. The FECD patient population with this (CTG·CAG)n trinucleotide repeat expansion exceeds that of the combined number of patients in all other microsatellite expansion disorders.


Bioinformatics | 2008

Robust and efficient identification of biomarkers by classifying features on graphs

Tae Hyun Hwang; Hugues Sicotte; Ze Tian; Baolin Wu; Jean-Pierre A. Kocher; Dennis A. Wigle; Vipin Kumar; Rui Kuang

MOTIVATIONnA central problem in biomarker discovery from large-scale gene expression or single nucleotide polymorphism (SNP) data is the computational challenge of taking into account the dependence among all the features. Methods that ignore the dependence usually identify non-reproducible biomarkers across independent datasets. We introduce a new graph-based semi-supervised feature classification algorithm to identify discriminative disease markers by learning on bipartite graphs. Our algorithm directly classifies the feature nodes in a bipartite graph as positive, negative or neutral with network propagation to capture the dependence among both samples and features (clinical and genetic variables) by exploring bi-cluster structures in a graph. Two features of our algorithm are: (1) our algorithm can find a global optimal labeling to capture the dependence among all the features and thus, generates highly reproducible results across independent microarray or other high-thoughput datasets, (2) our algorithm is capable of handling hundreds of thousands of features and thus, is particularly useful for biomarker identification from high-throughput gene expression and SNP data. In addition, although designed for classifying features, our algorithm can also simultaneously classify test samples for disease prognosis/diagnosis.nnnRESULTSnWe applied the network propagation algorithm to study three large-scale breast cancer datasets. Our algorithm achieved competitive classification performance compared with SVMs and other baseline methods, and identified several markers with clinical or biological relevance with the disease. More importantly, our algorithm also identified highly reproducible marker genes and enriched functions from the independent datasets.nnnAVAILABILITYnSupplementary results and source code are available at http://compbio.cs.umn.edu/Feature_Class.nnnSUPPLEMENTARY INFORMATIONnSupplementary data are available at Bioinformatics online.


Bioinformatics | 2012

SAAP-RRBS

Zhifu Sun; Saurabh Baheti; Sumit Middha; Rahul Kanwar; Yuji Zhang; Xing Li; Andreas S. Beutler; Eric W. Klee; Yan W. Asmann; E. Aubrey Thompson; Jean-Pierre A. Kocher

Summary: Reduced representation bisulfite sequencing (RRBS) is a cost-effective approach for genome-wide methylation pattern profiling. Analyzing RRBS sequencing data is challenging and specialized alignment/mapping programs are needed. Although such programs have been developed, a comprehensive solution that provides researchers with good quality and analyzable data is still lacking. To address this need, we have developed a Streamlined Analysis and Annotation Pipeline for RRBS data (SAAP-RRBS) that integrates read quality assessment/clean-up, alignment, methylation data extraction, annotation, reporting and visualization. This package facilitates a rapid transition from sequencing reads to a fully annotated CpG methylation report to biological interpretation. Availability and implementation: SAAP-RRBS is freely available to non-commercial users at the web site http://ndc.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm. Contact: [email protected] or [email protected] Supplementary Information: Supplementary data are available at Bioinformatics online.


international conference on data mining | 2008

Learning on Weighted Hypergraphs to Integrate Protein Interactions and Gene Expressions for Cancer Outcome Prediction

Tae Hyun Hwang; Ze Tian; Rui Kuang; Jean-Pierre A. Kocher

Building reliable predictive models from multiple complementary genomic data for cancer study is a crucial step towards successful cancer treatment and a full understanding of the underlying biological principles. To tackle this challenging data integration problem, we propose a hypergraph-based learning algorithm called HyperGene to integrate microarray gene expressions and protein-protein interactions for cancer outcome prediction and biomarker identification. HyperGene is a robust two-step iterative method that alternatively finds the optimal outcome prediction and the optimal weighting of the marker genes guided by a protein-protein interaction network. Under the hypothesis that cancer-related genes tend to interact with each other, the HyperGene algorithm uses a protein-protein interaction network as prior knowledge by imposing a consistent weighting of interacting genes. Our experimental results on two large-scale breast cancer gene expression datasets show that HyperGene utilizing a curated protein-protein interaction network achieves significantly improved cancer outcome prediction. Moreover, HyperGene can also retrieve many known cancer genes as highly weighted marker genes.


BMC Genomics | 2010

Copy number variation and cytidine analogue cytotoxicity: A genome-wide association approach

Krishna R. Kalari; Scott J. Hebbring; High Seng Chai; Liang Li; Jean-Pierre A. Kocher; Liewei Wang; Richard M. Weinshilboum

BackgroundThe human genome displays extensive copy-number variation (CNV). Recent discoveries have shown that large segments of DNA, ranging in size from hundreds to thousands of nucleotides, are either deleted or duplicated. This CNV may encompass genes, leading to a change in phenotype, including drug response phenotypes. Gemcitabine and 1-β-D-arabinofuranosylcytosine (AraC) are cytidine analogues used to treat a variety of cancers. Previous studies have shown that genetic variation may influence response to these drugs. In the present study, we set out to test the hypothesis that variation in copy number might contribute to variation in cytidine analogue response phenotypes.ResultsWe used a cell-based model system consisting of 197 ethnically-defined lymphoblastoid cell lines for which genome-wide SNP data were obtained using Illumina 550 and 650 K SNP arrays to study cytidine analogue cytotoxicity. 775 CNVs with allele frequencies > 1% were identified in 102 regions across the genome. 87/102 of these loci overlapped with previously identified regions of CNV. Association of CNVs with gemcitabine and AraC IC50 values identified 11 regions with permutation p-values < 0.05. Multiplex ligation-dependent probe amplification assays were performed to verify the 11 CNV regions that were associated with this phenotype; with false positive and false negative rates for the in-silico findings of 1.3% and 0.04%, respectively. We also had basal mRNA expression array data for these same 197 cell lines, which allowed us to quantify mRNA expression for 41 probesets in or near the CNV regions identified. We found that 7 of those 41 genes were highly expressed in our lymphoblastoid cell lines, and one of the seven genes (SMYD3) that was significant in the CNV association study was selected for further functional experiments. Those studies showed that knockdown of SMYD3, in pancreatic cancer cell lines increased gemcitabine and AraC resistance during cytotoxicity assay, consistent with the results of the association analysis.ConclusionsThese results suggest that CNVs may play a role in variation in cytidine analogue effect. Therefore, association studies of CNVs with drug response phenotypes in cell-based model systems, when paired with functional characterization, might help to identify CNV that contributes to variation in drug response.


Briefings in Bioinformatics | 2016

VCF-Miner: GUI-based application for mining variants and annotations stored in VCF files

Steven N. Hart; Patrick H. Duffy; Daniel J. Quest; Asif Hossain; Michael A. Meiners; Jean-Pierre A. Kocher

Next-generation sequencing platforms are widely used to discover variants associated with disease. The processing of sequencing data involves read alignment, variant calling, variant annotation and variant filtering. The standard file format to hold variant calls is the variant call format (VCF) file. According to the format specifications, any arbitrary annotation can be added to the VCF file for downstream processing. However, most downstream analysis programs disregard annotations already present in the VCF and re-annotate variants using the annotation provided by that particular program. This precludes investigators who have collected information on variants from literature or other sources from including these annotations in the filtering and mining of variants. We have developed VCF-Miner, a graphical user interface-based stand-alone tool, to mine variants and annotation stored in the VCF. Powered by a MongoDB database engine, VCF-Miner enables the stepwise trimming of non-relevant variants. The grouping feature implemented in VCF-Miner can be used to identify somatic variants by contrasting variants in tumor and in normal samples or to identify recessive/dominant variants in family studies. It is not limited to human data, but can also be extended to include non-diploid organisms. It also supports copy number or any other variant type supported by the VCF specification. VCF-Miner can be used on a personal computer or large institutional servers and is freely available for download from http://bioinformaticstools.mayo.edu/research/vcf-miner/.

Collaboration


Dive into the Jean-Pierre A. Kocher's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sumit Middha

Memorial Sloan Kettering Cancer Center

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge