Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where John H. Phan is active.

Publication


Featured researches published by John H. Phan.


Journal of the American Medical Informatics Association | 2013

Pathology imaging informatics for quantitative analysis of whole-slide images

Sonal Kothari; John H. Phan; Todd H. Stokes; May D. Wang

Objectives With the objective of bringing clinical decision support systems to reality, this article reviews histopathological whole-slide imaging informatics methods, associated challenges, and future research opportunities. Target audience This review targets pathologists and informaticians who have a limited understanding of the key aspects of whole-slide image (WSI) analysis and/or a limited knowledge of state-of-the-art technologies and analysis methods. Scope First, we discuss the importance of imaging informatics in pathology and highlight the challenges posed by histopathological WSI. Next, we provide a thorough review of current methods for: quality control of histopathological images; feature extraction that captures image properties at the pixel, object, and semantic levels; predictive modeling that utilizes image features for diagnostic or prognostic applications; and data and information visualization that explores WSI for de novo discovery. In addition, we highlight future research directions and discuss the impact of large public repositories of histopathological data, such as the Cancer Genome Atlas, on the field of pathology informatics. Following the review, we present a case study to illustrate a clinical decision support system that begins with quality control and ends with predictive modeling for several cancer endpoints. Currently, state-of-the-art software tools only provide limited image processing capabilities instead of complete data analysis for clinical decision-making. We aim to inspire researchers to conduct more research in pathology imaging informatics so that clinical decision support can become a reality.


Nature Biotechnology | 2014

Detecting and correcting systematic variation in large-scale RNA sequencing data

Sheng Li; Paweł P. Łabaj; Paul Zumbo; Peter Sykacek; Wei Shi; Leming Shi; John H. Phan; Po-Yen Wu; May Wang; Charles Wang; Danielle Thierry-Mieg; Jean Thierry-Mieg; David P. Kreil; Christopher E. Mason

High-throughput RNA sequencing (RNA-seq) enables comprehensive scans of entire transcriptomes, but best practices for analyzing RNA-seq data have not been fully defined, particularly for data collected with multiple sequencing platforms or at multiple sites. Here we used standardized RNA samples with built-in controls to examine sources of error in large-scale RNA-seq studies and their impact on the detection of differentially expressed genes (DEGs). Analysis of variations in guanine-cytosine content, gene coverage, sequencing error rate and insert size allowed identification of decreased reproducibility across sites. Moreover, commonly used methods for normalization (cqn, EDASeq, RUV2, sva, PEER) varied in their ability to remove these systematic biases, depending on sample complexity and initial data quality. Normalization methods that combine data from genes across sites are strongly recommended to identify and remove site-specific effects and can substantially improve RNA-seq studies.


Genome Biology | 2015

Comparison of RNA-seq and microarray-based models for clinical endpoint prediction

Wenqian Zhang; Falk Hertwig; Jean Thierry-Mieg; Wenwei Zhang; Danielle Thierry-Mieg; Jian Wang; Cesare Furlanello; Viswanath Devanarayan; Jie Cheng; Youping Deng; Barbara Hero; Huixiao Hong; Meiwen Jia; Li Li; Simon Lin; Yuri Nikolsky; André Oberthuer; Tao Qing; Zhenqiang Su; Ruth Volland; Charles Wang; May D. Wang; Junmei Ai; Davide Albanese; Shahab Asgharzadeh; Smadar Avigad; Wenjun Bao; Marina Bessarabova; Murray H. Brilliant; Benedikt Brors

BackgroundGene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model.ResultsWe generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models.ConclusionsWe demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.


Pharmacogenomics Journal | 2010

k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction.

R.M. Parry; Wendell D. Jones; Todd H. Stokes; John H. Phan; Richard A. Moffitt; Hong Fang; Leming Shi; André Oberthuer; Matthias Fischer; Weida Tong; Wang

In the clinical application of genomic data analysis and modeling, a number of factors contribute to the performance of disease classification and clinical outcome prediction. This study focuses on the k-nearest neighbor (KNN) modeling strategy and its clinical use. Although KNN is simple and clinically appealing, large performance variations were found among experienced data analysis teams in the MicroArray Quality Control Phase II (MAQC-II) project. For clinical end points and controls from breast cancer, neuroblastoma and multiple myeloma, we systematically generated 463 320 KNN models by varying feature ranking method, number of features, distance metric, number of neighbors, vote weighting and decision threshold. We identified factors that contribute to the MAQC-II project performance variation, and validated a KNN data analysis protocol using a newly generated clinical data set with 478 neuroblastoma patients. We interpreted the biological and practical significance of the derived KNN models, and compared their performance with existing clinical factors.


Trends in Biotechnology | 2009

Convergence of biomarkers, bioinformatics and nanotechnology for individualized cancer treatment

John H. Phan; Richard A. Moffitt; Todd H. Stokes; Jian Liu; Andrew N. Young; Shuming Nie; May D. Wang

Recent advances in biomarker discovery, biocomputing and nanotechnology have raised new opportunities in the emerging fields of personalized medicine (in which disease detection, diagnosis and therapy are tailored to each individuals molecular profile) and predictive medicine (in which genetic and molecular information is used to predict disease development, progression and clinical outcome). Here, we discuss advanced biocomputing tools for cancer biomarker discovery and multiplexed nanoparticle probes for cancer biomarker profiling, in addition to the prospects for and challenges involved in correlating biomolecular signatures with clinical outcome. This bio-nano-info convergence holds great promise for molecular diagnosis and individualized therapy of cancer and other human diseases.


international symposium on biomedical imaging | 2011

Automatic batch-invariant color segmentation of histological cancer images

Sonal Kothari; John H. Phan; Richard A. Moffitt; Todd H. Stokes; Shelby E. Hassberger; Qaiser Chaudry; Andrew N. Young; May D. Wang

We propose an automatic color segmentation system that (1) incorporates domain knowledge to guide histological image segmentation and (2) normalizes images to reduce sensitivity to batch effects. Color segmentation is an important, yet difficult, component of image-based diagnostic systems. User-interactive guidance by domain experts—i.e., pathologistsߞoften leads to the best color segmentation or “ground truth” regardless of stain color variations in different batches. However, such guidance limits the objectivity, reproducibility and speed of diagnostic systems. Our system uses knowledge from pre-segmented reference images to normalize and classify pixels in patient images. The system then refines the segmentation by re-classifying pixels in the original color space. We test our system on four batches of H&E stained images and, in comparison to a system with no normalization (39% average accuracy), we obtain an average segmentation accuracy of 85%.


Annals of Biomedical Engineering | 2007

chip artifact CORRECTion (caCORRECT): A Bioinformatics System for Quality Assurance of Genomics and Proteomics Array Data

Todd H. Stokes; Richard A. Moffitt; John H. Phan; May D. Wang

Quality assurance of high throughput “-omics” data is a major concern for biomedical discovery and translational medicine, and is considered a top priority in bioinformatics and systems biology. Here, we report a web-based bioinformatics tool called caCORRECT for chip artifact detection, analysis, and CORRECTion, which removes systematic artifactual noises that are commonly observed in microarray gene expression data. Despite the development of major databases such as GEO arrayExpress, caArray, and the SMD to manage and distribute microarray data to the public, reproducibility has been questioned in many cases, including high-profile papers and datasets. Based on both archived and synthetic data, we have designed the caCORRECT to have several advanced features: (1) to uncover significant, correctable artifacts that affect reproducibility of experiments; (2) to improve the integrity and quality of public archives by removing artifacts; (3) to provide a universal quality score to aid users in their selection of suitable microarray data; and (4) to improve the true-positive rate of biomarker selection verified by test data. These features are expected to improve the reproducibility of Microarray study. caCORRECT is freely available at: http://caCORRECT.bme.gatech.edu.


Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine | 2012

Biological interpretation of morphological patterns in histopathological whole-slide images

Sonal Kothari; John H. Phan; Adeboye O. Osunkoya; May D. Wang

We propose a framework for studying visual morphological patterns across histopathological whole-slide images (WSIs). Image representation is an important component of computer-aided decision support systems for histopathological cancer diagnosis. Such systems extract hundreds of quantitative image features from digitized tissue biopsy slides and produce models for prediction. The performance of these models depends on the identification of informative features for selection of appropriate regions-of-interest (ROIs) from heterogeneous WSIs and for development of models. However, identification of informative features is hindered by the semantic gap between human interpretation of visual morphological patterns and quantitative image features. We address this challenge by using data mining and information visualization tools to study spatial patterns formed by features extracted from sub-sections of WSIs. Using ovarian serous cystadenocarcinoma (OvCa) WSIs provided by the cancer genome atlas (TCGA), we show that (1) individual and (2) multivariate image features correspond to biologically relevant ROIs, and (3) supervised image feature selection can map histopathology domain knowledge to quantitative image features.


international conference of the ieee engineering in medicine and biology society | 2013

Benchmarking RNA-Seq quantification tools

Raghu Chandramohan; Po-Yen Wu; John H. Phan; May D. Wang

RNA-Seq, a deep sequencing technique, promises to be a potential successor to microarraysfor studying the transcriptome. One of many aspects of transcriptomics that are of interest to researchers is gene expression estimation. With rapid development in RNA-Seq, there are numerous tools available to estimate gene expression, each producing different results. However, we do not know which of these tools produces the most accurate gene expression estimates. In this study we have addressed this issue using Cufflinks, IsoEM, HTSeq, and RSEM to quantify RNA-Seq expression profiles. Comparing results of these quantification tools, we observe that RNA-Seq relative expression estimates correlate with RT-qPCR measurements in the range of 0.85 to 0.89, with HTSeq exhibiting the highest correlation. But, in terms of root-mean-square deviation of RNA-Seq relative expression estimates from RT-qPCR measurements, we find HTSeq to produce the greatest deviation. Therefore, we conclude that, though Cufflinks, RSEM, and IsoEM might not correlate as well as HTSeq with RT-qPCR measurements, they may produce expression values with higher accuracy.


BMC Medical Imaging | 2013

Histological image classification using biologically interpretable shape-based features

Sonal Kothari; John H. Phan; Andrew N. Young; May D. Wang

BackgroundAutomatic cancer diagnostic systems based on histological image classification are important for improving therapeutic decisions. Previous studies propose textural and morphological features for such systems. These features capture patterns in histological images that are useful for both cancer grading and subtyping. However, because many of these features lack a clear biological interpretation, pathologists may be reluctant to adopt these features for clinical diagnosis.MethodsWe examine the utility of biologically interpretable shape-based features for classification of histological renal tumor images. Using Fourier shape descriptors, we extract shape-based features that capture the distribution of stain-enhanced cellular and tissue structures in each image and evaluate these features using a multi-class prediction model. We compare the predictive performance of the shape-based diagnostic model to that of traditional models, i.e., using textural, morphological and topological features.ResultsThe shape-based model, with an average accuracy of 77%, outperforms or complements traditional models. We identify the most informative shapes for each renal tumor subtype from the top-selected features. Results suggest that these shapes are not only accurate diagnostic features, but also correlate with known biological characteristics of renal tumors.ConclusionsShape-based analysis of histological renal tumor images accurately classifies disease subtypes and reveals biologically insightful discriminatory features. This method for shape-based analysis can be extended to other histological datasets to aid pathologists in diagnostic and therapeutic decisions.

Collaboration


Dive into the John H. Phan's collaboration.

Top Co-Authors

Avatar

May D. Wang

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Po-Yen Wu

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Richard A. Moffitt

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Sonal Kothari

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Todd H. Stokes

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Chang F. Quo

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge