Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kelly H. Zou is active.

Publication


Featured researches published by Kelly H. Zou.


IEEE Transactions on Medical Imaging | 2004

Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation

Simon K. Warfield; Kelly H. Zou; William M. Wells

Characterizing the performance of image segmentation approaches has been a persistent challenge. Performance analysis is important since segmentation algorithms often have limited accuracy and precision. Interactive drawing of the desired segmentation by human raters has often been the only acceptable approach, and yet suffers from intra-rater and inter-rater variability. Automated algorithms have been sought in order to remove the variability introduced by raters, but such algorithms must be assessed to ensure they are suitable for the task. The performance of raters (human or algorithmic) generating segmentations of medical images has been difficult to quantify because of the difficulty of obtaining or estimating a known true segmentation for clinical data. Although physical and digital phantoms can be constructed for which ground truth is known or readily estimated, such phantoms do not fully reflect clinical images due to the difficulty of constructing phantoms which reproduce the full range of imaging characteristics and normal and pathological anatomical variability observed in clinical data. Comparison to a collection of segmentations by raters is an attractive alternative since it can be carried out directly on the relevant clinical imaging data. However, the most appropriate measure or set of measures with which to compare such segmentations has not been clarified and several measures are used in practice. We present here an expectation-maximization algorithm for simultaneous truth and performance level estimation (STAPLE). The algorithm considers a collection of segmentations and computes a probabilistic estimate of the true segmentation and a measure of the performance level represented by each segmentation. The source of each segmentation in the collection may be an appropriately trained human rater or raters, or may be an automated segmentation algorithm. The probabilistic estimate of the true segmentation is formed by estimating an optimal combination of the segmentations, weighting each segmentation depending upon the estimated performance level, and incorporating a prior model for the spatial distribution of structures being segmented as well as spatial homogeneity constraints. STAPLE is straightforward to apply to clinical imaging data, it readily enables assessment of the performance of an automated image segmentation algorithm, and enables direct comparison of human rater and algorithm performance.


Circulation | 2007

Receiver-Operating Characteristic Analysis for Evaluating Diagnostic Tests and Predictive Models

Kelly H. Zou; A. James O’Malley; Laura Mauri

Receiver-operating characteristic (ROC) analysis was originally developed during World War II to analyze classification accuracy in differentiating signal from noise in radar detection.1 Recently, the methodology has been adapted to several clinical areas heavily dependent on screening and diagnostic tests,2–4 in particular, laboratory testing,5 epidemiology,6 radiology,7–9 and bioinformatics.10 ROC analysis is a useful tool for evaluating the performance of diagnostic tests and more generally for evaluating the accuracy of a statistical model (eg, logistic regression, linear discriminant analysis) that classifies subjects into 1 of 2 categories, diseased or nondiseased. Its function as a simple graphical tool for displaying the accuracy of a medical diagnostic test is one of the most well-known applications of ROC curve analysis. In Circulation from January 1, 1995, through December 5, 2005, 309 articles were published with the key phrase “receiver operating characteristic.” In cardiology, diagnostic testing plays a fundamental role in clinical practice (eg, serum markers of myocardial necrosis, cardiac imaging tests). Predictive modeling to estimate expected outcomes such as mortality or adverse cardiac events based on patient risk characteristics also is common in cardiovascular research. ROC analysis is a useful tool in both of these situations. In this article, we begin by reviewing the measures of accuracy—sensitivity, specificity, and area under the curve (AUC)—that use the ROC curve. We also illustrate how these measures can be applied using the evaluation of a hypothetical new diagnostic test as an example. A diagnostic classification test typically yields binary, ordinal, or continuous outcomes. The simplest type, binary outcomes, arises from a screening test indicating whether the patient is nondiseased (Dx=0) or diseased (Dx=1). The screening test indicates whether the patient is likely to be diseased or not. When >2 categories are used, the test data can be on an ordinal rating …


Academic Radiology | 2004

Statistical validation of image segmentation quality based on a spatial overlap index.

Kelly H. Zou; Simon K. Warfield; Aditya Bharatha; Clare M. Tempany; Michael Kaus; Steven Haker; William M. Wells; Ferenc A. Jolesz; Ron Kikinis

RATIONALE AND OBJECTIVES To examine a statistical validation method based on the spatial overlap between two sets of segmentations of the same anatomy. MATERIALS AND METHODS The Dice similarity coefficient (DSC) was used as a statistical validation metric to evaluate the performance of both the reproducibility of manual segmentations and the spatial overlap accuracy of automated probabilistic fractional segmentation of MR images, illustrated on two clinical examples. Example 1: 10 consecutive cases of prostate brachytherapy patients underwent both preoperative 1.5T and intraoperative 0.5T MR imaging. For each case, 5 repeated manual segmentations of the prostate peripheral zone were performed separately on preoperative and on intraoperative images. Example 2: A semi-automated probabilistic fractional segmentation algorithm was applied to MR imaging of 9 cases with 3 types of brain tumors. DSC values were computed and logit-transformed values were compared in the mean with the analysis of variance (ANOVA). RESULTS Example 1: The mean DSCs of 0.883 (range, 0.876-0.893) with 1.5T preoperative MRI and 0.838 (range, 0.819-0.852) with 0.5T intraoperative MRI (P < .001) were within and at the margin of the range of good reproducibility, respectively. Example 2: Wide ranges of DSC were observed in brain tumor segmentations: Meningiomas (0.519-0.893), astrocytomas (0.487-0.972), and other mixed gliomas (0.490-0.899). CONCLUSION The DSC value is a simple and useful summary measure of spatial overlap, which can be applied to studies of reproducibility and accuracy in image segmentation. We observed generally satisfactory but variable validation results in two clinical applications. This metric may be adapted for similar validation tasks.


Journal of Biomedical Informatics | 2005

The use of receiver operating characteristic curves in biomedical informatics

Thomas A. Lasko; Jui G. Bhagwat; Kelly H. Zou; Lucila Ohno-Machado

Receiver operating characteristic (ROC) curves are frequently used in biomedical informatics research to evaluate classification and prediction models for decision support, diagnosis, and prognosis. ROC analysis investigates the accuracy of a models ability to separate positive from negative cases (such as predicting the presence or absence of disease), and the results are independent of the prevalence of positive cases in the study population. It is especially useful in evaluating predictive models or other tests that produce output values over a continuous range, since it captures the trade-off between sensitivity and specificity over that range. There are many ways to conduct an ROC analysis. The best approach depends on the experiment; an inappropriate approach can easily lead to incorrect conclusions. In this article, we review the basic concepts of ROC analysis, illustrate their use with sample calculations, make recommendations drawn from the literature, and list readily available software.


Medical Physics | 2001

Evaluation of three‐dimensional finite element‐based deformable registration of pre‐ and intraoperative prostate imaging

Aditya Bharatha; Masanori Hirose; Nobuhiko Hata; Simon K. Warfield; Matthieu Ferrant; Kelly H. Zou; Eduardo Suarez-Santana; Juan Ruiz-Alzola; Anthony V. D'Amico; Robert A. Cormack; Ron Kikinis; Ferenc A. Jolesz; Clare M. Tempany

In this report we evaluate an image registration technique that can improve the information content of intraoperative image data by deformable matching of preoperative images. In this study, pretreatment 1.5 tesla (T) magnetic resonance (MR) images of the prostate are registered with 0.5 T intraoperative images. The method involves rigid and nonrigid registration using biomechanical finite element modeling. Preoperative 1.5 T MR imaging is conducted with the patient supine, using an endorectal coil, while intraoperatively, the patient is in the lithotomy position with a rectal obturator in place. We have previously observed that these changes in patient position and rectal filling produce a shape change in the prostate. The registration of 1.5 T preoperative images depicting the prostate substructure [namely central gland (CG) and peripheral zone (PZ)] to 0.5 T intraoperative MR images using this method can facilitate the segmentation of the substructure of the gland for radiation treatment planning. After creating and validating a dataset of manually segmented glands from images obtained in ten sequential MR-guided brachytherapy cases, we conducted a set of experiments to assess our hypothesis that the proposed registration system can significantly improve the quality of matching of the total gland (TG), CG, and PZ. The results showed that the method statistically-significantly improves the quality of match (compared to rigid registration), raising the Dice similarity coefficient (DSC) from prematched coefficients of 0.81, 0.78, and 0.59 for TG, CG, and PZ, respectively, to 0.94, 0.86, and 0.76. A point-based measure of registration agreement was also improved by the deformable registration. CG and PZ volumes are not changed by the registration, indicating that the method maintains the biomechanical topology of the prostate. Although this strategy was tested for MRI-guided brachytherapy, the preliminary results from these experiments suggest that it may be applied to other settings such as transrectal ultrasound-guided therapy, where the integration of preoperative MRI may have a significant impact upon treatment planning and guidance.


The Journal of Urology | 2002

Etiology Of Spontaneous Perirenal Hemorrhage: A Meta-Analysis

Jian Qing Zhang; Julia R. Fielding; Kelly H. Zou

PURPOSE We determine the most common etiology of spontaneous perirenal hemorrhage. MATERIALS AND METHODS A MEDLINE search of the English language literature from 1985 to 1999 revealed 47 publications and 165 cases of spontaneous renal hemorrhage meeting our study entry criteria. These criteria were presentation of raw data including imaging modality, pathological confirmation (123 cases) or long-term (greater than 2 years) (42) imaging and/or clinical followup and no history of recent trauma, anticoagulant use, dialysis or renal transplant. Meta-analysis was performed using analysis of counts derived from contingency tables and pooled and stratified analysis. RESULTS Hemorrhage was identified by ultrasound in 56 of 100 cases (56%) and by computerized tomography (CT) in all 135 cases assessed (100%). Etiology was correctly identified with an overall sensitivity and specificity of 0.11 and 0.33 for ultrasound and 0.57 and 0.82 for CT. Angiography in 81 cases revealed active bleeding in 11. The most common etiology of spontaneous renal hemorrhage was benign or malignant neoplasm (101 cases, 61%) with angiomyolipoma being predominant (48) followed closely by renal cell carcinoma (43). Vascular disease was the next most common offender (28 cases, 17%) with polyarteritis nodosa occurring most frequently (20). CONCLUSIONS The most common cause of spontaneous perirenal hemorrhage is renal neoplasm and approximately 50% of such neoplasms are malignant. CT is the method of choice for evaluation of perirenal hemorrhage, although its sensitivity for detection of underlying etiology is only moderate.


Cancer Cytopathology | 2009

EGFR mutations are detected comparably in cytologic and surgical pathology specimens of nonsmall cell lung cancer.

Jason H. Smouse; Edmund S. Cibas; Pasi A. Jänne; Victoria A. Joshi; Kelly H. Zou; Neal I. Lindeman

Somatic mutations in the epidermal growth factor receptor (EGFR) are present in ∼10% of nonsmall cell lung cancers, and higher in never‐smokers, women, and Asians. Small in‐frame deletions in exon 19 (∼45%) and L858R mutation in exon 21 (∼40%) predict response to treatment with tyrosine kinase inhibitors, whereas some others herald resistance. Direct sequencing of tumor DNA detects all EGFR mutations, but is limited by interference from nonmalignant cells within the samples. Concern over such interference has discouraged testing cytologic samples, but the adequacy of cytologic specimens for EGFR sequencing has not been studied.


Human Brain Mapping | 2006

Quantitative Evaluation of Automated Skull-Stripping Methods Applied to Contemporary and Legacy Images: Effects of Diagnosis, Bias Correction, and Slice Location

Christine Fennema-Notestine; Ibrahim Burak Ozyurt; Camellia Clark; Shaunna Morris; Amanda Bischoff-Grethe; Mark W. Bondi; Terry L. Jernigan; Bruce Fischl; Florent Ségonne; David W. Shattuck; Richard M. Leahy; David E. Rex; Arthur W. Toga; Kelly H. Zou; Gregory G. Brown

Performance of automated methods to isolate brain from nonbrain tissues in magnetic resonance (MR) structural images may be influenced by MR signal inhomogeneities, type of MR image set, regional anatomy, and age and diagnosis of subjects studied. The present study compared the performance of four methods: Brain Extraction Tool (BET; Smith [ 2002 ]: Hum Brain Mapp 17:143–155); 3dIntracranial (Ward [ 1999 ] Milwaukee: Biophysics Research Institute, Medical College of Wisconsin; in AFNI); a Hybrid Watershed algorithm (HWA, Segonne et al. [ 2004 ] Neuroimage 22:1060–1075; in FreeSurfer); and Brain Surface Extractor (BSE, Sandor and Leahy [ 1997 ] IEEE Trans Med Imag 16:41–54; Shattuck et al. [ 2001 ] Neuroimage 13:856–876) to manually stripped images. The methods were applied to uncorrected and bias‐corrected datasets; Legacy and Contemporary T1‐weighted image sets; and four diagnostic groups (depressed, Alzheimers, young and elderly control). To provide a criterion for outcome assessment, two experts manually stripped six sagittal sections for each dataset in locations where brain and nonbrain tissue are difficult to distinguish. Methods were compared on Jaccard similarity coefficients, Hausdorff distances, and an Expectation‐Maximization algorithm. Methods tended to perform better on contemporary datasets; bias correction did not significantly improve method performance. Mesial sections were most difficult for all methods. Although AD image sets were most difficult to strip, HWA and BSE were more robust across diagnostic groups compared with 3dIntracranial and BET. With respect to specificity, BSE tended to perform best across all groups, whereas HWA was more sensitive than other methods. The results of this study may direct users towards a method appropriate to their T1‐weighted datasets and improve the efficiency of processing for large, multisite neuroimaging studies. Hum. Brain Mapping, 2005.


Journal of Magnetic Resonance Imaging | 2002

Quantitative Analysis of MRI Signal Abnormalities of Brain White Matter With High Reproducibility and Accuracy

Xingchang Wei; Simon K. Warfield; Kelly H. Zou; Ying Wu; Xiaoming Li; Alexandre Guimond; John P. Mugler; Randall R. Benson; Leslie Wolfson; Howard L. Weiner; Charles R. G. Guttmann

To assess the reproducibility and accuracy compared to radiologists of three automated segmentation pipelines for quantitative magnetic resonance imaging (MRI) measurement of brain white matter signal abnormalities (WMSA).


Magnetic Resonance in Medicine | 2000

Multi-component apparent diffusion coefficients in human brain : Relationship to spin-lattice relaxation

Robert V. Mulkern; Hale Pinar Zengingonul; Richard L. Robertson; Péter Bogner; Kelly H. Zou; Hakon Gudbjartsson; Charles R. G. Guttmann; David Holtzman; Walid E. Kyriakos; Ferenc A. Jolesz; Stephan E. Maier

In vivo measurements of the human brain tissue water signal decay with b‐factor over an extended b‐factor range up to 6,000 s/mm 2 reveal a nonmonoexponential decay behavior for both gray and white matter. Biexponential parametrization of the decay curves from cortical gray (CG) and white matter voxels from the internal capsule (IC) of healthy adult volunteers describes the decay process and serves to differentiate between these two tissues. Inversion recovery experiments performed in conjunction with the extended b‐factor signal decay measurements are used to make separate measurements of the spin‐lattice relaxation times of the fast and slow apparent diffusion coefficient (ADC) components. Differences between the spin‐lattice relaxation times of the fast and slow ADC components were not statistically significant in either the CG or IC voxels. It is possible that the two ADC components observed from the extended b‐factor measurements arise from two distinct water compartments with different intrinsic diffusion coefficients. If so, then the relaxation results are consistent with two possibilities. Either the spin‐lattice relaxation times within the compartments are similar or the rate of water exchange between compartments is “fast” enough to ensure volume averaged T1 relaxation yet “slow” enough to allow for the observation of biexponential ADC decay curves over an extended b‐factor range. Magn Reson Med 44:292–300, 2000.

Collaboration


Dive into the Kelly H. Zou's collaboration.

Top Co-Authors

Avatar

Simon K. Warfield

Boston Children's Hospital

View shared research outputs
Top Co-Authors

Avatar

Stuart G. Silverman

Brigham and Women's Hospital

View shared research outputs
Top Co-Authors

Avatar

Julia R. Fielding

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Clare M. Tempany

Brigham and Women's Hospital

View shared research outputs
Top Co-Authors

Avatar

Ron Kikinis

Brigham and Women's Hospital

View shared research outputs
Top Co-Authors

Avatar

William M. Wells

Brigham and Women's Hospital

View shared research outputs
Top Co-Authors

Avatar

Ferenc A. Jolesz

Brigham and Women's Hospital

View shared research outputs
Top Co-Authors

Avatar

Jui G. Bhagwat

Brigham and Women's Hospital

View shared research outputs
Top Co-Authors

Avatar

Kemal Tuncali

Brigham and Women's Hospital

View shared research outputs
Researchain Logo
Decentralizing Knowledge