Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jin Chu Wu is active.

Publication


Featured researches published by Jin Chu Wu.


Journal of Research of the National Institute of Standards and Technology | 2011

Measures, Uncertainties, and Significance Test in Operational ROC Analysis

Jin Chu Wu; Alvin F. Martin; Raghu N. Kacker

In receiver operating characteristic (ROC) analysis, the sampling variability can result in uncertainties of performance measures. Thus, while evaluating and comparing the performances of algorithms, the measurement uncertainties must be taken into account. The key issue is how to calculate the uncertainties of performance measures in ROC analysis. Our ultimate goal is to perform the significance test in evaluation and comparison using the standard errors computed. From the operational perspective, based on fingerprint-image matching algorithms on large datasets, the measures and their uncertainties are investigated in the three scenarios: 1) the true accept rate (TAR) of genuine scores at a specified false accept rate (FAR) of impostor scores, 2) the TAR and FAR at a given threshold, and 3) the equal error rate. The uncertainties of measures are calculated using the nonparametric two-sample bootstrap based on our extensive studies of bootstrap variability on large datasets. The significance test is carried out to determine whether the difference between the performance of one algorithm and a hypothesized value, or the difference between the performances of two algorithms where the correlation is taken into account is statistically significant. Examples are provided.


Communications in Statistics - Simulation and Computation | 2014

Bootstrap Variability Studies in ROC Analysis on Large Datasets

Jin Chu Wu; Alvin F. Martin; Raghu N. Kacker

The nonparametric two-sample bootstrap is employed to compute uncertainties of measures in receiver operating characteristic (ROC) analysis on large datasets in areas such as biometrics, and so on. In this framework, the bootstrap variability was empirically studied without a normality assumption, exhaustively in five scenarios involving both high- and low-accuracy matching algorithms. With a tolerance 0.02 of the coefficient of variation, it was found that 2000 bootstrap replications were appropriate for ROC analysis on large datasets in order to reduce the bootstrap variance and ensure the accuracy of the computation.


Proceedings of SPIE | 2013

Significance test with data dependency in speaker recognition evaluation

Jin Chu Wu; Alvin F. Martin; Craig S. Greenberg; Raghu N. Kacker; Vincent M. Stanford

To evaluate the performance of speaker recognition systems, a detection cost function defined as a weighted sum of the probabilities of type I and type II errors is employed. The speaker datasets may have data dependency due to multiple uses of the same subjects. Using the standard errors of the detection cost function computed by means of the two-layer nonparametric two-sample bootstrap method, a significance test is performed to determine whether the difference between the measured performance levels of two speaker recognition algorithms is statistically significant. While conducting the significance test, the correlation coefficient between two systems’ detection cost functions is taken into account. Examples are provided.


Proceedings of SPIE | 2010

Significance test in operational ROC analysis

Jin Chu Wu; Alvin F. Martin; Raghu N. Kacker; Charles Hagwood

To evaluate the performance of fingerprint-image matching algorithms on large datasets, a receiver operating characteristic (ROC) curve is applied. From the operational perspective, the true accept rate (TAR) of the genuine scores at a specified false accept rate (FAR) of the impostor scores and/or the equal error rate (EER) are often employed. Using the standard errors of these metrics computed using the nonparametric two-sample bootstrap based on our studies of bootstrap variability on large fingerprint datasets, the significance test is performed to determine whether the difference between the performance of one algorithm and a hypothesized value, or the difference between the performances of two algorithms where the correlation is taken into account is statistically significant. In the case that the alternative hypothesis is accepted, the sign of the difference is employed to determine which is better than the other. Examples are provided.


Communications in Statistics - Simulation and Computation | 2016

Validation of Nonparametric Two-sample Bootstrap in ROC Analysis on Large Datasets

Jin Chu Wu; Alvin F. Martin; Raghu N. Kacker

The nonparametric two-sample bootstrap is applied to computing uncertainties of measures in receiver operating characteristic (ROC) analysis on large datasets in areas such as biometrics, speaker recognition, etc. when the analytical method cannot be used. Its validation was studied by computing the standard errors of the area under ROC curve using the well-established analytical Mann–Whitney statistic method and also using the bootstrap. The analytical result is unique. The bootstrap results are expressed as a probability distribution due to its stochastic nature. The comparisons were carried out using relative errors and hypothesis testing. These match very well. This validation provides a sound foundation for such computations.


Proceedings of SPIE | 2012

Data Dependency on Measurement Uncertainties in Speaker Recognition Evaluation

Jin Chu Wu; Alvin F. Martin; Craig S. Greenberg; Raghu N. Kacker

The National Institute of Standards and Technology conducts an ongoing series of Speaker Recognition Evaluations (SRE). Speaker detection performance is measured using a detection cost function defined as a weighted sum of the probabilities of type I and type II errors. The sampling variability can result in measurement uncertainties. In our prior study, the data independency was assumed in using the nonparametric two-sample bootstrap method to compute the standard errors (SE) of the detection cost function based on our extensive bootstrap variability studies in ROC analysis on large datasets. In this article, the data dependency caused by multiple uses of the same subjects is taken into account. The data are grouped into target sets and non-target sets, and each set contains multiple scores. One-layer and two-layer bootstrap methods are proposed based on whether the two-sample bootstrap resampling takes place only on target sets and non-target sets, or subsequently on target scores and non-target scores within the sets, respectively. The SEs of the detection cost function using these two methods along with those with the assumption of data independency are compared. It is found that the data dependency increases both estimated SEs and the variations of SEs. Some suggestions regarding the test design are provided.


IEEE Transactions on Audio, Speech, and Language Processing | 2017

The Impact of Data Dependence on Speaker Recognition Evaluation

Jin Chu Wu; Alvin F. Martin; Craig S. Greenberg; Raghu N. Kacker

The data dependence due to multiple use of the same subjects has impact on the standard error (SE) of the detection cost function (DCF) in speaker recognition evaluation. The DCF is defined as a weighted sum of the probabilities of type I and type II errors at a given threshold. A two-layer data structure is constructed: Target scores are grouped into target sets based on the dependence, and likewise for non-target scores. On account of the needed equal probabilities for scores being selected when resampling, target sets must contain the same number of target scores, and so must non-target sets. In addition to the bootstrap method with i.i.d. assumption, the nonparametric two-sample one-layer and two-layer bootstrap methods are carried out based on whether the resampling takes place only on sets, or subsequently on scores within the sets. Due to the stochastic nature of the bootstrap, the distributions of the SEs of the DCF estimated using the three different bootstrap methods are created and compared. After performing hypothesis testing, it is found that data dependence increases not only the SE but also the variation of the SE, and the two-layer bootstrap is more conservative than the one-layer bootstrap. The rationale regarding the different impacts of the three bootstrap methods on the estimated SEs is investigated.


Proceedings of SPIE | 2011

Uncertainties of Measures in Speaker Recognition Evaluation

Jin Chu Wu; Alvin F. Martin; Craig S. Greenberg; Raghu N. Kacker

The National Institute of Standards and Technology (NIST) Speaker Recognition Evaluations (SRE) are an ongoing series of projects conducted by NIST. In the NIST SRE, speaker detection performance is measured using a detection cost function, which is defined as a weighted sum of probabilities of type I error and type II error. The sampling variability can result in measurement uncertainties of the detection cost function. Hence, while evaluating and comparing the performances of speaker recognition systems, the uncertainties of measures must be taken into account. In this article, the uncertainties of detection cost functions in terms of standard errors (SE) and confidence intervals are computed using the nonparametric two-sample bootstrap methods based on our extensive bootstrap variability studies on large datasets conducted before. The data independence is assumed because the bootstrap results of SEs matched very well with the analytical results of SEs using the Mann-Whitney statistic for independent and identically distributed samples if the metric of area under a receiver operating characteristic curve is employed. Examples are provided.


BMC Bioinformatics | 2017

A novel measure and significance testing in data analysis of cell image segmentation

Jin Chu Wu; Michael Halter; Raghu N. Kacker; John T. Elliott; Anne L. Plant

BackgroundCell image segmentation (CIS) is an essential part of quantitative imaging of biological cells. Designing a performance measure and conducting significance testing are critical for evaluating and comparing the CIS algorithms for image-based cell assays in cytometry. Many measures and methods have been proposed and implemented to evaluate segmentation methods. However, computing the standard errors (SE) of the measures and their correlation coefficient is not described, and thus the statistical significance of performance differences between CIS algorithms cannot be assessed.ResultsWe propose the total error rate (TER), a novel performance measure for segmenting all cells in the supervised evaluation. The TER statistically aggregates all misclassification error rates (MER) by taking cell sizes as weights. The MERs are for segmenting each single cell in the population. The TER is fully supported by the pairwise comparisons of MERs using 106 manually segmented ground-truth cells with different sizes and seven CIS algorithms taken from ImageJ. Further, the SE and 95% confidence interval (CI) of TER are computed based on the SE of MER that is calculated using the bootstrap method. An algorithm for computing the correlation coefficient of TERs between two CIS algorithms is also provided. Hence, the 95% CI error bars can be used to classify CIS algorithms. The SEs of TERs and their correlation coefficient can be employed to conduct the hypothesis testing, while the CIs overlap, to determine the statistical significance of the performance differences between CIS algorithms.ConclusionsA novel measure TER of CIS is proposed. The TER’s SEs and correlation coefficient are computed. Thereafter, CIS algorithms can be evaluated and compared statistically by conducting the significance testing.


NIST Interagency/Internal Report (NISTIR) - 7730 | 2010

Further Studies of Bootstrap Variability for ROC Analysis on Large Datasets

Jin Chu Wu; Alvin F. Martin; Raghu N. Kacker

Collaboration


Dive into the Jin Chu Wu's collaboration.

Top Co-Authors

Avatar

Raghu N. Kacker

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

Alvin F. Martin

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

Craig S. Greenberg

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

John T. Elliott

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

Michael Halter

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

Anne L. Plant

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

Vincent M. Stanford

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

Charles Hagwood

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

Gregory A. Sanders

National Institute of Standards and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge