Clayton Scott | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Clayton Scott is active.

Explore More

Publication

Featured researches published by Clayton Scott.

IEEE Transactions on Information Theory | 2006

Minimax-optimal classification with dyadic decision trees

Clayton Scott; Robert D. Nowak

Decision trees are among the most popular types of classifiers, with interpretability and ease of implementation being among their chief attributes. Despite the widespread use of decision trees, theoretical analysis of their performance has only begun to emerge in recent years. In this paper, it is shown that a new family of decision trees, dyadic decision trees (DDTs), attain nearly optimal (in a minimax sense) rates of convergence for a broad range of classification problems. Furthermore, DDTs are surprisingly adaptive in three important respects: they automatically 1) adapt to favorable conditions near the Bayes decision boundary; 2) focus on data distributed on lower dimensional manifolds; and 3) reject irrelevant features. DDTs are constructed by penalized empirical risk minimization using a new data-dependent penalty and may be computed exactly with computational complexity that is nearly linear in the training sample size. DDTs comprise the first classifiers known to achieve nearly optimal rates for the diverse class of distributions studied here while also being practical and implementable. This is also the first study (of which we are aware) to consider rates for adaptation to intrinsic data dimension and relevant features.

international conference on computer communications | 2008

Distributed Spatial Anomaly Detection

Parminder Chhabra; Clayton Scott; Eric D. Kolaczyk; Mark Crovella

Detection of traffic anomalies is an important problem that has been the focus of considerable research. Recent work has shown the utility of spatial detection of anomalies via crosslink traffic comparisons. In this paper we identify three advances that are needed to make such methods more useful and practical for network operators. First, anomaly detection methods should avoid global communication and centralized decision making. Second, nonparametric anomaly detection methods are needed to augment current parametric approaches. And finally, such methods should not just identify possible anomalies, but should also annotate each detection with some probabilistic qualifier of its importance. We propose a framework that simultaneously advances the current state of the art on all three fronts. We show that routers can effectively identify volume anomalies through crosslink comparison of traffic observed only on the routers own links. Second, we show that generalized quantile estimators are an effective way to identify high-dimensional sets of local traffic patterns that are potentially anomalous; such methods can be either parametric or nonparametric, and we evaluate both. Third, through the use of false discovery rate as a detection metric, we show that candidate anomalous patterns can be equipped with an estimate of a probability that they truly are anomalous. Overall, our framework provides network operators with an anomaly detection methodology that is distributed, effective, and easily interpretable. Part of the underlying statistical framework, which merges aspects of nonparametric set estimation and multiple hypothesis testing, is novel in itself, although the derivation of that framework is necessarily given elsewhere.

Computational Statistics & Data Analysis | 2012

EM algorithms for multivariate Gaussian mixture models with truncated and censored data

Gyemin Lee; Clayton Scott

We present expectation-maximization (EM) algorithms for fitting multivariate Gaussian mixture models to data that are truncated, censored or truncated and censored. These two types of incomplete measurements are naturally handled together through their relation to the multivariate truncated Gaussian distribution. We illustrate our algorithms on synthetic and flow cytometry data.

Journal of the American College of Cardiology | 2010

The Value of Defibrillator Electrograms for Recognition of Clinical Ventricular Tachycardias and for Pace Mapping of Post-Infarction Ventricular Tachycardia

Kentaro Yoshida; Tzu-Yu Liu; Clayton Scott; Alfred O. Hero; Miki Yokokawa; Sanjaya Gupta; Eric Good; Fred Morady; Frank Bogun

OBJECTIVES The purpose of this study was to assess the value of implantable cardioverter-defibrillator (ICD) electrograms (EGMs) in identifying clinically documented ventricular tachycardias (VTs). BACKGROUND Twelve-lead electrocardiograms (ECG) of spontaneous VT often are not available in patients referred for catheter ablation of post-infarction VT. Many of these patients have ICDs, and the ability of ICD EGMs to identify a specific configuration of VT has not been described. METHODS In 21 consecutive patients referred for catheter ablation of post-infarction VT, 124 VTs (mean cycle length: 393 ± 103 ms) were induced, and ICD EGMs were recorded during VT. Clinical VT had been documented with 12-lead ECGs in 15 of 21 patients. The 12-lead ECGs of the clinical VTs were compared with 64 different inducible VTs (mean cycle length: 390 ± 91 ms) to assess how well the ICD EGMs differentiated the clinical VTs from the other induced VTs. The exit site of 62 VTs (mean cycle length: 408 ± 112 ms) was identified by pace mapping (10 to 12 of 12 matching leads). The spatial resolution of pace mapping to identify a VT exit site was determined for both the 12-lead ECGs and the ICD EGMs using a customized MATLAB program (version 7.5, The MathWorks, Inc., Natick, Massachusetts). RESULTS Analysis of stored EGMs by comparison of receiver-operating characteristic curve cutoff values accurately distinguished the clinical VTs from 98% of the other inducible VTs. The mean spatial resolution of a 12-lead ECG pace map for the VT exit site was 2.9 ± 4.0 cm(2) (range 0 to 17.5 cm(2)) compared with 8.9 ± 9.0 cm(2) (range 0 to 35 cm(2)) for ICD EGM pace maps. The spatial resolution of pace mapping varied greatly between patients and between VTs. The spatial resolution of ICD EGMs was < 1.0 cm(2) for ≥ 1 of the target VTs in 12 of 21 patients and 19 of 62 VTs. By visual inspection of the ICD EGMs, 96% of the clinical VTs were accurately differentiated from previously undocumented VTs. CONCLUSIONS Stored ICD EGMs usually are an accurate surrogate for 12-lead ECGs for differentiating clinical VTs from other VTs. Pace mapping based on ICD EGMs has variable resolution but may be useful for identifying a VT exit site.

IEEE Transactions on Information Theory | 2007

Performance Measures for Neyman–Pearson Classification

Clayton Scott

In the Neyman-Pearson (NP) classification paradigm, the goal is to learn a classifier from labeled training data such that the probability of a false negative is minimized while the probability of a false positive is below a user-specified level alpha isin (0,1). This work addresses the question of how to evaluate and compare classifiers in the NP setting. Simply reporting false positives and false negatives leaves some ambiguity about which classifier is best. Unlike conventional classification, however, there is no natural performance measure for NP classification. We cannot reject classifiers whose false positive rate exceeds a since, among other reasons, the false positive rate must be estimated from data and hence is not known with certainty. We propose two families of performance measures for evaluating and comparing classifiers and suggest one criterion in particular for practical use. We then present general learning rules that satisfy performance guarantees with respect to these criteria. As in conventional classification, the notion of uniform convergence plays a central role, and leads to finite sample bounds, oracle inequalities, consistency, and rates of convergence. The proposed performance measures are also applicable to the problem of anomaly prediction.

international conference on acoustics, speech, and signal processing | 2006

Controlling False Alarms With Support Vector Machines

Mark A. Davenport; Richard G. Baraniuk; Clayton Scott

We study the problem of designing support vector classifiers with respect to a Neyman-Pearson criterion. Specifically, given a user-specified level alpha isin (0,1), how can we ensure a false alarm rate no greater than q while minimizing the miss rate? We examine two approaches, one based on shifting the offset of a conventionally trained SVM and the other based on the introduction of class-specific weights. Our contributions include a novel heuristic for improved error estimation and a strategy for efficiently searching the parameter space of the second method. We also provide a characterization of the feasible parameter set of the 2v-SVM on which the second approach is based. The proposed methods are compared on four benchmark datasets

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2010

Tuning Support Vector Machines for Minimax and Neyman-Pearson Classification

Mark A. Davenport; Richard G. Baraniuk; Clayton Scott

This paper studies the training of support vector machine (SVM) classifiers with respect to the minimax and Neyman-Pearson criteria. In principle, these criteria can be optimized in a straightforward way using a cost-sensitive SVM. In practice, however, because these criteria require especially accurate error estimation, standard techniques for tuning SVM parameters, such as cross-validation, can lead to poor classifier performance. To address this issue, we first prove that the usual cost-sensitive SVM, here called the 2C-SVM, is equivalent to another formulation called the 2\nu-SVM. We then exploit a characterization of the 2\nu-SVM parameter space to develop a simple yet powerful approach to error estimation based on smoothing. In an extensive experimental study, we demonstrate that smoothing significantly improves the accuracy of cross-validation error estimates, leading to dramatic performance gains. Furthermore, we propose coordinate descent strategies that offer significant gains in computational efficiency, with little to no loss in performance.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2010

L₂ Kernel Classification

JooSeuk Kim; Clayton Scott

Nonparametric kernel methods are widely used and proven to be successful in many statistical learning problems. Well--known examples include the kernel density estimate (KDE) for density estimation and the support vector machine (SVM) for classification. We propose a kernel classifier that optimizes the L2 or integrated squared error (ISE) of a “difference of densities.” We focus on the Gaussian kernel, although the method applies to other kernels suitable for density estimation. Like a support vector machine (SVM), the classifier is sparse and results from solving a quadratic program. We provide statistical performance guarantees for the proposed L2 kernel classifier in the form of a finite sample oracle inequality and strong consistency in the sense of both ISE and probability of error. A special case of our analysis applies to a previously introduced ISE-based method for kernel density estimation. For dimensionality greater than 15, the basic L2 kernel classifier performs poorly in practice. Thus, we extend the method through the introduction of a natural regularization parameter, which allows it to remain competitive with the SVM in high dimensions. Simulation results for both synthetic and real-world data are presented.

Annals of Statistics | 2009

ADAPTIVE HAUSDORFF ESTIMATION OF DENSITY LEVEL SETS

Aarti Singh; Clayton Scott; Robert D. Nowak

Consider the problem of estimating the γ-level set G∗γ = {x : f(x) ≥ γ} of an unknown d-dimensional density function f based on n independent observations X1, . . . ,Xn from the density. This problem has been addressed under global error criteria related to the symmetric set difference. However, in certain applications such as anomaly detection and clustering, a spatially uniform confidence interval is desired to ensure that the estimated set is close to the target set everywhere. The Hausdorff error criterion provides this degree of uniformity and hence is more appropriate in such situations. The minimax optimal rate of Hausdorff error convergence is known to be (n/ log n) for level sets with boundaries that have a Lipschitz functional form, and where the parameter α characterizes the regularity of the density around the level of interest. However, previously developed estimators are non-adaptive to the density regularity and assume knowledge of α. Moreover, the estimators proposed in previous work achieve the minimax optimal rate for rather restricted classes of sets (for example, the boundary fragment and star-shaped sets) that effectively reduce the set estimation problem to a function estimation problem. This characterization precludes level sets with multiple connected components, which are fundamental to many applications. This paper presents a fully data-driven procedure that is adaptive to unknown local density regularity, and achieves minimax optimal Hausdorff error control for a class of level sets with very general shapes and multiple connected components.

international conference on acoustics, speech, and signal processing | 2007

The One Class Support Vector Machine Solution Path

Gyemin Lee; Clayton Scott

This paper applies the algorithm of Hastie et al., (2004) to the problem of learning the entire solution path of the one class support vector machine (OC-SVM) as its free parameter ν varies from 0 to 1. The OC-SVM with Gaussian kernel is a nonparametric estimator of a level set of the density governing the observed sample, with the parameter ν implicitly defining the corresponding level. Thus, the path algorithm produces estimates of all level sets and can therefore be applied to a variety of problems requiring estimation of multiple level sets including clustering, outlier ranking, minimum volume set estimation, and density estimation. The algorithms cost is comparable to the cost of computing the OC-SVM for a single point on the path. We introduce a heuristic for enforced nestedness of the sets in the path, and present a method for kernel bandwidth selection based in minimum integrated volume, a kind of AUC criterion. These methods are illustrated on three datasets.

Explore More