Corinna Cortes
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Corinna Cortes.
Machine Learning | 1995
Corinna Cortes; Vladimir Vapnik
The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
international conference on pattern recognition | 1994
Léon Bottou; Corinna Cortes; John S. Denker; Harris Drucker; Isabelle Guyon; Larry D. Jackel; Yann LeCun; Urs Muller; Eduard Sackinger; Patrice Y. Simard; Vladimir Vapnik
This paper compares the performance of several classifier algorithms on a standard database of handwritten digits. We consider not only raw accuracy, but also training time, recognition time, and memory requirements. When available, we report measurements of the fraction of patterns that must be rejected so that the remaining patterns have misclassification rates less than a given threshold.
Neural Computation | 1994
Harris Drucker; Corinna Cortes; Lawrence D. Jackel; Yann LeCun; Vladimir Vapnik
We compare the performance of three types of neural network-based ensemble techniques to that of a single neural network. The ensemble algorithms are two versions of boosting and committees of neural networks trained independently. For each of the four algorithms, we experimentally determine the test and training error curves in an optical character recognition (OCR) problem as both a function of training set size and computational cost using three architectures. We show that a single machine is best for small training set size while for large training set size some version of boosting is best. However, for a given computational cost, boosting is always best. Furthermore, we show a surprising result for the original boosting algorithm: namely, that as the training set size increases, the training error decreases until it asymptotes to the test error rate. This has potential implications in the search for better training algorithms.
intelligent data analysis | 2001
Corinna Cortes; Daryl Pregibon; Chris Volinsky
We consider problems that can be characterized by large dynamic graphs. Communication networks provide the prototypical example of such problems where nodes in the graph are network IDs and the edges represent communication between pairs of network IDs. In such graphs, nodes and edges appear and disappear through time so that methods that apply to static graphs are not sufficient. We introduce a data structure that captures, in an approximate sense, the graph and its evolution through time. The data structure arises from a bottom-up representation of the large graph as the union of small subgraphs, called Communities of Interest (COI), centered on every node. These subgraphs are interesting in their own right and we discuss two applications in the area of telecommunications fraud detection to help motivate the ideas.
knowledge discovery and data mining | 1999
William DuMouchel; Chris Volinsky; Theodore Johnson; Corinna Cortes; Daryl Pregibon
A feature of data mining that distinguishes it from “classical” machine learning (ML) and statistical modeling (SM) is scale. The community seems to agree on this yet progress to this point has been limited. We present a methodology that addresses scale in a novel fashion that has the potential for revolutionizing the field. While the methodology applies most directly to flat (row by column) data sets we believe that it can be adapted to other representations. Our approach to the problem is not to scale up individual ML and SM methods. Rather we prefer to leverage the entire collection of existing methods by scaling down the data set. We call the method squashing. Our method demonstrably outperforms random sampling and a theoretical argument suggests how and why it works well. Squashing consists of three modular steps: grouping, momentizing, and generating (GMG). These three steps describe the squashing pipeline whereby the original (very large data set) is sectioned off into mutually exclusive groups (or bins); within each group a series of low-order moments are computed; and finally these moments are passed off to a routine that generates pseudo data that accurately reproduce the moments. The result of the GMG squashing pipeline is a squashed data set that has the same structure as the original data with the addition of a weight for each pseudo data point that reflects the distribution of the original data into the initial groups. Any ML or SM method that accepts weights can be used to analyze the weighted pseudo data. By construction the resulting analyses will mimic the corresponding analyses on the original data set. Squashing should appeal to many of the sub-disciplines of
algorithmic learning theory | 2008
Corinna Cortes; Mehryar Mohri; Michael Riley; Afshin Rostamizadeh
This paper presents a theoretical analysis of sample selection bias correction. The sample bias correction technique commonly used in machine learning consists of reweighting the cost of an error on each training point of a biased sample to more closely reflect the unbiased distribution. This relies on weights derived by various estimation techniques based on finite samples. We analyze the effect of an error in that estimation on the accuracy of the hypothesis returned by the learning algorithm for two estimation techniques: a cluster-based estimation technique and kernel mean matching. We also report the results of sample bias correction experiments with several data sets using these techniques. Our analysis is based on the novel concept of distributional stabilitywhich generalizes the existing concept of point-based stability. Much of our work and proof techniques can be used to analyze other importance weighting techniques and their effect on accuracy when using a distributionally stable algorithm.
Data Mining and Knowledge Discovery | 2001
Corinna Cortes; Daryl Pregibon
We have been developing signature-based methods in the telecommunications industry for the past 5 years. In this paper, we describe our work as it evolved due to improvements in technology and our aggressive attitude toward scale. We discuss the types of features that our signatures contain, nuances of how these are updated through time, our treatment of outliers, and the trade-off between time-driven and event-driven processing. We provide a number of examples, all drawn from the application of signatures to toll fraud detection.
Journal of Computational and Graphical Statistics | 2003
Corinna Cortes; Daryl Pregibon; Chris Volinsky
This article considers problems that can be characterized by large dynamic graphs. Communication networks provide the prototypical example of such problems where nodes in the graph are network IDs and the edges represent communication between pairs of network IDs. In such graphs, nodes and edges appear and disappear through time so that methods that apply to static graphs are not sufficient. Our definition of a dynamic graph is procedural. We introduce a data structure and an updating scheme that captures, in an approximate sense, the graph and its evolution through time. The data structure arises from a bottom-up representation of the large graph as the union of small subgraphs centered on every node. These subgraphs are interesting in their own right and can be enhanced to form what we call communities of interest (COI). We discuss an application in the area of telecommunications fraud detection to help motivate the ideas.
international conference on machine learning | 2007
Corinna Cortes; Mehryar Mohri; Ashish Rastogi
This paper studies the learning problem of ranking when one wishes not just to accurately predict pairwise ordering but also preserve the magnitude of the preferences or the difference between ratings, a problem motivated by its key importance in the design of search engines, movie recommendation, and other similar ranking systems. We describe and analyze several algorithms for this problem and give stability bounds for their generalization error, extending previously known stability results to non-bipartite ranking and magnitude of preference-preserving algorithms. We also report the results of experiments comparing these algorithms on several datasets and compare these results with those obtained using an algorithm minimizing the pairwise misranking error and standard regression.
international conference on machine learning | 2005
Corinna Cortes; Mehryar Mohri; Jason Weston
The problem of learning a transduction, that is a string-to-string mapping, is a common problem arising in natural language processing and computational biology. Previous methods proposed for learning such mappings are based on classification techniques. This paper presents a new and general regression technique for learning transductions and reports the results of experiments showing its effectiveness. Our transduction learning consists of two phases: the estimation of a set of regression coefficients and the computation of the pre-image corresponding to this set of coefficients. A novel and conceptually cleaner formulation of kernel dependency estimation provides a simple framework for estimating the regression coefficients, and an efficient algorithm for computing the pre-image from the regression coefficients extends the applicability of kernel dependency estimation to output sequences. We report the results of a series of experiments illustrating the application of our regression technique for learning transductions.