W. Nick Street | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where W. Nick Street is active.

Explore More

Publication

Featured researches published by W. Nick Street.

knowledge discovery and data mining | 2001

A streaming ensemble algorithm (SEA) for large-scale classification

W. Nick Street; Yong Seog Kim

Ensemble methods have recently garnered a great deal of attention in the machine learning community. Techniques such as Boosting and Bagging have proven to be highly effective but require repeated resampling of the training data, making them inappropriate in a data mining context. The methods presented in this paper take advantage of plentiful data, building separate classifiers on sequential chunks of training points. These classifiers are combined into a fixed-size ensemble using a heuristic replacement strategy. The result is a fast algorithm for large-scale or streaming data that classifies as well as a single decision tree built on all the data, requires approximately constant memory, and adjusts quickly to concept drift.

Operations Research | 1995

Breast Cancer Diagnosis and Prognosis Via Linear Programming

Olvi L. Mangasarian; W. Nick Street; William H. Wolberg

Two medical applications of linear programming are described in this paper. Specifically, linear programming-based machine learning techniques are used to increase the accuracy and objectivity of breast cancer diagnosis and prognosis. The first application to breast cancer diagnosis utilizes characteristics of individual cells, obtained from a minimally invasive fine needle aspirate, to discriminate benign from malignant breast lumps. This allows an accurate diagnosis without the need for a surgical biopsy. The diagnostic system in current operation at University of Wisconsin Hospitals was trained on samples from 569 patients and has had 100% chronological correctness in diagnosing 131 subsequent patients. The second application, recently put into clinical practice, is a method that constructs a surface that predicts when breast cancer is likely to recur in patients that have had their cancers excised. This gives the physician and the patient better information with which to plan treatment, and may eliminate the need for a prognostic surgical procedure. The novel feature of the predictive approach is the ability to handle cases for which cancer has not recurred (censored data) as well as cases for which cancer has recurred at a specific time. The prognostic system has an expected error of 13.9 to 18.3 months, which is better than prognosis correctness by other available techniques.

knowledge discovery and data mining | 2000

Feature selection in unsupervised learning via evolutionary search

YeongSeog Kim; W. Nick Street; Filippo Menczer

Feature subset selection is an important problem in knowledge discovery, not only for the insight gained from determining relevant modeling variables but also for the improved understandability, scalabilit y, and possibly , accuracy of the resulting models. In this paper w e consider the problem of feature selection for unsupervised learning. A number of heuristic criteria can be used to estimate the quality of clusters built from a giv en featuresubset. Rather than combining such criteria, we use ELSA, an evolutionary local selection algorithm that maintains a diverse population of solutions that approximate the Pareto front in a multidimensional objectiv espace. Eac hevolved solution represents a feature subset and a number of clusters; a standard K-means algorithm is applied to form the given n umber of clusters based on the selected features. Preliminary results on both real and synthetic data show promise in nding P areto-optimal solutions through which we can identify the signi cant features and the correct number of clusters.

decision support systems | 2004

An intelligent system for customer targeting: a data mining approach

Yong Seog Kim; W. Nick Street

We propose a data mining approach for market managers that uses artificial neural networks (ANNs) guided by genetic algorithms (GAs). Our predictive model allows the selection of an optimal target point where expected profit from direct mailing is maximized. Our approach also produces models that are easier to interpret by using a smaller number of predictive features. Through sensitivity analysis, we also show that our chosen model significantly outperforms the baseline algorithms in terms of hit rate and expected net profit on key target points.

Cancer Letters | 1994

Machine learning techniques to diagnose breast cancer from image-processed nuclear features of fine needle aspirates

William H. Wolberg; W. Nick Street; Olvi L. Mangasarian

An interactive computer system evaluates and diagnoses based on cytologic features derived directly from a digital scan of fine-needle aspirate (FNA) slides. A consecutive series of 569 patients provided the data to develop the system and an additional 54 consecutive, new patients provided samples to test the system. The projected prospective accuracy of the system estimated by tenfold cross validation was 97%. The actual accuracy on 54 new samples (36 benign, 1 atypia, and 17 malignant) was 100%. Digital image analysis coupled with machine learning techniques will improve diagnostic accuracy of breast fine needle aspirates.

international world wide web conferences | 2010

Detecting Wikipedia vandalism with active learning and statistical language models

Si-Chi Chin; W. Nick Street; Padmini Srinivasan; David Eichmann

This paper proposes an active learning approach using language model statistics to detect Wikipedia vandalism. Wikipedia is a popular and influential collaborative information system. The collaborative nature of authoring, as well as the high visibility of its content, have exposed Wikipedia articles to vandalism. Vandalism is defined as malicious editing intended to compromise the integrity of the content of articles. Extensive manual efforts are being made to combat vandalism and an automated approach to alleviate the laborious process is needed. This paper builds statistical language models, constructing distributions of words from the revision history of Wikipedia articles. As vandalism often involves the use of unexpected words to draw attention, the fitness (or lack thereof) of a new edit when compared with language models built from previous versions may well indicate that an edit is a vandalism instance. In addition, the paper adopts an active learning model to solve the problem of noisy and incomplete labeling of Wikipedia vandalism. The Wikipedia domain with its revision histories offers a novel context in which to explore the potential of language models in characterizing author intention. As the experimental results presented in the paper demonstrate, these models hold promise for vandalism detection.

electronic commerce | 2000

Efficient and Scalable Pareto Optimization by Evolutionary Local Selection Algorithms

Filippo Menczer; Melania Degeratu; W. Nick Street

Local selection is a simple selection scheme in evolutionary computation. Individual fitnesses are accumulated over time and compared to a fixed threshold, rather than to each other, to decide who gets to reproduce. Local selection, coupled with fitness functions stemming from the consumption of finite shared environmental resources, maintains diversity in a way similar to fitness sharing. However, it is more efficient than fitness sharing and lends itself to parallel implementations for distributed tasks. While local selection is not prone to premature convergence, it applies minimal selection pressure to the population. Local selection is, therefore, particularly suited to Pareto optimization or problem classes where diverse solutions must be covered. This paper introduces ELSA, an evolutionary algorithm employing local selection and outlines three experiments in which ELSA is applied to multiobjective problems: a multimodal graph search problem, and two Pareto optimization problems. In all these experiments, ELSA significantly outperforms other well-known evolutionary algorithms. The paper also discusses scalability, parameter dependence, and the potential distributed applications of the algorithm.

conference on recommender systems | 2010

Collaborative filtering via euclidean embedding

Mohammad Khoshneshin; W. Nick Street

Recommendation systems suggest items based on user preferences. Collaborative filtering is a popular approach in which recommending is based on the rating history of the system. One of the most accurate and scalable collaborative filtering algorithms is matrix factorization, which is based on a latent factor model. We propose a novel Euclidean embedding method as an alternative latent factor model to implement collaborative filtering. In this method, users and items are embedded in a unified Euclidean space where the distance between a user and an item is inversely proportional to the rating. This model is comparable to matrix factorization in terms of both scalability and accuracy while providing several advantages. First, the result of Euclidean embedding is more intuitively understandable for humans, allowing useful visualizations. Second, the neighborhood structure of the unified Euclidean space allows very efficient recommendation queries. Finally, the method facilitates online implementation requirements such as mapping new users or items in an existing model. Our experimental results confirm these advantages and show that collaborative filtering via Euclidean embedding is a promising approach for online recommender systems.

conference on recommender systems | 2010

Incremental collaborative filtering via evolutionary co-clustering

Mohammad Khoshneshin; W. Nick Street

Collaborative filtering is a popular approach for building recommender systems. Current collaborative filtering algorithms are accurate but also computationally expensive, and so are best in static off-line settings. It is desirable to include the new data in a collaborative filtering model in an online manner, requiring a model that can be incrementally updated efficiently. Incremental collaborative filtering via co-clustering has been shown to be a very scalable approach for this purpose. However, locally optimized co-clustering solutions via current fast iterative algorithms give poor accuracy. We propose an evolutionary co-clustering method that improves predictive performance while maintaining the scalability of co-clustering in the online phase.

Expert Systems With Applications | 2006

Optimal ensemble construction via meta-evolutionary ensembles

Yong Seog Kim; W. Nick Street; Filippo Menczer

In this paper, we propose a meta-evolutionary approach to improve on the performance of individual classifiers. In the proposed system, individual classifiers evolve, competing to correctly classify test points, and are given extra rewards for getting difficult points right. Ensembles consisting of multiple classifiers also compete for member classifiers, and are rewarded based on their predictive performance. In this way we aim to build small-sized optimal ensembles rather than form large-sized ensembles of individually-optimized classifiers. Experimental results on 15 data sets suggest that our algorithms can generate ensembles that are more effective than single classifiers and traditional ensemble methods.

Explore More