Stan Matwin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stan Matwin is active.

Explore More

Publication

Featured researches published by Stan Matwin.

Machine Learning | 1998

Machine Learning for the Detection of Oil Spills in Satellite Radar Images

Miroslav Kubat; Robert C. Holte; Stan Matwin

During a project examining the use of machine learning techniques for oil spill detection, we encountered several essential questions that we believe deserve the attention of the research community. We use our particular case study to illustrate such issues as problem formulation, selection of evaluation measures, and data preparation. We relate these issues to properties of the oil spill application, such as its imbalanced class distribution, that are shown to be common to many applications. Our solutions to these issues are implemented in the Canadian Environmental Hazards Detection System (CEHDS), which is about to undergo field testing.

european conference on machine learning | 1997

Learning When Negative Examples Abound

Miroslav Kubat; Robert C. Holte; Stan Matwin

Existing concept learning systems can fail when the negative examples heavily outnumber the positive examples. The paper discusses one essential trouble brought about by imbalanced training sets and presents a learning algorithm addressing this issue. The experiments (with synthetic and real-world data) focus on 2-class problems with examples described with binary and continuous attributes.

Archive | 2007

Knowledge Discovery in Databases: PKDD 2007

Joost N. Kok; Jacek Koronacki; Ramon López de Mántaras; Stan Matwin; Dunja Mladenic; Andrzej Skowron

Invited Talks.- Learning, Information Extraction and the Web.- Putting Things in Order: On the Fundamental Role of Ranking in Classification and Probability Estimation.- Mining Queries.- Adventures in Personalized Information Access.- Long Papers.- Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning.- Using the Web to Reduce Data Sparseness in Pattern-Based Information Extraction.- A Graphical Model for Content Based Image Suggestion and Feature Selection.- Efficient AUC Optimization for Classification.- Finding Transport Proteins in a General Protein Database.- Classification of Web Documents Using a Graph-Based Model and Structural Patterns.- Context-Specific Independence Mixture Modelling for Protein Families.- An Algorithm to Find Overlapping Community Structure in Networks.- Privacy Preserving Market Basket Data Analysis.- Feature Extraction from Sensor Data Streams for Real-Time Human Behaviour Recognition.- Generating Social Network Features for Link-Based Classification.- An Empirical Comparison of Exact Nearest Neighbour Algorithms.- Site-Independent Template-Block Detection.- Statistical Model for Rough Set Approach to Multicriteria Classification.- Classification of Anti-learnable Biological and Synthetic Data.- Improved Algorithms for Univariate Discretization of Continuous Features.- Efficient Weight Learning for Markov Logic Networks.- Classification in Very High Dimensional Problems with Handfuls of Examples.- Domain Adaptation of Conditional Probability Models Via Feature Subsetting.- Learning to Detect Adverse Traffic Events from Noisily Labeled Data.- IKNN: Informative K-Nearest Neighbor Pattern Classification.- Finding Outlying Items in Sets of Partial Rankings.- Speeding Up Feature Subset Selection Through Mutual Information Relevance Filtering.- A Comparison of Two Approaches to Classify with Guaranteed Performance.- Towards Data Mining Without Information on Knowledge Structure.- Relaxation Labeling for Selecting and Exploiting Efficiently Non-local Dependencies in Sequence Labeling.- Bridged Refinement for Transfer Learning.- A Prediction-Based Visual Approach for Cluster Exploration and Cluster Validation by HOV3.- Short Papers.- Flexible Grid-Based Clustering.- Polyp Detection in Endoscopic Video Using SVMs.- A Density-Biased Sampling Technique to Improve Cluster Representativeness.- Expectation Propagation for Rating Players in Sports Competitions.- Efficient Closed Pattern Mining in Strongly Accessible Set Systems (Extended Abstract).- Discovering Emerging Patterns in Spatial Databases: A Multi-relational Approach.- Realistic Synthetic Data for Testing Association Rule Mining Algorithms for Market Basket Databases.- Learning Multi-dimensional Functions: Gas Turbine Engine Modeling.- Constructing High Dimensional Feature Space for Time Series Classification.- A Dynamic Clustering Algorithm for Mobile Objects.- A Method for Multi-relational Classification Using Single and Multi-feature Aggregation Functions.- MINI: Mining Informative Non-redundant Itemsets.- Stream-Based Electricity Load Forecast.- Automatic Hidden Web Database Classification.- Pruning Relations for Substructure Discovery of Multi-relational Databases.- The Most Reliable Subgraph Problem.- Matching Partitions over Time to Reliably Capture Local Clusters in Noisy Domains.- Searching for Better Randomized Response Schemes for Privacy-Preserving Data Mining.- Pre-processing Large Spatial Data Sets with Bayesian Methods.- Tag Recommendations in Folksonomies.- Providing Naive Bayesian Classifier-Based Private Recommendations on Partitioned Data.- Multi-party, Privacy-Preserving Distributed Data Mining Using a Game Theoretic Framework.- Multilevel Conditional Fuzzy C-Means Clustering of XML Documents.- Uncovering Fraud in Direct Marketing Data with a Fraud Auditing Case Builder.- Real Time GPU-Based Fuzzy ART Skin Recognition.- A Cooperative Game Theoretic Approach to Prototype Selection.- Dynamic Bayesian Networks for Real-Time Classification of Seismic Signals.- Robust Visual Mining of Data with Error Information.- An Effective Approach to Enhance Centroid Classifier for Text Categorization.- Automatic Categorization of Human-Coded and Evolved CoreWar Warriors.- Utility-Based Regression.- Multi-label Lazy Associative Classification.- Visual Exploration of Genomic Data.- Association Mining in Large Databases: A Re-examination of Its Measures.- Semantic Text Classification of Emergent Disease Reports.

canadian conference on artificial intelligence | 2006

Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity

David Nadeau; Peter D. Turney; Stan Matwin

In this paper, we propose a named-entity recognition (NER) system that addresses two major limitations frequently discussed in the field. First, the system requires no human intervention such as manually labeling training data or creating gazetteers. Second, the system can handle more than the three classical named-entity types (person, location, and organization). We describe the systems architecture and compare its performance with a supervised system. We experimentally evaluate the system on a standard corpus, with the three classical named-entity types, and also on a new corpus, with a new named-entity type (car brands).

IEEE Intelligent Systems | 1989

Negoplan: an expert system shell for negotiation support

Stan Matwin; Stan Szpakowicz; Zbig Koperczak; Gregory E. Kersten; Wojtek Michalowski

The authors address a complex, two-party negotiation problem containing the following elements: (1) many negotiation issues that are elements of a negotiating partys position; (2) negotiation goals that can be reduced to unequivocal statements about the problem domain and that represent negotiation issues; (3) a fluid negotiating environment characterized by changing issues and relations between them; and (4) parties negotiating to achieve goals that may change. They describe in some detail the way they logically specify different aspects of negotiation. An application of Negoplan to a labor contract negotiation between the Canadian Paperworkers Union and CIP, Ltd. of Montreal is described.<<ETX>>

systems man and cybernetics | 1991

Genetic algorithms approach to a negotiation support system

Stan Matwin; Tomasz Szapiro; Karen Zita Haigh

It is argued that negotiation rules can be learned and invented by means of genetic algorithms. The work presented introduces a method, a system design, and a prototype implementation that uses genetic-based machine learning to acquire negotiation rules. The learned rules support a party involved in a two-party bargaining problem with multiple issues. It is assumed that both parties work towards a compromise deal. The method provides a framework in which genetic-based learning is applied repetitively on a changing problem representation. System design proposes a problem representation that is adequate to express bargaining processes and that is at the same time conducive to genetic-based learning. The authors report results of experiments with the prototype implementation. These results indicate that genetically learned rules, when used in real negotiations, yield results that are better than results obtained by humans in the same negotiation. The experiments indicate considerable robustness of genetically learned rules with respect to varying parameters defining the genetic operations on which the system relies in modeling negotiations. In terms of user support, experimental results show that in the bargaining process, a good rule is one that advises conceding in small steps and bringing new issues into the negotiation process. >

Artificial Intelligence Review | 2015

A review on particle swarm optimization algorithm and its variants to clustering high-dimensional data

Ahmed Ali Abdalla Esmin; Rodrigo A. Coelho; Stan Matwin

Data clustering is one of the most popular techniques in data mining. It is a process of partitioning an unlabeled dataset into groups, where each group contains objects which are similar to each other with respect to a certain similarity measure and different from those of other groups. Clustering high-dimensional data is the cluster analysis of data which have anywhere from a few dozen to many thousands of dimensions. Such high-dimensional data spaces are often encountered in areas such as medicine, bioinformatics, biology, recommendation systems and the clustering of text documents. Many algorithms for large data sets have been proposed in the literature using different techniques. However, conventional algorithms have some shortcomings such as the slowness of their convergence and their sensitivity to initialization values. Particle Swarm Optimization (PSO) is a population-based globalized search algorithm that uses the principles of the social behavior of swarms. PSO produces better results in complicated and multi-peak problems. This paper presents a literature survey on the PSO algorithm and its variants to clustering high-dimensional data. An attempt is made to provide a guide for the researchers who are working in the area of PSO and high-dimensional data clustering.

international conference on machine learning | 2008

Discriminative parameter learning for Bayesian networks

Jiang Su; Harry Zhang; Charles X. Ling; Stan Matwin

Bayesian network classifiers have been widely used for classification problems. Given a fixed Bayesian network structure, parameters learning can take two different approaches: generative and discriminative learning. While generative parameter learning is more efficient, discriminative parameter learning is more effective. In this paper, we propose a simple, efficient, and effective discriminative parameter learning method, called Discriminative Frequency Estimate (DFE), which learns parameters by discriminatively computing frequencies from data. Empirical studies show that the DFE algorithm integrates the advantages of both generative and discriminative learning: it performs as well as the state-of-the-art discriminative parameter learning method ELR in accuracy, but is significantly more efficient.

Journal of Network and Computer Applications | 2007

Privacy-Preserving collaborative association rule mining

Justin Zhan; Stan Matwin; LiWu Chang

This paper introduces a new approach to a problem of data sharing among multiple parties, without disclosing the data between the parties. Our focus is data sharing among parties involved in a data mining task. We study how to share private or confidential data in the following scenario: multiple parties, each having a private data set, want to collaboratively conduct association rule mining without disclosing their private data to each other or any other parties. To tackle this demanding problem, we develop a secure protocol for multiple parties to conduct the desired computation. The solution is distributed, i.e., there is no central, trusted party having access to all the data. Instead, we define a protocol using homomorphic encryption techniques to exchange the data while keeping it private.

IEEE Intelligent Systems & Their Applications | 1999

Data mining to predict aircraft component replacement

Sylvain Létourneau; Fazel Famili; Stan Matwin

Aircraft sensors generate vast amounts of data, much of which languishes in storage after its initial analysis. The authors have developed an approach for using this data to build models for predicting aircraft component failure. Their approach addresses several key data-mining issues.

Explore More