Wray L. Buntine | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wray L. Buntine is active.

Explore More

Publication

Featured researches published by Wray L. Buntine.

Journal of Artificial Intelligence Research | 1994

Operations for learning with graphical models

Wray L. Buntine

This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Well-known examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models are extended to model data analysis and empirical learning using the notation of plates. Graphical operations for simplifying and manipulating a problem are provided including decomposition, differentiation, and the manipulation of probability models from the exponential family. Two standard algorithm schemas for learning are reviewed in a graphical framework: Gibbs sampling and the expectation maximization algorithm. Using these operations and schemas, some popular algorithms can be synthesized from their graphical specification. This includes versions of linear regression, techniques for feed-forward networks, and learning Gaussian and discrete Bayesian networks frorn data. The paper concludes by sketching some implications for data analysis and summarizing how some popular algorithms fall within the framework presented. The main original contributions here are the decomposition techniques and the demonstration that graphical models provide a framework for understanding and developing complex learning algorithms.

IEEE Transactions on Knowledge and Data Engineering | 1996

A guide to the literature on learning probabilistic networks from data

Wray L. Buntine

The literature review presented discusses different methods under the general rubric of learning Bayesian networks from data, and includes some overlapping work on more general probabilistic networks. Connections are drawn between the statistical, neural network, and uncertainty communities, and between the different methodological communities, such as Bayesian, description length, and classical statistics. Basic concepts for learning and Bayesian networks are introduced and methods are then reviewed. Methods are discussed for learning parameters of a probabilistic network, for learning the structure, and for learning hidden variables. The article avoids formal definitions and theorems, as these are plentiful in the literature, and instead illustrates key concepts with simplified examples.

international conference on machine learning | 1988

Machine invention of first order predicates by inverting resolution

Stephen Muggleton; Wray L. Buntine

It has often been noted that the performance of existing learning systems is strongly biased by the vocabulary provided in the problem description language. An ideal system should be capable of overcoming this restriction by defining its own vocabulary. Such a system would be less reliant on the teachers ingenuity in supplying an appropriate problem representation. For this purpose we present a mechanism for automatically inventing and generalising first-order Horn clause predicates. The method is based on inverting the mechanism of resolution. The approach has its roots in the Duce system for induction of propositional Horn clauses. We have implemented the new mechanism in a system called CIGOL. CIGOL uses incremental induction to augment incomplete clausal theories. A single, uniform knowledge representation allows existing clauses to be used as background knowledge in the construction of new predicates. Given examples of a high-level predicate CIGOL generates related sub-concepts which it then asks its human teacher to name. Generalisations of predicates are tested by asking questions of the human teacher. CIGOL generates new concepts and generalisations with a preference for simplicity. We illustrate the operation of CIGOL by way of various sessions in which auxiliary predicates are automatically introduced and generalised.

Statistics and Computing | 1992

Learning Classification Trees

Wray L. Buntine

Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. This paper outlines how a tree learning algorithm can be derived using Bayesian statistics. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule is similar to Quinlans information gain, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach,c4 (Quinlanet al., 1987) andcart (Breimanet al., 1984), show that the full Bayesian algorithm can produce more accurate predictions than versions of these other approaches, though pays a computational price.

Machine Learning | 1992

A Further Comparison of Splitting Rules for Decision-Tree Induction

Wray L. Buntine; Tim Niblett

One approach to learning classification rules from examples is to build decision trees. A review and comparison paper by Mingers (Mingers, 1989) looked at the first stage of tree building, which uses a “splitting rule” to grow trees with a greedy recursive partitioning algorithm. That paper considered a number of different measures and experimentally examined their behavior on four domains. The main conclusion was that a random splitting rule does not significantly decrease classificational accuracy. This note suggests an alternative experimental method and presents additional results on further domains. Our results indicate that random splitting leads to increased error. These results are at variance with those presented by Mingers.

Artificial Intelligence | 1988

Generalized subsumption and its applications to induction and redundancy

Wray L. Buntine

Abstract A theoretical framework and algorithms are presented that provide a basis for the study of induction of definite (Horn) clauses. These hinge on a natural extension of θ-subsumption that forms a strong model of generalization. The model allows properties of inductive search spaces to be considered in detail. A useful by-product of the model is a simple but powerful model of redundancy. Both induction and redundancy control are central tasks in a learning system, and, more broadly, in a knowledge acquisition system. The results also demonstrate interaction between induction, redundancy, and change in a systems current knowledge—with subsumption playing a key role.

International Journal of Computer Vision | 2010

Unsupervised Object Discovery: A Comparison

Tinne Tuytelaars; Christoph H. Lampert; Matthew B. Blaschko; Wray L. Buntine

The goal of this paper is to evaluate and compare models and methods for learning to recognize basic entities in images in an unsupervised setting. In other words, we want to discover the objects present in the images by analyzing unlabeled data and searching for re-occurring patterns. We experiment with various baseline methods, methods based on latent variable models, as well as spectral clustering methods. The results are presented and compared both on subsets of Caltech256 and MSRC2, data sets that are larger and more challenging and that include more object classes than what has previously been reported in the literature. A rigorous framework for evaluating unsupervised object discovery methods is proposed.

international acm sigir conference on research and development in information retrieval | 2013

Improving LDA topic models for microblogs via tweet pooling and automatic labeling

Rishabh Mehrotra; Scott Sanner; Wray L. Buntine; Lexing Xie

Twitter, or the world of 140 characters poses serious challenges to the efficacy of topic models on short, messy text. While topic models such as Latent Dirichlet Allocation (LDA) have a long history of successful application to news articles and academic abstracts, they are often less coherent when applied to microblog content like Twitter. In this paper, we investigate methods to improve topics learned from Twitter content without modifying the basic machinery of LDA; we achieve this through various pooling schemes that aggregate tweets in a data preprocessing step for LDA. We empirically establish that a novel method of tweet pooling by hashtags leads to a vast improvement in a variety of measures for topic coherence across three diverse Twitter datasets in comparison to an unmodified LDA baseline and a variety of pooling schemes. An additional contribution of automatic hashtag labeling further improves on the hashtag pooling results for a subset of metrics. Overall, these two novel schemes lead to significantly improved LDA topic models on Twitter content.

european conference on machine learning | 2002

Variational extensions to EM and multinomial PCA

Wray L. Buntine

Several authors in recent years have proposed discrete analogues to principle component analysis intended to handle discrete or positive only data, for instance suited to analyzing sets of documents. Methods include non-negative matrix factorization, probabilistic latent semantic analysis, and latent Dirichlet allocation. This paper begins with a review of the basic theory of the variational extension to the expectation-maximization algorithm, and then presents discrete component finding algorithms in that light. Experiments are conducted on both bigram word data and document bag-of-word to expose some of the subtleties of this new class of algorithms.

IEEE Transactions on Neural Networks | 1994

Computing second derivatives in feed-forward networks: a review

Wray L. Buntine; Andreas S. Weigend

The calculation of second derivatives is required by recent training and analysis techniques of connectionist networks, such as the elimination of superfluous weights, and the estimation of confidence intervals both for weights and network outputs. We review and develop exact and approximate algorithms for calculating second derivatives. For networks with |w| weights, simply writing the full matrix of second derivatives requires O(|w|(2)) operations. For networks of radial basis units or sigmoid units, exact calculation of the necessary intermediate terms requires of the order of 2h+2 backward/forward-propagation passes where h is the number of hidden units in the network. We also review and compare three approximations (ignoring some components of the second derivative, numerical differentiation, and scoring). The algorithms apply to arbitrary activation functions, networks, and error functions.

Explore More