David Maxwell Chickering

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Maxwell Chickering is active.

Explore More

Publication

Featured researches published by David Maxwell Chickering.

Machine Learning | 1995

Learning Bayesian Networks: The Combination of Knowledge and Statistical Data

David Heckerman; Dan Geiger; David Maxwell Chickering

We describe a Bayesian approach for learning Bayesian networks from a combination of prior knowledge and statistical data. First and foremost, we develop a methodology for assessing informative priors needed for learning. Our approach is derived from a set of assumptions made previously as well as the assumption of likelihood equivalence, which says that data should not help to discriminate network structures that represent the same assertions of conditional independence. We show that likelihood equivalence when combined with previously made assumptions implies that the users priors for network parameters can be encoded in a single Bayesian network for the next case to be seen—a prior network—and a single measure of confidence for that network. Second, using these priors, we show how to compute the relative posterior probabilities of network structures given data. Third, we describe search methods for identifying network structures with high posterior probabilities. We describe polynomial algorithms for finding the highest-scoring network structures in the special case where every node has at most k = 1 parent. For the general case (k > 1), which is NP-hard, we review heuristic search algorithms including local search, iterative local search, and simulated annealing. Finally, we describe a methodology for evaluating Bayesian-network learning algorithms, and apply this approach to a comparison of various approaches.

international conference on artificial intelligence and statistics | 1996

Learning Bayesian Networks is NP-Complete

David Maxwell Chickering

Algorithms for learning Bayesian networks from data have two components: a scoring metric and a search procedure. The scoring metric computes a score reflecting the goodness-of-fit of the structure to the data. The search procedure tries to identify network structures with high scores. Heckerman et al. (1995) introduce a Bayesian metric, called the BDe metric, that computes the relative posterior probability of a network structure given data. In this paper, we show that the search problem of identifying a Bayesian network—among those where each node has at most K parents—that has a relative posterior probability greater than a given constant is NP-complete, when the BDe metric is used.

Journal of Machine Learning Research | 2002

Learning equivalence classes of bayesian-network structures

David Maxwell Chickering

Two Bayesian-network structures are said to be equivalent if the set of distributions that can be represented with one of those structures is identical to the set of distributions that can be represented with the other. Many scoring criteria that are used to learn Bayesian-network structures from data are score equivalent; that is, these criteria do not distinguish among networks that are equivalent. In this paper, we consider using a score equivalent criterion in conjunction with a heuristic search algorithm to perform model selection or model averaging. We argue that it is often appropriate to search among equivalence classes of network structures as opposed to the more common approach of searching among individual Bayesian-network structures. We describe a convenient graphical representation for an equivalence class of structures, and introduce a set of operators that can be applied to that representation by a search algorithm to move among equivalence classes. We show that our equivalence-class operators can be scored locally, and thus share the computational efficiency of traditional operators defined for individual structures. We show experimentally that a greedy model-selection algorithm using our representation yields slightly higher-scoring structures than the traditional approach without any additional time overhead, and we argue that more sophisticated search algorithms are likely to benefit much more.

Journal of Machine Learning Research | 2001

Dependency networks for inference, collaborative filtering, and data visualization

David Heckerman; David Maxwell Chickering; Christopher Meek; Robert L. Rounthwaite; Carl M. Kadie

We describe a graphical model for probabilistic relationships--an alternative to the Bayesian network--called a dependency network. The graph of a dependency network, unlike a Bayesian network, is potentially cyclic. The probability component of a dependency network, like a Bayesian network, is a set of conditional distributions, one for each node given its parents. We identify several basic properties of this representation and describe a computationally efficient procedure for learning the graph and probability components from data. We describe the application of this representation to probabilistic inference, collaborative filtering (the task of predicting preferences), and the visualization of acausal predictive relationships.

Machine Learning | 1997

Efficient Approximations for the MarginalLikelihood of Bayesian Networks with Hidden Variables

David Maxwell Chickering; David Heckerman

We discuss Bayesian methods for model averaging and model selection among Bayesian-network models with hidden variables. In particular, we examine large-sample approximations for the marginal likelihood of naive-Bayes models in which the root node is hidden. Such models are useful for clustering or unsupervised learning. We consider a Laplace approximation and the less accurate but more computationally efficient approximation known as the Bayesian Information Criterion (BIC), which is equivalent to Rissanens (1987) Minimum Description Length (MDL). Also, we consider approximations that ignore some off-diagonal elements of the observed information matrix and an approximation proposed by Cheeseman and Stutz (1995). We evaluate the accuracy of these approximations using a Monte-Carlo gold standard. In experiments with artificial and real examples, we find that (1) none of the approximations are accurate when used for model averaging, (2) all of the approximations, with the exception of BIC/MDL, are accurate for model selection, (3) among the accurate approximations, the Cheeseman–Stutz and Diagonal approximations are the most computationally efficient, (4) all of the approximations, with the exception of BIC/MDL, can be sensitive to the prior distribution over model parameters, and (5) the Cheeseman–Stutz approximation can be more accurate than the other approximations, including the Laplace approximation, in situations where the parameters in the maximum a posteriori configuration are near a boundary.

european conference on information retrieval | 2008

Here or There

Ben Carterette; Paul N. Bennett; David Maxwell Chickering; Susan T. Dumais

Information retrieval systems have traditionally been evaluated over absolute judgments of relevance: each document is judged for relevance on its own, independent of other documents that may be on topic. We hypothesize that preference judgments of the form “document A is more relevant than document B” are easier for assessors to make than absolute judgments, and provide evidence for our hypothesis through a study with assessors. We then investigate methods to evaluate search engines using preference judgments. Furthermore, we show that by using inferences and clever selection of pairs to judge, we need not compare all pairs of documents in order to apply evaluation methods.

Interfaces | 2003

Targeted advertising on the Web with inventory management

David Maxwell Chickering; David Heckerman

Companies that maintain Web sites can make considerable revenue by running advertisements, and they therefore compete to attract advertisers. The ability to deliver high click-through rates on a site can attract advertisers and, under an appropriate pricing model, can also increase revenue directly. Consequently, companies can benefit from delivery systems that display advertisements selectively to those visitors most likely to click though. To satisfy contractual obligations, however, these systems must simultaneously manage inventory. We developed a delivery system that maximizes click-through rate given inventory-management constraints in the form of advertisement quotas. The system uses predictive segments in conjunction with a linear program to perform the constrained optimization. Using a real Web site (msn.com), we demonstrated the efficacy of the system. We can generalize our system to find revenue-optimal advertisement schedules under a wide variety of pricing models.

User Modeling and User-adapted Interaction | 2007

Improving command and control speech recognition on mobile devices: using predictive user models for language modeling

Tim Paek; David Maxwell Chickering

Command and control (C&C) speech recognition allows users to interact with a system by speaking commands or asking questions restricted to a fixed grammar containing pre-defined phrases. Whereas C&C interaction has been commonplace in telephony and accessibility systems for many years, only recently have mobile devices had the memory and processing capacity to support client-side speech recognition. Given the personal nature of mobile devices, statistical models that can predict commands based in part on past user behavior hold promise for improving C&C recognition accuracy. For example, if a user calls a spouse at the end of every workday, the language model could be adapted to weight the spouse more than other contacts during that time. In this paper, we describe and assess statistical models learned from a large population of users for predicting the next user command of a commercial C&C application. We explain how these models were used for language modeling, and evaluate their performance in terms of task completion. The best performing model achieved a 26% relative reduction in error rate compared to the base system. Finally, we investigate the effects of personalization on performance at different learning rates via online updating of model parameters based on individual user data. Personalization significantly increased relative reduction in error rate by an additional 5%.

electronic commerce | 2000

Targeted advertising with inventory management

David Maxwell Chickering; David Heckerman

Companies that maintain web sites can make considerable revenue through advertising, and consequently attracting advertisers has become an important and competitive endeavor. A property that can attract advertisers is high clickthrough rates; and therefore companies can bene t from delivery systems that serve advertisements selectively to those visitors most likely to click. In order to satisfy contractual obligations, however, these systems must simultaneously perform inventory management. For example, if a company has agreed to serve a certain number of a particular advertisement, it must do so regardless of how likely it is to be clicked. In this paper, we describe how to use a linear program to identify a schedule, based on known attributes of each visitor, that maximizes the expected number of clicks given all of the inventory-management constraints. We present experimental results using real data that demonstrate that a delivery schedule from our system realizes more clicks than a schedule that was hand constructed.

Artificial Intelligence | 1996

Best-first minimax search

Richard E. Korf; David Maxwell Chickering

Abstract We describe a very simple selective search algorithm for two-player games, called best-first minimax. It always expands next the node at the end of the expected line of play, which determines the minimax value of the root. It uses the same information as alpha-beta minimax, and takes roughly the same time per node generation. We present an implementation of the algorithm that reduces its space complexity from exponential to linear in the search depth, but at significant time cost. Our actual implementation saves the subtree generated for one move that is still relevant after the player and opponent move, pruning subtrees below moves not chosen by either player. We also show how to efficiently generate a class of incremental random game trees. On uniform random game trees, best-first minimax outperforms alpha-beta, when both algorithms are given the same amount of computation. On random trees with random branching factors, best-first outperforms alpha-beta for shallow depths, but eventually loses at greater depths. We obtain similar results in the game of Othello. Finally, we present a hybrid best-first extension algorithm that combines alpha-beta with best-first minimax, and performs significantly better than alpha-beta in both domains, even at greater depths. In Othello, it beats alpha-beta in two out of three games.

Explore More