Christopher Meek | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christopher Meek is active.

Explore More

Publication

Featured researches published by Christopher Meek.

Journal of Machine Learning Research | 2001

Dependency networks for inference, collaborative filtering, and data visualization

David Heckerman; David Maxwell Chickering; Christopher Meek; Robert L. Rounthwaite; Carl M. Kadie

We describe a graphical model for probabilistic relationships--an alternative to the Bayesian network--called a dependency network. The graph of a dependency network, unlike a Bayesian network, is potentially cyclic. The probability component of a dependency network, like a Bayesian network, is a set of conditional distributions, one for each node given its parents. We identify several basic properties of this representation and describe a computationally efficient procedure for learning the graph and probability components from data. We describe the application of this representation to probabilistic inference, collaborative filtering (the task of predicting preferences), and the visualization of acausal predictive relationships.

knowledge discovery and data mining | 2005

Adversarial learning

Daniel Lowd; Christopher Meek

Many classification tasks, such as spam filtering, intrusion detection, and terrorism detection, are complicated by an adversary who wishes to avoid detection. Previous work on adversarial classification has made the unrealistic assumption that the attacker has perfect knowledge of the classifier [2]. In this paper, we introduce the adversarial classifier reverse engineering (ACRE) learning problem, the task of learning sufficient information about a classifier to construct adversarial attacks. We present efficient algorithms for reverse engineering linear classifiers with either continuous or Boolean features and demonstrate their effectiveness using real data from the domain of spam filtering.

european conference on information retrieval | 2007

Similarity measures for short segments of text

Donald Metzler; Susan T. Dumais; Christopher Meek

Measuring the similarity between documents and queries has been extensively studied in information retrieval. However, there are a growing number of tasks that require computing the similarity between two very short segments of text. These tasks include query reformulation, sponsored search, and image retrieval. Standard text similarity measures perform poorly on such tasks because of data sparseness and the lack of context. In this work, we study this problem from an information retrieval perspective, focusing on text representations and similarity measures. We examine a range of similarity measures, including purely lexical measures, stemming, and language modeling-based measures. We formally evaluate and analyze the methods on a query-query similarity task using 363,822 queries from a web search log. Our analysis provides insights into the strengths and weaknesses of each method, including important tradeoffs between effectiveness and efficiency.

knowledge discovery and data mining | 2000

Visualization of navigation patterns on a Web site using model-based clustering

Igor V. Cadez; David Heckerman; Christopher Meek; Padhraic Smyth; Steven D. White

We present a new methodology for visualizing navigation patterns on a Web site. In our approach, we rst partition site users into clusters such that only users with similar navigation paths through the site are placed into the same cluster. Then, for each cluster, we display these paths for users within that cluster. The clustering approach we employ is model based (as opposed to distance based) and partitions users according to the order in which they request Web pages. In particular, we cluster users by learning a mixture of rst-order Markov models using the ExpectationMaximization algorithm. Our algorithm scales linearly with both number of users and number of clusters, and our implementation easily handles millions of users and thousands of clusters in memory. In the paper, we describe the details of our technology and a tool based on it called WebCANVAS. We illustrate the use of our technology on user-traAEc data from msnbc.com.

Archive | 2006

A Bayesian Approach to Causal Discovery

David Heckerman; Christopher Meek; Gregory F. Cooper

We examine the Bayesian approach to the discovery of causal DAG models and compare it to the constraint-based approach. Both approaches rely on the Causal Markov condition, but the two differ significantly in theory and practice. An important difference between the approaches is that the constraint-based approach uses categorical information about conditional-independence constraints in the domain, whereas the Bayesian approach weighs the degree to which such constraints hold. As a result, the Bayesian approach has three distinct advantages over its constraint-based counterpart. One, conclusions derived from the Bayesian approach are not susceptible to incorrect categorical decisions about independence facts that can occur with data sets of finite size. Two, using the Bayesian approach, finer distinctions among model structures—both quantitative and qualitative—can be made. Three, information from several models can be combined to make better inferences and to better account for modeling uncertainty. In addition to describing the general Bayesian approach to causal discovery, we review approximation methods for missing data and hidden variables, and illustrate differences between the Bayesian and constraint-based methods using artificial and real examples.

Data Mining and Knowledge Discovery | 2003

Model-Based Clustering and Visualization of Navigation Patterns on a Web Site

Igor V. Cadez; David Heckerman; Christopher Meek; Padhraic Smyth; Steven D. White

We present a new methodology for exploring and analyzing navigation patterns on a web site. The patterns that can be analyzed consist of sequences of URL categories traversed by users. In our approach, we first partition site users into clusters such that users with similar navigation paths through the site are placed into the same cluster. Then, for each cluster, we display these paths for users within that cluster. The clustering approach we employ is model-based (as opposed to distance-based) and partitions users according to the order in which they request web pages. In particular, we cluster users by learning a mixture of first-order Markov models using the Expectation-Maximization algorithm. The runtime of our algorithm scales linearly with the number of clusters and with the size of the data; and our implementation easily handles hundreds of thousands of user sessions in memory. In the paper, we describe the details of our method and a visualization tool based on it called WebCANVAS. We illustrate the use of our approach on user-traffic data from msnbc.com.

meeting of the association for computational linguistics | 2014

Semantic Parsing for Single-Relation Question Answering

Wen-tau Yih; Xiaodong He; Christopher Meek

We develop a semantic parsing framework based on semantic similarity for open domain question answering (QA). We focus on single-relation questions and decompose each question into an entity mention and a relation pattern. Using convolutional neural network models, we measure the similarity of entity mentions with entities in the knowledge base (KB) and the similarity of relation patterns and relations in the KB. We score relational triples in the KB using these measures and select the top scoring relational triple to answer the question. When evaluated on an open-domain QA task, our method achieves higher precision across different recall points compared to the previous approach, and can improve F1 by 7 points.

Annals of Statistics | 2006

On the toric algebra of graphical models

Dan Geiger; Christopher Meek; Bernd Sturmfels

We formulate necessary and sufficient conditions for an arbitrary discrete probability distribution to factor according to an undirected graphical model, or a log-linear model, or other more general exponential models. For decomposable graphical models these conditions are equivalent to a set of conditional independence statements similar to the Hammersley-Clifford theorem; however, we show that for nondecomposable graphical models they are not. We also show that nondecomposable models can have nonrational maximum likelihood estimates. These results are used to give several novel characterizations of decomposable graphical models.

empirical methods in natural language processing | 2015

WikiQA: A Challenge Dataset for Open-Domain Question Answering

Yi Yang; Wen-tau Yih; Christopher Meek

We describe the WIKIQA dataset, a new publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering. Most previous work on answer sentence selection focuses on a dataset created using the TREC-QA data, which includes editor-generated questions and candidate answer sentences selected by matching content words in the question. WIKIQA is constructed using a more natural process and is more than an order of magnitude larger than the previous dataset. In addition, the WIKIQA dataset also includes questions for which there are no correct sentences, enabling researchers to work on answer triggering, a critical component in any QA system. We compare several systems on the task of answer sentence selection on both datasets and also describe the performance of a system on the problem of answer triggering using the WIKIQA dataset.

Multivariate Behavioral Research | 1998

The TETRAD Project: Constraint Based Aids to Causal Model Specification

Richard Scheines; Peter Spirtes; Clark Glymour; Christopher Meek; Thomas S. Richardson

The statistical community has brought logical rigor and mathematical precision to the problem of using data to make inferences about a models parameter values. The TETRAD project, and related work in computer science and statistics, aims to apply those standards to the problem of using data and background knowledge to make inferences about a models specification. We begin by drawing the analogy between parameter estimation and model specification search. We then describe how the specification of a structural equation model entails familiar constraints on the covariance matrix for all admissible values of its parameters; we survey results on the equivalence of structural equation models, and we discuss search strategies for model specification. We end by presenting several algorithms that are implemented in the TETRAD I1 program.

Explore More