Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David Andrzejewski is active.

Publication


Featured researches published by David Andrzejewski.


international conference on machine learning | 2009

Incorporating domain knowledge into topic modeling via Dirichlet Forest priors

David Andrzejewski; Xiaojin Zhu; Mark Craven

Users of topic modeling methods often have knowledge about the composition of words that should have high or low probability in various topics. We incorporate such domain knowledge using a novel Dirichlet Forest prior in a Latent Dirichlet Allocation framework. The prior is a mixture of Dirichlet tree distributions with special structures. We present its construction, and inference via collapsed Gibbs sampling. Experiments on synthetic and real datasets demonstrate our models ability to follow and generalize beyond user-specified domain knowledge.


north american chapter of the association for computational linguistics | 2009

Latent Dirichlet Allocation with Topic-in-Set Knowledge

David Andrzejewski; Xiaojin Zhu

Latent Dirichlet Allocation is an unsupervised graphical model which can discover latent topics in unlabeled data. We propose a mechanism for adding partial supervision, called topic-in-set knowledge, to latent topic modeling. This type of supervision can be used to encourage the recovery of topics which are more relevant to user modeling goals than the topics which would be recovered otherwise. Preliminary experiments on text datasets are presented to demonstrate the potential effectiveness of this method.


international joint conference on artificial intelligence | 2011

A framework for incorporating general domain knowledge into latent Dirichlet allocation using first-order logic

David Andrzejewski; Xiaojin Zhu; Mark Craven; Benjamin Recht

Topic models have been used successfully for a variety of problems, often in the form of application-specific extensions of the basic Latent Dirichlet Allocation (LDA) model. Because deriving these new models in order to encode domain knowledge can be difficult and time-consuming, we propose the Foldċall model, which allows the user to specify general domain knowledge in First-Order Logic (FOL). However, combining topic modeling with FOL can result in inference problems beyond the capabilities of existing techniques. We have therefore developed a scalable inference technique using stochastic gradient descent which may also be useful to the Markov Logic Network (MLN) research community. Experiments demonstrate the expressive power of Foldċall, as well as the scalability of our proposed inference method.


european conference on machine learning | 2007

Statistical Debugging Using Latent Topic Models

David Andrzejewski; Anne Mulhern; Ben Liblit; Xiaojin Zhu

Statistical debugging uses machine learning to model program failures and help identify root causes of bugs. We approach this task using a novel Delta-Latent-Dirichlet-Allocation model. We model execution traces attributed to failed runs of a program as being generated by two types of latent topics: normal usage topics and bug topics. Execution traces attributed to successful runs of the same program, however, are modeled by usage topics only. Joint modeling of both kinds of traces allows us to identify weak bug topics that would otherwise remain undetected. We perform model inference with collapsed Gibbs sampling. In quantitative evaluations on four real programs, our model produces bug topics highly correlated to the true bugs, as measured by the Rand index. Qualitative evaluation by domain experts suggests that our model outperforms existing statistical methods for bug cause identification, and may help support other software tasks not addressed by earlier models.


knowledge discovery and data mining | 2011

Latent topic feedback for information retrieval

David Andrzejewski; David Buttler

We consider the problem of a user navigating an unfamiliar corpus of text documents where document metadata is limited or unavailable, the domain is specialized, and the user base is small. These challenging conditions may hold, for example, within an organization such as a business or government agency. We propose to augment standard keyword search with user feedback on latent topics. These topics are automatically learned from the corpus in an unsupervised manner and presented alongside search results. User feedback is then used to reformulate the original query, resulting in improved information retrieval performance in our experiments.


north american chapter of the association for computational linguistics | 2007

Improving Diversity in Ranking using Absorbing Random Walks

Xiaojin Zhu; Andrew B. Goldberg; Jurgen Van Gael; David Andrzejewski


empirical methods in natural language processing | 2012

Exploring Topic Coherence over Many Models and Many Topics

Keith Stevens; W. Philip Kegelmeyer; David Andrzejewski; David Buttler


north american chapter of the association for computational linguistics | 2009

May All Your Wishes Come True: A Study of Wishes and How to Recognize Them

Andrew B. Goldberg; Nathanael Fillmore; David Andrzejewski; Zhiting Xu; Bryan R. Gibson; Xiaojin Zhu


Archive | 2009

Visualization tool for system tracing infrastructure events

Alice X. Zheng; Trishul A. Chilimbi; Shuo-Hsien Hsiao; Danyel Fisher; David Andrzejewski


text retrieval conference | 2006

Ranking Biomedical Passages for Relevance and Diversity: University of Wisconsin, Madison at TREC Genomics 2006.

Andrew B. Goldberg; David Andrzejewski; Jurgen Van Gael; Burr Settles; Xiaojin Zhu; Mark Craven

Collaboration


Dive into the David Andrzejewski's collaboration.

Top Co-Authors

Avatar

Xiaojin Zhu

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar

Mark Craven

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar

Andrew B. Goldberg

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar

David Buttler

Lawrence Livermore National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Jurgen Van Gael

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar

Alexander Kiselev

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Anne Mulhern

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar

Ben Liblit

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar

Benjamin Recht

University of California

View shared research outputs
Researchain Logo
Decentralizing Knowledge