Alex Kulesza | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alex Kulesza is active.

Explore More

Publication

Featured researches published by Alex Kulesza.

Machine Learning | 2010

A theory of learning from different domains

Shai Ben-David; John Blitzer; Koby Crammer; Alex Kulesza; Fernando Pereira; Jennifer Wortman Vaughan

Discriminative learning methods for classification perform well when training and test data are drawn from the same distribution. Often, however, we have plentiful labeled training data from a source domain but wish to learn a classifier which performs well on a target domain with a different distribution and little or no labeled training data. In this work we investigate two questions. First, under what conditions can a classifier trained from source data be expected to perform well on target data? Second, given a small amount of labeled target data, how should we combine it during training with the large amount of labeled source data to achieve the lowest target error at test time?We address the first question by bounding a classifier’s target error in terms of its source error and the divergence between the two domains. We give a classifier-induced divergence measure that can be estimated from finite, unlabeled samples from the domains. Under the assumption that there exists some hypothesis that performs well in both domains, we show that this quantity together with the empirical source error characterize the target error of a source-trained classifier.We answer the second question by bounding the target error of a model which minimizes a convex combination of the empirical source and target errors. Previous theoretical work has considered minimizing just the source error, just the target error, or weighting instances from the two domains equally. We show how to choose the optimal combination of source and target error as a function of the divergence, the sample sizes of both domains, and the complexity of the hypothesis class. The resulting bound generalizes the previously studied cases and is always at least as tight as a bound which considers minimizing only the target error or an equal weighting of source and target errors.

neural information processing systems | 2009

Adaptive Regularization of Weight Vectors

Koby Crammer; Alex Kulesza; Mark Dredze

We present AROW, an online learning algorithm for binary and multiclass problems that combines large margin training, confidence weighting, and the capacity to handle non-separable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform especially well in the presence of label noise. We derive mistake bounds for the binary and multiclass settings that are similar in form to the second order perceptron bound. Our bounds do not assume separability. We also relate our algorithm to recent confidence-weighted online learning techniques. Empirical evaluations show that AROW achieves state-of-the-art performance on a wide range of binary and multiclass tasks, as well as robustness in the face of non-separable data.

arXiv: Machine Learning | 2012

Determinantal Point Processes for Machine Learning

Alex Kulesza; Ben Taskar

Determinantal point processes (DPPs) are elegant probabilistic models of repulsion that arise in quantum physics and random matrix theory. In contrast to traditional structured models like Markov random fields, which become intractable and hard to approximate in the presence of negative correlations, DPPs offer efficient and exact algorithms for sampling, marginalization, conditioning, and other inference tasks. While they have been studied extensively by mathematicians, giving rise to a deep and beautiful theory, DPPs are relatively new in machine learning. Determinantal Point Processes for Machine Learning provides a comprehensible introduction to DPPs, focusing on the intuitions, algorithms, and extensions that are most relevant to the machine learning community, and shows how DPPs can be applied to real-world applications like finding diverse sets of high-quality search results, building informative summaries by selecting diverse sentences from documents, modeling non-overlapping human poses in images or video, and automatically building timelines of important news stories. It presents the general mathematical background to DPPs along with a range of modeling extensions, efficient algorithms, and theoretical results that aim to enable practical modeling and learning.

empirical methods in natural language processing | 2009

Multi-Class Confidence Weighted Algorithms

Koby Crammer; Mark Dredze; Alex Kulesza

The recently introduced online confidence-weighted (CW) learning algorithm for binary classification performs well on many binary NLP tasks. However, for multi-class problems CW learning updates and inference cannot be computed analytically or solved as convex optimization problems as they are in the binary case. We derive learning algorithms for the multi-class CW setting and provide extensive evaluation using nine NLP datasets, including three derived from the recently released New York Times corpus. Our best algorithm out-performs state-of-the-art online and batch methods on eight of the nine tasks. We also show that the confidence information maintained during learning yields useful probabilistic information at test time.

Machine Learning | 2010

Multi-domain learning by confidence-weighted parameter combination

Mark Dredze; Alex Kulesza; Koby Crammer

State-of-the-art statistical NLP systems for a variety of tasks learn from labeled training data that is often domain specific. However, there may be multiple domains or sources of interest on which the system must perform. For example, a spam filtering system must give high quality predictions for many users, each of whom receives emails from different sources and may make slightly different decisions about what is or is not spam. Rather than learning separate models for each domain, we explore systems that learn across multiple domains. We develop a new multi-domain online learning framework based on parameter combination from multiple classifiers. Our algorithms draw from multi-task learning and domain adaptation to adapt multiple source domain classifiers to a new target domain, learn across multiple similar domains, and learn across a large number of disparate domains. We evaluate our algorithms on two popular NLP domain adaptation tasks: sentiment classification and spam filtering.

web search and data mining | 2014

Social collaborative retrieval

Ko Jen Hsiao; Alex Kulesza; Alfred O. Hero

Socially-based recommendation systems have recently attracted significant interest, and a number of studies have shown that social information can dramatically improve a systems predictions of user interests. Meanwhile, there are now many potential applications that involve aspects of both recommendation and information retrieval, and the task of collaborative retrieval-a combination of these two traditional problems-has recently been introduced. Successful collaborative retrieval requires overcoming severe data sparsity, making additional sources of information, such as social graphs, particularly valuable. In this paper we propose a new model for collaborative retrieval, and show that our algorithm outperforms current state-of-the-art approaches by incorporating information from social networks. We also provide empirical analyses of the ways in which cultural interests propagate along a social graph using a real-world music dataset.

IEEE Journal of Selected Topics in Signal Processing | 2014

Multi-Layer Graph Analysis for Dynamic Social Networks

Brandon Oselio; Alex Kulesza; Alfred O. Hero

Modern social networks frequently encompass multiple distinct types of connectivity information; for instance, explicitly acknowledged friend relationships might complement behavioral measures that link users according to their actions or interests. One way to represent these networks is as multi-layer graphs, where each layer contains a unique set of edges over the same underlying vertices (users). Edges in different layers typically have related but distinct semantics; depending on the application multiple layers might be used to reduce noise through averaging, to perform multifaceted analyses, or a combination of the two. However, it is not obvious how to extend standard graph analysis techniques to the multi-layer setting in a flexible way. In this paper we develop latent variable models and methods for mining multi-layer networks for connectivity patterns based on noisy data.

international conference on acoustics, speech, and signal processing | 2015

Information extraction from large multi-layer social networks

Brandon Oselio; Alex Kulesza; Alfred O. Hero

Social networks often encode community structure using multiple distinct types of links between nodes. In this paper we introduce a novel method to extract information from such multi-layer networks, where each type of link forms its own layer. Using the concept of Pareto optimality, community detection in this multi-layer setting is formulated as a multiple criterion optimization problem. We propose an algorithm for finding an approximate Pareto frontier containing a family of solutions. The power of this approach is demonstrated on a Twitter dataset, where the nodes are hashtags and the layers correspond to (1) behavioral edges connecting pairs of hashtags whose temporal profiles are similar and (2) relational edges connecting pairs of hashtags that appear in the same tweets.

international conference on social computing | 2014

Multi-objective Optimization for Multi-level Networks

Brandon Oselio; Alex Kulesza; Alfred O. Hero

Social network analysis is a rich field with many practical applications like community formation and hub detection. Traditionally, we assume that edges in the network have homogeneous semantics, for instance, indicating friend relationships. However, we increasingly deal with networks for which we can define multiple heterogeneous types of connections between users; we refer to these distinct groups of edges as layers. Naively, we could perform standard network analyses on each layer independently, but this approach may fail to identify interesting signals that are apparent only when viewing all of the layers at once. Instead, we propose to analyze a multi-layered network as a single entity, potentially yielding a richer set of results that better reflect the underlying data. We apply the framework of multi-objective optimization and specifically the concept of Pareto optimality, which has been used in many contexts in engineering and science to deliver solutions that offer tradeoffs between various objective functions. We show that this approach can be well-suited to multi-layer network analysis, as we will encounter situations in which we wish to optimize contrasting quantities. As a case study, we utilize the Pareto framework to show how to bisect the network into equal parts in a way that attempts to minimize the cut-size on each layer. This type of procedure might be useful in determining differences in structure between layers, and in cases where there is an underlying true bisection over multiple layers, this procedure could give a more accurate cut.

ieee international workshop on computational advances in multi sensor adaptive processing | 2013

Multi-layer graph analytics for social networks

Brandon Oselio; Alex Kulesza; Alfred O. Hero

Modern social networks frequently encompass multiple distinct types of connectivity information; for instance, explicitly acknowledged friend relationships might complement behavioral measures that link users according to their actions or interests. One way to represent these networks is as multi-layer graphs, where each layer contains a unique set of edges over the same underlying vertices (users). Edges in different layers typically have related but distinct semantics; depending on the application, multiple layers might be used to reduce noise through averaging, perform multifaceted analyses, or a combination of the two. However, it is not obvious how to extend standard graph analysis techniques to the multi-layer setting in a flexible way. In this paper we develop latent variable models and methods for mining multi-layer networks for connectivity patterns based on noisy data.

Explore More