Is this you? Create Your Porfile

Guy Lebanon

Georgia Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guy Lebanon is active.

Explore More

Publication

Featured researches published by Guy Lebanon.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2006

Metric learning for text documents

Guy Lebanon

Many algorithms in machine learning rely on being given a good distance metric over the input space. Rather than using a default metric such as the Euclidean metric, it is desirable to obtain a metric based on the provided data. We consider the problem of learning a Riemannian metric associated with a given differentiable manifold and a set of points. Our approach to the problem involves choosing a metric from a parametric family that is based on maximizing the inverse volume of a given data set of points. From a statistical perspective, it is related to maximum likelihood under a model that assigns probabilities inversely proportional to the Riemannian volume element. We discuss in detail learning a metric on the multinomial simplex where the metric candidates are pull-back metrics of the Fisher information under a Lie group of transformations. When applied to text document classification the resulting geodesic distance resemble, but outperform, the tfidf cosine similarity measure.

web search and data mining | 2013

Learning multiple-question decision trees for cold-start recommendation

Mingxuan Sun; Fuxin Li; Joonseok Lee; Ke Zhou; Guy Lebanon; Hongyuan Zha

For cold-start recommendation, it is important to rapidly profile new users and generate a good initial set of recommendations through an interview process --- users should be queried adaptively in a sequential fashion, and multiple items should be offered for opinion solicitation at each trial. In this work, we propose a novel algorithm that learns to conduct the interview process guided by a decision tree with multiple questions at each split. The splits, represented as sparse weight vectors, are learned through an L_1-constrained optimization framework. The users are directed to child nodes according to the inner product of their responses and the corresponding weight vector. More importantly, to account for the variety of responses coming to a node, a linear regressor is learned within each node using all the previously obtained answers as input to predict item ratings. A user study, preliminary but first in its kind in cold-start recommendation, is conducted to explore the efficient number and format of questions being asked in a recommendation survey to minimize user cognitive efforts. Quantitative experimental validations also show that the proposed algorithm outperforms state-of-the-art approaches in terms of both the prediction accuracy and user cognitive efforts.

IEEE Transactions on Visualization and Computer Graphics | 2008

Visualizing Incomplete and Partially Ranked Data

Paul Kidwell; Guy Lebanon; William S. Cleveland

Ranking data, which result from m raters ranking n items, are difficult to visualize due to their discrete algebraic structure, and the computational difficulties associated with them when n is large. This problem becomes worse when raters provide tied rankings or not all items are ranked. We develop an approach for the visualization of ranking data for large n which is intuitive, easy to use, and computationally efficient. The approach overcomes the structural and computational difficulties by utilizing a natural measure of dissimilarity for raters, and projecting the raters into a low dimensional vector space where they are viewed. The visualization techniques are demonstrated using voting data, jokes, and movie preferences.

IEEE Transactions on Visualization and Computer Graphics | 2007

Sequential Document Visualization

Yi Mao; Joshua V. Dillon; Guy Lebanon

Documents and other categorical valued time series are often characterized by the frequencies of short range sequential patterns such as n-grams. This representation converts sequential data of varying lengths to high dimensional histogram vectors which are easily modeled by standard statistical models. Unfortunately, the histogram representation ignores most of the medium and long range sequential dependencies making it unsuitable for visualizing sequential data. We present a novel framework for sequential visualization of discrete categorical time series based on the idea of local statistical modeling. The framework embeds categorical time series as smooth curves in the multinomial simplex summarizing the progression of sequential trends. We discuss several visualization techniques based on the above framework and demonstrate their usefulness for document visualization.

international conference on management of data | 2008

Mechanisms for database intrusion detection and response

Ashish Kamra; Elisa Bertino; Guy Lebanon

Data represent today a valuable asset for companies and organizations and must be protected. Most of an organizations sensitive and proprietary data resides in a Database Management System (DBMS). The focus of this thesis is to develop advanced security solutions for protecting the data residing in a DBMS. Our strategy is to develop an Intrusion Detection (ID) mechanism, implemented within the database server, that is capable of detecting anomalous user requests to a DBMS. The key idea is to learn profiles of users and applications interacting with a database. A database request that deviates from these profiles is then termed as anomalous. A major component of this work involves prototype implementation of this ID mechanism in the Post-greSQL database server. We also propose to augment the ID mechanism with an Intrusion Response engine that is capable of issuing an appropriate response to an anomalous database request.

recent advances in intrusion detection | 2008

Determining Placement of Intrusion Detectors for a Distributed Application through Bayesian Network Modeling

Gaspar Modelo-Howard; Saurabh Bagchi; Guy Lebanon

To secure todays computer systems, it is critical to have different intrusion detection sensors embedded in them. The complexity of distributedcomputer systems makes it difficult to determine the appropriate configuration of these detectors, i.e., their choice and placement. In this paper, we describe a method to evaluate the effect of the detector configuration on the accuracy and precision of determining security goals in the system. For this, we develop a Bayesian network model for the distributed system, from an attack graph representation of multi-stage attacks in the system. We use Bayesian inference to solve the problem of determining the likelihood that an attack goal has been achieved, givena certain set of detector alerts. We quantify the overall detection performance in the system for different detector settings, namely, choice and placement of the detectors, their quality, and levels of uncertainty of adversarial behavior. These observations lead us to a greedy algorithm for determining the optimal detector settings in a large-scale distributed system. We present the results of experiments on Bayesian networks representing two real distributed systems and real attacks on them.

knowledge discovery and data mining | 2012

Fast bregman divergence NMF using taylor expansion and coordinate descent

Liangda Li; Guy Lebanon; Haesun Park

Non-negative matrix factorization (NMF) provides a lower rank approximation of a matrix. Due to nonnegativity imposed on the factors, it gives a latent structure that is often more physically meaningful than other lower rank approximations such as singular value decomposition (SVD). Most of the algorithms proposed in literature for NMF have been based on minimizing the Frobenius norm. This is partly due to the fact that the minimization problem based on the Frobenius norm provides much more flexibility in algebraic manipulation than other divergences. In this paper we propose a fast NMF algorithm that is applicable to general Bregman divergences. Through Taylor series expansion of the Bregman divergences, we reveal a relationship between Bregman divergences and Euclidean distance. This key relationship provides a new direction for NMF algorithms with general Bregman divergences when combined with the scalar block coordinate descent method. The proposed algorithm generalizes several recently proposed methods for computation of NMF with Bregman divergences and is computationally faster than existing alternatives. We demonstrate the effectiveness of our approach with experiments conducted on artificial as well as real world data.

IEEE Transactions on Information Theory | 2005

Axiomatic geometry of conditional models

Guy Lebanon

We formulate and prove an axiomatic characterization of the Riemannian geometry underlying manifolds of conditional models. The characterization holds for both normalized and nonnormalized conditional models. In the normalized case, the characterization extends the derivation of the Fisher information by Cencov while in the nonnormalized case it extends Campbells theorem. Due to the close connection between the conditional I-divergence and the product Fisher information metric, we provides a new axiomatic interpretation of the geometries underlying logistic regression and AdaBoost

Journal of the American Statistical Association | 2011

Statistical Estimation of Word Acquisition With Application to Readability Prediction

Paul Kidwell; Guy Lebanon; Kevyn Collins-Thompson

Models of language learning play a central role in a wide range of applications: from psycholinguistic theories of how people acquire new word knowledge, to information systems that can automatically match content to users’ reading ability. Traditional methods for estimating word acquisition ages or content readability are typically based on linear regression over a small number of summary features derived from time-consuming user studies or costly expert judgments. With the increasing amounts of content available from the web and other sources, however, new statistical approaches are possible that can exploit this easily acquired data to learn more flexible, fine-grained models of language usage. We present a novel statistical model for document readability that is based on the logistic Rasch model and the quantiles of word acquisition age distributions. We use this model to estimate the distributions of word acquisition ages from empirical readability data collected from the web. We then demonstrate that the estimated acquisition distributions are very effective in predicting both global and local document readability. We also compare the estimated distributions with word acquisition data from existing oral studies, revealing interesting historical trends as well as differences between oral and written word acquisition grade levels.

Journal of The Optical Society of America A-optics Image Science and Vision | 2001

Variational approach to moiré pattern synthesis

Guy Lebanon; Alfred M. Bruckstein

Moiré phenomena occur when two or more images are nonlinearly combined to create a new superposition image. Moiré patterns are patterns that do not exist in any of the original images but appear in the superposition image, for example as the result of a multiplicative superposition rule. The topic of moiré pattern synthesis deals with creating images that when superimposed will reveal certain desired moiré patterns. Conditions that ensure that a desired moiré pattern will be present in the superposition of two images are known; however, they do not specify these images uniquely. The freedom in choosing the superimposed images can be exploited to produce various degrees of visibility and ensure desired properties. Performance criteria for the images that measure when one superposition is better than another are introduced. These criteria are based on the visibility of the moire patterns to the human visual system and on the digitization that takes place when the images are presented on discrete displays. We propose to resolve the freedom in moire synthesis by choosing the images that optimize the chosen criteria.

Explore More