Michael J. Pazzani | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael J. Pazzani is active.

Explore More

Publication

Featured researches published by Michael J. Pazzani.

Machine Learning | 1997

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Pedro M. Domingos; Michael J. Pazzani

The simple Bayesian classifier is known to be optimal when attributes are independent given the class, but the question of whether other sufficient conditions for its optimality exist has so far not been explored. Empirical results showing that it performs surprisingly well in many domains containing clear attribute dependences suggest that the answer to this question may be positive. This article shows that, although the Bayesian classifiers probability estimates are only optimal under quadratic loss if the independence assumption holds, the classifier itself can be optimal under zero-one loss (misclassification rate) even when this assumption is violated by a wide margin. The region of quadratic-loss optimality of the Bayesian classifier is in fact a second-order infinitesimal fraction of the region of zero-one optimality. This implies that the Bayesian classifier has a much greater range of applicability than previously thought. For example, in this article it is shown to be optimal for learning conjunctions and disjunctions, even though they violate the independence assumption. Further, studies in artificial domains show that it will often outperform more powerful classifiers for common training set sizes and numbers of attributes, even if its bias is a priori much less appropriate to the domain. This articles results also imply that detecting attribute dependence is not necessarily the best way to extend the Bayesian classifier, and this is also verified empirically.

The adaptive web | 2007

Content-based recommendation systems

Michael J. Pazzani; Daniel Billsus

This chapter discusses content-based recommendation systems, i.e., systems that recommend an item to a user based upon a description of the item and a profile of the users interests. Content-based recommendation systems may be used in a variety of domains ranging from recommending web pages, news articles, restaurants, television programs, and items for sale. Although the details of various systems differ, content-based recommendation systems share in common a means for describing the items that may be recommended, a means for creating a profile of the user that describes the types of items the user likes, and a means of comparing items to the user profile to determine what to recommend. The profile is often created and updated automatically in response to feedback on the desirability of items that have been presented to the user.

Artificial Intelligence Review | 1999

A Framework for Collaborative, Content-Based and Demographic Filtering

Michael J. Pazzani

We discuss learning a profile of user interests for recommending information sources such as Web pages or news articles. We describe the types of information available to determine whether to recommend a particular page to a particular user. This information includes the content of the page, the ratings of the user on other pages and the contents of these pages, the ratings given to that page by other users and the ratings of these other users on other pages and demographic information about users. We describe how each type of information may be used individually and then discuss an approach to combining recommendations from multiple sources. We illustrate each approach and the combined approach in the context of recommending restaurants.

Machine Learning | 1997

Learning and Revising User Profiles: The Identification ofInteresting Web Sites

Michael J. Pazzani; Daniel Billsus

We discuss algorithms for learning and revising user profiles that can determine which World Wide Web sites on a given topic would be interesting to a user. We describe the use of a naive Bayesian classifier for this task, and demonstrate that it can incrementally learn profiles from user feedback on the interestingness of Web sites. Furthermore, the Bayesian classifier may easily be extended to revise user provided profiles. In an experimental evaluation we compare the Bayesian classifier to computationally more intensive alternatives, and show that it performs at least as well as these approaches throughout a range of different domains. In addition, we empirically analyze the effects of providing the classifier with background knowledge in form of user defined profiles and examine the use of lexical knowledge for feature selection. We find that both approaches can substantially increase the prediction accuracy.

Knowledge and Information Systems | 2001

Dimensionality reduction for fast similarity search in large time series databases

Eamonn J. Keogh; Kaushik Chakrabarti; Michael J. Pazzani; Sharad Mehrotra

Abstract. The problem of similarity search in large time series databases has attracted much attention recently. It is a non-trivial problem because of the inherent high dimensionality of the data. The most promising solutions involve first performing dimensionality reduction on the data, and then indexing the reduced data with a spatial access method. Three major dimensionality reduction techniques have been proposed: Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and more recently the Discrete Wavelet Transform (DWT). In this work we introduce a new dimensionality reduction technique which we call Piecewise Aggregate Approximation (PAA). We theoretically and empirically compare it to the other techniques and demonstrate its superiority. In addition to being competitive with or faster than the other methods, our approach has numerous other advantages. It is simple to understand and to implement, it allows more flexible distance measures, including weighted Euclidean queries, and the index can be built in linear time.

international conference on data mining | 2001

An online algorithm for segmenting time series

Eamonn J. Keogh; Selina Chu; David M. Hart; Michael J. Pazzani

In recent years, there has been an explosion of interest in mining time-series databases. As with most computer science problems, representation of the data is the key to efficient and effective solutions. One of the most commonly used representations is piecewise linear approximation. This representation has been used by various researchers to support clustering, classification, indexing and association rule mining of time-series data. A variety of algorithms have been proposed to obtain this representation, with several algorithms having been independently rediscovered several times. In this paper, we undertake the first extensive review and empirical comparison of all proposed techniques. We show that all these algorithms have fatal flaws from a data-mining perspective. We introduce a novel algorithm that we empirically show to be superior to all others in the literature.

knowledge discovery and data mining | 2000

Scaling up dynamic time warping for datamining applications

Eamonn J. Keogh; Michael J. Pazzani

There has been much recent interest in adapting data mining algorithms to time series databases. Most of these algorithms need to compare time series. Typically some variation of Euclidean distance is used. However, as we demonstrate in this paper, Euclidean distance can be an extremely brittle distance measure. Dynamic time warping (DTW) has been suggested as a technique to allow more robust distance calculations, however it is computationally expensive. In this paper we introduce a modification of DTW which operates on a higher level abstraction of the data, in particular, a Piecewise Aggregate Approximation (PAA). Our approach allows us to outperform DTW by one to two orders of magnitude, with no loss of accuracy.

User Modeling and User-adapted Interaction | 2000

User Modeling for Adaptive News Access

Daniel Billsus; Michael J. Pazzani

We present a framework for adaptive news access, based on machine learning techniques specifically designed for this task. First, we focus on the systems general functionality and system architecture. We then describe the interface and design of two deployed news agents that are part of the described architecture. While the first agent provides personalized news through a web-based interface, the second system is geared towards wireless information devices such as PDAs (personal digital assistants) and cell phones. Based on implicit and explicit user feedback, our agents use a machine learning algorithm to induce individual user models. Motivated by general shortcomings of other user modeling systems for Information Retrieval applications, as well as the specific requirements of news classification, we propose the induction of hybrid user models that consist of separate models for short-term and long-term interests. Furthermore, we illustrate how the described algorithm can be used to address an important issue that has thus far received little attention in the Information Retrieval community: a users information need changes as a direct result of interaction with information. We empirically evaluate the systems performance based on data collected from regular system users. The goal of the evaluation is not only to understand the performance contributions of the algorithms individual components, but also to assess the overall utility of the proposed user modeling techniques from a user perspective. Our results provide empirical evidence for the utility of the hybrid user model, and suggest that effective personalization can be achieved without requiring any extra effort from the user.

international conference on user modeling, adaptation, and personalization | 1999

A hybrid user model for news story classification

Daniel Billsus; Michael J. Pazzani

We present an intelligent agent designed to compile a daily news program for individual users. Based on feedback from the user, the system automatically adapts to the user’s preferences and interests. In this paper we focus on the system’s user modeling component. First, we motivate the use of a multi-strategy machine learning approach that allows for the induction of user models that consist of separate models for long-term and short-term interests. Second, we investigate the utility of explicitly modeling information that the system has already presented to the user. This allows us to address an important issue that has thus far received virtually no attention in the Information Retrieval community: the fact that a user’s information need changes as a direct result of interaction with information. We evaluate the proposed algorithms on user data collected with a prototype of our system, and assess the individual performance contributions of both model components.

User Modeling and User-adapted Interaction | 2001

Machine Learning for User Modeling

Geofferey I. Webb; Michael J. Pazzani; Daniel Billsus

At first blush, user modeling appears to be a prime candidate for straightforward application of standard machine learning techniques. Observations of the users behavior can provide training examples that a machine learning system can use to form a model designed to predict future actions. However, user modeling poses a number of challenges for machine learning that have hindered its application in user modeling, including: the need for large data sets; the need for labeled data; concept drift; and computational complexity. This paper examines each of these issues and reviews approaches to resolving them.

Explore More