Igor V. Cadez
University of California, Irvine
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Igor V. Cadez.
knowledge discovery and data mining | 2000
Igor V. Cadez; David Heckerman; Christopher Meek; Padhraic Smyth; Steven D. White
We present a new methodology for visualizing navigation patterns on a Web site. In our approach, we rst partition site users into clusters such that only users with similar navigation paths through the site are placed into the same cluster. Then, for each cluster, we display these paths for users within that cluster. The clustering approach we employ is model based (as opposed to distance based) and partitions users according to the order in which they request Web pages. In particular, we cluster users by learning a mixture of rst-order Markov models using the ExpectationMaximization algorithm. Our algorithm scales linearly with both number of users and number of clusters, and our implementation easily handles millions of users and thousands of clusters in memory. In the paper, we describe the details of our technology and a tool based on it called WebCANVAS. We illustrate the use of our technology on user-traAEc data from msnbc.com.
Data Mining and Knowledge Discovery | 2003
Igor V. Cadez; David Heckerman; Christopher Meek; Padhraic Smyth; Steven D. White
We present a new methodology for exploring and analyzing navigation patterns on a web site. The patterns that can be analyzed consist of sequences of URL categories traversed by users. In our approach, we first partition site users into clusters such that users with similar navigation paths through the site are placed into the same cluster. Then, for each cluster, we display these paths for users within that cluster. The clustering approach we employ is model-based (as opposed to distance-based) and partitions users according to the order in which they request web pages. In particular, we cluster users by learning a mixture of first-order Markov models using the Expectation-Maximization algorithm. The runtime of our algorithm scales linearly with the number of clusters and with the size of the data; and our implementation easily handles hundreds of thousands of user sessions in memory. In the paper, we describe the details of our method and a visualization tool based on it called WebCANVAS. We illustrate the use of our approach on user-traffic data from msnbc.com.
knowledge discovery and data mining | 2000
Igor V. Cadez; Scott Gaffney; Padhraic Smyth
This paper presents a unifying probabilisti framework for lustering individuals or systems into groups when the available data measurements are not multivariate ve tors of xed dimensionality. For example, one might have data from a set of medi al patients, where for ea h patient one has a set of of observed time-series, ea h time-series of potentially di erent length and di erent sampling rate. We propose a general model-based probabilisti framework for lustering data types of this form whi h are non-ve tor in nature and may vary in size from individual to individual. The Expe tation-Maximization (EM) pro edure for lustering within this framework is dis ussed and we dis uss how it be applied in a general manner to lustering of sequen es, time-series, traje tories, and other non-ve tor data. We show that a number of earlier algorithms an be viewed as spe ial ases within this unifying framework. The paper on ludes with several illustrations of the method, in luding lustering of red blood ell data in a medi al diagnosis ontext, lustering of proteins from urves of gene expression data, and lustering of individuals based on their sequen es of Web navigation. General Terms Clustering, Mixture Models, EM Algorithm
Machine Learning | 2002
Igor V. Cadez; Padhraic Smyth; Geoffrey J. McLachlan; Christine E. McLaren
Binning and truncation of data are common in data analysis and machine learning. This paper addresses the problem of fitting mixture densities to multivariate binned and truncated data. The EM approach proposed by McLachlan and Jones (Biometrics, 44: 2, 571–578, 1988) for the univariate case is generalized to multivariate measurements. The multivariate solution requires the evaluation of multidimensional integrals over each bin at each iteration of the EM procedure. Naive implementation of the procedure can lead to computationally inefficient results. To reduce the computational cost a number of straightforward numerical techniques are proposed. Results on simulated data indicate that the proposed methods can achieve significant computational gains with no loss in the accuracy of the final parameter estimates. Furthermore, experimental results suggest that with a sufficient number of bins and data points it is possible to estimate the true underlying density almost as well as if the data were not binned. The paper concludes with a brief description of an application of this approach to diagnosis of iron deficiency anemia, in the context of binned and truncated bivariate measurements of volume and hemoglobin concentration from an individuals red blood cells.
international conference on pattern recognition | 2002
Sergey Kirshner; Igor V. Cadez; Padhraic Smyth; Chandrika Kamath; Erick Cantú-Paz
We describe an application of probabilistic modeling to the problem of recognizing radio galaxies with a bent-double morphology. The type of galaxies in question contain distinctive signatures of geometric shape and flux density that can be used to be build a probabilistic model that is then used to score potential galaxy configurations. The experimental results suggest that even relatively simple probabilistic models can be useful in identifying galaxies of interest in an automatic manner.
knowledge discovery and data mining | 2001
Igor V. Cadez; Padhraic Smyth; Heikki Mannila
Archive | 2004
Igor V. Cadez; David Heckerman; Christopher A. Meek; Steven James White
knowledge discovery and data mining | 2000
Igor V. Cadez; Scott Gaffney; Padhraic Smyth
Archive | 1999
Igor V. Cadez; Padhraic Smyth
neural information processing systems | 2001
Igor V. Cadez; Paul S. Bradley