George Karypis | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where George Karypis is active.

Explore More

Publication

Featured researches published by George Karypis.

international world wide web conferences | 2001

Item-based collaborative filtering recommendation algorithms

Badrul Munir Sarwar; George Karypis; Joseph A. Konstan; John Riedl

Recommender systems apply knowledge discovery techniques to the problem of making personalized recommendations for information, products or services during a live interaction. These systems, especially the k-nearest neighbor collaborative ltering based ones, are achieving widespread success on the Web. The tremendous growth in the amount of available information and the number of visitors to Web sites in recent years poses some key challenges for recommender systems. These are: producing high quality recommendations, performing many recommendations per second for millions of users and items and achieving high coverage in the face of data sparsity. In traditional collaborative ltering systems the amount of work increases with the number of participants in the system. New recommender system technologies are needed that can quickly produce high quality recommendations, even for very large-scale problems. To address these issues we have explored item-based collaborative ltering techniques. Item-based techniques rst analyze the user-item matrix to identify relationships between di erent items, and then use these relationships to indirectly compute recommendations for users. In this paper we analyze di erent item-based recommendation generation algorithms. We look into di erent techniques for computing item-item similarities (e.g., item-item correlation vs. cosine similarities between item vectors) and di erent techniques for obtaining recommendations from them (e.g., weighted sum vs. regression model). Finally, we experimentally evaluate our results and compare them to the basic k-nearest neighbor approach. Our experiments suggest that item-based algorithms provide dramatically better performance than user-based algorithms, while at the same time providing better quality than the best available userbased algorithms.

SIAM Journal on Scientific Computing | 1998

A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

George Karypis; Vipin Kumar

Recently, a number of researchers have investigated a class of graph partitioning algorithms that reduce the size of the graph by collapsing vertices and edges, partition the smaller graph, and then uncoarsen it to construct a partition for the original graph [Bui and Jones, Proc. of the 6th SIAM Conference on Parallel Processing for Scientific Computing, 1993, 445--452; Hendrickson and Leland, A Multilevel Algorithm for Partitioning Graphs, Tech. report SAND 93-1301, Sandia National Laboratories, Albuquerque, NM, 1993]. From the early work it was clear that multilevel techniques held great promise; however, it was not known if they can be made to consistently produce high quality partitions for graphs arising in a wide range of application domains. We investigate the effectiveness of many different choices for all three phases: coarsening, partition of the coarsest graph, and refinement. In particular, we present a new coarsening heuristic (called heavy-edge heuristic) for which the size of the partition of the coarse graph is within a small factor of the size of the final partition obtained after multilevel refinement. We also present a much faster variation of the Kernighan--Lin (KL) algorithm for refining during uncoarsening. We test our scheme on a large number of graphs arising in various domains including finite element methods, linear programming, VLSI, and transportation. Our experiments show that our scheme produces partitions that are consistently better than those produced by spectral partitioning schemes in substantially smaller time. Also, when our scheme is used to compute fill-reducing orderings for sparse matrices, it produces orderings that have substantially smaller fill than the widely used multiple minimum degree algorithm.

electronic commerce | 2000

Analysis of recommendation algorithms for e-commerce

Badrul Munir Sarwar; George Karypis; Joseph A. Konstan; John Riedl

ABSTRACT Re ommender systems apply statisti al and knowledge disovery te hniques to the problem of making produ t re ommendations during a live ustomer intera tion and they are a hieving widespread su ess in E-Commer e nowadays. In this paper, we investigate several te hniques for analyzing large-s ale pur hase and preferen e data for the purpose of produ ing useful re ommendations to ustomers. In parti ular, we apply a olle tion of algorithms su h as traditional data mining, nearest-neighbor ollaborative ltering, and dimensionality redu tion on two di erent data sets. The rst data set was derived from the web-pur hasing transa tion of a large Eommer e ompany whereas the se ond data set was olle ted from MovieLens movie re ommendation site. For the experimental purpose, we divide the re ommendation generation pro ess into three sub pro esses{ representation of input data, neighborhood formation, and re ommendation generation. We devise di erent te hniques for di erent sub pro esses and apply their ombinations on our data sets to ompare for re ommendation quality and performan e.

ACM Transactions on Information Systems | 2004

Item-based top- N recommendation algorithms

Mukund Deshpande; George Karypis

The explosive growth of the world-wide-web and the emergence of e-commerce has led to the development of recommender systems---a personalized information filtering technology used to identify a set of items that will be of interest to a certain user. User-based collaborative filtering is the most successful technology for building recommender systems to date and is extensively used in many commercial recommender systems. Unfortunately, the computational complexity of these methods grows linearly with the number of customers, which in typical commercial applications can be several millions. To address these scalability concerns model-based recommendation techniques have been developed. These techniques analyze the user--item matrix to discover relations between the different items and use these relations to compute the list of recommendations.In this article, we present one such class of model-based recommendation algorithms that first determines the similarities between the various items and then uses them to identify the set of items to be recommended. The key steps in this class of algorithms are (i) the method used to compute the similarity between the items, and (ii) the method used to combine these similarities in order to compute the similarity between a basket of items and a candidate recommender item. Our experimental evaluation on eight real datasets shows that these item-based algorithms are up to two orders of magnitude faster than the traditional user-neighborhood based recommender systems and provide recommendations with comparable or better quality.

Journal of Parallel and Distributed Computing | 1998

Multilevelk-way Partitioning Scheme for Irregular Graphs

George Karypis; Vipin Kumar

In this paper, we present and study a class of graph partitioning algorithms that reduces the size of the graph by collapsing vertices and edges, we find ak-way partitioning of the smaller graph, and then we uncoarsen and refine it to construct ak-way partitioning for the original graph. These algorithms compute ak-way partitioning of a graphG= (V,E) inO(|E|) time, which is faster by a factor ofO(logk) than previously proposed multilevel recursive bisection algorithms. A key contribution of our work is in finding a high-quality and computationally inexpensive refinement algorithm that can improve upon an initialk-way partitioning. We also study the effectiveness of the overall scheme for a variety of coarsening schemes. We present experimental results on a large number of graphs arising in various domains including finite element methods, linear programming, VLSI, and transportation. Our experiments show that this new scheme produces partitions that are of comparable or better quality than those produced by the multilevel bisection algorithm and requires substantially smaller time. Graphs containing up to 450,000 vertices and 3,300,000 edges can be partitioned in 256 domains in less than 40 s on a workstation such as SGIs Challenge. Compared with the widely used multilevel spectral bisection algorithm, our new algorithm is usually two orders of magnitude faster and produces partitions with substantially smaller edge-cut.

IEEE Computer | 1999

Chameleon: hierarchical clustering using dynamic modeling

George Karypis; Eui-Hong Han; Vipin Kumar

Clustering is a discovery process in data mining. It groups a set of data in a way that maximizes the similarity within clusters and minimizes the similarity between two different clusters. Many advanced algorithms have difficulty dealing with highly variable clusters that do not follow a preconceived model. By basing its selections on both interconnectivity and closeness, the Chameleon algorithm yields accurate results for these highly variable clusters. Existing algorithms use a static model of the clusters and do not use information about the nature of individual clusters as they are merged. Furthermore, one set of schemes (the CURE algorithm and related schemes) ignores the information about the aggregate interconnectivity of items in two clusters. Another set of schemes (the Rock algorithm, group averaging method, and related schemes) ignores information about the closeness of two clusters as defined by the similarity of the closest items across two clusters. By considering either interconnectivity or closeness only, these algorithms can select and merge the wrong pair of clusters. Chameleons key feature is that it accounts for both interconnectivity and closeness in identifying the most similar pair of clusters. Chameleon finds the clusters in the data set by using a two-phase algorithm. During the first phase, Chameleon uses a graph partitioning algorithm to cluster the data items into several relatively small subclusters. During the second phase, it uses an algorithm to find the genuine clusters by repeatedly combining these subclusters.

international conference on data mining | 2001

Frequent subgraph discovery

Michihiro Kuramochi; George Karypis

As data mining techniques are being increasingly applied to non-traditional domains, existing approaches for finding frequent itemsets cannot be used as they cannot model the requirement of these domains. An alternate way of modeling the objects in these data sets is to use graphs. Within that model, the problem of finding frequent patterns becomes that of discovering subgraphs that occur frequently over the entire set of graphs.The authors present a computationally efficient algorithm for finding all frequent subgraphs in large graph databases. We evaluated the performance of the algorithm by experiments with synthetic datasets as well as a chemical compound dataset. The empirical results show that our algorithm scales linearly with the number of input transactions and it is able to discover frequent subgraphs from a set of graph transactions reasonably fast, even though we have to deal with computationally hard problems such as canonical labeling of graphs and subgraph isomorphism which are not necessary for traditional frequent itemset discovery.

design automation conference | 1997

Multilevel hypergraph partitioning: application in VLSI domain

George Karypis; Rajat Aggarwal; Vipin Kumar; Shashi Shekhar

In this paper, we present a new hypergraph partitioning algorithmthat is based on the multilevel paradigm. In the multilevel paradigm,a sequence of successively coarser hypergraphs is constructed. Abisection of the smallest hypergraph is computed and it is used toobtain a bisection of the original hypergraph by successively projectingand refining the bisection to the next level finer hypergraph.We evaluate the performance both in terms of the size of the hyper-edgecut on the bisection as well as run time on a number of VLSIcircuits. Our experiments show that our multilevel hypergraph partitioningalgorithm produces high quality partitioning in relativelysmall amount of time. The quality of the partitionings produced byour scheme are on the average 4% to 23% better than those producedby other state-of-the-art schemes. Furthermore, our partitioning algorithmissignificantly faster, often requiring 4 to 15 times less timethan that required by the other schemes. Our multilevel hypergraphpartitioning algorithm scales very well for large hypergraphs. Hypergraphswith over 100,000 vertices can be bisected in a few minuteson todays workstations. Also, on the large hypergraphs, ourscheme outperforms other schemes (in hyperedge cut) quite consistentlywith larger margins (9% to 30%).

conference on information and knowledge management | 2001

Evaluation of Item-Based Top- N Recommendation Algorithms

George Karypis

The explosive growth of the world-wide-web and the emergence of e-commerce has led to the development of recommender systems---a personalized information filtering technology used to identify a set of N items that will be of interest to a certain user. User-based Collaborative filtering is the most successful technology for building recommender systems to date, and is extensively used in many commercial recommender systems. Unfortunately, the computational complexity of these methods grows linearly with the number of customers that in typical commercial applications can grow to be several millions. To address these scalability concerns item-based recommendation techniques have been developed that analyze the user-item matrix to identify relations between the different items, and use these relations to compute the list of recommendations.In this paper we present one such class of item-based recommendation algorithms that first determine the similarities between the various items and then used them to identify the set of items to be recommended. The key steps in this class of algorithms are (i) the method used to compute the similarity between the items, and (ii) the method used to combine these similarities in order to compute the similarity between a basket of items and a candidate recommender item. Our experimental evaluation on five different datasets show that the proposed item-based algorithms are up to 28 times faster than the traditional user-neighborhood based recommender systems and provide recommendations whose quality is up to 27% better.

Machine Learning | 2004

Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Ying Zhao; George Karypis

This paper evaluates the performance of different criterion functions in the context of partitional clustering algorithms for document datasets. Our study involves a total of seven different criterion functions, three of which are introduced in this paper and four that have been proposed in the past. We present a comprehensive experimental evaluation involving 15 different datasets, as well as an analysis of the characteristics of the various criterion functions and their effect on the clusters they produce. Our experimental results show that there are a set of criterion functions that consistently outperform the rest, and that some of the newly proposed criterion functions lead to the best overall results. Our theoretical analysis shows that the relative performance of the criterion functions depends on (i) the degree to which they can correctly operate when the clusters are of different tightness, and (ii) the degree to which they can lead to reasonably balanced clusters.

Explore More