Martin B. Scholz | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Martin B. Scholz is active.

Explore More

Publication

Featured researches published by Martin B. Scholz.

international conference on data mining | 2008

One-Class Collaborative Filtering

Rong Pan; Yunhong Zhou; Bin Cao; Nathan Nan Liu; Rajan Lukose; Martin B. Scholz; Qiang Yang

Many applications of collaborative filtering (CF), such as news item recommendation and bookmark recommendation, are most naturally thought of as one-class collaborative filtering (OCCF) problems. In these problems, the training data usually consist simply of binary data reflecting a users action or inaction, such as page visitation in the case of news item recommendation or webpage bookmarking in the bookmarking scenario. Usually this kind of data are extremely sparse (a small fraction are positive examples), therefore ambiguity arises in the interpretation of the non-positive examples. Negative examples and unlabeled positive examples are mixed together and we are typically unable to distinguish them. For example, we cannot really attribute a user not bookmarking a page to a lack of interest or lack of awareness of the page. Previous research addressing this one-class problem only considered it as a classification task. In this paper, we consider the one-class problem under the CF setting. We propose two frameworks to tackle OCCF. One is based on weighted low rank approximation; the other is based on negative example sampling. The experimental results show that our approaches significantly outperform the baselines.

Sigkdd Explorations | 2010

Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement

George Forman; Martin B. Scholz

Cross-validation is a mainstay for measuring performance and progress in machine learning. There are subtle differences in how exactly to compute accuracy, F-measure and Area Under the ROC Curve (AUC) in cross-validation studies. However, these details are not discussed in the literature, and incompatible methods are used by various papers and software packages. This leads to inconsistency across the research literature. Anomalies in performance calculations for particular folds and situations go undiscovered when they are buried in aggregated results over many folds and datasets, without ever a person looking at the intermediate performance measurements. This research note clarifies and illustrates the differences, and it provides guidance for how best to measure classification performance under cross-validation. In particular, there are several divergent methods used for computing F-measure, which is often recommended as a performance measure under class imbalance, e.g., for text classification domains and in one-vs.-all reductions of datasets having many classes. We show by experiment that all but one of these computation methods leads to biased measurements, especially under high class imbalance. This paper is of particular interest to those designing machine learning software libraries and researchers focused on high class imbalance.

knowledge discovery and data mining | 2009

Mind the gaps: weighting the unknown in large-scale one-class collaborative filtering

Rong Pan; Martin B. Scholz

One-Class Collaborative Filtering (OCCF) is a task that naturally emerges in recommender system settings. Typical characteristics include: Only positive examples can be observed, classes are highly imbalanced, and the vast majority of data points are missing. The idea of introducing weights for missing parts of a matrix has recently been shown to help in OCCF. While existing weighting approaches mitigate the first two problems above, a sparsity preserving solution that would allow to efficiently utilize data sets with e.g., hundred thousands of users and items has not yet been reported. In this paper, we study three different collaborative filtering frameworks: Low-rank matrix approximation, probabilistic latent semantic analysis, and maximum-margin matrix factorization. We propose two novel algorithms for large-scale OCCF that allow to weight the unknowns. Our experimental results demonstrate their effectiveness and efficiency on different problems, including the Netflix Prize data.

knowledge discovery and data mining | 2009

Feature shaping for linear SVM classifiers

George Forman; Martin B. Scholz; Shyam Sundar Rajaram

Linear classifiers have been shown to be effective for many discrimination tasks. Irrespective of the learning algorithm itself, the final classifier has a weight to multiply by each feature. This suggests that ideally each input feature should be linearly correlated with the target variable (or anti-correlated), whereas raw features may be highly non-linear. In this paper, we attempt to re-shape each input feature so that it is appropriate to use with a linear weight and to scale the different features in proportion to their predictive value. We demonstrate that this pre-processing is beneficial for linear SVM classifiers on a large benchmark of text classification tasks as well as UCI datasets.

conference on information and knowledge management | 2008

Data weaving: scaling up the state-of-the-art in data clustering

Ron Bekkerman; Martin B. Scholz

The enormous amount and dimensionality of data processed by modern data mining tools require effective, scalable unsupervised learning techniques. Unfortunately, the majority of previously proposed clustering algorithms are either effective or scalable. This paper is concerned with information-theoretic clustering (ITC) that has historically been considered the state-of-the-art in clustering multi-dimensional data. Most existing ITC methods are computationally expensive and not easily scalable. Those few ITC methods that scale well (using, e.g., parallelization) are often outperformed by the others, of an inherently sequential nature. First, we justify this observation theoretically. We then propose data weaving - a novel method for parallelizing sequential clustering algorithms. Data weaving is intrinsically multi-modal - it allows simultaneous clustering of a few types of data (modalities). Finally, we use data weaving to parallelize multi-modal ITC, which results in proposing a powerful DataLoom algorithm. In our experimentation with small datasets, DataLoom shows practically identical performance compared to expensive sequential alternatives. On large datasets, however, DataLoom demonstrates significant gains over other parallel clustering methods. To illustrate the scalability, we simultaneously clustered rows and columns of a contingency table with over 120 billion entries.

knowledge discovery and data mining | 2009

Improving clustering stability with combinatorial MRFs

Ron Bekkerman; Martin B. Scholz; Krishnamurthy Viswanathan

As clustering methods are often sensitive to parameter tuning, obtaining stability in clustering results is an important task. In this work, we aim at improving clustering stability by attempting to diminish the influence of algorithmic inconsistencies and enhance the signal that comes from the data. We propose a mechanism that takes m clusterings as input and outputs m clusterings of comparable quality, which are in higher agreement with each other. We call our method the Clustering Agreement Process (CAP). To preserve the clustering quality, CAP uses the same optimization procedure as used in clustering. In particular, we study the stability problem of randomized clustering methods (which usually produce different results at each run). We focus on methods that are based on inference in a combinatorial Markov Random Field (or Comraf, for short) of a simple topology. We instantiate CAP as inference within a more complex, bipartite Comraf. We test the resulting system on four datasets, three of which are medium-sized text collections, while the fourth is a large-scale user/movie dataset. First, in all the four cases, our system significantly improves the clustering stability measured in terms of the macro-averaged Jaccard index. Second, in all the four cases our system managed to significantly improve clustering quality as well, achieving the state-of-the-art results. Third, our system significantly improves stability of consensus clustering built on top of the randomized clustering solutions.

european conference on machine learning | 2008

Client-Friendly Classification over Random Hyperplane Hashes

Shyam Sundar Rajaram; Martin B. Scholz

In this work, we introduce a powerful and general feature representation based on a locality sensitive hash scheme called random hyperplane hashing. We are addressing the problem of centrally learning (linear) classification models from data that is distributed on a number of clients, and subsequently deploying these models on the same clients. Our main goal is to balance the accuracy of individual classifiers and different kinds of costs related to their deployment, including communication costs and computational complexity. We hence systematically study how well schemes for sparse high-dimensional data adapt to the much denser representations gained by random hyperplane hashing, how much data has to be transmitted to preserve enough of the semantics of each document, and how the representations affect the overall computational complexity. This paper provides theoretical results in the form of error bounds and margin based bounds to analyze the performance of classifiers learnt over the hash-based representation. We also present empirical evidence to illustrate the attractive properties of random hyperplane hashing over the conventional baseline representation of bag of words with and without feature selection.

web intelligence | 2008

Leveraging Web 2.0 Sources for Web Content Classification

Somnath Banerjee; Martin B. Scholz

This paper addresses practical aspects of Web page classification not captured by the classical text mining framework. Classifiers are supposed to perform well on a broad variety of pages. We argue that constructing training corpora is a bottleneck for building such classifiers, and that care has to be taken if the goal is to generalize to previously unseen kinds of pages on the Web. We study techniques for building training corpora automatically from publicly available Web resources, quantify the discrepancy between them, and demonstrate that encouraging agreement between classifiers given such diverse sources drastically outperforms methods that ignore the different natures of data sources on the Web.

Archive | 2008