Moises Goldszmidt | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Moises Goldszmidt is active.

Explore More

Publication

Featured researches published by Moises Goldszmidt.

dependable systems and networks | 2005

Ensembles of models for automated diagnosis of system performance problems

Steve Zhang; Ira Cohen; Moises Goldszmidt; Julie Symons; Armando Fox

Violations of service level objectives (SLO) in Internet services are urgent conditions requiring immediate attention. Previously we explored (I. Cohen et al., 2004) an approach for identifying which low-level system properties were correlated to high-level SLO violations (the metric attribution problem). The approach is based on automatically inducing models from data using pattern recognition and probability modeling techniques. In this paper we extend our approach to adapt to changing workloads and external disturbances by maintaining an ensemble of probabilistic models, adding new models when existing ones do not accurately capture current system behavior. Using realistic workloads on an implemented prototype system, we show that the ensemble of models captures the performance behavior of the system accurately under changing workloads and conditions. We fuse information from the models in the ensemble to identify likely causes of the performance problem, with results comparable to those produced by an oracle that continuously changes the model based on advance knowledge of the workload. The cost of inducing new models and managing the ensembles is negligible, making our approach both immediately practical and theoretically appealing.

european conference on principles of data mining and knowledge discovery | 2004

Properties and benefits of calibrated classifiers

Ira Cohen; Moises Goldszmidt

A calibrated classifier provides reliable estimates of the true probability that each test sample is a member of the class of interest. This is crucial in decision making tasks. Procedures for calibration have already been studied in weather forecasting, game theory, and more recently in machine learning, with the latter showing empirically that calibration of classifiers helps not only in decision making, but also improves classification accuracy. In this paper we extend the theoretical foundation of these empirical observations. We prove that (1) a well calibrated classifier provides bounds on the Bayes error (2) calibrating a classifier is guaranteed not to decrease classification accuracy, and (3) the procedure of calibration provides the threshold or thresholds on the decision rule that minimize the classification error. We also draw the parallels and differences between methods that use receiver operating characteristic (ROC) curves and calibration based procedures that are aimed at findig a threshold of minimum error. In particular, calibration leads to improved performance when multiple thresholds exist.

knowledge discovery and data mining | 2005

Short term performance forecasting in enterprise systems

Rob Powers; Moises Goldszmidt; Ira Cohen

We use data mining and machine learning techniques to predict upcoming periods of high utilization or poor performance in enterprise systems. The abundant data available and complexity of these systems defies human characterization or static models and makes the task suitable for data mining techniques. We formulate the problem as one of classification: given current and past information about the systems behavior, can we forecast whether the system will meet its performance targets over the next hour? Using real data gathered from several enterprise systems in Hewlett-Packard, we compare several approaches ranging from time series to Bayesian networks. Besides establishing the predictive power of these approaches our study analyzes three dimensions that are important for their application as a stand alone tool. First, it quantifies the gain in accuracy of multivariate prediction methods over simple statistical univariate methods. Second, it quantifies the variations in accuracy when using different classes of system and workload features. Third, it establishes that models induced using combined data from various systems generalize well and are applicable to new systems, enabling accurate predictions on systems with insufficient historical data. Together this analysis offers a promising outlook on the development of tools to automate assignment of resources to stabilize performance, (e.g., adding servers to a cluster) and allow opportunistic job scheduling (e.g., backups or virus scans).

operating systems design and implementation | 2004