Mark Brodie | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mark Brodie is active.

Explore More

Publication

Featured researches published by Mark Brodie.

network operations and management symposium | 2004

Real-time problem determination in distributed systems using active probing

Irina Rish; Mark Brodie; Natalia Odintsova; Sheng Ma; Genady Grabarnik

We describe algorithms and an architecture for a real-time problem determination system that uses online selection of most-informative measurements - the approach called herein active probing. Probes are end-to-end test transactions which gather information about system components. Active probing allows probes to be selected and sent on-demand, in response to ones belief about the state of the system. At each step the most informative next probe is computed and sent. As probe results are received, belief about the system state is updated using probabilistic inference. This process continues until the problem is diagnosed. We demonstrate through both analysis and simulation that the active probing scheme greatly reduces both the number of probes and the time needed for localizing the problem when compared with non-active probing schemes.

distributed systems: operations and management | 2001

Optimizing Probe Selection for Fault Localization

Mark Brodie; Irina Rish; Sheng Ma

We investigate the use of probing technology for the purpose of problem determination and fault localization in networks. We present a framework for addressing this issue and implement algorithms that exploit interactions between probe paths to find a small collection of probes that can be used to locate faults. Small probe sets are desirable in order to minimize the costs imposed by probing, such as additional network load and data management requirements. Our results show that although finding the optimal collection of probes is expensive for large networks, efficient approximation algorithms can be used to find a nearly-optimal set.

Journal of Network and Systems Management | 2005

Automated Problem Determination Using Call-Stack Matching

Mark Brodie; Sheng Ma; Leonid Rachevsky; Jon Champlin

We present an architecture and algorithms for performing automated software problem determination using call-stack matching. In an environment where software is used by a large user community, the same problem may re-occur many times. We show that this can be detected by matching the program call-stack against a historical database of call-stacks, so that as soon as the problem has been resolved once, future cases of the same or similar problems can be automatically resolved. This would greatly reduce the number of cases that need to be dealt with by human support analysts. We also show how a call-stack matching algorithm can be automatically learned from a small sample of call-stacks labeled by human analysts, and examine the performance of this learning algorithm on two different data sets.

international conference on data mining | 2006

TOP-COP: Mining TOP-K Strongly Correlated Pairs in Large Databases

Hui Xiong; Mark Brodie; Sheng Ma

Recently, there has been considerable interest in computing strongly correlated pairs in large databases. Most previous studies require the specification of a minimum correlation threshold to perform the computation. However, it may be difficult for users to provide an appropriate threshold in practice, since different data sets typically have different characteristics. To this end, we propose an alternative task: mining the top-k strongly correlated pairs. In this paper, we identify a 2-D monotone property of an upper bound of Pearsons correlation coefficient and develop an efficient algorithm, called TOP-COP to exploit this property to effectively prune many pairs even without computing their correlation coefficients. Our experimental results show that the TOP-COP algorithm can be orders of magnitude faster than brute-force alternatives for mining the top-k strongly correlated pairs.

Informs Journal on Computing | 2008

Top-k ϕ Correlation Computation

Hui Xiong; Wenjun Zhou; Mark Brodie; Sheng Ma

Recently, there has been considerable interest in efficiently computing strongly correlated pairs in large databases. Most previous studies require the specification of a minimum correlation threshold to perform the computation. However, it may be difficult for users to provide an appropriate threshold in practice because different data sets typically have different characteristics. To this end, in this paper, we propose an alternative task: finding the top-k strongly correlated pairs. Consequently, we identify a two-dimensional monotone property of an upper bound of ϕ correlation coefficient and develop an efficient algorithm, called TOP-COP, to exploit this property to effectively prune many pairs even without computing their correlation coefficients. Our experimental results show that TOP-COP can be an order of magnitude faster than alternative approaches for mining the top-k strongly correlated pairs. Finally, we show that the performance of the TOP-COP algorithm is tightly related to the degree of da...

integrated network management | 2005

Test-based diagnosis: tree and matrix representations

Alina Beygelzimer; Mark Brodie; Sheng Ma; Irina Rish

A common problem encountered in many application scenarios is how to represent some prior knowledge about a system in order to determine its true state as efficiently as possible. The information is typically in the form of tests, or questions about the system. Each test can potentially reduce our uncertainty about the systems state. The problem is to represent the information capturing the dependence between tests, their outcomes, and possible states in an efficiently navigable way to aid diagnosis. The most common such representation is a flowchart with leaf nodes corresponding to possible states, and non-leaf nodes corresponding to tests about the state. The problem with flowcharts is that they are notoriously difficult to maintain. Additional knowledge often has to be manually integrated as the system changes, making it impossible to keep track of all possible decision paths, let alone optimize the flow to maximize performance. We propose an efficient method for optimizing an existing flowchart based on a conversion to an auxiliary matrix representation. The main goal of the paper is show a synergy between the two representations in the hope that this will help practitioners choose a better strategy for their applications. We show that such a conversion suggests ways to improve both representations - ways that were not envisioned when using each representation alone. Finally, we show that the two representations are informationally equivalent in the sense that one can be transformed into the other so that if both are used as black-boxes, one would not be able to tell them apart, regardless of which state the system is in.

Machine Learning | 2001

Iterated Phantom Induction: A Knowledge-Based Approach to Learning Control

Mark Brodie; Gerald DeJong

We advance a knowledge-based learning method that allows prior domain knowledge to be effectively utilized by machine learning systems. The domain knowledge is incorporated not into the learning algorithm itself but instead affects only the training data. The domain knowledge is used to explain and then transform the actual training examples into a more informative set of imaginary, or “phantom” examples. These phantom examples are added to the training set; the experienced examples are discarded. A new control policy is induced from the phantom training set. This policy is then exercised, yielding additional training points, and the process repeats.We investigate the performance of this method in a stylized air-hockey domain which demands a difficult nonlinear control policy. Our experiments show that, surprisingly, an accurate policy can be learned even if the domain theory is only imprecise and approximate. We advance an interpretation which indicates that the information available from a plausible qualitative domain theory is sufficient for robust successful learning. This interpretation is used to make a number of predictions which are tested in subsequent experiments. The outcomes confirm the interpretation and the robustness of the approach.

european conference on machine learning | 2001

A Unified Framework for Evaluation Metrics in Classification Using Decision Trees

Ricardo Vilalta; Mark Brodie; Daniel A. Oblinger; Irina Rish

Most evaluation metrics in classification are designed to reward class uniformity in the example subsets induced by a feature (e.g., Information Gain). Other metrics are designed to reward discrimination power in the context of feature selection as a means to combat the feature-interaction problem (e.g., Relief, Contextual Merit). We define a new framework that combines the strengths of both kinds of metrics. Our framework enriches the available information when considering which feature to use to partition the training set. Since most metrics rely on only a small fraction of this information, this framework enlarges the space of possible metrics. Experiments on real-world domains in the context of decision-tree learning show how a simple setting for our framework compares well with standard metrics.

Ibm Systems Journal | 2002

Cross training and its application to skill mining

D. A. Oblinger; M. Reid; Mark Brodie; R. de Salvo Braz

We present an approach for cataloging an organizations skill assets based on electronic communications. Our approach trains classifiers using messages from skill-related discussion groups and then applies those classifiers to a different distribution of person-related e-mail messages. We present a general framework, called cross training, for addressing such discrepancies between the training and test distributions. We outline two instances of the general cross-training problem, develop algorithms for each, and empirically demonstrate the efficacy of our solution in the skill-mining context.

IEEE Transactions on Neural Networks | 2005