Carla E. Brodley | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Carla E. Brodley is active.

Explore More

Publication

Featured researches published by Carla E. Brodley.

Remote Sensing of Environment | 1997

Decision tree classification of land cover from remotely sensed data

Mark A. Friedl; Carla E. Brodley

Decision tree classification algorithms have significant potential for land cover mapping problems and have not been tested in detail by the remote sensing community relative to more conventional pattern recognition techniques such as maximum likelihood classification. In this paper, we present several types of decision tree classification algorithms arid evaluate them on three different remote sensing data sets. The decision tree classification algorithms tested include an univariate decision tree, a multivariate decision tree, and a hybrid decision tree capable of including several different types of classification algorithms within a single decision tree structure. Classification accuracies produced by each of these decision tree algorithms are compared with both maximum likelihood and linear discriminant function classifiers. Results from this analysis show that the decision tree algorithms consistently outperform the maximum likelihood and linear discriminant function classifiers in regard to classf — cation accuracy. In particular, the hybrid tree consistently produced the highest classification accuracies for the data sets tested. More generally, the results from this work show that decision trees have several advantages for remote sensing applications by virtue of their relatively simple, explicit, and intuitive classification structure. Further, decision tree algorithms are strictly nonparametric and, therefore, make no assumptions regarding the distribution of input data, and are flexible and robust with respect to nonlinear and noisy relations among input features and class labels.

Journal of Artificial Intelligence Research | 1999

Identifying mislabeled training data

Carla E. Brodley; Mark A. Friedl

This paper presents a new approach to identifying and eliminating mislabeled training instances for supervised learning. The goal of this approach is to improve classification accuracies produced by learning algorithms by improving the quality of the training data. Our approach uses a set of learning algorithms to create classifiers that serve as noise filters for the training data. We evaluate single algorithm, majority vote and consensus filters on five datasets that are prone to labeling errors. Our experiments illustrate that filtering significantly improves classification accuracy for noise levels up to 30%. An analytical and empirical evaluation of the precision of our approach shows that consensus filters are conservative at throwing away good data at the expense of retaining bad data and that majority filters are better at detecting bad data at the expense of throwing away good data. This suggests that for situations in which there is a paucity of data, consensus filters are preferable, whereas majority vote filters are preferable for situations with an abundance of data.

international conference on machine learning | 2004

Solving cluster ensemble problems by bipartite graph partitioning

Xiaoli Z. Fern; Carla E. Brodley

A critical problem in cluster ensemble research is how to combine multiple clusterings to yield a final superior clustering result. Leveraging advanced graph partitioning techniques, we solve this problem by reducing it to a graph partitioning problem. We introduce a new reduction method that constructs a bipartite graph from a given cluster ensemble. The resulting graph models both instances and clusters of the ensemble simultaneously as vertices in the graph. Our approach retains all of the information provided by a given ensemble, allowing the similarity among instances and the similarity among clusters to be considered collectively in forming the final clustering. Further, the resulting graph partitioning problem can be solved efficiently. We empirically evaluate the proposed approach against two commonly used graph formulations and show that it is more robust and achieves comparable or better performance in comparison to its competitors.

ACM Transactions on Information and System Security | 1999

Temporal sequence learning and data reduction for anomaly detection

Terran Lane; Carla E. Brodley

The anomaly-detection problem can be formulated as one of learning to characterize the behaviors of an individual, system, or network in terms of temporal sequences of discrete data. We present an approach on the basis of instance-based learning (IBL) techniques. To cast the anomaly-detection task in an IBL framework, we employ an approach that transforms temporal sequences of discrete, unordered observations into a metric space via a similarity measure that encodes intra-attribute dependencies. Classification boundaries are selected from an a posteriori characterization of valid user behaviors, coupled with a domain heuristic. An empirical evaluation of the approach on user command data demonstrates that we can accurately differentiate the profiled user from alternative users when the available features encode sufficient information. Furthermore, we demonstrate that the system detects anomalous conditions quickly — an important quality for reducing potential damage by a malicious user. We present several techniques for reducing data storage requirements of the user profile, including instance-selection methods and clustering. As empirical evaluation shows that a new greedy clustering algorithm reduces the size of the user model by 70%, with only a small loss in accuracy.

Computer Vision and Image Understanding | 1999

ASSERT: a physician-in-the-loop content-based retrieval system for HRCT image databases

Chi-Ren Shyu; Carla E. Brodley; Avinash C. Kak; Akio Kosaka; Alex M. Aisen; Lynn S. Broderick

It is now recognized in many domains that content-based image retrieval from a database of images cannot be carried out by using completely automated approaches. One such domain is medical radiology for which the clinically useful information in an image typically consists of gray level variations in highly localized regions of the image. Currently, it is not possible to extract these regions by automatic image segmentation techniques. To address this problem, we have implemented a human-in-the-loop (a physician-in-the-loop, more specifically) approach in which the human delineates the pathology bearing regions (PBR) and a set of anatomical landmarks in the image when the image is entered into the database. To the regions thus marked, our approach applies low-level computer vision and image processing algorithms to extract attributes related to the variations in gray scale, texture, shape, etc. In addition, the system records attributes that capture relational information such as the position of a PBR with respect to certain anatomical landmarks. An overall multidimensional index is assigned to each image based on these attribute values.

computer and communications security | 2004

IP covert timing channels: design and detection

Serdar Cabuk; Carla E. Brodley; Clay Shields

A network covert channel is a mechanism that can be used to leak information across a network in violation of a security policy and in a manner that can be difficult to detect. In this paper, we describe our implementation of a covert network timing channel, discuss the subtle issues that arose in its design, and present performance data for the channel. We then use our implementation as the basis for our experiments in its detection. We show that the regularity of a timing channel can be used to differentiate it from other traffic and present two methods of doing so and measures of their efficiency. We also investigate mechanisms that attackers might use to disrupt the regularity of the timing channel, and demonstrate methods of detection that are effective against them.

european conference on machine learning | 1998

Pruning decision trees with misclassification costs

Jeffrey P. Bradford; Clayton Kunz; Ron Kohavi; Clifford Brunk; Carla E. Brodley

We describe an experimental study of pruning methods for decision tree classifiers when the goal is minimizing loss rather than error. In addition to two common methods for error minimization, CARTs cost-complexity pruning and C4.5s error-based pruning, we study the extension of cost-complexity pruning to loss and one pruning variant based on the Laplace correction. We perform an empirical comparison of these methods and evaluate them with respect to loss. We found that applying the Laplace correction to estimate the probability distributions at the leaves was beneficial to all pruning methods. Unlike in error minimization, and somewhat surprisingly, performing no pruning led to results that were on par with other methods in terms of the evaluation criteria. The main advantage of pruning was in the reduction of the decision tree size, sometimes by a factor of ten. While no method dominated others on all datasets, even for the same domain different pruning mechanisms are better for different loss matrices.

high performance distributed computing | 1999

Predictive application-performance modeling in a computational grid environment

Nirav H. Kapadia; José A. B. Fortes; Carla E. Brodley

This paper describes and evaluates the application of three local learning algorithms-nearest-neighbor, weighted-average, and locally-weighted polynomial regression-for the prediction of run-specific resource-usage on the basis of run-time input parameters supplied to tools. A two-level knowledge base allows the learning algorithms to track short-term fluctuations in the performances of computing systems, and the use of instance editing techniques improves the scalability of the performance-modeling system. The learning algorithms assist PUNCH, a network-computing system at Purdue University, in emulating an ideal user in terms of its resource management and usage policies.

Machine Learning | 1995

Recursive Automatic Bias Selection for Classifier Construction

Carla E. Brodley

The results of empirical comparisons of existing learning algorithms illustrate that each algorithm has a selective superiority; each is best for some but not all tasks. Given a data set, it is often not clear beforehand which algorithm will yield the best performance. In this article we present an approach that uses characteristics of the given data set, in the form of feedback from the learning process, to guide a search for a tree-structured hybrid classifier. Heuristic knowledge about the characteristics that indicate one bias is better than another is encoded in the rule base of the Model Class Selection (MCS) system. The approach does not assume that the entire instance space is best learned using a single representation language; for some data sets, choosing to form a hybrid classifier is a better bias, and MCS has the ability to determine these cases. The results of an empirical evaluation illustrate that MCS achieves classification accuracies equal to or higher than the best of its primitive learning components for each data set, demonstrating that the heuristic rules effectively select an appropriate learning bias.

ACM Transactions on Information and System Security | 2009

IP Covert Channel Detection

Serdar Cabuk; Carla E. Brodley; Clay Shields

A covert channel can occur when an attacker finds and exploits a shared resource that is not designed to be a communication mechanism. A network covert channel operates by altering the timing of otherwise legitimate network traffic so that the arrival times of packets encode confidential data that an attacker wants to exfiltrate from a secure area from which she has no other means of communication. In this article, we present the first public implementation of an IP covert channel, discuss the subtle issues that arose in its design, and present a discussion on its efficacy. We then show that an IP covert channel can be differentiated from legitimate channels and present new detection measures that provide detection rates over 95%. We next take the simple step an attacker would of adding noise to the channel to attempt to conceal the covert communication. For these noisy IP covert timing channels, we show that our online detection measures can fail to identify the covert channel for noise levels higher than 10%. We then provide effective offline search mechanisms that identify the noisy channels.

Explore More