Shubhomoy Das | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shubhomoy Das is active.

Explore More

Publication

Featured researches published by Shubhomoy Das.

knowledge discovery and data mining | 2013

Systematic construction of anomaly detection benchmarks from real data

Andrew Emmott; Shubhomoy Das; Thomas G. Dietterich; Alan Fern; Weng-Keen Wong

Research in anomaly detection suffers from a lack of realistic and publicly-available problem sets. This paper discusses what properties such problem sets should possess. It then introduces a methodology for transforming existing classification data sets into ground-truthed benchmark data sets for anomaly detection. The methodology produces data sets that vary along three important dimensions: (a) point difficulty, (b) relative frequency of anomalies, and (c) clusteredness. We apply our generated datasets to benchmark several popular anomaly detection algorithms under a range of different conditions.

IEEE Transactions on Software Engineering | 2014

You Are the Only Possible Oracle: Effective Test Selection for End Users of Interactive Machine Learning Systems

Alex Groce; Todd Kulesza; Chaoqiang Zhang; Shalini Shamasunder; Margaret M. Burnett; Weng-Keen Wong; Simone Stumpf; Shubhomoy Das; Amber Shinsel; Forrest Bice; Kevin McIntosh

How do you test a program when only a single user, with no expertise in software testing, is able to determine if the program is performing correctly? Such programs are common today in the form of machine-learned classifiers. We consider the problem of testing this common kind of machine-generated program when the only oracle is an end user: e.g., only you can determine if your email is properly filed. We present test selection methods that provide very good failure rates even for small test suites, and show that these methods work in both large-scale random experiments using a “gold standard” and in studies with real users. Our methods are inexpensive and largely algorithm-independent. Key to our methods is an exploitation of properties of classifiers that is not possible in traditional software testing. Our results suggest that it is plausible for time-pressured end users to interactively detect failures-even very hard-to-find failures-without wading through a large number of successful (and thus less useful) tests. We additionally show that some methods are able to find the arguably most difficult-to-detect faults of classifiers: cases where machine learning algorithms have high confidence in an incorrect result.

Artificial Intelligence | 2013

End-user feature labeling: Supervised and semi-supervised approaches based on locally-weighted logistic regression

Shubhomoy Das; Travis Moore; Weng-Keen Wong; Simone Stumpf; Ian Oberst; Kevin McIntosh; Margaret M. Burnett

When intelligent interfaces, such as intelligent desktop assistants, email classifiers, and recommender systems, customize themselves to a particular end user, such customizations can decrease productivity and increase frustration due to inaccurate predictions-especially in early stages when training data is limited. The end user can improve the learning algorithm by tediously labeling a substantial amount of additional training data, but this takes time and is too ad hoc to target a particular area of inaccuracy. To solve this problem, we propose new supervised and semi-supervised learning algorithms based on locally-weighted logistic regression for feature labeling by end users, enabling them to point out which features are important for a class, rather than provide new training instances. We first evaluate our algorithms against other feature labeling algorithms under idealized conditions using feature labels generated by an oracle. In addition, another of our contributions is an evaluation of feature labeling algorithms under real-world conditions using feature labels harvested from actual end users in our user study. Our user study is the first statistical user study for feature labeling involving a large number of end users (43 participants), all of whom have no background in machine learning. Our supervised and semi-supervised algorithms were among the best performers when compared to other feature labeling algorithms in the idealized setting and they are also robust to poor quality feature labels provided by ordinary end users in our study. We also perform an analysis to investigate the relative gains of incorporating the different sources of knowledge available in the labeled training set, the feature labels and the unlabeled data. Together, our results strongly suggest that feature labeling by end users is both viable and effective for allowing end users to improve the learning algorithm behind their customized applications.

international conference on data mining | 2016

Incorporating Expert Feedback into Active Anomaly Discovery

Shubhomoy Das; Weng-Keen Wong; Thomas G. Dietterich; Alan Fern; Andrew Emmott

Unsupervised anomaly detection algorithms search for outliers and then predict that these outliers are the anomalies. When deployed, however, these algorithms are often criticized for high false positive and high false negative rates. One cause of poor performance is that not all outliers are anomalies and not all anomalies are outliers. In this paper, we describe an Active Anomaly Discovery (AAD) method for incorporating expert feedback to adjust the anomaly detector so that the outliers it discovers are more in tune with the expert users semantic understanding of the anomalies. The AAD approach is designed to operate in an interactive data exploration loop. In each iteration of this loop, our algorithm first selects a data instance to present to the expert as a potential anomaly and then the expert labels the instance as an anomaly or as a nominal data point. Our algorithm updates its internal model with the instance label and the loop continues until a budget of B queries is spent. The goal of our approach is to maximize the total number of true anomalies in the B instances presented to the expert. We show that when compared to other state-of-the-art algorithms, AAD is consistently one of the best performers.

knowledge discovery and data mining | 2013

Detecting insider threats in a real corporate database of computer usage activity

Ted E. Senator; Henry G. Goldberg; Alex Memory; William T. Young; Brad Rees; Robert Pierce; Daniel Huang; Matthew Reardon; David A. Bader; Edmond Chow; Irfan A. Essa; Joshua Jones; Vinay Bettadapura; Duen Horng Chau; Oded Green; Oguz Kaya; Anita Zakrzewska; Erica Briscoe; Rudolph L. Mappus; Robert McColl; Lora Weiss; Thomas G. Dietterich; Alan Fern; Weng-Keen Wong; Shubhomoy Das; Andrew Emmott; Jed Irvine; Jay Yoon Lee; Danai Koutra; Christos Faloutsos

intelligent user interfaces | 2011