Daniel Joseph Dean | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel Joseph Dean is active.

Explore More

Publication

Featured researches published by Daniel Joseph Dean.

international conference on autonomic computing | 2012

UBL: unsupervised behavior learning for predicting performance anomalies in virtualized cloud systems

Daniel Joseph Dean; Hiep Nguyen; Xiaohui Gu

Infrastructure-as-a-Service (IaaS) clouds are prone to performance anomalies due to their complex nature. Although previous work has shown the effectiveness of using statistical learning to detect performance anomalies, existing schemes often assume labelled training data, which requires significant human effort and can only handle previously known anomalies. We present an Unsupervised Behavior Learning (UBL) system for IaaS cloud computing infrastructures. UBL leverages Self-Organizing Maps to capture emergent system behaviors and predict unknown anomalies. For scalability, UBL uses residual resources in the cloud infrastructure for behavior learning and anomaly prediction with little add-on cost. We have implemented a prototype of the UBL system on top of the Xen platform and conducted extensive experiments using a range of distributed systems. Our results show that UBL can predict performance anomalies with high accuracy and achieve sufficient lead time for automatic anomaly prevention. UBL supports large-scale infrastructure-wide behavior learning with negligible overhead.

symposium on cloud computing | 2014

PerfScope: Practical Online Server Performance Bug Inference in Production Cloud Computing Infrastructures

Daniel Joseph Dean; Hiep Nguyen; Xiaohui Gu; Hui Zhang; Junghwan Rhee; Nipun Arora; Geoff Jiang

Performance bugs which manifest in a production cloud computing infrastructure are notoriously difficult to diagnose because of both the difficulty of reproducing those bugs and the lack of debugging information. In this paper, we present PerfScope, a practical online performance bug inference tool to help the developer understand how a performance bug happened during the production run. PerfScope achieves online bug inference to obviate the need for offline bug reproduction. PerfScope does not require application source code or any runtime instrumentation to the production system. PerfScope is application-agnostic, which can support both interpreted and compiled programs running inside a cloud infrastructure. We have implemented PerfScope and tested it using real performance bugs on seven popular open source server systems (Hadoop, HDFS, Cassandra, Tomcat, Apache, Lighttpd, MySQL). The results show that PerfScope can narrow down the search scope of the bug-related functions to a small percentage (0.03-2.3%) and rank the real bug-related functions within top five candidates in the majority of cases. PerfScope only imposes on average 1.8% runtime overhead to the tested server applications.

conference on data and application security and privacy | 2014

PREC: practical root exploit containment for android devices

Tsung-Hsuan Ho; Daniel Joseph Dean; Xiaohui Gu; William Enck

Application markets such as the Google Play Store and the Apple App Store have become the de facto method of distributing software to mobile devices. While official markets dedicate significant resources to detecting malware, state-of-the-art malware detection can be easily circumvented using logic bombs or checks for an emulated environment. We present a Practical Root Exploit Containment (PREC) framework that protects users from such conditional malicious behavior. PREC can dynamically identify system calls from high-risk components (e.g., third-party native libraries) and execute those system calls within isolated threads. Hence, PREC can detect and stop root exploits with high accuracy while imposing low interference to benign applications. We have implemented PREC and evaluated our methodology on 140 most popular benign applications and 10 root exploit malicious applications. Our results show that PREC can successfully detect and stop all the tested malware while reducing the false alarm rates by more than one order of magnitude over traditional malware detection algorithms. PREC is light-weight, which makes it practical for runtime on-device root exploit detection and containment.

IEEE Transactions on Parallel and Distributed Systems | 2014

Scalable Distributed Service Integrity Attestation for Software-as-a-Service Clouds

Juan Du; Daniel Joseph Dean; Yongmin Tan; Xiaohui Gu; Ting Yu

Software-as-a-service (SaaS) cloud systems enable application service providers to deliver their applications via massive cloud computing infrastructures. However, due to their sharing nature, SaaS clouds are vulnerable to malicious attacks. In this paper, we present IntTest, a scalable and effective service integrity attestation framework for SaaS clouds. IntTest provides a novel integrated attestation graph analysis scheme that can provide stronger attacker pinpointing power than previous schemes. Moreover, IntTest can automatically enhance result quality by replacing bad results produced by malicious attackers with good results produced by benign service providers. We have implemented a prototype of the IntTest system and tested it on a production cloud computing infrastructure using IBM System S stream processing applications. Our experimental results show that IntTest can achieve higher attacker pinpointing accuracy than existing approaches. IntTest does not require any special hardware or secure kernel support and imposes little performance impact to the application, which makes it practical for large-scale cloud systems.

IEEE Transactions on Parallel and Distributed Systems | 2016

PerfCompass: Online Performance Anomaly Fault Localization and Inference in Infrastructure-as-a-Service Clouds

Daniel Joseph Dean; Hiep Nguyen; Peipei Wang; Xiaohui Gu; Anca Sailer; Andrzej Kochut

Infrastructure-as-a-service clouds are becoming widely adopted. However, resource sharing and multi-tenancy have made performance anomalies a top concern for users. Timely debugging those anomalies is paramount for minimizing the performance penalty for users. Unfortunately, this debugging often takes a long time due to the inherent complexity and sharing nature of cloud infrastructures. When an application experiences a performance anomaly, it is important to distinguish between faults with a global impact and faults with a local impact as the diagnosis and recovery steps forfaults with a global impact or local impact are quite different. In this paper, we present PerfCompass, an online performance anomaly fault debugging tool that can quantify whether a production-run performance anomaly has a global impact or local impact. PerfCompass can use this information to suggest the root cause as either an external fault (e.g., environment-based) or an internal fault (e.g., software bugs). Furthermore, PerfCompass can identify top affected system calls to provide useful diagnostic hints for detailed performance debugging. PerfCompass does not require source code or runtime application instrumentation, which makes it practical for production systems. We have tested PerfCompass by running five common open source systems (e.g., Apache, MySQL, Tomcat, Hadoop, Cassandra) inside a virtualized cloud testbed. Our experiments use a range of common infrastructure sharing issues and real software bugs. The results show that PerfCompass accurately classifies 23 out of the 24 tested cases without calibration and achieves 100 percent accuracy with calibration. PerfCompass provides useful diagnosis hints within several minutes and imposes negligible runtime overhead to the production system during normal execution time.

international conference on autonomic computing | 2015

Automatic Server Hang Bug Diagnosis: Feasible Reality or Pipe Dream?

Daniel Joseph Dean; Peipei Wang; Xiaohui Gu; William Enck; Guoliang Jin

It is notoriously difficult to diagnose server hang bugs as they often generate little diagnostic information and are difficult to reproduce offline. In this paper, we present a characteristic study of 177 real software hang bugs from 8 common open source server systems (i.e., Apache, Lighttpd, My SQL, Squid, HDFS, Hadoop Mapreduce, Tomcat, Cassandra). We identify three major root cause categories (i.e., Programmer errors, mishandled values, concurrency issues). We then describe two major problems (i.e., False positives and false negatives) while applying existing rule-based bug detection techniques to those bugs.

ieee international conference on cloud engineering | 2015

Understanding Real World Data Corruptions in Cloud Systems

Peipei Wang; Daniel Joseph Dean; Xiaohui Gu

Big data processing is one of the killer applications for cloud systems. MapReduce systems such as Hadoop are the most popular big data processing platforms used in the cloud system. Data corruption is one of the most critical problems in cloud data processing, which not only has serious impact on the integrity of individual application results but also affects the performance and availability of the whole data processing system. In this paper, we present a comprehensive study on 138 real world data corruption incidents reported in Hadoop bug repositories. We characterize those data corruption problems in four aspects: 1) what impact can data corruption have on the application and system? 2) how is data corruption detected? 3) what are the causes of the data corruption? and 4) what problems can occur while attempting to handle data corruption? Our study has made the following findings: 1) the impact of data corruption is not limited to data integrity, 2) existing data corruption detection schemes are quite insufficient: only 25% of data corruption problems are correctly reported, 42% are silent data corruption without any error message, and 21% receive imprecise error report. We also found the detection system raised 12% false alarms, 3) there are various causes of data corruption such as improper runtime checking, race conditions, inconsistent block states, improper network failure handling, and improper node crash handling, and 4) existing data corruption handling mechanisms (i.e., data replication, replica deletion, simple re-execution) make frequent mistakes including replicating corrupted data blocks, deleting uncorrupted data blocks, or causing undesirable resource hogging.

world congress on services | 2017

Engineering Scalable, Secure, Multi-Tenant Cloud for Healthcare Data

Daniel Joseph Dean; Rohit Ranchal; Yu Gu; Anca Sailer; Shakil Khan; Kirk A. Beaty; Senthil Bakthavachalam; Yichong Yu; Yaoping Ruan; Paul Bastide

Cloud-based analytics allow for inexpensive processing of large amount of data. However, processing protected health information (PHI) in cloud is a challenging task due to strict regulations (e.g., HIPAA) requiring features (e.g., dataisolation) which most cloud-based platforms do not currentlysupport in their offerings. This makes it difficult to leveragemany technologies well suited to the cloud (e.g., Apache Spark)to process PHI. To address this issue, we have developedthe Watson Health Cloud (WHC), a cloud-based platformfor the storage and analysis of large amount of PHI. TheWHC enables all the features necessary to store and processPHI, with little customization needed by the end-user. Thispaper describes the lessons learned from developing a cloudplatform for PHI. Specifically, we discuss the architecture andimplementation challenges we faced throughout development. We hope the insights gained from our experiences help otherswhen designing frameworks and applications which processPHI.

symposium on cloud computing | 2017

Hytrace: a hybrid approach to performance bug diagnosis in production cloud infrastructures

Ting Dai; Daniel Joseph Dean; Peipei Wang; Xiaohui Gu; Shan Lu

Server applications running inside production cloud infrastructures are prone to various performance problems (e.g., software hang, performance slowdown). When those problems occur, developers often have little clue to diagnose those problems. In this paper, we present Hytrace, a novel hybrid approach to diagnosing performance problems in production cloud infrastructures. Hytrace combines rule-based static analysis and runtime inference techniques to achieve higher bug localization accuracy than pure-static and pure-dynamic approaches for performance bugs. Hytrace does not require source code and can be applied to both compiled and interpreted programs such as C/C++ and Java. We conduct experiments using real performance bugs from seven commonly used server applications in production cloud infrastructures. The results show that our approach can significantly improve the performance bug diagnosis accuracy compared to existing diagnosis techniques.

ieee international conference on cloud engineering | 2017

Agile Composition of Compliant Data Analytics Platforms

Michael Le; K. R. Jayaram; Yaron Weinsberg; Daniel Joseph Dean; Shu Tao

Sensitive data such as health records and financialtransactions are increasingly being stored and processed inthe cloud. Correspondingly, laws and regulations have beenestablished to protect such data. For a cloud-based analyticsservice provider, it is of paramount importance to protect thesensitive information contained in customer data, while runninganalytics on it. While there exist a plethora of technologies tosafeguard data, regulatory rules are not always defined in cleartechnical terms, and different regulations may impose different(or sometimes conflicting) rules on the analytics platform. Therefore, it remains a challenge in developing a platform that cansupport various security and compliance-enabling mechanisms, in a agile fashion, to reduce maintenance effort as well asimproving scalability and performance. To address this challenge, we introduce the design and implementationof a cloud-based middleware platform that supportson-demand composition and configuration of security mechanismsto ease regulatory compliance enablement. We discussat length our experiences and lessons learned from using ourplatform to deploy secure analytics systems at IBM and highlightthe benefits of our approach by discussing the performanceimpact and trade-offs of different security mechanisms withrespect to regulatory compliance.

Explore More