Jason Sawin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jason Sawin is active.

Explore More

Publication

Featured researches published by Jason Sawin.

fundamental approaches to software engineering | 2005

Coverage criteria for testing of object interactions in sequence diagrams

Atanas Rountev; Scott Kagan; Jason Sawin

This work defines several control-flow coverage criteria for testing the interactions among a set of collaborating objects. The criteria are based on UML sequence diagrams that are reverse-engineered from the code under test. The sequences of messages in the diagrams are used to define the coverage goals for the family of criteria, in a manner that generalizes traditional testing techniques such as branch coverage and path coverage. We also describe a run-time analysis that gathers coverage measurements for each criterion. To compare the criteria, we propose an approach that estimates the testing effort required to satisfy each criterion, using analysis of the complexity of the underlying sequence diagrams. The criteria were investigated experimentally on a set of realistic Java components. The results of this study compare different approaches for testing of object interactions and provide insights for testers and for builders of test coverage tools.

international conference on data engineering | 2014

A tunable compression framework for bitmap indices

Gheorghi Guzun; Guadalupe Canahuate; David Chiu; Jason Sawin

Bitmap indices are widely used for large read-only repositories in data warehouses and scientific databases. Their binary representation allows for the use of bitwise operations and specialized run-length compression techniques. Due to a trade-off between compression and query efficiency, bitmap compression schemes are aligned using a fixed encoding length size (typically the word length) to avoid explicit decompression during query time. In general, smaller encoding lengths provide better compression, but require more decoding during query execution. However, when the difference in size is considerable, it is possible for smaller encodings to also provide better execution time. We posit that a tailored encoding length for each bit vector will provide better performance than a one-size-fits-all approach. We present a framework that optimizes compression and query efficiency by allowing bitmaps to be compressed using variable encoding lengths while still maintaining alignment to avoid explicit decompression. Efficient algorithms are introduced to process queries over bitmaps compressed using different encoding lengths. An input parameter controls the aggressiveness of the compression providing the user with the ability to tune the tradeoff between space and query time. Our empirical study shows this approach achieves significant improvements in terms of both query time and compression ratio for synthetic and real data sets. Compared to 32-bit WAH, VAL-WAH produces up to 1.8× smaller bitmaps and achieves query times that are 30% faster.

automated software engineering | 2009

Improving static resolution of dynamic class loading in Java using dynamically gathered environment information

Jason Sawin; Atanas Rountev

In Java software, one important flexibility mechanism is dynamic class loading. Unfortunately, the vast majority of static analyses for Java treat dynamic class loading either unsoundly or too conservatively. We present a novel semi-static approach for resolving dynamic class loading by combining static string analysis with dynamically gathered information about the execution environment. The insight behind the approach is that dynamic class loading often depends on characteristics of the environment that are encoded in various environment variables. Such variables are not static elements; however, their run-time values typically remain the same across multiple executions of the application. Thus, the string values reported by our technique are tailored to the current installation of the system under analysis. Additionally, we propose extensions of string analysis to increase the number of sites that can be resolved purely statically, and to track the names of environment variables. An experimental evaluation on the Java 1.4 standard libraries shows that a state-of-the-art purely static approach resolves only 28% of non-trivial sites, while our approach resolves 74% of such sites. We also demonstrate how the information gained from resolved dynamic class loading can be used to determine the classes that can potentially be instantiated through the use of reflection. Our extensions of string analysis greatly increase the number of resolvable reflective instantiation sites. This work is a step towards making static analysis tools better equipped to handle the dynamic features of Java.

database and expert systems applications | 2011

Variable length compression for bitmap indices

Fabian J. Corrales; David Chiu; Jason Sawin

Modern large-scale applications are generating staggering amounts of data. In an effort to summarize and index these data sets, databases often use bitmap indices. These indices have become widely adopted due to their dual properties of (1) being able to leverage fast bit-wise operations for query processing and (2) compressibility. Today, two pervasive bitmap compression schemes employ a variation of run-length encoding, aligned over bytes (BBC) and words (WAH), respectively. While BBC typically offers high compression ratios, WAH can achieve faster query processing, but often at the cost of space. Recent work has further shown that reordering the rows of a bitmap can dramatically increase compression. However, these sorted bitmaps often display patterns of changing run-lengths that are not optimal for a byte nor a word alignment. We present a general framework to facilitate a variable length compression scheme. Given a bitmap, our algorithm is able to use different encoding lengths for compression on a per-column basis. We further present an algorithm that efficiently processes queries when encoding lengths share a common integer factor. Our empirical study shows that in the best case our approach can out-compress BBC by 30% and WAH by 70%, for real data sets. Furthermore, we report a query processing speedup of 1.6× over BBC and 1.25× over WAH. We will also show that these numbers drastically improve in our synthetic, uncorrelated data sets.

international conference on software maintenance | 2007

Automated Refactoring of Legacy Java Software to Enumerated Types

Raffi Khatchadourian; Jason Sawin; Atanas Rountev

Java 1.5 introduces several new features that offer significant improvements over older Java technology. In this paper we consider the new enum construct, which provides language support for enumerated types. Prior to Java 1.5, programmers needed to employ various patterns (e.g., the weak enum pattern) to compensate for the absence of enumerated types in Java. Unfortunately, these compensation patterns lack several highly-desirable properties of the enum construct, most notably, type safety. We present a novel fully-automated approach for transforming legacy Java code to use the new enumeration construct. This semantics-preserving approach increases type safety, produces code that is easier to comprehend, removes unnecessary complexity, and eliminates brittleness problems due to separate compilation. At the core of the proposed approach is an interprocedural type inferencing algorithm which tracks the flow of enumerated values. The algorithm was implemented as an Eclipse plug-in and evaluated experimentally on 17 large Java benchmarks. Our results indicate that analysis cost is practical and the algorithm can successfully refactor a substantial number of fields to enumerated types. This work is a significant step towards providing automated tool support for migrating legacy Java software to modern Java technologies.

eclipse technology exchange | 2005

Building a whole-program type analysis in Eclipse

Mariana Sharp; Jason Sawin; Atanas Rountev

Eclipse has the potential to become a widely-used platform for implementation and dissemination of various static analyses for Java. In order to realize this potential, it is important to understand the challenges for building high-quality static analyses in Eclipse. This paper discusses some of these challenges in the context of the TACLE plug-in for whole-program type analysis and call graph construction. In particular, we argue that the treatment of the standard Java libraries should be an important concern for static analysis builders. Our experiments indicate that it may be necessary to use pre-computed summary information for the libraries, in order to improve the scalability of whole-program analyses for Eclipse. The experience described in this paper could be beneficial for static analysis researchers who use Eclipse as the infrastructure for their analysis implementations.

source code analysis and manipulation | 2007

Improved Static Resolution of Dynamic Class Loading in Java

Jason Sawin; Atanas Rountev

Modern applications are becoming increasingly more dynamic and flexible. In Java software, one important flexibility mechanism is dynamic class loading. Unfortunately, the vast majority of static analyses for Java handle this feature either unsoundly or overly conservatively. We present a set of techniques for static resolution of dynamic-class-loading sites in Java software. Previous work has used static string analysis to achieve this goal. However, a large number of such sites are impossible to resolve with purely static techniques. We present a novel semi-static approach, which combines static string analysis with dynamically gathered information about the execution environment. The key insight behind this approach is the observation that dynamic class loading often depends on characteristics of the execution environment that are encoded in various environment variables. In addition, we propose generalizations of string analysis to increase the number of sites that can be resolved purely statically, and to track the names of environment variables. We present an experimental evaluation on 10,238 classes from the standard Java libraries. Our results show that a state- of-the-art purely static approach resolves only 28% of non-trivial sites, while our approach resolves more than twice as many sites. This work is a step towards making static analysis tools better equipped to handle the dynamic features of Java.

source code analysis and manipulation | 2011

Assumption Hierarchy for a CHA Call Graph Construction Algorithm

Jason Sawin; Atanas Rountev

Method call graphs are integral components of many interprocedural static analyses which are widely used to aid in the development and maintenance of software. Unfortunately, the existences of certain dynamic features in modern programming languages, such as Java or C++, can lead to either unsoundness or imprecision in statically constructed call graphs. We investigate a hierarchy of assumptions that a Class Hierarchy Analysis (CHA) call graph construction algorithm can make about dynamic features in Java. Each successive level of the assumption hierarchy introduces new relaxations of suppositions. These relaxations allow the call graph algorithm to treat some uses of dynamic features more precisely and still remain sound. The hierarchy includes a novel assumption that dynamic features will respect encapsulation. We present an empirical study in which a unique call graph algorithm is implemented for each level of the assumption hierarchy. This study shows that assuming that dynamic features will respect encapsulation can lead to a call graph with 44% fewer edges than the fully conservative graph. By incorporating assumptions about casting operations and string values, it is possible to remain conservative and reduce the number of graph edges by 54% and graph nodes by 10% through the use of various resolution techniques. This work demonstrates that even a slight relaxation of assumptions can greatly improve the precision of a call graph. It further articulates the exact assumptions that a CHA call graph construction algorithm must make in order to use advanced resolution techniques.

source code analysis and manipulation | 2006

Estimating the Run-Time Progress of a Call Graph Construction Algorithm53-62

Jason Sawin; Atanas Rountev

This work considers static analysis algorithms that are integrated with a development environment. In this context, IDE users can benefit from continuouslyupdated information about the run-time progress of the analysis algorithm (i.e., what portion of the analysis work is completed). IDEs can provide the means to convey this information back to the user - for example, the Java IDE in Eclipse achieves this by employing GUI elements such as progress bars. Precise tracking of the run-time progress of an analysis algorithm requires a priori knowledge of the total running time of the analysis. Such knowledge is typically not available, and analysis builders need to employ various heuristics to estimate run-time progress. In this paper we describe our initial work on defining and evaluating such heuristics for a whole-program analysis in Eclipse. The analysis, based on the wellknown Rapid Type Analysis (RTA) approach, builds a call graph for a Java program, for subsequent use in various software tools. We propose multiple heuristics to estimate run-time analysis progress; these heuristics have been implemented in a publicly available Eclipse plug-in. We report the results of evaluating each heuristic on an large set of Java programs.

international database engineering and applications symposium | 2014

Optimizing query execution for variable-aligned length compression of bitmap indices

Ryan Slechta; Jason Sawin; Ben McCamish; David Chiu; Guadalupe Canahuate

Indexing is a fundamental mechanism for efficient data access. Recently, we proposed the Variable-Aligned Length (VAL) bitmap index encoding framework, which generalizes the commonly used word-aligned compression techniques. VAL presented a variable-aligned compression framework, which allows columns of a bitmap to be compressed using different encoding lengths. This flexibility creates a tunable compression that balances the trade-off between space and query processing time. The variable format of VAL presents several unique opportunities for query optimization. In this paper we explore multiple algorithms to optimize both point queries and range queries in VAL. In particular, we propose a dynamic encoding-length translation heuristic to process point queries. For range queries, we propose several column orderings based on the bitmaps metadata: largest segment length first (lsf), column size (size), and weighted size (ws). In our empirical study over both real and synthetic data sets, we show that our dynamic translation selection scheme produces query execution times only 3.5% below the optimal. We also found that the weighted size column ordering significantly and consistently out-performs other ordering techniques. Finally, we show that algorithms scale to data sets that are row-ordered.

Explore More