Manuvir Das
Microsoft
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Manuvir Das.
international conference on software engineering | 2006
Jinlin Yang; David Evans; Deepali Bhardwaj; Thirumalesh Bhat; Manuvir Das
Dynamic inference techniques have been demonstrated to provide useful support for various software engineering tasks including bug finding, test suite evaluation and improvement, and specification generation. To date, however, dynamic inference has only been used effectively on small programs under controlled conditions. In this paper, we identify reasons why scaling dynamic inference techniques has proven difficult, and introduce solutions that enable a dynamic inference technique to scale to large programs and work effectively with the imperfect traces typically available in industrial scenarios. We describe our approximate inference algorithm, present and evaluate heuristics for winnowing the large number of inferred properties to a manageable set of interesting properties, and report on experiments using inferred properties. We evaluate our techniques on JBoss and the Windows kernel. Our tool is able to infer many of the properties checked by the Static Driver Verifier and leads us to discover a previously unknown bug in Windows.
programming language design and implementation | 2000
Manuvir Das
This paper describes a new algorithm for flow and context insensitive pointer analysis of C programs. Our studies show that the most common use of pointers in C programs is in passing the addresses of composite objects or updateable values as arguments to procedures. Therefore, we have designed a low-cost algorithm that handles this common case accurately. In terms of both precision and running time, this algorithm lies between Steensgaards algorithm, which treats assignments bi-directionally using unification, and Andersens algorithm, which treats assignments directionally using subtyping. Our “one level flow” algorithm uses a restricted form of subtyping to avoid unification of symbols at the top levels of pointer chains in the points-to graph, while using unification elsewhere in the graph. The method scales easily to large programs. For instance, we are able to analyze a 1.4 MLOC (million lines of code) program in two minutes, using less than 200MB of memory. At the same time, the precision of our algorithm is very close to that of Andersens algorithm. On all of the integer benchmark programs from SPEC95, the one level flow algorithm and Andersens algorithm produce either identical or essentially identical points-to information. Therefore, we claim that our algorithm provides a method for obtaining precise flow-insensitive points-to information for large C programs.
foundations of software engineering | 1997
Thomas W. Reps; Thomas Ball; Manuvir Das; James R. Larus
This paper describes new techniques to help with testing and debugging, using information obtained from path profiling. A path profiler instruments a program so that the number of times each different loop-free path executes is accumulated during an execution run. With such an instrumented program, each run of the program generates a path spectrum for the execution—a distribution of the paths that were executed during that run. A path spectrum is a finite, easily obtainable characterization of a programs execution on a dataset, and provides a behavior signature for a run of the program.
programming language design and implementation | 2000
Manuel Fähndrich; Jakob Rehof; Manuvir Das
This paper shows that a type graph (obtained via polymorphic typeinference) harbors explicit directional flow paths between functions. These flow paths arise from the instantiations of polymorphic types and correspond to call-return sequences in first-order programs. We show that flow information can be computed efficiently while considering only paths with well matched call-return sequences, even in the higher-order case. Furthermore, we present a practical algorithm for inferring type instantiation graphs and provide empirical evidence to the scalability of the presented techniques by applying them in the context of points-to analysis for C programs.
international conference on software engineering | 2006
Brian Hackett; Manuvir Das; Daniel Wang; Zhe Yang
We describe an ongoing project, the deployment of a modular checker to statically find and prevent every buffer overflow in future versions of a Microsoft product. Lightweight annotations specify requirements for safely using each buffer, and functions are checked individually to ensure they obey these requirements and do not overflow. Our focus is on the incremental deployment of this technology: by layering the annotation language, using aggressive inference techniques, and slicing warnings by checker confidence, teams must pay only part of the cost of annotating a program to achieve part of the benefit, which provides incentive for further annotation. To date over 400,000 annotations have been added to specify buffer usage in the source code for this product, of which over 150,000 were automatically inferred, and over 3,000 potential buffer overflows have been found and fixed.
IEEE Software | 2004
James R. Larus; Thomas Ball; Manuvir Das; Robert DeLine; Manuel Fähndrich; Jon Pincus; Sriram K. Rajamani; Ramanathan Venkatapathy
What tools do we use to develop and debug software? Most of us rely on a full-screen editor to write code, a compiler to translate it, a source-level debugger to correct it, and a source-code control system to archive and share it. These tools originated in the 1970s, when the change from batch to interactive programming stimulated the development of innovative languages, tools, environments, and other utilities we take for granted. Microsoft Research has developed two generations of tools, some of which Microsoft developers already use to find and correct bugs. These correctness tools can improve software development by systematically detecting programming errors.
static analysis symposium | 2001
Manuvir Das; Ben Liblit; Manuel Fähndrich; Jakob Rehof
This paper addresses the following question: Do scalable control-flow-insensitive pointer analyses provide the level of precision required to make them useful in compiler optimizations? We first describe alias frequency, a metric that measures the ability of a pointer analysis to determine that pairs of memory accesses in C programs cannot be aliases. We believe that this kind of information is useful for a variety of optimizations, while remaining independent of a particular optimization. We show that control-flow and context insensitive analyses provide the same answer as the best possible pointer analysis on at least 95% of all statically generated alias queries. In order to understand the potential run-time impact of the remaining 5% queries, we weight the alias queries by dynamic execution counts obtained from profile data. Flow-insensitive pointer analyses are accurate on at least 95% of the weighted alias queries as well. We then examine whether scalable pointer analyses are inaccurate on the remaining 5% alias queries because they are context-insensitive. To this end, we have developed a new context-sensitive pointer analysis that also serves as a general engine for tracing the flow of values in C programs. To our knowledge, it is the first technique for performing context-sensitive analysis with subtyping that scales to millions of lines of code. We find that the new algorithm does not identify fewer aliases than the context-insensitive analysis.
workshop on program analysis for software tools and engineering | 2001
Markus Mock; Manuvir Das; Craig Chambers; Susan J. Eggers
In this paper, we compare the behavior of pointers in C programs, as approximated by static pointer analysis algorithms, with the actual behavior of pointers when these programs are run. In order to perform this comparison, we have implemented several well known pointer analysis algorithms, and we have built an instrumentation infrastructure for tracking pointer values during program execution. Our experiments show that for a number of programs from the Spec95 and Spec2000 benchmark suites, the pointer information produced by existing scalable static pointer analyses is far worse than the actual behavior observed at run-time. These results have two implications. First, a tool like ours can be used to supplement static program understanding tools in situations where the static pointer information is too coarse to be usable. Second, a feedback-directed compiler can use profile data on pointer values to improve program performance by ignoring aliases that do not arise at run time (and inserting appropriate run-time checks to ensure safety). As an example, we were able to obtain a factor of 6 speedup on a frequently executed routine from m88ksim.
static analysis symposium | 2006
Dinakar Dhurjati; Manuvir Das; Yue Yang
In this paper, we present a new method for supporting abstraction refinement in path-sensitive dataflow analysis. We show how an adjustable merge criterion can be used as an interface to control the degree of abstraction. In particular, we partition the merge criterion with two sets of predicates — one related to the dataflow facts being propagated and the other related to path feasibility. These tracked predicates are then used to guide merge operations and path feasibility analysis, so that expensive computations are performed only at the right places. Refinement amounts to lazily growing the path predicate set to recover lost precision. We have implemented our refinement technique in ESP, a software validation tool for C/C++ programs. We apply ESP to validate a future version of Windows against critical security properties. Our experience suggests that applying iterative refinement to path-sensitive dataflow analysis is both effective in cutting down spurious errors and scalable enough for solving real world problems.
computer aided verification | 2006
Manuvir Das
The research community has long understood the value of formal specifications in building robust software. However, the adoption of any specifications beyond run-time assertions in industrial software has been limited. All of this has changed at Microsoft in the last few years. Today, formal specifications are a mandated part of the software development process in the largest Microsoft product groups. Millions of specifications have been added, and tens of thousands of bugs have been exposed and fixed in future versions of products under development. In addition, Windows public interfaces are formally specified and the Visual Studio compiler understands and enforces these specifications, meaning that programmers anywhere can now use formal specifications to make their software more robust. How did this happen? The key ingredients of success were picking a critical programming error that costs software companies real money (buffer overruns), and building an incremental solution in which programmers obtain value proportional to their specification effort. The key technical aspects of this incremental approach include SAL, a lightweight specification language for describing memory access behaviour of C/C++ programs; espX, a heavyweight modular checker that enforces consistency between the code and the specification and validates memory accesses; and SALinfer, a lightweight global analysis that infers and inserts a large fraction of the memory specifications automatically. The goal of this talk is to share the technical story of the insights that enabled SAL, espX and SALinfer, as well as the social and practical story of how we were able to move organizations with thousands of programmers to an environment where the use of specifications is routine.