Analysis of MiniJava Programs via Translation to ML
aa r X i v : . [ c s . P L ] D ec Analysis of MiniJava Programs via Translation to ML
Martin Mariusz Lester
Department of Computer ScienceUniversity of ReadingUnited Kingdom [email protected]
Abstract
MiniJava is a subset of the object-oriented programming lan-guage Java. Standard ML is the canonical representative ofthe ML family of functional programming languages, whichincludes F
CCS Concepts • Software and its engineering → For-mal methods ; Object oriented languages ; Functional languages ; Keywords
Java, ML, automated verification, static analy-sis, program transformation
ACM Reference Format:
Martin Mariusz Lester. 2019. Analysis of MiniJava Programs viaTranslation to ML. In
Formal Techniques for Java-like Programs (FT-fJP’19), July 15, 2019, London, United Kingdom.
ACM, New York,NY, USA, 3 pages. https://doi.org/10.1145/3340672.3341119
FTfJP’19, July 15, 2019, London, United Kingdom © 2019 Copyright held by the owner/author(s). Publication rights licensedto ACM.This is the author’s version of the work. It is posted here for your personaluse. Not for redistribution. The definitive Version of Record was publishedin
Formal Techniques for Java-like Programs (FTfJP’19), July 15, 2019, Lon-don, United Kingdom , https://doi.org/10.1145/3340672.3341119 . Tools for program analysis and verification have developedrapidly since the success of Microsoft’s SLAM driver verifi-cation project [3]. A range of complementary and overlap-ping techniques and technologies have gained prominence,such as abstract interpretation, model-checking, CEGAR andSMT solvers. All provide some way of bounding potentiallyinfinite behaviours in a program or avoiding state space ex-plosion.Many of the biggest successes have been in the worldof traditional imperative programs. Idiomatic C programsmake comparatively little use of dynamic memory alloca-tion, but may control their behaviour through intricate useof bit-level manipulation and values of complex combina-tions of flags and other variables. Bounded model-checkingusing SMT solvers has been particularly successful here [6].There has also been some success in dealing with object-oriented programs, such as those written in Java [8], andfunctional programs [10], written in ML or Haskell.The challenges for handling idiomatic programs writtenin these paradigms are different. In Java, allocation of ob-jects on the heap is very common. Use of dynamic methoddispatch is central to writing idiomatic Java code of any com-plexity. This means that, even for simple programs, accuratemodelling of program control flow requires good modellingof the heap, combined with context sensitivity to match methodcalls and returns. (In C programs, the equivalent problem oftracking function pointers stored at heap-allocated memorylocations still arises, but less frequently.) However, this maynot always be important for program verification, as in awell-designed object-oriented program (or at least one thatobeys the Liskov Substitution Principle), methods of a sub-class that override methods in the superclass will usually sat-isfy a stronger specification than the method they override.Thus, for many verification problems, it is not necessary toknow exactly which subclass method is being called.In functional languages, the use of higher order functionsis similarly prevalent. Conceptually, the difficulty they presentis similar to dynamic method dispatch in object-orientedprogramming, but the complexity of analysis required is of-ten greater. Firstly, accurately tracking flow control for higherorder functions requires tracking of more levels of callingcontext. Secondly, the same functionals are often used ina wide variety of unrelated situations, so type informationcannot reliably be used to delineate and partition their uses.
TfJP’19, July 15, 2019, London, United Kingdom Martin Mariusz Lester
Furthermore, for the same reason, determining the actualresults of functionals is more important for accurate pro-gram analysis. Consequently, many analyses for functionalprogramming languages emphasise accurate modelling ofcontrol flow for higher order functions. In contrast, they of-ten neglect or ignore mutable state, as its use is prohibitedin Haskell (other than through monads) and discouraged inML.
Because Java and ML have different feature sets, it is difficultto apply an analysis designed for one language to a programwritten in the other. But by doing so, we may gain someinsight into our tools and techniques. We may discover thattechniques developed in one community would be useful tothe other. Or the inability of one community’s tool to handleprograms from the other may motivate improvements to thetool. In particular, the introduction of lambda expressions toJava 8 may make it more important for Java tools to be ableto reason about higher order functions in programs writtenin a functional style [7].We propose to begin this exploration by translating Javaprograms into ML, so that they may be analysed by toolswritten for functional programs. In order for the translationto be manageable, we focus on translating the MiniJava sub-set of Java. So that our translated programs may be usedwith as many tools as possible, we use only the core fea-tures of the language, namely recursive functions, algebraicdatatypes (including lists) and pattern-matching. In particu-lar, we avoid the use of references (mutable variables). Sub-ject to these constraints, we aim to be as idiomatic as rea-sonably possible in our translation.MiniJava is a subset of Java introduced in Appel and Pals-berg’s book
Modern Compiler Implementation in Java [2]. Typesin MiniJava are limited to int , boolean , arrays of int andobject types corresponding to any classes defined in the pro-gram. Java features omitted from MiniJava include interfaces,explicit casts, exceptions, visibility modifiers, generics andreflection. The combination of features is expressive enoughfor writing idiomatic object-oriented programs, but constrainedenough to support easy compilation, analysis or transforma-tion. Statements and expressions.
Each Java statement becomesa let-binding, with the “current” program state being usedin the bound expression and the “next” program state beingthe newly bound variables. The style of the resulting codeis similar to Administrative Normal Form [4].
Mutable state.
The mutable state of a Java program is splitinto two parts: heap-allocated objects and method-local vari-ables. As the number of local variables in any method is fixed, the local variables can be encoded as a fixed-size tu-ple of variable values. The heap is a map from pointers toobjects. Pointers can be encoded using any datatype thatsupports the operations required for a name, namely com-parison for equality and creation of fresh names. The sim-plest choice is to use unbounded integers starting at 0, allo-cating integers sequentially as fresh pointers. Any encodingof maps can be used, but the choice will impact the analysisof the translated program.
Objects and subclasses.
Java objects are encodable as a tu-ple combining their methods (which become ML functions)and their properties (which become either int s, bool s or int s encoding object pointers). Member lookup simply be-comes selection of an element from the tuple. Property up-date requires replacing the whole object in the map encod-ing the heap. Subclassing could be handled using row-levelpolymorphism for records [15], as in OCaml’s objects. Asthis is not part of Standard ML, we instead encode an objectas a tuple combining its members and an Option for any sub-class members. The type of the Option is then an algebraicsum over all possible subclasses. Program transformation is often used for removal of morecomplex features of a language [5], or translation to a sim-pler language, so that the verification tools need only han-dle a smaller number of language features. Notably, the Jim-ple [14] intermediate language for Java used by Soot is de-liberately simpler than Java bytecode. Such transformationsare often avoided, as they hide the structure of a program,confounding analysis. Indeed, attempting to recover this struc-ture is a key step in analysis of compiled programs [9].Previous work considers analysis of functional programswritten in Haskell via translation to C using the compilerJHC and application of the symbolic execution tool Klee [1].We are not aware of any work in the reverse direction, pre-sumably because of the relative immaturity of tools for func-tional languages. Tools for analysing ML programs are basedaround a variety of different techniques, such as model-checkingof Higher Order Recursion Schemes (MoCHi [13]), refine-ment type inference (DSolve [12]) and algorithmic game se-mantics (SyTeCi [11]), but there is no clear leader.
We are currently implementing the translation. The startingpoint for our work is a toy MiniJava compiler used to teach amodule on compilers at the University of Reading. The nextstep will be to compare Java analysis tools on MiniJava pro-grams with ML program tools on the translated programs.We expect that they will be reasonably accurate until theyhave to reason about values retrieved from the heap, how-ever we choose to encode it. nalysis of MiniJava Programs via Translation to ML FTfJP’19, July 15, 2019, London, United Kingdom
References [1] Mario Alvarez-Picallo. 2015. MPRI Internship Report: Verification bycompilation of higher-order functional programs. (2015).[2] Andrew W. Appel and Jens Palsberg. 2002.
Modern Compiler Imple-mentation in Java, 2nd edition . Cambridge University Press.[3] Thomas Ball, Byron Cook, Vladimir Levin, and Sriram K. Rajamani.2004. SLAM and Static Driver Verifier: Technology Transfer ofFormal Methods inside Microsoft. In
Integrated Formal Methods, 4thInternational Conference, IFM 2004, Canterbury, UK, April 4-7, 2004,Proceedings (Lecture Notes in Computer Science) , Eerke A. Boiten,John Derrick, and Graeme Smith (Eds.), Vol. 2999. Springer, 1–20. https://doi.org/10.1007/978-3-540-24756-2_1 [4] Robert Cartwright (Ed.). 1993.
Proceedings of the ACM SIGPLAN’93Conference on Programming Language Design and Implementation(PLDI), Albuquerque, New Mexico, USA, June 23-25, 1993 . ACM. http://dl.acm.org/citation.cfm?id=155090 [5] Wontae Choi, Baris Aktemur, Kwangkeun Yi, and Makoto Tatsuta.2011. Static analysis of multi-staged programs via unstaging trans-lation. In
Proceedings of the 38th ACM SIGPLAN-SIGACT Symposiumon Principles of Programming Languages, POPL 2011, Austin, TX, USA,January 26-28, 2011 , Thomas Ball and Mooly Sagiv (Eds.). ACM, 81–92. https://doi.org/10.1145/1926385.1926397 [6] Edmund M. Clarke, Daniel Kroening, and Flavio Lerda. 2004. ATool for Checking ANSI-C Programs. In
Tools and Algorithms for theConstruction and Analysis of Systems, 10th International Conference,TACAS 2004, Held as Part of the Joint European Conferences on The-ory and Practice of Software, ETAPS 2004, Barcelona, Spain, March29 - April 2, 2004, Proceedings (Lecture Notes in Computer Science) ,Kurt Jensen and Andreas Podelski (Eds.), Vol. 2988. Springer, 168–176. https://doi.org/10.1007/978-3-540-24730-2_15 [7] David R. Cok. 2018. Reasoning about functional programming in Javaand C++. In
Companion Proceedings for the ISSTA/ECOOP 2018 Work-shops, ISSTA 2018, Amsterdam, Netherlands, July 16-21, 2018 , JulianDolby, William G. J. Halfond, and Ashish Mishra (Eds.). ACM, 37–39. https://doi.org/10.1145/3236454.3236483 [8] Lucas C. Cordeiro, Daniel Kroening, and Peter Schrammel. 2018.Benchmarking of Java Verification Tools at the Software VerificationCompetition (SV-COMP).
ACM SIGSOFT Software Engineering Notes
43, 4 (2018), 56. https://doi.org/10.1145/3282517.3282529 [9] Alessandro Di Federico, Mathias Payer, and Giovanni Agosta. 2017.rev.ng: a unified binary analysis framework to recover CFGs andfunction boundaries. In
Proceedings of the 26th International Con-ference on Compiler Construction, Austin, TX, USA, February 5-6, 2017 , Peng Wu and Sebastian Hack (Eds.). ACM, 131–141. https://doi.org/10.1145/3033019 [10] Marco Gaboardi, Suresh Jagannathan, Ranjit Jhala, and StephanieWeirich. 2016. Language Based Verification Tools for Functional Pro-grams (Dagstuhl Seminar 16131).
Dagstuhl Reports
6, 3 (2016), 59–77. https://doi.org/10.4230/DagRep.6.3.59 [11] Guilhem Jaber. 2018. SyTeCi: Towards automation of contextualequivalence for higher-order programs with references. (2018).[12] Ming Kawaguchi, Patrick Maxim Rondon, and Ranjit Jhala. 2010.Dsolve: Safety Verification via Liquid Types. In
Computer Aided Veri-fication, 22nd International Conference, CAV 2010, Edinburgh, UK, July15-19, 2010. Proceedings (Lecture Notes in Computer Science) , TayssirTouili, Byron Cook, and Paul B. Jackson (Eds.), Vol. 6174. Springer,123–126. https://doi.org/10.1007/978-3-642-14295-6_12 [13] Ryosuke Sato, Hiroshi Unno, and Naoki Kobayashi. 2013. Towardsa scalable software model checker for higher-order programs. In
Proceedings of the ACM SIGPLAN 2013 Workshop on Partial Evalu-ation and Program Manipulation, PEPM 2013, Rome, Italy, January21-22, 2013 , Elvira Albert and Shin-Cheng Mu (Eds.). ACM, 53–62. https://doi.org/10.1145/2426890.2426900 [14] Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie J. Hendren,Patrick Lam, and Vijay Sundaresan. 1999. Soot - a Java bytecodeoptimization framework. In
Proceedings of the 1999 conference of theCentre for Advanced Studies on Collaborative Research, November 8-11,1999, Mississauga, Ontario, Canada , Stephen A. MacKay and J. HowardJohnson (Eds.). IBM, 13. https://dl.acm.org/citation.cfm?id=782008 [15] Mitchell Wand. 1989. Type Inference for Record Concatena-tion and Multiple Inheritance. In
Proceedings of the Fourth An-nual Symposium on Logic in Computer Science (LICS ’89), PacificGrove, California, USA, June 5-8, 1989 . IEEE Computer Society, 92–97.. IEEE Computer Society, 92–97.