[PDF] A Modern Compiler for the French Tax Code

Abstract

In France, income tax is computed from taxpayers' individual returns, using an algorithm that is authored, designed and maintained by the French Public Finances Directorate (DGFiP). This algorithm relies on a legacy custom language and compiler originally designed in 1990, which unlike French wine, did not age well with time. Owing to the shortcomings of the input language and the technical limitations of the compiler, the algorithm is proving harder and harder to maintain, relying on ad-hoc behaviors and workarounds to implement the most recent changes in tax law. Competence loss and aging code also mean that the system does not benefit from any modern compiler techniques that would increase confidence in the implementation. We overhaul this infrastructure and present Mlang, an open-source compiler toolchain whose goal is to replace the existing infrastructure. Mlang is based on a reverse-engineered formalization of the DGFiP's system, and has been thoroughly validated against the private DGFiP test suite. As such, Mlang has a formal semantics; eliminates previous handwritten workarounds in C; compiles to modern languages (Python); and enables a variety of instrumentations, providing deep insights about the essence of French income tax computation. The DGFiP is now officially transitioning to Mlang for their production system.

Full PDF

AA Modern Compiler for the French Tax Code

Denis Merigoux ∗ InriaParis, France [email protected]

Raphaël Monat ∗ Sorbonne Université, CNRS, LIP6Paris, France [email protected]

Jonathan Protzenko

Microsoft ResearchUSA [email protected]

Abstract

In France, income tax is computed from taxpayers’ individ-ual returns, using an algorithm that is authored, designedand maintained by the French Public Finances Directorate(DGFiP). This algorithm relies on a legacy custom languageand compiler originally designed in 1990, which unlike Frenchwine, did not age well with time. Owing to the shortcom-ings of the input language and the technical limitations ofthe compiler, the algorithm is proving harder and harderto maintain, relying on ad-hoc behaviors and workaroundsto implement the most recent changes in tax law. Compe-tence loss and aging code also mean that the system does notbenefit from any modern compiler techniques that wouldincrease confidence in the implementation.We overhaul this infrastructure and present Mlang, anopen-source compiler toolchain whose goal is to replacethe existing infrastructure. Mlang is based on a reverse-engineered formalization of the DGFiP’s system, and hasbeen thoroughly validated against the private DGFiP testsuite. As such, Mlang has a formal semantics; eliminatesprevious hand-written workarounds in C; compiles to mod-ern languages (Python); and enables a variety of instrumen-tations, providing deep insights about the essence of Frenchincome tax computation. The DGFiP is now officially transi-tioning to Mlang for their production system.

Keywords: legal expert system, compiler, tax code

The French Tax Code is a body of legislation amounting toroughly 3,500 pages of text, defining the modalities of taxcollection by the state. In particular, each new fiscal year, anew edition of the Tax Code describes in natural languagehow to compute the final amount of income tax (IR, for impôtsur le revenu ) owed by each household.As in many other tax systems around the world, this com-putation is quite complex. France uses a bracket system (asin, say, the US federal income tax), along with a myriad of taxcredits, deductions, optional rules, state-sponsored direct aid,all of which are parameterized over the composition of thehousehold, that is, the number of children, their respectiveages, potential disabilities, and so on. ∗ Equal contribution

Preprint, Jan. 2021, Online “rules”M files “rules”C files“inter”C filesShared state “calculette”Shared libraryDGFiP’s internalcompiler GCC

Figure 1.

Legacy architectureUnlike, say, the United States, the French system reliesheavily on automation. During tax season, French taxpayerslog in to the online tax portal, which is managed by the state.There, taxpayers are presented with online forms, generallypre-filled. If applicable, taxpayers can adjust the forms, e.g.by entering extra deductions or credits. Once the taxpayer issatisfied with the contents of the online form, they send intheir return. Behind the scenes, the IR algorithm is run, andtaking as input the contents of the forms, returns the finalamount of tax owed. The taxpayer is then presented withthe result at tax-collection time.Naturally, the ability to independently reproduce and thustrust the IR computation performed by the DGFiP is crucial.First, taxpayers need to understand the result, as their ownestimate may differ (explainability). Second, citizens maywant to audit the algorithm, to ensure it faithfully imple-ments the law (correctness). Third, a standalone, reusableimplementation allows for a complete and precise simulation of the impacts of a tax reform, greatly improving existingefforts [8, 17] (forecasting).Unfortunately, we are currently far from a transparent,open-source, reproducible computation. Following numer-ous requests (using a disposition similar to the United States’Freedom of Information Act), parts of the existing sourcecode were published. In doing so, the public learned that i) the existing infrastructure is made up of various partspieced together and that ii) key data required to accuratelyreproduce IR computations was not shared with the public.The current, legacy architecture of the IR tax system ispresented in Fig. 1. The bulk of the tax code is described as aset of “rules” authored in M, a custom, non Turing-completelanguage. A total of 90,000 lines of M rules compile to 535,000lines of C (including whitespace and comments) via a custom a r X i v : . [ c s . P L ] J a n reprint, Jan. 2021, Online Denis Merigoux, Raphaël Monat, and Jonathan Protzenko compiler. Rules are now mostly public [10]. Over time, theexpressive power of rules turned out to be too limited toexpress a particular feature, known as liquidations multiples ,which involves tax returns across different years. Lackingthe expertise to extend the M language, the DGFiP added in1995 some high-level glue code in C, known as “inter”. Theglue code is closer to a full-fledged language, and has a non-recursive call-graph which may call the “rules” computationmultiple times with various parameters. The “inter” driveramounts to 35,000 lines of C code and has not been released.Both “inter” and “rules” are updated every year to fol-low updates in the law, and as such, have been extensivelymodified over their 30-year operational lifespan.Our goal is to address these shortcomings by bringing theFrench tax code infrastructure into the 21 st century. Specifi-cally, we wish to: i) reverse-engineer the unpublished partsof the DGFiP computation, so as to ii) provide an explain-able, open-source, correct implementation that can be inde-pendently audited; furthermore, we wish to iii) modernizethe compiler infrastructure, eliminating in the process anyhand-written C that could not be released because of secu-rity concerns, thus enabling a host of modern applications,simulations and use-cases. ● We start with a reverse-engineered formal semanticsfor the M DSL, along with a proof of type safety per-formed using the Coq [31] proof assistant (Section 2). ● To eliminate C code from the ecosystem, we extendthe M language with enough capabilities to encode thelogic of the high-level “inter” driver (Fig. 1) – we dubthe new design M++ (Section 3). ● To execute M/M++ programs, we introduce Mlang,a complete re-implementation which combines a ref-erence interpreter along with an optimizing compilerthat generates C and Python code (Section 4). ● We evaluate our implementation: we show how weattained 100% conformance on the legacy system’stestsuite, then proceed to enable a variety of analysesand instrumentations to fuzz, measure and stress-testour new system (Section 5). ● We conclude with a tour d’horizon of related attemptsat increasing trust in algorithmic parts of the law (Sec-tion 6).Our code is open-source and available on GitHub [20]and as an archived artifact on Zenodo [21]. We have en-gaged with the DGFiP, and following numerous discussions,iterations, and visits to their offices, we have been formallyapproved to start replacing the legacy infrastructure with ournew implementation, meaning that within a few years’ time,all French tax returns will be processed using the compilerdescribed in the present paper.

The 2018 version of the income tax computation [10] is splitacross 48 files, for a total of 92,000 lines of code. The code iswritten in M, the input language originally designed by theDGFiP. In order to understand this body of tax code, we setout to give a semantics to M.

M programs are made up of two parts: declarations and rules.

Declarations introduce: input variables, intermediary vari-ables, output variables and exceptions. Variables are eitherscalars or fixed-length arrays. Both variables and exceptionsare annotated with a human-readable description. Variablesthat belong to the same section of the tax form are annotatedwith the same kind. Examples of kinds include "triggers taxcredit", or "is advance payment". This is used later in M++(Section 3.3) for partitioning variables, and quickly checkingwhether any variable of a given kind has a non- undef value.

Rules capture the computational part of an M program;they are either variable assignments or raise-if statements.As a first simplified example, the French tax code declaresan input variable

V_0AC for whether an individual is single(value ) or not (value ). Lacking any notion of data type orenumeration, there is no way to enforce statically that anindividual cannot be married ( V_0AM ) and single (

V_0AC) atthe same time. Instead, an exception

A031 is declared, alongwith a human-readable description. Then, a rule raises an ex-ception if the sum of the two variables is greater than 1. (Theseemingly superfluous + 0 is explained in Section 2.5.) Forthe sake of example, we drop irrelevant extra syntactic fea-tures, and for the sake of readability, we translate keywordsand descriptions into English.

V_0AC : input family ... : "Checkbox : Single" type BOOLEAN ;V_0AM : input family ... : "Checkbox : Married" type BOOLEAN ;A031:exception :"A":"031":"00":"both married and single":"N";if V_0AC + V_0AM + 0 > 1 then error A031 ;

As a second simplified example, the following M rule com-putes the value of a hypothetical variable

TAXBREAK . Its valueis computed from variables

CHILDRENCOUNT (for the numberof children in the household) and

TAXOWED (for the tax owedbefore the break) – the assigned expression relies on a condi-tional and the built-in max function. This expression gives abetter tax break to households having three or more children.

TAXBREAK= if (CHILDRENCOUNT+0 > 2)then max(MINTAXBREAK,TAXOWED * 20 / 100)else MINTAXBREAK endif;

For the rest of this paper, we abandon concrete syntax and all-caps variable names, in favor of a core calculus that faithfullymodels M: 𝜇 M. 𝜇 M: a core model of M

The 𝜇 M core language omits variable declarations, whosemain purpose is to provide a human-readable description Modern Compiler for the French Tax Code Preprint, Jan. 2021, Online string that relates them to the original tax form. The 𝜇 M corelanguage also eliminates syntactic sugar, such as staticallybounded loops, or type aliases (e.g.

BOOLEAN ). Finally, a partic-ular feature of M is that rules may be provided in any order:the M language has a built-in dependency resolution fea-ture that automatically re-orders computations (rules) andasserts that there are no loops in variable assignments. Inour own implementation (Mlang, Section 4), we perform atopological sort; in our 𝜇 M formalization, we assume thatcomputations are already in a suitable order. 𝜇 M We describe the syntax of 𝜇 M in Fig. 2. A program is a seriesof statements (“rules”). Statements are either raise-error-if,or assignments. We define two forms of assignment: one forscalars and the other for fixed-size arrays. The latter is ofthe form a[X, n] := e , where X is bound in e (the index is always named X ). Using Haskell’s list comprehension syntax,this is to be understood as 𝑎 ∶= (︀ 𝑒 ⋃︀ 𝑋 ← (︀ ..𝑛 − ⌋︀⌋︀ .Expressions are a combination of variables (including thespecial index expression X ), values, comparisons, logic andarithmetic expressions, conditionals, calls to builtin func-tions, or index accesses. Most functions exhibit standardbehavior on floating-point values, but M assumes the de-fault IEEE-754 rounding mode, that is, rounding to nearestand ties to even. The detailed behavior of each function isdescribed in Fig. 6.Values can be undef , which arises in two situations: refer-ences to variables that have not been defined (i.e. for whichthe entry in the tax form has been left blank) and out ofbounds array accesses. All other values are IEEE-754 double-precision numbers, i.e. 64-bit floats. The earlier BOOLEAN type(Section 2.1) is simply an alias for a float whose value is im-plicitly or . There is no other kind of value, as a referenceto an array variable is invalid. Function present discrimi-nates the undef value from floats. 𝜇 M Types in 𝜇 M are either scalar or array types. M does not offernested arrays. Therefore, typing is mostly about making surescalars and arrays are not mixed up.In Fig. 3, a first judgment Γ ⊢ 𝑒 defines expression well-formedness. It rules out references to arrays, hence enforcingthat expressions have type scalar and that no values of typearray can be produced. Furthermore, variables may have noassignment at all (if the underlying entry in the tax formhas been left blank) but may still be referred in other rules.Rather than introduce spurious variable assignments with undef , we remain faithful to the very loose nature of the Mlanguage and account for references to undefined variables.Then, Γ ⊢ ∐︀ program ̃︀ ⇛ Γ ′ enforces well-formedness fora whole program while returning an extended environment ∐︀ program ̃︀ ::= ∐︀ command ̃︀ | ∐︀ command ̃︀ ; ∐︀ program ̃︀∐︀ command ̃︀ ::= if ∐︀ expr ̃︀ then ∐︀ error ̃︀ | ∐︀ var ̃︀ := ∐︀ expr ̃︀ | ∐︀ var ̃︀ [ X ; ∐︀ float ̃︀ ] := ∐︀ expr ̃︀∐︀ expr ̃︀ ::= ∐︀ var ̃︀ | X | ∐︀ value ̃︀ | ∐︀ expr ̃︀ ∐︀ binop ̃︀ ∐︀ expr ̃︀ | ∐︀ unop ̃︀ ∐︀ expr ̃︀ | if ∐︀ expr ̃︀ then ∐︀ expr ̃︀ else ∐︀ expr ̃︀ | ∐︀ func ̃︀ ( ∐︀ expr ̃︀ , ..., ∐︀ expr ̃︀ ) | ∐︀ var ̃︀ [ ∐︀ expr ̃︀ ] ∐︀ value ̃︀ ::= undef | ∐︀ float ̃︀∐︀ binop ̃︀ ::= ∐︀ arithop ̃︀ | ∐︀ boolop ̃︀∐︀ arithop ̃︀ ::= + | - | * | / ∐︀ boolop ̃︀ ::= <= | < | > | >= | == | != | && | || ∐︀ unop ̃︀ ::= - | ~ ∐︀ func ̃︀ ::= round | truncate | max | min | abs | pos | pos_or_null | null | present Figure 2.

Syntax of the 𝜇 M language Γ ′ . We take advantage of the fact that scalar and array as-signments have different syntactic forms. M disallows as-signing different types to the same variable; we rule thisout in T-Assign-* . A complete 𝜇 M program is well-formed if ∅ ⊢ 𝑃 ⇛ _. 𝜇 M At this stage, seeing that there are neither unbounded loopsnor user-defined (recursive) functions in the language, M isobviously not

Turing-complete. The language semantics arenonetheless quite devious, owing to the undef value, whichcan be explicitly converted to a float via a + 0 , as seen inearlier examples. We proceed to formalize them in Coq [31],using the Flocq library [4]. This ensures we correctly ac-count for all cases related to the undef value, and guides theimplementation of Mlang (Section 3).

Expressions.

The semantics of expressions is defined inFig. 4. The memory environment, written Ω is a functionfrom variables to either scalar values (usually denoted 𝑣 ),or arrays (written ( 𝑣 , . . . , 𝑣 𝑛 − ) ). A value absent from theenvironment evaluates to undef .The special array index variable X is evaluated as a normalvariable. Conditionals reduce normally, except when theguard is undef : in that case, the whole conditional evaluatesinto undef . If an index evaluates to undef , the whole arrayaccess is undef . In the case of a negative out-of-bounds indexaccess the result is 0; in the case of a positive out-of-boundsindex access the result is undef . Otherwise, the index istruncated into an integer, used to access Ω . The behavior offunctions, unary and binary operators is described in Fig. 6.Figuring out these (unusual) semantics took over a year.We initially worked in a black-box setting, using as an oraclefor our semantics the simplified online tax simulator offeredby the DGFiP. After the initial set of M rules was open-sourced, we simply manually crafted test cases and fed those reprint, Jan. 2021, Online Denis Merigoux, Raphaël Monat, and Jonathan Protzenko Global function environment Δ : Δ ( round ) = Δ ( truncate ) = Δ ( abs ) = Δ ( pos )= Δ ( pos_or_null ) = Δ ( null ) = Δ ( present ) = Δ ( min ) = Δ ( max ) = Δ (∐︀ arithop ̃︀) = Δ (∐︀ boolop ̃︀) = Judgment : Γ ⊢ 𝑒 (“Under Γ , 𝑒 is well-formed”) T-float Γ ⊢ ∐︀ float ̃︀ T-undef Γ ⊢ undef T-var-undef 𝑥 ⇑∈ dom ΓΓ ⊢ 𝑥 T-var Γ ( 𝑥 ) = scalar Γ ⊢ 𝑥 T-index-undef 𝑥 ⇑∈ dom Γ Γ ⊢ 𝑒 Γ ⊢ 𝑥 (︀ 𝑒 ⌋︀ T-conditional Γ ⊢ 𝑒 Γ ⊢ 𝑒 Γ ⊢ 𝑒 Γ ⊢ if 𝑒 then 𝑒 else 𝑒 T-index Γ ( 𝑥 ) = array Γ ⊢ 𝑒 Γ ⊢ 𝑥 (︀ 𝑒 ⌋︀ T-func Δ ( 𝑓 ) = 𝑛 Γ ⊢ 𝑒 ⋯ Γ ⊢ 𝑒 𝑛 Γ ⊢ 𝑓 ( 𝑒 , . . . , 𝑒 𝑛 ) Judgment : Γ ⊢ ∐︀ command ̃︀ ⇛ Γ ′ and Γ ⊢ ∐︀ program ̃︀ ⇛ Γ ′ (“ 𝑃 transforms Γ to Γ ′ ”) T-cond Γ ⊢ 𝑒 Γ ⊢ if 𝑒 then ∐︀ error ̃︀ ⇛ Γ T-seq Γ ⊢ 𝑐 ⇛ Γ Γ ⊢ 𝑃 ⇛ Γ Γ ⊢ 𝑐 ; 𝑃 ⇛ Γ T-assign-scalar 𝑥 ∈ Γ ⇒ Γ ( 𝑥 ) = scalar Γ ⊢ 𝑒 Γ ⊢ 𝑥 := 𝑒 ⇛ Γ (︀ 𝑥 ↦ scalar ⌋︀ T-assign-array 𝑥 ∈ Γ ⇒ Γ ( 𝑥 ) = array Γ (︀ X ↦ scalar ⌋︀ ⊢ 𝑒 Γ ⊢ 𝑥 (︀ X , 𝑛 ⌋︀ := 𝑒 ⇛ Γ (︀ 𝑥 ↦ array ⌋︀ Figure 3.

Typing of expressions and programsby hand to the online simulator to adjust our semantics. Thisallowed us to gain credibility and to have the DGFiP take usseriously. After that, we were allowed to enter the DGFiPoffices and browse the source of their M compiler, as longas we did not exfiltrate any information. This final “codebrowsing” allowed us to understand the “inter” part of theircompiler, a well as nail down the custom operators fromFig. 15.

Statements.

The memory environment Ω is extended into Ω 𝑐 , to propagate the error case that may be raised by excep-tions. An assignment updates a valid memory environmentwith the computed value. If an assertion’s guard evaluatesto a non-zero float, an error is raised; otherwise, programexecution continues. Rule D-error propagates a raised er-ror across a program. The whole-array assignment works byevaluating the expression in different memory environments,one for each index.

We now prove type safety in Coq. Owing to the unusualsemantics of the undef value, and to the lax treatment of

Judgment : Ω ⊢ 𝑒 ⇓ 𝑣 (“Under Ω , 𝑒 evaluates to 𝑣 ”) D-value 𝑣 ∈ ∐︀ value ̃︀ Ω ⊢ 𝑣 ⇓ 𝑣 D-var-undef 𝑥 ⇑∈ dom ΩΩ ⊢ 𝑥 ⇓ undef D-var Ω ( 𝑥 ) = 𝑣 Ω ⊢ 𝑥 ⇓ 𝑣 D-cond-true Ω ⊢ 𝑒 ⇓ 𝑓 𝑓 ∉ { , undef } Ω ⊢ 𝑒 ⇓ 𝑣 Ω ⊢ if 𝑒 then 𝑒 else 𝑒 ⇓ 𝑣 D-X Ω ( X ) = 𝑣 Ω ⊢ X ⇓ 𝑣 D-cond-false Ω ⊢ 𝑒 ⇓ Ω ⊢ 𝑒 ⇓ 𝑣 Ω ⊢ if 𝑒 then 𝑒 else 𝑒 ⇓ 𝑣 D-index-neg Ω ⊢ 𝑒 ⇓ 𝑟 𝑟 < Ω ⊢ 𝑥 (︀ 𝑒 ⌋︀ ⇓ D-cond-undef Ω ⊢ 𝑒 ⇓ undef Ω ⊢ if 𝑒 then 𝑒 else 𝑒 ⇓ undef D-index-undef Ω ⊢ 𝑒 ⇓ undef Ω ⊢ 𝑥 (︀ 𝑒 ⌋︀ ⇓ undef D-index-outside Ω ⊢ 𝑒 ⇓ 𝑟 𝑟 ⩾ 𝑛 ⋃︀ Ω ( 𝑥 )⋃︀ = 𝑛 Ω ⊢ 𝑥 (︀ 𝑒 ⌋︀ ⇓ undef D-tab-undef 𝑥 ⇑∈ dom ΩΩ ⊢ 𝑥 (︀ 𝑒 ⌋︀ ⇓ undef D-index Ω ( 𝑥 ) = ( 𝑣 , . . . , 𝑣 𝑛 − ) Ω ⊢ 𝑒 ⇓ 𝑟 𝑟 ∈ (︀ , 𝑛 ) 𝑟 ′ = truncate F ( 𝑟 ) Ω ⊢ 𝑥 (︀ 𝑒 ⌋︀ ⇓ 𝑣 𝑟 ′ D-func Ω ⊢ 𝑒 ⇓ 𝑣 ⋯ Ω ⊢ 𝑒 𝑛 ⇓ 𝑣 𝑛 Ω ⊢ 𝑓 ( 𝑒 , . . . , 𝑒 𝑛 ) ⇓ 𝑓 ( 𝑣 , . . . , 𝑣 𝑛 ) Figure 4.

Operational semantics: expressions

Judgment : Ω 𝑐 ⊢ 𝑐 ⇛ Ω ′ 𝑐 and Ω 𝑐 ⊢ 𝑃 ⇛ Ω ′ 𝑐 (“Under Ω 𝑐 , 𝑃 produces Ω ′ 𝑐 ”) D-assign Ω 𝑐 ≠ error Ω 𝑐 ⊢ 𝑒 ⇓ 𝑣 Ω 𝑐 ⊢ 𝑥 := 𝑒 ⇛ Ω 𝑐 (︀ 𝑥 ↦ 𝑣 ⌋︀ D-assert-other Ω 𝑐 ≠ error Ω 𝑐 ⊢ 𝑒 ⇓ 𝑣 𝑣 ∈ { , undef } Ω 𝑐 ⊢ if 𝑒 then ∐︀ error ̃︀ ⇛ Ω 𝑐 D-assert-true Ω 𝑐 ≠ error Ω 𝑐 ⊢ 𝑒 ⇓ 𝑓 𝑓 ∉ { , undef } Ω 𝑐 ⊢ if 𝑒 then ∐︀ error ̃︀ ⇛ error D-error error ⊢ 𝑐 ⇛ error D-seq Ω 𝑐, ⊢ 𝑐 ⇛ Ω 𝑐, Ω 𝑐, ⊢ 𝑃 ⇛ Ω 𝑐, Ω 𝑐, ⊢ 𝑐 ; 𝑃 ⇛ Ω 𝑐, D-assign-table Ω 𝑐 ≠ error Ω 𝑐 (︀ X ↦ ⌋︀ ⊢ 𝑒 ⇓ 𝑣 ⋯ Ω 𝑐 (︀ X ↦ 𝑛 − ⌋︀ ⊢ 𝑒 ⇓ 𝑣 𝑛 − Ω 𝑐 ⊢ 𝑥 (︀ X , 𝑛 ⌋︀ := 𝑒 ⇛ Ω 𝑐 (︀ 𝑥 ↦ ( 𝑣 , . . . , 𝑣 𝑛 − )⌋︀ Figure 5.

Operational semantics: statements Modern Compiler for the French Tax Code Preprint, Jan. 2021, Online 𝑒 ⊙ 𝑒 , ⊙ ∈ {+ , −} undef 𝑓 ∈ F undef undef ⊙ 𝑓 𝑓 ∈ F 𝑓 ⊙ 𝑓 ⊙ F 𝑓 𝑒 ⊙ 𝑒 , ⊙ ∈ {× , ÷} undef 𝑓 ∈ F , 𝑓 ≠ undef undef undef undef 𝑓 undef 𝑓 ⊙ F 𝑓 𝑏 ∐︀ boolop ̃︀ 𝑏 undef 𝑓 ∈ F undef undef undef 𝑓 ∈ F undef 𝑓 ∐︀ boolop ̃︀ F 𝑓 𝑚 ( 𝑒 , 𝑒 ) ,𝑚 ∈ { min , max } undef 𝑓 ∈ F undef 𝑚 F ( , 𝑓 ) 𝑓 ∈ F 𝑚 F ( 𝑓 , ) 𝑚 F ( 𝑓 , 𝑓 ) round ( undef ) = undefround ( 𝑓 ∈ F ) = floor F ( 𝑓 + sign ( 𝑓 ) ∗ . ) truncate ( undef ) = undeftruncate ( 𝑓 ∈ F ) = floor F ( 𝑓 + − ) abs ( x ) ≡ if x >= 0 then x else -xpos_or_null ( x ) ≡ x >= 0pos ( x ) ≡ x > 0null ( x ) ≡ x = 0present ( undef ) = present ( 𝑓 ∈ F ) = Figure 6.

Function semantics.

For context on round and truncate definitions, see Section 4.3 ∐︀ program ̃︀ ::= ∐︀ fundecl ̃︀ * ∐︀ fundecl ̃︀ ::= ∐︀ funname ̃︀ ( ∐︀ var ̃︀ * ): ∐︀ command ̃︀ * ∐︀ command ̃︀ ::= if ∐︀ expr ̃︀ then ∐︀ command ̃︀ * else ∐︀ command ̃︀ *| partition with ∐︀ var_kind ̃︀ : ∐︀ command ̃︀ *| ∐︀ var ̃︀ = ∐︀ expr ̃︀ | ∐︀ var ̃︀ * <- ∐︀ fun ̃︀ () | del ∐︀ var ̃︀∐︀ expr ̃︀ ::= ∐︀ var ̃︀ | ∐︀ float ̃︀ | undef | ∐︀ expr ̃︀ ∐︀ binop ̃︀ ∐︀ expr ̃︀ | ∐︀ unop ̃︀ ∐︀ expr ̃︀ | ∐︀ builtin ̃︀ ( ∐︀ expr ̃︀ , ..., ∐︀ expr ̃︀ ) | exists( ∐︀ var_kind ̃︀ ) ∐︀ binop ̃︀ ::= ∐︀ arithop ̃︀ | ∐︀ boolop ̃︀∐︀ arithop ̃︀ ::= + | - | * | / ∐︀ boolop ̃︀ ::= <= | < | > | >= | == | != | && | || ∐︀ unop ̃︀ ::= - | ~ ∐︀ var_kind ̃︀ ::= taxbenefit | deposit | ... ∐︀ fun ̃︀ ::= ∐︀ funname ̃︀ | call_m ∐︀ builtin ̃︀ ::= present | cast Figure 7.

Syntax of the M++ languageundefined variables, this provides an additional level of guar-antee, by ensuring that reduction always produces a valueor an error (i.e. we haven’t forgotten any corner cases inour semantics). Furthermore, we show in the process thatthe store is consistent with the typing environment, writ-ten Γ ⊳ Ω . This entails store typing (i.e. values of the righttype are to be found in the store) and proper handling ofundefined variables (i.e. dom Ω ⊆ dom Γ ). Theorem (Expressions) . If Γ ⊳ Ω and Γ ⊢ 𝑒 , then thereexists 𝑣 such that Γ ⊢ 𝑒 ⇓ 𝑣 .We extend ⊳ to statements, so as to account for exceptions: Γ ⊳ 𝑐 Ω 𝑐 ⇐⇒ Ω 𝑐 = error ∨ Γ ⊳ Ω 𝑐 Theorem (Statements) . If Γ ⊢ 𝑐 ⇛ Γ ′ et Γ ⊳ 𝑐 Ω 𝑐 , then thereexists Ω ′ 𝑐 such that Ω 𝑐 ⊢ 𝑐 ⇛ Ω ′ 𝑐 and Γ ′ ⊳ 𝑐 Ω ′ 𝑐 .We provide full proofs and definitions in Coq, along with aguided tour of our development, in the supplement [21]. As described in Fig. 1, the internal compiler of the DGFiPcompiles M files (Section 2) to C code. Insofar as we un-derstand, the M codebase originally expressed the wholeincome tax computation. However, in the 1990s (Section 1),the DGFiP started executing the M code twice, with slightlydifferent parameters, in order for the taxpayer to witnessthe impact of a tax reform. Rather than extending M withsupport for user-defined functions, the DGFiP wrote the newlogic in C, in a folder called “inter”, for multi-year computa-tions. This piece of code can read and write variables usedin the M codebase using shared global state. To assemble thefinal executable, M-produced C files and hand-written “in-ter” C files are compiled by GCC and distributed as a sharedlibrary. Over time, the “inter” folder grew to handle a varietyof special cases, multiplying calls into the M codebase. Atthe time of writing, the “inter” folder amounts to 35,000 linesof C code.This poses numerous problems. First, the mere fact that“inter” is written in C prevents it from being released to thepublic, the DGFiP fearing security issues that might some-how be triggered by malicious inputs provided by the tax-payer. Therefore, the taxpayer cannot reproduce the tax com-putation since key parts of the logic are missing. Second, byvirtue of being written in C, “inter” does not compose withM, hindering maintainability, readability and auditability.Third, C limits the ability to modernize the codebase; rightnow, the online tax simulator is entirely written in C usingApache’s CGI feature (including HTML code generation),a very legacy infrastructure for Web-based development.Fourth, C is notoriously hard to analyze, preventing both theDGFiP and the taxpayer from doing fine-grained analyses.To address all of these limitations, we design M++, a com-panion domain-specific language (DSL) that is powerfulenough to completely eliminate the hand-written C code.

The chief purpose of the M++ DSL is to repeatedly call theM rules, with different M variable assignments for each call.To assist with this task, M++ provides basic computational reprint, Jan. 2021, Online Denis Merigoux, Raphaël Monat, and Jonathan Protzenko facilities, such as functions and local variables. In essence,M++ allows implementing a “driver” for the M code.Fig. 8 shows concrete syntax for M++. We chose syntaxresembling Python, where block scope is defined by indenta-tion. As the French administration moves towards a moderndigital infrastructure, Python seems to be reasonably under-stood across various administrative services.Fig. 7 formally lists all of the language constructs thatM++ provides. A program is a sequence of function declara-tions. M++ features two flavors of variables. Local variablesfollow scoping rules similar to Python: there is one localvariable scope per function body; however, unlike Python,we disallow shadowing and have no block scope or nonlocal keyword. Local variables exist only in M++. Variables in all-caps live in the M variable scope, which is shared betweenM and M++, and obey particular semantics. Two constructs support the interaction between M and M++:the <- and partition operators. They have slightly unusualsemantics, in the way that they deal with the M variablescope. These semantics are heavily influenced by the needsof the DGFiP, as we strived to provide something that wouldfeel intuitive to technicians in the French administration.To precisely define the expected behavior, Fig. 9 presentsreduction semantics of the form Δ , Ω ⊢ 𝑐 ↝ Ω , mean-ing command 𝑐 updates the store from Ω to Ω , given thefunctions declared in Δ .We distinguish built-ins, which may only appear in ex-pressions and do not modify the global store, from functions,which are declared at the top-level and may modify the store.The call_m operation is a special function. The <- operatortakes a function call, and executes it in a copy of the memory.Then, only those variables that appear on the left-hand sidesee their value propagated to the parent execution environ-ment. Thus, call_m only affects variables ⃗ 𝑋 .To execute the function call, the <- operator either looks updefinitions in Δ , the environment of user-defined functions,or executes the M rules in the call_m case, relying on theearlier definition of ⇛ (Fig. 5).Worded differently, our semantics introduce a notion ofcall stack and treat the M computation as a function callreturning multiple values. It is to be noted that the original Ccode had no such notion, and that the ⃗ 𝑋 were nothing morethan mere comments. As such, there was no way to staticallyrule out potential hidden state persisting from one call_m to another since the global scope was modified in place.With this formalization and its companion implementation(Section 4), we were able to confirm that there is currently noreliance on hidden state (something which we suspect tookconsiderable effort to enforce in the hand-written C code),and were able to design a much more principled semanticsthat we believe will lower the risk of future errors. compute_benefits(): if exists(taxbenefit) or exists(deposit): V_INDTEO = 1 V_CALCUL_NAPS = 1 partition with taxbenefit: NAPSANSPENA, IAD11, INE, IRE, PREM8_11 <- call_m() iad11 = cast(IAD11) ire = cast(IRE) ine = cast(INE) prem = cast(PREM8_11) V_CALCUL_NAPS = 0 V_IAD11TEO = iad11 V_IRETEO = ire V_INETEO = ine PREM8_11 = prem

Figure 8.

Example function defined in M++The partition operation operates over a variable kind 𝑘 (Section 2.1). The sub-block 𝑐 of partition executes in a re-stricted scope, where variables having kind 𝑘 are temporarilyset to undef . Upon completion of 𝑐 , the variables at kind 𝑘 are restored to their original value, while other variables arepropagated from the sub-computation into the parent scope.This allows running computations while “disabling” groupsof variables, e.g. ignoring an entire category of tax credits. Fig. 8 provides a complete M++ example, namely the function compute_benefits .The conditional at line 2 uses a variable kind-check (Sec-tion 2.1) to see if any variables of kind “tax benefit” have anon- undef value. Then, lines 3-4 set some flags before callingM. Line 5 tells us that the call to M at line 6 is to be executedin a restricted context where variables of kind “tax benefit”are set to undef . Line 6 runs the M computation, over thecurrent state of the M variables; five M output variables areretained from this M execution, while the rest are discarded.Lines 7-11 represent local variable assignment, where cast has the same effect as + 0 in M, namely, forcing the conver-sion of undef to 0. Then, lines 11-15 set M some variables asinput for later function calls.

After clarifying the semantics of M (Section 2), and designinga new DSL to address its shortcomings (M++, Section 3), wenow present Mlang, a modern compiler for both M andM++.

Mlang takes as input an M codebase, an M++ file, and a filespecifying assumptions (described in the next paragraph).Mlang currently generates Python or C; it also offers a built-in interpreter for computations. Mlang is implemented inOCaml, with around 9,000 lines of code. The general archi-tecture is shown in Fig. 10. The M files and the M++ program Modern Compiler for the French Tax Code Preprint, Jan. 2021, Online

Judgments: Δ , Ω ⊢ 𝑒 £ 𝑣 (“Under Δ , Ω , 𝑒 evaluates into v”) Δ , Ω ⊢ 𝑐 ↝ Ω (“Under Δ , 𝑐 transforms Ω into Ω ”) Cast-float Δ , Ω ⊢ e £ 𝑓 𝑓 ≠ undef Δ , Ω ⊢ cast(e) £ 𝑓 Cast-undef Δ , Ω ⊢ e £ undef Δ , Ω ⊢ cast(e) £ Exists-true ∃ 𝑋 ∈ Ω , 𝑘𝑖𝑛𝑑 ( 𝑋 ) = 𝑘 ∧ Ω ( 𝑋 ) ≠ undef Δ , Ω ⊢ exists(k) £ Exists-false ∀ 𝑋 ∈ Ω , 𝑘𝑖𝑛𝑑 ( 𝑋 ) ≠ 𝑘 ∨ Ω ( 𝑋 ) = undef Δ , Ω ⊢ exists(k) £ Call Ω ⊢ M rules ⇛ Ω if f = call_m Δ , Ω ⊢ Δ ( 𝑓 ) ↝ Ω otherwise Ω ( 𝑌 ) = Ω ( 𝑌 ) if 𝑌 ⇑∈ ⃗ 𝑋 Ω ( 𝑌 ) = Ω ( 𝑌 ) if 𝑌 ∈ ⃗ 𝑋 Δ , Ω ⊢ ⃗ 𝑋 ← f () ↝ Ω Partition Ω ( 𝑌 ) = undef if kind ( 𝑌 ) = 𝑘 Ω ( 𝑌 ) = Ω ( 𝑌 ) otherwise Δ , Ω ⊢ 𝑐 ↝ Ω Ω ( 𝑌 ) = Ω ( 𝑌 ) if kind ( 𝑌 ) = 𝑘 Ω ( 𝑌 ) = Ω ( 𝑌 ) otherwise Δ , Ω ⊢ partition with k ∶ 𝑐 ↝ Ω Delete Δ , Ω ⊢ v = undef ↝ Ω Δ , Ω ⊢ del v ↝ Ω Figure 9.

Reduction rules of M++are first parsed and transformed into intermediate represen-tations. These intermediate representations are inlined into asingle backend intermediate representation (BIR), consistingof assignments and conditionals. Inlining is aware of the se-mantic subtleties described in Fig. 9 and uses temporary vari-able assignments to save/restore the shared M/M++ scope.BIR code is then translated to the optimization intermediaterepresentation (OIR) in order to perform optimizations. OIRis the control-flow-graph (CFG) equivalent of BIR.OIR is the representation on which we perform our op-timizations (Section 4.2). For instance, in order to performconstant propagation, we must check that a given assign-ment to a variable dominates all its subsequent uses. A CFGis the best data structure for this kind of analysis. We lateron switch back to the AST-based BIR in order to generatetextual C output.

Additional assumptions.

In M, a variable not defined inthe current memory environment evaluates to undef (rule

D-Var-Undef , Fig. 4). This permissive behavior is fine for aninterpreter which has a dynamic execution environment;however, our goal is to generate efficient C and Python codethat can be integrated into existing software. As such, declar-ing every single one of the 27,113 possible variables (as foundin the original M rules) in C would be quite unsavory.We therefore devise a mechanism that allows stating aheadof time which variables can be truly treated as inputs, andwhich are the outputs that we are interested in. Since thesevary depending on the use-case, we choose to list these as-sumptions in a separate file that can be provided alongsidewith the M/M++ source code, rather than making this an in-trinsic, immutable property set at variable-declaration time.Doing so increases the quality of the generated C or Python.We call these assumption files ; we have hand-written 5 ofthose.

All is the empty file, i.e. no additional assumptions.This leaves 2459 input variables, and 10,411 output variablesfor the 2018 codebase.

Selected outs enables all input vari-ables, but retains only 11 output variables.

Tests corresponds to the inputs and outputs used in the test files used by theDGFiP.

Simplified corresponds to the simplified simulatorreleased each year by the DGFiP a few months before the fullincome tax computation is released. There are 214 inputs,and we chose 11 output variables.

Basic accepts as inputsonly the marital status and the salaries of each individual ofthe couple. The output is the income tax.

In the 2018 tax code, the initial number of BIR instructionsafter inlining M and M++ files together is 656,020. This es-sentially corresponds to what the legacy compiler normallygenerates, since it performs no optimizations.Thanks to its modern compiler architecture, Mlang caneasily perform numerous textbook optimizations, namelydead code elimination, inlining and partial evaluation. Thisallows greatly improving the quality of the generated code.We now present a series of optimizations, performed onthe OIR intermediate representation. The number of instruc-tions after these optimizations is shown in Fig. 11. Withoutany assumption (

All ), the optimizations shrink the generatedC code to 15% size (a factor of . ). With the most restrictiveassumption file ( Simplified ), only 0.47% optimization.

Definedness analysis.

Due to the presence of undef , someusual optimizations are not available. For example, optimiz-ing e * 0 into is incorrect when e is undef , as undef * 0 =undef . Similarly, e + 0 cannot be rewritten as e . Our partialevaluation is thus combined with a simple definedness anal-ysis. The lattice of the analysis is shown in Fig. 12; we usethe standard sharp symbol of abstract interpretation [7] todenote abstract elements. The transfer function absorb defined in Fig. 13 is used to compute the definedness in thecase of the multiplication, the division and all operators in ∐︀ boolop ̃︀ . The cast transfer function is used for the addi-tion and the subtraction.This definedness analysis enables finer-grained partialevaluation rules, such as those presented in Fig. 14. reprint, Jan. 2021, Online Denis Merigoux, Raphaël Monat, and Jonathan Protzenko sources.msource.mpp M ASTM++ AST M IRM++ IR BIR assumptions.m_spec

OIR CPythonInterpreterParsing Desugaring Inlining Optimization Transpiling

Figure 10.

Mlang compilation passesSpec. name

Figure 11.

Number of instructions generated after optimiza-tion. Instructions with optimizations disabled: 656,020. ⊺ undef ∐︀ float ̃︀ (cid:150) Figure 12.

Definedness lattice 𝑑 𝑑 absorb ( 𝑑 , 𝑑 ) cast ( 𝑑 , 𝑑 ) undef undef undef undef undef ∐︀ float ̃︀ undef ∐︀ float ̃︀ ∐︀ float ̃︀ undef undef ∐︀ float ̃︀ ∐︀ float ̃︀ ∐︀ float ̃︀ ∐︀ float ̃︀ ∐︀ float ̃︀ Figure 13.

Transfer functions over the definedness lattice,implicitly lifted to the full lattice. 𝑒 + undef ↝ 𝑒 𝑒 ∶ ∐︀ float ̃︀ + ↝ 𝑒𝑒 ∗ ↝ 𝑒 𝑒 ∶ ∐︀ float ̃︀ ∗ ↝ max ( , min ( , 𝑥 )) ↝ present ( undef ) ↝ max ( , − max ( , 𝑥 )) ↝ present ( 𝑒 ∶ ∐︀ float ̃︀ ) ↝ Figure 14.

Examples of optimizationsThe optimizations for + 0 and * 0 are invalid in the pres-ence of IEEE-754 special values (NaN, minus zero, infinities)[3, 23]. We have instrumented the M code to confirm that // my_var1 is a local variable always defined

Figure 15.

Custom rounding and truncation rulesthese are valid on the values used. But for safety, these unsafeoptimizations are only enabled if the --fast_math flag is set.

DGFiP (legacy).

The DGFiP’s legacy system has a singlebackend that produces pre-ANSI (K&R) C. For each M rule,two C computations are emitted. The first one aims to deter-mine whether the resulting value is defined. It operates onC’s char type, where is undefined or is defined. The sec-ond computation is syntactically identical, except it operateson double and thus computes the actual arithmetic expres-sion. This two-step process explains some of the operationalsemantics: with being undefined, the special value undef isabsorbing for e.g. the multiplication.Careful study of the generated code also allowed us tonail down some non-standard rounding and truncation ruleswhich had until then eluded us. We list them in Fig. 15; theseare used to implement the built-in operators from Fig. 2 inboth our interpreter and backends. Mlang.

Our backend generates C and Python from BIR.Since BIR only features assignments, arithmetic and condi-tionals, we plan to extend it with backends for JavaScript,R/MatLab and even SQL for in-database native tax computa-tion. Depending on the DGFiP’s appetite for formal verifica-tion, we may verify the whole compiler since the semanticsare relatively small.Implementing a new backend is not very onerous: it tookus 500 lines for the C backend and 375 lines for the Pythonbackend. Both backends are validated by running them overthe entire test suite and comparing the result with our refer-ence interpreter.Our generated code only relies on a small library of helperswhich implement operations over M values. These helpers Modern Compiler for the French Tax Code Preprint, Jan. 2021, Online

Scheme M compiler C compiler Bin. size TimeOriginal DGFiP GCC -O0 ∼ . msOriginal DGFiP GCC -O1 ∼ . msArray Mlang Clang -O0

19 Mo ∼ msArray Mlang Clang -O1

10 Mo ∼ ms Figure 16.

Performance of the C code generated by variouscompilation schemes for the M code.

The time measured isthe time spent inside the main tax computation function for onefiscal household picked in the set of test cases. Size of the compiledbinary is indicated. “Original” corresponds to the DGFiP’s legacysystem. “Local vars” corresponds to Mlang’s C backend mappingeach M variable to a C local variable. are aware of all the semantic subtleties of M and are manuallyaudited against the paper semantics.

Due to the sheer size of the code and number of variables,generating efficient code is somewhat delicate – we had thepleasure of breaking both the Clang and Python parsersbecause of an exceedingly naïve translation. Thankfully, ow-ing to our flexible architecture for Mlang, we were able toquickly iterate and evaluate several design choices.We now show the benefits of a modern compiler infras-tructure, and proceed to describe a variety of instrumen-tations, techniques and tweaking knobs that allowed us togain insights on the the tax computation. By bringing the Mlanguage into the 21 st century, we not only greatly enhancethe quality of the generated code, but also unlock a host oftechniques that significantly increase our confidence in theFrench tax computation. We initially generated C code that would emit one localvariable per M variable. But with tens of thousands of localvariables, running the code required ulimit -s .We analyzed the legacy code and found out that the DGFiPstored all of the M variables in a global array. We imple-mented the same technique and found out that with -O1 ,we were almost as fast as the legacy code. We attribute thisimprovement to the fact that the array, which is a few dozenkB, which fits in the L2 cache of most modern processors.This is a surprisingly fortuitous choice by the DGFiP. SeeFig. 16 for full results. In the grand scheme of things, thecost of computing the final tax is dwarfed by the time spentgenerating a PDF summary for the taxpayer ( ∼ Relying on IEEE-754 and its limited precision for somethingas crucial as the income tax of an entire nation naturally raises questions. Thanks to our new infrastructure, we wereable to instrument the generated code and gain numerousinsights.

Does precision matter?

We tweaked our backend to usethe MPFR multiprecision library [13]. With 1024-bit floats,all tests still pass, meaning that there is no loss of precisionwith the double-precision 64-bit format.

Does rounding matter?

We then instrumented the codeto measure the effect of the IEEE-754 rounding mode on the fi-nal result. Anything other than the default (rounding to near-est, ties to even) generates incorrect results. The control-flowremains roughly the same, but some comparisons against0 do give out different results as the computation skewsnegatively or positively. We plan in the future to devisea static analysis that could formally detect errors, such ascomparisons that are always false, or numbers that may besuspiciously close to zero (denormals).

Fixed precision.

Nevertheless, floating-point computa-tions are notoriously hard to analyze and reason about, sowe set out to investigate replacing floats with integer val-ues. In our first experiment, we adopted big decimals, i.e.a bignum for the integer part and a fixed amount of digitsfor the fractional part. Our test suite indicates that the inte-ger part never exceeds 9999999999 (encodable in 37 bits); italso indicates that with 40 bits of precision for the fractionalpart, we get correct results. This means that a 128-bit integerwould be a viable alternative to a double , with the addedadvantage that formal analysis tools would be able to dealwith it much better.

Using rationals.

Finally, we wondered if it was possibleto completely work without floating-point and eliminate im-precision altogether, taking low-level details such as round-ing mode and signed zeroes completely out of the picture.To that end, we encoded values as fractions where bothnumerator and denominator are big integers. We observedthat both never exceed , meaning we could conceivablyimplement values as a struct with two 128-bit integers and asign bit. We have yet to investigate the performance impactof this change. The DGFiP test suite is painstakingly constructed by handby lawyers year after year. From this test suite, we extracted476 usable test cases that don’t raise any exceptions (seeSection 2.1). The DGFiP has no infrastructure to automati-cally generate cases that would exercise new situations. Assuch, the test suite remains relatively limited in the variety ofhouseholds it covers. Furthermore, many of the hand-writtentests are for previous editions of the tax code, and describesituations that would be rejected by the current tax code.Generating test cases is actually non-trivial: the searchspace is incredibly large, owing to the amount of variables,but also deeply constrained, owing to the fact that most reprint, Jan. 2021, Online Denis Merigoux, Raphaël Monat, and Jonathan Protzenko variables only admit a few possible values (Section 1), andare further constrained in relationship to other variables.We now set out to automatically generate fresh (valid) testcases for the tax computation, with two objectives: assert ona very large number of test cases that our code and the legacyimplementation compute the same result; and exhibit cornercases that were previously not exercised, so as to generatefresh novel tax situations for lawmakers to consider. Randomized testing.

We start by randomly mutating thelegacy test suite, in order to generate new distinct, valid testcases. If a test case raises an exception, we discard it. Weobtain 1267 tests, but these are, unsurprisingly, very close tothe legacy test suite and do not exercise very many new situa-tions. They did, however, help us when reverse-engineeringthe semantics of M. We now have 100% conformance onthose tests.

Coverage-guided fuzzing.

In order to better explore thesearch space, we turn to AFL [33]. The tool admits severalusage modes – finding genuine crashes (e.g. segfaults), orgenerating test cases for further seeding into the rest of thetesting pipeline. We focus on the latter mode, meaning thatwe generate an artificial “crash” when a synthesized testcaseraises no M errors, that is, when we have found a validtestcase. We first devise an injection from opaque binaryinputs, which AFL controls, to the DGFiP input variables.Once “crashes” have been collected, we simply emit a set oftest inputs that has the same format as the DGFiP.Thanks to this very flexible architecture, we were able toperform fully general fuzzing exercising all input variables,as well as targeted fuzzing that focuses on a subset of thevariables. The former takes a few hours on a high-end ma-chine; the latter mere minutes. We synthesized around 30,000tests cases, which we reduced down to 275 using afl-cmin.So far, the fuzzer-generated test case have pointed outof a few bugs in Mlang’s optimizations and backends. Weplan to further use AFL to find find test cases that satisfyextra properties not originally present in the tax code, e.g.an excessively high marginal tax rate that might raise somelegality questions.

Symbolic execution fuzzing.

We attempted to use dy-namic symbolic execution tool KLEE [5], but found out thatit only had extremely limited support for floating-point com-putations. As detailed earlier (Section 5.2), we have foundthat integer based computations are a valid replacement forfloats, and plan to use this alternate compilation scheme toinvestigate whether KLEE would provide interesting testcases.

Finally, we wish to evaluate how “good” our new test casesare. Code coverage seems like a natural notion, especiallyseeing that there is currently none in the DGFiP infrastruc-ture. However, traditional code coverage makes little sense:conditionals are very rare in the generated code. % % % % % % % % % % % % % %Number of distinct values assigned P e r c e n t a g e o f a ss i g n m e n t s DGFiP Private (476 tests) Randomized (1267 tests)Fuzzer-generated (275 tests)

Figure 17.

Value coverage of assignments for each test suiteRather, we focus on value coverage: for each assignmentin the code, we count the number of distinct values assignedduring the execution of an entire test case. This is a goodproxy for test quality: the more different values flow throughan assignment, the more interesting the tax situation is.Fig. 17 shows our measurements. The first take-away isthat our randomized tests did not result in meaningful tests:the number of assignments that are uncovered actually in-creased. The tests we obtained with AFL, however, signifi-cantly increase the quality of test coverage. We managed tosynthesize many tests that exercise statements previouslyunvisited by the DGFiP’s test suite, and exhibit much morecomplex assignments (2 or more different values assigned).Our knowledge of the existing DGFiP test suite is incom-plete, as we only have access to a partial set of tests. Inparticular, a special set of rules apply when the tax needsto be adjusted later on following an audit, and the tests forthese have not been communicated to us. We hope to obtainvisibility onto those in the future.

Formalizing part of the law using logic programming ora custom domain specific language has been extensivelytried in the past, as early as 1914 [1, 2, 9, 12, 15, 25, 28].Most of these works follow the same structure: they take asubset of the law, analyze its logical structure, and encodeit using a novel or existing formalism. All of them stressthe complexity of this formal endeavor, coming from i) theunderlying reality that the law models and ii) the logicalstructure of the legislative text itself. After more a centuryof research, no silver bullet has emerged that would allowto systematically translate the text of a law into a formalmodel.However, domain-specific attempts have been more suc-cessful. Recently, blockchain has demonstrated increasedinterest for domain-specific languages encoding smart con-tracts [14, 16, 27, 32]. Regular private commercial contractshave also been targeted for formalization [6, 30], as well Modern Compiler for the French Tax Code Preprint, Jan. 2021, Online as financial contracts [11, 24]. Concerning the public sec-tor, the “rules as code” movement has been the object of anexhaustive OECD report [22].Closer to the topic of this paper, the logical structure of theUS tax law has been extensively studied by Lawsky [18, 19],pointing out the legal ambiguities in the text of the law thatneed to be resolved using legal reasoning. She also claimsthat the tax law drafting style follows default logic [26], anon-monotonic logic that is hard to encode in languageswith first-order logic (FOL). This could explain, as M is alsobased on FOL, the complexity of the DGFiP codebase.As this complexity generates opacity around the way taxesare computed, another government agency set out to re-implement the entire French socio-fiscal system in Python[29]. Even if this initiative was helpful and used as a compu-tation backend for various online simulators, the results itreturns are not legally binding, unlike the results returned bythe DGFiP. Furthermore, this Python implementation doesnot deal with all the corner cases of the law. To the extent ofour knowledge, our work is unprecedented in terms of sizeand exhaustiveness of the portion of the law turned into areusable and formalized software artifact.

Thanks to modern compiler construction techniques, wehave been able to lift up a legacy, secret codebase into areusable, public artifact that can be distributed into virtuallyany programming environment. The natural next step forthe DGFiP is to consider taking more insight from program-ming languages research, and design a successor to M/M++that provides good tooling for translating the tax law into acorrect and distributable implementation.

Acknowledgments

This work is partially supported by the European ResearchCouncil under Consolidator Grant Agreement 681393 — MOPSAand 683032 — CIRCUS.

References [1] Layman E Allen. 1956. Symbolic logic: A razor-edged tool for draftingand interpreting legal documents.

Yale LJ

66 (1956), 833.[2] Layman E. Allen and C. Rudy Engholm. 1978. Normalized legal draftingand the query method.

Journal of Legal Education

29, 4 (1978), 380–412.[3] Sylvie Boldo, Jacques-Henri Jourdan, Xavier Leroy, and GuillaumeMelquiond. 2015. Verified Compilation of Floating-Point Computa-tions.

J. Autom. Reason.

54, 2 (2015), 135–163. https://doi.org/10.1007/s10817-014-9317-x [4] Sylvie Boldo and Guillaume Melquiond. 2011. Flocq: A Unified Libraryfor Proving Floating-Point Algorithms in Coq. In , Elisardo Antelo, David Hough, and Paolo Ienne (Eds.). IEEEComputer Society, 243–252. https://doi.org/10.1109/ARITH.2011.40 [5] Cristian Cadar, Daniel Dunbar, Dawson R Engler, et al. 2008. Klee: unas-sisted and automatic generation of high-coverage tests for complexsystems programs.. In

OSDI , Vol. 8. 209–224. [6] John J Camilleri. 2017.

Contracts and Computation—Formal modellingand analysis for normative natural language .[7] Patrick Cousot and Radhia Cousot. 1977. Abstract Interpretation: AUnified Lattice Model for Static Analysis of Programs by Constructionor Approximation of Fixpoints. In

POPL . ACM, 238–252.[8] Equipe Leximpact de L’Assemblée nationale. 2019.

LexImpact . https://leximpact.an.fr/ [9] John Dewey. 1914. Logical method and law. Cornell LQ

10 (1914), 17.[10] Direction Générale des Finances Publiques (DGFiP). 2019.

Les règlesdu moteur de calcul de l’impôt sur le revenu et de l’impôt sur la fortuneimmobilière . https://gitlab.adullact.net/dgfip/ir-calcul [11] Jean-Marc Eber. 2009. The Financial Crisis, a Lack of Contract Specifi-cation Tools: What Can Finance Learn from Programming LanguageDesign?.. In ESOP . 205–206.[12] D Fernández Duque, M González Bedmar, D Sousa, Joosten, J.J, andG. Errezil Alberdi. 2019. To drive or not to drive: A formal analysisof requirements (51) and (52) from Regulation (EU) 2016/799. In

Per-sonalidades jurídicas difusas y artificiales . TransJus Working PapersPublication - Edición Especial, 159–171. http://diposit.ub.edu/dspace/handle/2445/137759 [13] Laurent Fousse, Guillaume Hanrot, Vincent Lefèvre, Patrick Pélissier,and Paul Zimmermann. 2007. MPFR: A multiple-precision binaryfloating-point library with correct rounding.

ACM Transactions onMathematical Software (TOMS)

33, 2 (2007), 13–es.[14] Xiao He, Bohan Qin, Yan Zhu, Xing Chen, and Yi Liu. 2018. Spesc: Aspecification language for smart contracts. In , Vol. 1.IEEE, 132–137.[15] Nils Holzenberger, Andrew Blair-Stanek, and Benjamin Van Durme.2020. A Dataset for Statutory Reasoning in Tax Law Entailment andQuestion Answering. arXiv preprint arXiv:2005.05257 (2020).[16] Tom Hvitved. 2011.

Contract formalisation and modular implementationof domain-specific languages . Ph.D. Dissertation. Citeseer.[17] Camille Landais, Thomas Piketty, and Emmanuel Saez. 2011.

Pourune révolution fiscale: Un impôt sur le revenu pour le 21 ème siècle . [18] Sarah B. Lawsky. 2017. Formalizing the Code. Tax Law Review

70, 377(2017).[19] Sarah B. Lawsky. 2018. A Logic for Statutes.

Florida Tax Review (2018).[20] Denis Merigoux, Raphaël Monat, and Jonathan Protzenko. 2020.

Mlang,A Modern Compiler for the French Tax Code . https://github.com/MLanguage/mlang [21] Denis Merigoux, Raphaël Monat, and Jonathan Protzenko. 2021. AModern Compiler for the French Tax Code - Artifact . https://doi.org/10.5281/zenodo.4456774 [22] James Mohun and Alex Roberts. 2020. Cracking the code: Rulemakingfor humans and machines. (2020).[23] Jean-Michel Muller, Nicolas Brunie, Florent de Dinechin, Claude-PierreJeannerod, Mioara Joldes, Vincent Lefèvre, Guillaume Melquiond,Nathalie Revol, and Serge Torres. 2018. Handbook of Floating-PointArithmetic (2nd Ed.) . Springer. https://doi.org/10.1007/978-3-319-76526-6 [24] Grant Olney Passmore and Denis Ignatovich. 2017. Formal Verificationof Financial Algorithms. In

CADE . https://doi.org/10.1007/978-3-319-63046-5_3 [25] Marcos A Pertierra, Sarah Lawsky, Erik Hemberg, and Una-MayO’Reilly. 2017. Towards Formalizing Statute Law as Default Logicthrough Automatic Semantic Parsing.. In ASAIL@ ICAIL .[26] R. Reiter. 1987. Readings in Nonmonotonic Reasoning. Morgan Kauf-mann Publishers Inc., San Francisco, CA, USA, Chapter A Logic for De-fault Reasoning, 68–93. http://dl.acm.org/citation.cfm?id=42641.42646 [27] Vincenzo Scoca, Rafael Brundo Uriarte, and Rocco De Nicola. 2017.Smart contract negotiation in cloud computing. In reprint, Jan. 2021, Online Denis Merigoux, Raphaël Monat, and Jonathan Protzenko International Conference on Cloud Computing (CLOUD) . IEEE, 592–599.[28] M. J. Sergot, F. Sadri, R. A. Kowalski, F. Kriwaczek, P. Hammond, andH. T. Cory. 1986. The British Nationality Act As a Logic Program.

Commun. ACM

29, 5 (May 1986), 370–386.[29] Sébastien Shulz. 2019. Free software to tackle the lack of transparencyin the social tax system. Sociology of a heterogeneous movement atthe margins of the state.

Revue francaise de science politique

69, 5(2019), 845–868. [30] SMU Centre for Computational Law. 2020.