TThis is a preprint of the following publication:Tristan Miller and Denis Auroux. GPP, the generic preprocessor.
Journal of Open Source Software , 5(51), July2020. ISSN 2475-9066. DOI: 10.21105/joss.02400
GPP, the Generic Preprocessor
Tristan MillerAustrian Research Institute for Artificial IntelligenceFreyung 6/3, 1010 Vienna, AustriaORCID: 0000-0002-0749-1100Denis AurouxDepartment of Mathematics, Harvard University1 Oxford Street, Cambridge, MA 02138, USA
Summary
In computer science, a preprocessor (or macro pro-cessor ) is a tool that programatically alters its input,typically on the basis of inline annotations, to pro-duce data that serves as input for another program.Preprocessors are used in software development anddocument processing workflows to translate or ex-tend programming or markup languages, as wellas for conditional or pattern-based generation ofsource code and text. Early preprocessors were rela-tively simple string replacement tools that were tiedto specific programming languages and applicationdomains, and while these have since given rise tomore powerful, general-purpose tools, these oftenrequire the user to learn and use complex macrolanguages with their own syntactic conventions. Inthis paper, we present GPP, an extensible, general-purpose preprocessor whose principal advantage isthat its syntax and behaviour can be customized tosuit any given preprocessing task. This makes GPPof particular benefit to research applications, whereit can be easily adapted for use with novel markup,programming, and control languages.
Background
Preprocessors date back to the mid-1950s, whenthey were used to extend individual assembly lan-guages with constructs that would later be found inhigh-level programming languages (Layzell, 1985).These languages, in turn, fostered the developmentof yet more special-purpose preprocessors aimed at providing even higher-level constructs, such asconditional loops and other control structures inFORTRAN (Meissner, 1975) and COBOL (Tri-ance, 1980). The need for generalized, language-independent tools was eventually recognized (McIl-roy, 1960), leading to the development of general-purpose preprocessors such as GPM (Strachey, 1965)and ML/I (Brown, 1967).By the end of the 1960s, preprocessors had at-tracted a considerable amount of attention, by com-puting theorists and practitioners alike, and theiruse in software engineering had expanded beyondthe augmentation and adaptation of programminglanguages. A survey paper by Brown (1969) iden-tified four broad application areas: language ex-tension, systematic searching and editing of sourcecode, translation between programming languages,and code generation (i.e., simplifying the writing ofhighly repetitive code, parameterizing a program bysubstituting compile-time constants, or producingvariants of a program by conditionally including cer-tain statements or modules). While the first three ofthese application areas have largely been renderedobsolete by today’s integrated development envi-ronments and expressive, feature-rich programminglanguages, implementing software variability withlanguage-specific and general-purpose preprocessorsremains commonplace (Apel et al., 2013; Kstneret al., 2012).Text processing became another main applicationarea for preprocessors, in particular to generate doc-uments on the basis of user-specified conditions orpatterns, and to convert between document markuplanguages (Walden, 2014). The earliest such uses1 a r X i v : . [ c s . P L ] A ug ere ad-hoc repurposings of programming language–specific preprocessors to operate on human-readabletexts (Keese, 1964; Stallman and Weinberg, 2020);these were soon supplanted by text-specific macrolanguages such as TRAC (Mooers and Deutsch,1965), which were positioned as tools for stenogra-phers and other writing professionals. More recentlyit has been common to use general-purpose prepro-cessors (Mailund, 2019; Pesch, 1992). Statement of Need
Criticism of preprocessors commonly focuses on theidiosyncratic languages they employ for their ownbuilt-in directives and for users to define and in-voke macros. The languages of early preprocessorswere derided as “clumsy and restrictive” (Layzell,1985) and “hard to read” (Brown, 1969), and evenmodern preprocessors are sometimes attacked forrelying on “the clumsiness of a separate languageof limited expressiveness” (Ernst et al., 2002) or,at the other extreme, for being overly complicated,quirky, opaque, or hard to learn, even for experi-enced programmers and markup users (Ernst et al.,2002; Paddon, 1993; Pesch, 1992).Our general-purpose preprocessor, GPP, avoidsthese issues by providing a lightweight but flexiblemacro language whose syntax can be customizedby the user. The tool’s built-in presets allow itsdirectives to be made to resemble those of manypopular languages, including HTML and TEX. Thisgreatly reduces the learning curve for GPP when it isused with these languages, eliminates the cognitiveburden of repeatedly “mode switching” betweensource and preprocessor syntax when reading orcomposing, and allows existing syntax highlightersand other tools to process GPP directives with littleor no further configuration. Furthermore, users arenot limited to using these presets, but can fullydefine their own syntax for GPP directives andmacros. This makes GPP particularly attractive foruse in research and development, where its syntaxcan be readily adapted to match novel programmingand markup languages.GPP’s independence from any one programmingor markup language makes it more versatile thanthe C Preprocessor, which was formerly “abused” asa general text processor and is still sometimes (inap-propriately) used for non-C applications (Stallman and Weinberg, 2020). While GPP is less powerfulthan m4 (Seindal et al., 2016), it is arguably moreflexible, and supports all the basic operations ex-pected of a modern, high-level preprocessing system,including conditional tests, arithmetic evaluation,and POSIX-style wildcard matching (“globbing”).In addition to macros, GPP understands commentsand strings, whose syntax and behaviour can alsobe widely customized to fit any particular purpose.
GPP in research
GPP has already been integrated into a number ofthird-party projects in basic and applied research.These include the following: • The Waveform Definition Language (WDL) isCaltech Optical Observatories’ C-like languagefor programming astronomical research cam-eras. WDL uses GPP to preprocess configura-tion files containing signals and parameters spe-cific to the camera controllers, flags setting thedevices’ operating modes and image properties,and timing rules. According to the develop-ers, GPP was chosen over the C Preprocessor“for added flexibility and to avoid some C-likelimitations” (Kaye et al., 2017). • XSB is a research-oriented, commercial-gradelogic programming system and Prolog compiler.The developers chose to make GPP XSB’s de-fault preprocessor because it “maintains a highdegree of compatibility with the C preproces-sor, but is more suitable for processing Prologprograms” (Swift et al., 2017). • C-Control Pro is a family of electronic mi-crocontrollers produced by Conrad Electronic;they are specifically designed for industrial andautomotive applications. The official softwaredevelopment kit includes a modified version ofGPP for use with the products’ BASIC andCompact-C programming languages (Schirmand Sprenger, 2007). • SUS is a tool that allows system administratorsto exercise fine-grained control over how userscan run commands with elevated privileges. Ithas a sophisticated control file syntax that ispreprocessed with GPP (Gray, 2001).2part from these uses, GPP is occasionally cited asprevious or related work in scholarly publicationson metaprogramming or compile-time variability ofsoftware (Apel et al., 2013; Baxter and Mehlich,2001; Behringer, 2017; Blendinger, 2010; Dreiling,2010; Kstner et al., 2012; Lotoreychik and Shopyrin,2006; Zmiry, 2016).
Acknowledgments
Tristan Miller is supported by the Austrian ScienceFund (FWF) under project M 2625-N31. DenisAuroux is partially supported by NSF grant DMS-1937869 and by Simons Foundation grant
References
Apel, Sven et al. (Oct. 2013). Classic, Tool-DrivenVariability Mechanisms.
Feature-Oriented Soft-ware Product Lines . Berlin/Heidelberg: Springer-Verlag. isbn : 978-3-642-37520-0. doi : .Baxter, Ira D. and Michael Mehlich (2001). Pre-processor Conditional Removal by Simple PartialEvaluation. Proceedings of the 8th Working Con-ference on Reverse Engineering . IEEE, pp. 281–290. isbn : 0-7695-1303-4. doi :
10 . 1109 / WCRE .2001.957833 .Behringer, Benjamin (July 2017). “ProjectionalEditing of Software Product Lines The PEoPLApproach”. PhD thesis. Faculty of Sciences, Tech-nology and Communication, Universit de Luxem-bourg.Blendinger, Frank (Aug. 2010). “A Filesystem-Based Approach to Support Product Line De-velopment with Editable Views”. Diploma Thesis.Department of Computer Sciences 4, Friedrich-Alexander University Erlangen-Nuremberg.Brown, P. J. (Oct. 1967). The ML/I Macro Proces-sor.
Communications of the ACM issn : 0001-0782. doi : .Brown, P. J. (1969). A Survey of Macro Proces-sors. Annual Review in Automatic Programming issn : 0066-4138. doi : . Dreiling, Alexander (July 2010). “Feature Mining:Semiautomatische Transition von (Alt-)Systemenzu Software-Produktlinien”. Diploma thesis.Fakultt fr Informatik, Institut fr Technischeund Betriebliche Informationssysteme, Otto-von-Guericke-Universitt Magdeburg.Ernst, Michael D., Greg J. Badros, and DavidNotkin (Dec. 2002). An Empirical Analysis of CPreprocessor Use. IEEE Transactions on SoftwareEngineering issn : 0098-5589. doi : .Gray, Peter D. (Dec. 2001). SUS An Object Ref-erence Model for Distributing UNIX Super UserPrivileges. Proceedings of the LISA 2001 15th Sys-tems Administration Conference . The USENIXAssociation, pp. 15–18.Kstner, Christian et al. (June 2012). Type CheckingAnnotation-Based Product Lines.
ACM Trans-actions on Software Engineering and Methodol-ogy doi :
10 . 1145 / 2211616 .2211617 .Kaye, Stephen et al. (2017).
Waveform DefinitionLanguage . Tech. rep. Pasadena, CA: Caltech Op-tical Observatories.Keese Jr., W. M. (Sept. 1964).
A Note on Auto-matic Generation of Documentation by MacroAssemblers . Technical memorandum TM-64-1031-1. Washington, DC: Bellcom, Inc.Layzell, P. J. (Jan. 1985). The History of MacroProcessors in Programming Language Extensi-bility.
The Computer Journal issn :0010-4620. doi : .Lotoreychik, V. Yu. and D. G. Shopyrin (2006).Metaprogrammirovaniye na osnove tekstovogopreprotsessora [Text PreprocessorBased Metapro-gramming]. Nauchno-Tehnicheskii Vestnik Infor-matsionnykh Tekhnologii, Mekhaniki i Optiki [Sci-entific and Technical Journal of Information Tech-nologies, Mechanics and Optics] issn :2226-1494.Mailund, Thomas (2019). Preprocessing.
Introduc-ing Markdown and Pandoc: Using Markup Lan-guage and Document Converter . Berkeley, CA:Apress. isbn : 978-1-4842-5148-5. doi : .McIlroy, M. Douglas (Apr. 1960). Macro InstructionExtensions of Compiler Languages. Communica-tions of the ACM issn : 0001-0782. doi : .3eissner, Loren P. (Sept. 1975). On Extending For-tran Control Structures to Facilitate StructuredProgramming. SIGPLAN Notices issn : 0362-1340. doi : .Mooers, Calvin N. and L. Peter Deutsch (Aug. 1965).TRAC, a Text-Handling Language. ACM ’65:Proceedings of the 20th National Conference . Ed.by Lewis Winner. New York: Association for Com-puting Machinery, pp. 229–246. isbn : 978-1-4503-7495-8. doi : .Paddon, Michael (1993). Shake: A Portable Toolfor Generating Makefiles. AUUG ’93 ConferenceProceedings . Kensington, NSW, Australia: AUUGInc., pp. 145–156.Pesch, R. H. (1992). Configurable Manuals.
Confer-ence Record on Crossing Frontiers , pp. 776–780. isbn : 0-7803-0788-7. doi : .Schirm, Reiner and Peter Sprenger (2007). DerPreprozessor. Messen, Steuern und Regeln mitC-Control Pro: Praxisanwendungen, Schaltung-stechnik und Programmierung . Poing, Germany:Franzis. isbn : 978-3-7723-4097-0.Seindal, Ren et al. (Dec. 2016).
GNU M4, Version1.4.18 . Free Software Foundation.Stallman, Richard M. and Zachary Weinberg (2020).Overview.
The C Preprocessor . GCC 10.1.0. FreeSoftware Foundation.Strachey, C. (Jan. 1965). A General Purpose Macro-generator.
The Computer Journal issn : 0010-4620. doi : .Swift, Theresa et al. (Oct. 2017). The XSB System,Version 3.8.x, Volume 1: Programmer’s Manual .Triance, J. M. (1980). Structured Programmingin COBOLThe Current Options.