cc (cid:13) th EUROMICRO Conference onDigital System Design, IEEE, Limassol, Cyprus, August 31–September, 2016. P 192–199.https://doi.org/10.1109/DSD.2016.24
HTCC: Haskell to Handel-C Hardware Compiler
Ahmed B. Ablak and Issam Damaj
Electrical and Computer Engineering DepartmentAmerican University of KuwaitSalmiya, KuwaitEmail: { s00015070, idamaj } @auk.edu.kw Abstract —Functional programming languages, such asHaskell, enable simple, concise, and correct-by-constructionhardware development. HTCC compiles a subset of Haskell toHandel-C language with hardware output. Moreover, HTCCgenerates VHDL, Verilog, EDIF, and SystemC programs. Thedesign of HTCC compiler includes lexical, syntax and semanticanalyzers. HTCC automates a transformational derivationmethodology to rapidly produce hardware that maps onto FieldProgrammable Gate Arrays (FPGAs) . HTCC is generatedusing ANTLR compiler-compiler tool and supports an effectiveintegrated development environment. This paper presents thedesign rationale and the implementation of HTCC. Severalsample generations of first-class and higher-order functions arepresented. In-addition, a compilation case-study is presentedfor the XTEA cipher. The investigation comprises a thoroughevaluation and performance analysis. The targeted FPGAsinclude Cyclone II, Stratix IV, and Virtex-6 from Altera andXilinx.
I. I
NTRODUCTION
FPGAs are famous and widely used reconfigurable com-puting (RC) systems. FPGAs have become very popular inresearch and industrial applications in different fields, suchas, security, signal processing etc. FPGAs evolved from beinglimited in functionality and speed to become high-performanceprocessors. Example FPGAs include Stratix from Altera andVirtex from Xilinx [1], [2]. The flexibility of FPGAs, that aresometimes described as seas-of-gates, enable the developmentof software paradigms to rapidly reconfigure hardware almostinstantly.Recently, there has been considerable focus on the devel-opment of high-level synthesis (HLS) and rapid prototypinghardware/software co-design tools. The targets of co-designtools are high design productivity, simplicity, reduced time-to-prototype, correctness, to name a few. Co-design tools includeconverting algorithmic behaviors into digital circuits that canmap onto FPGAs. High-level co-design tools are currently beyond behavioral VHDL and the other standard tools. Thearea witnessed the emergence of programming languages andtools such as Handel-C [3], SystemC [4], Matlab HDL Coder,LabVIEW, etc. All the modern co-design tools enable the inte-gration and partitioning of computations into communicatinghardware and software subsystems.Handel-C is a high-level language with hardware output.Handel-C is based on ANSI C; it is extended to the theory ofcommunication sequential processes (CSP) and the concurrentprogramming language (OCCAM) [5]. Moreover, Handel-Chas the ability to provide both parallel and sequential imple-mentations. Handel-C can target different FPGA types. Recentresearch effort has been on automating hardware generationto target Handel-C and hardware in general starting fromfunctional specifications, such as, Haskell [6]–[9].Haskell is a purely functional programming language thatutilizes functions to construct programs. Utilizing Haskellfunctions is presumed to have no side effects, as the eval-uation order of the functions is independent [10]. Modernfunctional languages are characterized by being strongly typed,concise, clear, lazy, and easy to insure correctness. With nodoubt, developing hardware circuits based on the functionalprogramming paradigm is a promising and modern topic underinvestigation [11]–[13]. Much research effort has been doneto benefit from the advantages of functional programminglanguages in hardware design including
Lava [14],
Hawk [15],[16],
Hydra [17],
HML [18],
MHDL [19],
DDD system [20],
SAFL [21],
MuFP [22],
Ruby [23], and
Form [24].HTCC compiles a subset of Haskell to Handel-C, in addi-tion to automatically generating VHDL, Verilog, EDIF, andSystemC. The design of HTCC compiler includes lexical,syntax and semantic analyzers. The compiler is generatedusing ANTLR based-on a subset of Haskell grammar. HTCCIntegrated Development Environment (IDE) produces a variety a r X i v : . [ c s . P L ] J u l f analysis and schematic files. HTCC successfully connectsto external tools, such as, DK Design Suite, Altera Quartus,and ModelSim. The developed compiler targets several FPGAtypes, and Altera DE2-70 and DE4 FPGA boards. The targetedarea of application is cryptography, namely, the XTEA cipher.The paper is organized so that Section II presents therapid prototyping methodology adopted by HTCC. Section IIIdetails the HTCC construction including the compiler and IDEdesigns. The compiler implementation is presented in SectionIV. Sections V and VI present the compilation approach offirst-class and higher-order functions and a case-study fromcryptography. A thorough analysis and evaluation is presentedin Section VII. Section VIII concludes the paper and sets theground for future works.II. B ACKGROUND
HTCC adopts the transformational derivation and refine-ment methodology of Abdallah et. al [8], [25]. The adoptedmethodology refines functional specifications into parallelhardware implementations in Handel-C. Several case-studiesfor the methodology were carried out by Damaj et. al [9], [26]–[28], however the implementations did not include a compilerthat automates the refinement procedure.Figure 1 depicts the step-wise refinement procedure, wherefunctional specifications are refined to hardware. The adoptedmethodology is systematic in the sense that it is carried outusing the following step-by-step procedure: • Specify the algorithm in a functional setting relying onhigher-order functions as the main building constructswherever necessary. • Apply the predefined set of rules to create the corre-sponding
CSP networks according to a chosen degree ofparallelism. • Write the equivalent
Handel-C code and complete thehardware compilation.The refinement steps are aided by different compilers andintegrated development environments. HTCC automates thedevelopment process including the background run of exist-ing FPGA vendor interfaces and Haskell, Handel-C, VHDL,Verilog, EDIF, and SystemC compilers.The adopted methodology refines both datatypes and func-tions. Datatypes are refined to
Items , Streams , and
Vectors tocreate communicating entities based-on the message passingtechnique. The
Item corresponds to a basic type, such as anInteger data type , and it is to be communicated on a singlecommunicating channel. The
Stream is a purely sequentialmethod of communicating a list of values. The
Vector is arefinement of a simple list of items that communicates theentire structure in parallel [9].In addition, the methodology refines functions to commu-nicating processes. The refinement comprises a library ofstandard processes, such as,
Produce and
Store that aid thecommunication of refined datatypes. The
Produce process isused to produce values on the channels of a certain commu-nication construct (
Item , Stream , Vector , etc.). These valuesare to be received and manipulated by another processes. The
Fig. 1. The transformational derivation and refinement methodology. process
Store stores a communication construct in a simple orcomposite variable [9].The methodology also supports a rich set of refined higher-order functions, such as, map , zip , zipwith , etc. The refinementof higher-order functions to processes could be done in streamor vector settings, or a combination of them. In Handel-C,datatypes are refined to structures ( struct ), while processesare refined to macro procedures [9]. Handel-C compiler gen-erates the required hardware circuits that can be mapped ontoFPGAs. III. C OMPILER C ONSTRUCTION
HTCC is a compiler that automates the presented refinementmethodology. The presented version of HTCC Integrated De-velopment Environment (IDE) supports the following: • Compiles a subset of Haskell to Handel-C • Automatically connects to the DK Design Suite fromMentor Graphics to run the Handel-C Compiler; it ver-ifies, generates, and analyzes the corresponding VHDL,Verilog, EDIF, or SystemC code • Automatically connects to Glasgow Haskell Compiler(GHC) to run and test the Haskell code • Automatically connects to Altera Quartus II to run, test,analyze hardware designs; place and route; produce bitfiles; and target specific FPGAs and FPGA boards. • Provides an easy-to-use, rich, and modern developmentenvironment
A. Compiler Design using ANTLR
HTCC is developed using the compiler-compiler toolANTLR. ANTLR provides an easy-to-use compiler construc-tion structure; ANTLR is efficient, reliable, and effective [26].ANTLR uses an adaptive parsing technique that providesruntime grammar analysis [29]. Moreover, ANTLR uses theExtended BackusNaur Form (EBNF). The efficiency and ef-fectiveness of utilizing ANTLR is primarily due to its ability tosupport direct left-recursion, side-effecting actions (mutators)and predictions from the corresponding grammar [30].Figure 2 demonstrates the state machine diagram of HTCCcompilation procedure. The Lexical Analyzer analyzes thenput Haskell code by producing a numbered list of lexemes.In addition, the Lexical Analyzer divides the code based on theprovided grammar to prepare it for the syntax analysis. TheLexical Analyzer removes all white space between tokens andignores any input with comment symbol ”–”.
Fig. 2. HTCC compiler state machine.
The syntax analyzer is also generated using ANTLR, wherea new parse tree is constructed every compilation. ANTLRprovides the required Java library to construct parse trees andto walk through them starting on the leftmost side. During thewalk-through, the program being compiled is checked for anyerrors based-on the provided grammar to ANTLR.The third stage of HTCC compiler is the semantic analysis,where all types of all functions are checked and stored ina table for further processing. Semantic Analysis checks thetypes of inputs and outputs of each function. The semanticanalyzer walks through the parse tree nodes using ANTLR’stree walker. If any datatype is found to be not supported ormismatched, HTCC terminates the compilation processes andreports the error.After a successful semantic analysis check, HTCC continuesto the intermediate code generation and then to the final codegeneration. In the intermediate stage, all input and outputinterface buses and macros are generated. Then, the numberof connections among macros is determined and passed tothe final generation stage. During the final compilation stage,both Handel-C bus interfaces and Handel-C main method aregenerated. Moreover, the connections among all macros aregenerated. The current version of HTCC does not include anoptimization stage.Figure 3 depicts the correspondence used to generateHandel-C macros from Haskell functions. An example Haskellfunction is as follows: add
Int → Intadd x = x + 3 The add3 function has one input and one output, whereboth are of type integer . The corresponding Handel-C macro for add3 is as follows: macro proc add3 (itemIn, itemOut){typeof itemIn.message x;itemIn.channel ? x;itemOut.channel ! x+3;}
Fig. 3. Code generation of items
It is very important to notice that add3 function can beutilized for list processing. The generation correspondence isshown in Figure 4. vector add
Int ] → [ Int ] vector add x = map ( add x The corresponding Handel-C code includes a version of add3 based on items ; the generic implementation of the parallelversion of the higher-order function map (VMAP); the imple-mentation of function vector add3 that invokes
VMAP macro;and a main function that calls vector add3 with its inputs,outputs, and the number of elements in each vector. Theparallel instances of add3 are replicated using the par operatorin Handel-C. The generated code is as follows: macro proc add3 (itemIn, itemOut){typeof itemIn.message x;fitemIn.channel ? x;itemOut.channel ! (x+3);}macro proc VMAP(vectorIn,vectorOut,n,F){typeof(n) c;par(c=0;c Fig. 4. Code generation of parallel list processing B. IDE Design The technique used in the development of the IDE separatesthe programming concern in structuring the code in differentJar files. HTCC IDE adopts the iterative and incrementaldesign model (IIDM) [31]. In the IIDM, each component ofthe IDE is developed separately as a standalone project whichallows it to be integrated into multiple projects. The IDE isimplemented using Java under Netbeans [32]. The code editoris implemented using RSyntaxTextArea Java framework. TheIDE theme is implemented using JTattoo Java framework.Figure 5 demonstrates the use-case diagram of HTCC IDE.The proposed IDE supports the following: • Editing and storing project files • Highlighting and automatic code completion • File navigation, and allows to open multiple files simul-taneously • Running Haskell code under GHC • Compiling Haskell code to Handel-C code. Accordinglysimulating Handel-C code and generating VHDL, EDIF,Verilog, and SystemC implementations. • Compiling the generated HDL files using Altera Quar-tus. Accordingly, producing analysis and FPGA mappingfiles. The IDE connects HTCC Compiler to external tools, suchas, DK Design Suite to simulate and generate VHDL, Verilog,EDIF, and SystemC files. In addition, the IDE connects thecompiler to Altera Quartus using the TCL commands tosynthesize and generate timing analyses, pin assignments forFPGA boards, and generate bit files to program the targetedFPGAs. GHC is also connected to the IDE to execute andverify Haskell functions. Figure 6 shows a snapshot of theHTCC IDE. Fig. 5. Use-Case diagramFig. 6. HTCC IDE IV. C OMPILER I MPLEMENTATION The following subset of Haskell grammar is part of HTCCcompiler code. Here, functions are divided into decelerations( dcFun ) and definitions ( dFun ):ROG : ST AT +; STAT : dcF un ; dcFun : ID (cid:48) :: (cid:48) f ormalT ype ( − > ) ∗ N L + dF un ; expr : expr op = ( (cid:48) ∗ (cid:48) | (cid:48) / (cid:48) )( DIGIT | expr ) | exprop = ( (cid:48) . & . (cid:48) | (cid:48) . || . (cid:48) )( DIGIT | expr ) | exprop = ( (cid:48) + (cid:48) | (cid:48) − (cid:48) )( DIGIT | expr ) | ( (cid:48) xor (cid:48) exprDIGIT ) | ( (cid:48) shif tL (cid:48) exprDIGIT ) | ( (cid:48) shif tR (cid:48) exprDIGIT ) | mP assing ( mP assing ) ∗| exprmP assing | ID ∗ According to the proposed grammar an expression ( expr )has multiple meanings that captures the definition of thefunction. expr can be any arithmetic or logic operation betweentwo or more variables. In addition, an expression expr cancall other functions that take place at mPassing node. Figure7 demonstrates the parse tree of the following function: f :: Int → Intf x = x + 3 Fig. 7. The parse tree of function f. A subset of the lexer grammar is as following:ID : [ a − zA − Z ] + [0 − ∗ ; NL : (cid:48) \ r (cid:48) ? (cid:48) \ n (cid:48) ; ARROW : (cid:48) − > (cid:48) | (cid:48) → (cid:48) ; WS : [ \ t ] + → SKIP ; DIGIT : [0 − + ; COMMENT : (cid:48) − − (cid:48) . ∗ ? (cid:48) \ r (cid:48) ? (cid:48) \ n (cid:48) → SKIP ; V. FIRST-CLASS AND HIGHER-ORDERHASKELL FUNCTIONSHTCC can generate both first-class and higher-order func-tions. First-class functions represent simple binary operations,while higher-order functions can take other functions as pa-rameters and usually are operated on lists. A. First-Class Functions A sample generation of the binary operation OR is shownin the following: or :: Int → Int → Intor a b = a . | . b By compiling the function or under HTCC, the generatedHandel-C code comprises three items - each has a messageof width 32 bits. The first two items are a and b , and thethird item is where the result is stored. In addition, HTCCgenerates the macro OR . HTCC generates three interfaces thatare input0 , input1 , and output0 for the inputs and output. Inthe main method, HTCC creates three items to produce thetwo inputs and store the output. Similar first-class functions,such as, AND , XOR , ADD , SUB , DIV can be generated in asimilar way. To run the compiled code on the Altera DE2-70,the following is automatically generated by HTCC. set clock = external"AD15";set reset = external"L8"; B. Higher-Order Functions HTCC utilizes a set of parallel and sequential versionsof a set of higher-order functions including map , zipWith , foldr , etc. The following is a sample generation of a parallelzipping of two lists with multiplication. Each list containsten elements. The generation employs the VectorOfItemsstructure and the parallel version of produce and store macros. mul :: Int → Int → Intmul x y = x ∗ ywo vectors mul :: [ Int ] → [ Int ] → [ Int ] two vectors mul a b = zipW ith ( mul ) a b macro proc mul (xItem, yItem,output){typeof (xItem.message) x, y;xItem.channel ? x;yItem.channel ? y;output.channel ! (x*y);}macro proc VZIPWITH ( vectorIn1, vectorIn2,vectorOut, n, F){typeof (n) c;par (c =0; c< n; c++){F(vectorIn1.elements[c], vectorIn2.elements[c],vectorOut.elements[c]); }}macro proc two_vectors_mul(vectorIn1,vectorIn2,vectorOut,n){VZIPWITH(vectorIn1, vectorIn2, vectorOut, 100, mul);}void main (){VectorOfItems(vector0, 10, unsigned 32);VectorOfItems(vector1, 10, unsigned 32);VectorOfItems(vector2, 10, unsigned 32);par{VPRODUCE(INPUT0, vector0, 10);VPRODUCE(INPUT1, vector1, 10);two_vectors_mul(vector0,vector1,vector2,10);VSTORE(vector2, OUTPUT0);}} VI. C ASE -S TUDY : T HE R APID P ROTOTYPING OF XTEA UNDER HTCCTo test the applicability of the developed compiler, weuse the extended tiny encryption algorithm (XTEA) as acase-study. XTEA uses a 128-bit key to encrypt a 64-bitblock ciphertext which follows Feistel ciphers structure witha variable number of rounds. The 128-bit plaintext is dividedinto two integers V0 and V1 . The key produces a set ofinteger sub-keys to be distributed to the appropriate round.XTEA is small in size, light in weight, low in power, anda secure block cipher [33]. The following is the functionalspecification of the XTEA single round under Haskell: xteasround :: Int → uInt → ( uInt , uInt → uInt → ( uInt , uInt xteasround sum x @( v , v key xxteasround rounds sum ( v , v key xteasround ( rounds + 1) new sum ( new v , new v key wherenew v xteav v v sum key new sum = xteasum sumnew v xteav new v v new sum key xteav uInt → uInt → uInt → uInt → uInt xteav v v sum key v xor ( key sum ) ( v xor ( shif tL v shif tR v xteasum :: uInt → uInt xteasum sum = sum + 0 x e b xteav uInt → uInt → uInt → uInt → uInt xteav v v sum key v xor ( key sum ) ( v xor ( shif tL v shif tR v The data type uInt32 is a user-defined unsigned integerwith 32 bits width. A single round of XTEA generatesthe following sample main function. However, the function xteasround produces a macro XTEASROUND when the 32rounds are replicated to implement the top-level function xtea . void main {par{PRODUCE(INPUT0.value, item0);PRODUCE(INPUT1.value, item1);PRODUCE(INPUT2.value, item2);PRODUCE(INPUT3.value, item3);xteav0(item0, item1, item2, item3, item4);xteasum(item3, item5);xteav1(item4, item1, item2, item5, item6);STORE (item4, OUTPUT0);STORE (item5, OUTPUT1);STORE (item6, OUTPUT2);}} Fig. 8. A single XTEA round with its internal computational constructs. Thecrossed square for the sum, crossed circle for an XOR, >> for a right shift, << for a left shift. VII. A NALYSIS AND E VALUATION The proposed compiler allows for the rapid prototyping ofhardware circuits at a high-level of abstraction based-on func-tional specifications. Functional programming enables design-ing hardware using clear, concise, and correct-by-constructionspecifications. Overall, the proposed compiler translates asubset of Haskell to Handel-C and thus enables the usage ofHaskell as a hardware description language for programmingFPGAs.TCC adopts an effective transformational derivation ap-proach that enables the systematic development of CSP con-currency descriptions. Accordingly, the automatic generationof Handel-C code is possible and effective in generatingVHDL, EDIF, Verilof, and SystemC descriptions. The refine-ment methodology provides a variety of parallelism techniquesto specify the required degree of parallelism. The methodologyprovided HTCC with the characteristics of generating a vari-ety of implementations with different parallel characteristics.HTCC benefited from the off-the-shelf first-order, higher-order, and application-specific libraries provided by Damaj etal. [9], [27], [28] and automated the refinement procedure.HTCC IDE enables the testing and evaluation of bothHaskell and Handel-C code through the background connec-tion to their native compilers. HTCC IDE offers the optionsto display analysis reports supported by Quartus, such as,power consumption, area utilization, timing, RTL views, pinassignments, etc. Furthermore, the adopted IIDM techniqueallows for the rapid development and integration of the variousparts of the IDE with simplicity.Although the use of ANTLR made the compiler implemen-tation simple, additions are necessary. The main addition inHTCC is the semantic analyzer that was embedded into theadopted ANTLR structure. The embedding enabled effectivelyfor type checking and error reporting using the supportedexception handling mechanism.Table I presents the performance analysis results of theXTEA cipher as generated by HTCC and tested under CycloneII, Stratix IV and Virtex-6 FPGAs. The Cyclone II FPGA ispart of the targeted DE2-70 board. The Stratix IV FPGA ispart of the targeted Altera DE4 board. The Virtex-6 FPGAis a high-speed FPGA from Xilinx. The Total Number ofNAND Gates as measured under DK Design Suite is 467969with a total of 192 clock cycles. The highest frequencyachieved is 648.54 MHz under Virtex-6, and the lowest powerconsumption achieved is 219.62 mW under the Cyclone II. Inaddition, the highest throughput is 219.3 Mbps under XilinxVirtex-6 FPGA. TABLE IXTEA I MPLEMENTATION R ESULTS Cyclone II Stratix IV Virtex-6Total logic elements Fmax (MHz) 183.18 513.8 648.54 Total Execution Time (ns) 5.46 1.95 1.52 Throughput (Mbps) 61.06 171.26 219.3 Power consumed (mW) 219.62 888.47 912.4 As compared to the performance reported in [33]–[36], theresults produced by HTCC achieved the highest throughput of219.3 Mbps under the Virtex-6 (See Table II). A behavioralimplementation of the XTEA cipher under VHDL achieved134 Mbps, however, the main purpose of the implementationwas to achieve a compact and low-power design [33]. Themanual Handel-C (HC) implementation achieved a speed of44.25 Mbps with an Fmax of 177 and an area of 720 LogicElements. VIII. C ONCLUSION HTCC is a Haskell to Handel-C hardware compiler thattargets FPGAs. HTCC automates a transformational derivationmethodology to rapidly produce hardware circuits from func-tional specifications. The adopted methodology refines func-tional programs to a formal concurrency framework, namely,CSP. The methodology enables the systematic refinement ofthe CSP descriptions to Handel-C; HTCC comes to make thisprocess automatic. Nevertheless HTCC doesn’t produce CSPdescriptions, this is identified as a future development. Thedeveloped compiler effectively produces hardware circuits invarious descriptions and languages, such as, VHDL, Verilog,EDIF, and SystemC. HTCC connects to a bouquet of hard-ware design tools to produce a rich-set of analysis reportsand bit-stream files that can map to different FPGAs. Thepaper includes a case-study from cryptography that producescomparable, and in some instances better results than what isreported in the literature. Indeed, HTCC adopted a functionalprogramming style to benefit from its simplicity, conciseness,and correctness. Future work includes expanding the area ofapplication and widening the pool of implemented Haskellsyntax and parallelization options.R Proceedings of ISSS01 , October 2001.[5] I. page, “Closing the gap between hardware and software: hardware-software cosythesis at oxford,” in IEE Colloquium on Hardware-Software Cosynthesis for Reconfigurable Systems , February 1996, pp.200–211.[6] I. W. Damaj, “Higher-Level Hardware Synthesis of the KASUMIAlgorithm,” Journal of Computer Science and Technology , vol. 22,no. 1, pp. 60–70, 2007. [Online]. Available: http://dx.doi.org/10.1007/s11390-007-9007-9[7] J. Hawkins and A. E. Abdallah, “Hardware synthesis of a paralleljpeg decoder from its functional specification,” in Design Methods andApplications for Distributed Embedded Systems . Springer, 2004, pp.197–206.[8] A. E. Abdallah and J. Hawkins, “Formal behavioural synthesis ofHandel-C parallel hardware implementation for functional specifica-tions,” in Proceedings of the 36th annual Hawaii international con-ference on system sciences . IEEE Computer Society Press, 2003, pp.278–288.[9] I. Damaj, “Parallel Algorithms Development for Programmable Deviceswith application from cryptography,” International Journal of ParallelProgramming , vol. 35, no. issue: 6, pp. 529–572, 1st Dec. 2007, journal(Purpose), Published (Status), Elsevier Science (Publisher), New York,U.S.A. (Address), DOI: 10.1007/s10766-007-0046-1.[10] S. Thompson, Haskell: The Craft of Functional Programming . Boston,MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1997.[11] P. Bjesse, K. Claessen, M. Sheeran, and S. Singh, “Lava: HardwareDesign in Haskell,” in Proceedings of the Third ACM SIGPLANInternational Conference on Functional Programming , ser. ICFP ’98.New York, NY, USA: ACM, 1998, pp. 174–184. [Online]. Available:http://doi.acm.org/10.1145/289423.289440[12] C. Baaij, “C λ ash : from Haskell to hardware,” December 2009.[Online]. Available: http://essay.utwente.nl/59482/[13] A. ACOSTA, “Hardware synthesis in ForSyDe,” June 2007. [Online].Available: http://people.kth.se/ ∼ ingo/Papers/ThesisAlfonsoAcosta2007.pdf[14] M. Sheeran, “Hardware design and functional programming: a perfectmatch.” J. UCS , vol. 11, no. 7, pp. 1135–1158, 2005.15] J. Launchbury, J. Lewis, and B. Cook, “On embedding a microarchitec-tural design language within haskell,” in Proceedings of the fourth ACMSIGPLAN international conference on Functional programming . ACMPress, 1999, pp. 60–69.[16] J. Matthews, J. Launchbury, and B. Cook, “Specifying microprocessorsin hawk,” in Proceedings of the International Conference on ComputerLanguages . IEEE, May 1998, pp. 90–101.[17] J. O’Donnell, “Hydra: hardware description in a functional language us-ing recursion equations and high order combining forms,” in The Fusionof Hardware Design and Verification , G. J. Milne, Ed. Amsterdam:North-Holland, 1988, pp. 309–328.[18] Y. Li and M. Leeser, “HML: An innovative hardware design languageand its translation to VHDL,” in Conference on Hardware DesignLanguages , June 1995.[19] D. Barton, “Advanced modeling features of MHDL,” in In InternationalConference on Electronic Hardware Description Languages , January1995.[20] S. Johnson and B. Bose, “DDD: A system for mechanized digital designderivation,” Indiana University, Indiana, Tech. Rep. 323, 1990.[21] R. Sharp, “Higher-level hardware synthesis,” Ph.D. dissertation, Robin-son College University of Cambridge, Cambridge, November 2002.[22] M. Sheeran, “muFP: a language for VLSI design,” in Proc. ACMSymposium on LISP and Functional Programming . ACM Press, 1984,pp. 104–112.[23] G. Jones and M. Sheeran, “Circuit design in ruby,” In Formal Methodsfor VLSI design , pp. 13–70, 1990.[24] T. Cheung and G. Hellestrand, “Multi-level equivalence in design trans-formation,” in Proceedings of International Conference on ComputerHardware Description Languages , Chiba Japan, September 1996, pp.559–566.[25] A. E. Abdallah, “Functional process modelling,” Research Directions inParallel Functional Programming, (Springer Verlag, October 1999) , pp.339–360, October 1999.[26] T. Parr, The Definitive ANTLR 4 Reference , 2nd ed. PragmaticBookshelf, 2013.[27] I. Damaj, “Co-designs of Parallel Rijndael,” in The International Sym-posium on System-on-Chip . Tampere, Finland: IEEE, 1-2 November2011, pp. 72–77.[28] ——, “Parallel AES Development for Programmable Devices,” in TheFourth IASTED International Conference on Parallel and DistributedComputing and Networks , IASTED. Innsbruck - Austria: Acta Press,February 2009.[29] T. Parr, Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages , 1st ed. PragmaticBookshelf, 2009.[30] T. Parr, S. Harwell, and K. Fisher, “Adaptive LL(*) Parsing: The Powerof Dynamic Analysis,” SIGPLAN Not. , vol. 49, no. 10, pp. 579–598, Oct.2014. [Online]. Available: http://doi.acm.org/10.1145/2714064.2660202[31] I. Jacobson, G. Booch, J. Rumbaugh, J. Rumbaugh, and G. Booch, Theunified software development process . Addison-Wesley Reading, 1999,vol. 1.[32] D. R. Heffelfinger, Java EE 7 Development with NetBeans 8 . PacktPublishing Ltd, 2015.[33] I. Damaj, S. Hamade, and H. Diab, “Efficient Tiny Hardware Cipher un-der Verilog,” in Proceedings of the 2008 High Performance Computingand Simulation Conference , 2008.[34] M. Botta, M. Simek, and N. Mitton, “Comparison of hardware andsoftware based encryption for secure communication in wireless sensornetworks,” in Telecommunications and Signal Processing (TSP), 201336th International Conference on , July 2013, pp. 6–10.[35] P. Yalla and J. Kaps, “Lightweight Cryptography for FPGAs,” in International Conference on Reconfigurable Computing and FPGAs,2009 , Dec 2009, pp. 225–230.[36] I. A. Shweta Gaba and D. Sujata, “Design of Efficient XTEA usingVerilog,” International Journal of Scientific and Research Publications ,vol. 2, June 2012. TABLE IIC OMPASSION AMONG SIMILAR XTEA HARDWARE IMPLEMENTATION Reference [34] [35] [36] [33]Logic elements NA 424 LUTs 1182 LUTs 539 Slices Fmax (MHz) NA NA 71.11 142.4 Total Exe. Time Throughput Reference Manual HC HTCCLogic elements 720 LE 26660 Slices Fmax (MHz) 177 648.54 Total Exe. Time