[PDF] Approximate Logic Synthesis: A Reinforcement Learning-Based Technology Mapping Approach

Abstract

Approximate Logic Synthesis (ALS) is the process of synthesizing and mapping a given Boolean network to a library of logic cells so that the magnitude/rate of error between outputs of the approximate and initial (exact) Boolean netlists is bounded from above by a predetermined total error threshold. In this paper, we present Q-ALS, a novel framework for ALS with focus on the technology mapping phase. Q-ALS incorporates reinforcement learning and utilizes Boolean difference calculus to estimate the maximum error rate that each node of the given network can tolerate such that the total error rate at non of the outputs of the mapped netlist exceeds a predetermined maximum error rate, and the worst case delay and the total area are minimized. Maximum Hamming Distance (MHD) between exact and approximate truth tables of cuts of each node is used as the error metric. In Q-ALS, a Q-Learning agent is trained with a sufficient number of iterations aiming to select the fittest values of MHD for each node, and in a cut-based technology mapping approach, the best supergates (in terms of delay and area, bounded further by the fittest MHD) are selected towards implementing each node. Experimental results show that having set the required accuracy of 95% at the primary outputs, Q-ALS reduces the total cost in terms of area and delay by up to 70% and 36%, respectively, and also reduces the run-time by 2.21 times on average, when compared to the best state-of-the-art academic ALS tools.

Full PDF

AApproximate Logic Synthesis: A ReinforcementLearning-Based Technology Mapping Approach

Ghasem Pasandi, Shahin Nazarian, and Massoud PedramDepartment of Electrical and Computer EngineeringUniversity of Southern California, Los Angeles, CA 90089. { pasandi, shahin.nazarian, pedram } @usc.edu Abstract —Approximate Logic Synthesis (ALS) is the process ofsynthesizing and mapping a given Boolean network to a library oflogic cells so that the magnitude/rate of error between outputs ofthe approximate and initial (exact) Boolean netlists is boundedfrom above by a predetermined total error threshold. In thispaper, we present Q-ALS, a novel framework for ALS withfocus on the technology mapping phase. Q-ALS incorporatesreinforcement learning and utilizes Boolean difference calculusto estimate the maximum error rate that each node of thegiven network can tolerate such that the total error rate at nonof the outputs of the mapped netlist exceeds a predeterminedmaximum error rate, and the worst case delay and the total areaare minimized. Maximum Hamming Distance (MHD) betweenexact and approximate truth tables of cuts of each node is usedas the error metric. In Q-ALS, a Q-Learning agent is trainedwith a sufﬁcient number of iterations aiming to select the ﬁttestvalues of MHD for each node, and in a cut-based technologymapping approach, the best supergates (in terms of delay andarea, bounded further by the ﬁttest MHD) are selected towardsimplementing each node. Experimental results show that havingset the required accuracy of 95% at the primary outputs, Q-ALSreduces the total cost in terms of area and delay by up to 70%and 36%, respectively, and also reduces the run-time by 2.21 × on average, when compared to the best state-of-the-art academicALS tools. I. I

NTRODUCTION

Approximate computing is deﬁned as a computing tech-nique to generate results with possible inaccuracy. This hap-pens by relaxing the exact equivalency requirements betweenprovided speciﬁcations and generated results. Approximatecomputing has attracted tremendous attentions in many differ-ent ﬁelds that can tolerate inaccuracy, such as video and imageprocessing [1], search engines [2], [3], machine learning [4]–[8], and approximate hardware design [9]–[11] by providingimprovement in speed and saving in resources. For example,Esmaeilzadeh et. al [12] accelerated neural networks usingapproximate computing techniques by introducing a parrottransformation and a neural processing unit.Approximate computing can be done in different abstractionlevels and transformations, one of which is the logic synthesis.Logic synthesis is the process which optimizes and mapsa given Boolean network into a netlist consisting of logicgates. The process of exact mapping is present in a varietyof synthesis tools. However, in recent times as the demandfor energy and area deﬁciencies has been increasing, thereis a need for an alternative approach to meet this demand.This can be achieved by identifying the digital systems which can tolerate errors and arriving at a mapping process whichsacriﬁces accuracy to achieve improvements in area, energy, ordelay. This process of synthesis is called Approximate LogicSynthesis (ALS).There are several papers in the literature for ALS includingSALSA [13], SASIMI [14], selection based (Single-Selectionand Multi-Selection) algorithms [15] and many more (seeSection II), which present new algorithms and/or synthesistools for generating approximate netlists bounded by a pre-determined error rate or error magnitude. These approachestypically suffer from large run-time values. Furthermore, theylack powerful machine learning engines to learn from previousoptimization process steps, to better optimize for power, areaor delay of the circuit. In this paper, we present Q-ALS, anovel approach for ALS which is based on ReinforcementLearning (RL) algorithms.Q-ALS embeds a Q-learning agent, which is trained and uti-lized for determining the maximum error that can be toleratedby a node in a given network such that the total error rate atnon of the primary outputs of the network exceeds a predeter-mined maximum error rate. The Hamming distance betweentruth tables of exact and approximate implementations of anode is used as the error metric. To estimate the total error rateat primary outputs which is resulted from approximation in anode, a probabilistic approach based on Boolean differencecalculus is utilized. Experimental results verify that Q-ALSprovides considerable QoR (Quality of Results) improvementsin terms of run-time, total area, and the worst case delay ofmany tested benchmark circuits over other ALS frameworks.Due to the high impact of technology mapping on the ﬁnaldelay and area results, the focus of approximation in thispaper is on the technology mapping phase. To the best ofour knowledge, this paper is the ﬁrst to address the problemof approximate logic synthesis using RL algorithms.The rest of the paper is organized as follows: Section IIsummarizes some of the previous works on ALS. SectionIII provides a quick background on Q-learning and cut-basedtechnology mapping, which will be used in the rest of the pa-per. Section IV presents our Q-ALS framework. Experimentalresults on many benchmark circuits and conclusions appear inSections V and VI, respectively.II. R

ELATED W ORK

Miao et al. [16] formulated the ALS problem unconstrainedby error rate as a Boolean relations (BR) minimization prob-lem, further reﬁned by a heuristic approach to satisfy the a r X i v : . [ c s . A R ] F e b rror frequency constraints. Shin and Gupta [17] developeda method to reduce the literal count by exploiting the errortolerance rate during the circuit design. Miao et al. [18]presented a method to reduce the gate count of a givencircuit by using the external don’t care (EXDC) sets thatmaximally approach the Boolean relation with compliance tothe constrained error magnitude. A novel ALS framework waspresented in [19] that performs statistical testing to certify thearea optimized circuits with high conﬁdence under the givenerror constraint. This is done by continuously monitoring thequality of the generated design. In [13], the existing synthesistools such as SIS [20] and Synopsys Design Compiler [21] areutilized to perform area reduction by using approximate don’tcares (ADCs) adhering to the given quality bounds. An ALStool called SASIMI is presented in [14], which works withmultiple error metrics such as error rate and error magnitude.SASIMI aims for area and power reduction by substitutingthe signal pairs which accept the same values with highprobability. Wu and Qian [15] addressed the ALS problemfor multi-level circuits by reducing the size of nodes in thegiven Boolean network with the help of approximating theirfactored-forms. They presented two algorithms namely Single-Selection and Multi-Selection with similar area savings butbetter run-time for the latter one.There have been a few recent publications on the utilizationof machine learning algorithms in Electronic Design Automa-tion (EDA). In [22], deep reinforcement learning advances areemployed in logic optimization. More speciﬁcally, the logicoptimization problem is formulated as a deterministic Markovdecision process, and a generic algorithm is presented to solveit. In [23], several classical Computer-Aided Design (CAD)algorithms are presented, which can beneﬁt from advances inmachine learning. The process of ﬁnding the error constraint ofeach node for large networks leads to diminished performance.The framework which we present in this paper tackles theperformance degradation problem by utilizing RL algorithmsto ﬁnd the maximum tolerable error rate at each node. RL,which is shown to resolve such issues [24], will lead to asuperior performance in terms of speed, area, and accuracy.III. B ACKGROUND

A. Q-Learning

Q-learning is an RL algorithm that involves an agent, a setof states, and a set of actions. In Q-learning, at each time t and state s t , an action a t is taken, a reward r t is observed,and the agent enters a new state s t +1 . Q-learning algorithmcan be modeled as a function that assigns a real number toeach pairs of state-actions: Q : S × A → R , and it can beexpressed by a Q -matrix. The Q -matrix is updated using thefollowing equation: Q ( s t , a t ) ← (1 − α ) × Q ( s t − , a t − )+ α × (cid:18) r t + γ × max a Q ( s t +1 , a ) (cid:19) (1) where hyper-parameters α and γ are learning rate and discount factor , respectively. max a Q ( s t +1 , a ) is an estimationfor optimal future reward, and r t + γ × max a Q ( s t +1 , a ) is thelearned value. Before the learning process starts, the Q -matrixis initialized into random values. B. Cut-Based Technology Mapping

In state-of-the-art technology mappers [25]–[28], k - f easible cuts [29] are computed for each node in the givenBoolean network, and subsequently, truth table of each cut iscalculated. A truth table expresses the function of a cut basedon its inputs. After computing truth tables, in a topologicalordering traversal starting from level 1 nodes, the best cutfor each node and the best supergate [28] implementingthis cut is computed. A supergate is a small combinationalsingle output network built from original gates in the givenlibrary. After visiting all nodes, the best mapping solution(best cover) for the whole network is constructed by traversingthe network back and using the computed best cuts and bestimplementation of those cuts.IV. O UR P ROPOSED F RAMEWORK FOR

ALS: Q-ALSFig. 1 illustrates Q-ALS, our proposed framework for ALS.In our Q-ALS, the cut-based technology mapping approachand the Q-learning algorithm are used to ﬁnd the best approx-imate mapping solution for a given Boolean network. Given amaximum error rate at outputs of the given network, a Q-agentis trained using many training networks to learn the maximumerror rate that each node can tolerate in order to maximizethe delay and area savings. Hamming distance between truthtables of exact and approximate implementations is used asa metric for error rate, and to estimate the propagated errorrate to outputs of the network resulted from approximation ina node, a Boolean difference calculus is used, which will beexplained later in more details.For a given training network N , all nodes are visited in atopological ordering traversal and the exact and approximatemapping solutions (matches) to implement the best cut of eachnode are computed. Approximate matches are computed intwo ways: (i) by simply dropping some gates from the exactmatch (ii) by examining other supergates in the supergatelibrary that can implement function of the cut subject to agiven maximum error rate. These approximate supergates aremore cost efﬁcient than the exact ones in terms of a desiredmetric such as delay, area, or power consumption. Similarlyto the work in [30], Q-ALS utilizes a prioritized cost functionwith delay having the highest priority while area is used as atie breaker.Fig. 2 shows an example network to demonstrate ourmethod for ﬁnding approximate matches in a cut-based tech-nology mapping approach. Fig. 3a shows an exact implemen-tation for the function of cut C shown in Fig. 2. C is a cutamong 4-feasible cuts of node n . The function of this nodebased on inputs of the shown cut is n = e · n · ( n + n ).An approximate implementation with Hamming distance ofsix between the exact and approximate truth tables is shownin Fig. 3b. By dropping any of the shown inverters from theshown exact implementation, another approximate implemen-tation with the same delay, one unit less area, and Hammingdistance of two compared to the exact expression can be found. A. Action Space

In Q-ALS, states and actions are deﬁned as follows: nodesare considered as states, therefore, there are as many states nvironment

Training Networks

Q-agent Action a t Reward r t State s t Boolean Difference Calculus ? ?????????? ???????????????????? Q =

Maximum Error (a)

Output Netlist Input Netlist

Logic Synthesis Engine f1 0 00 0 1

0 1 0 a b c 0 11

Q = (b)Fig. 1: Our Q-learning based ALS (Q-ALS) framework. (a) The Q-agent of Q-ALS is trained using the provided training networks and giventhe maximum error rate at primary outputs. (b) At the test time, this Q-agent helps synthesis tool to select the best approximate matches forimplementing different nodes. C n n n e n Fig. 2: An example network to demonstrate our method for ﬁndingapproximate matches in a cut-based technology mapping approach.Function of node n based on inputs of the shown cut is: n = e · n · ( n + n ). as the total node count in And-Inverter Graph (AIG) [31]representation of the given network. Given the current stateas node n , the set of actions that can be taken are selecting amatch for n among different exact and approximate solutionsfor implementing k-feasible cuts of this node. A set of actionswhich do not generate a valid mapping solution for network N are not desirable. For example, a set of actions is consideredinvalid, if those actions result in generating a mapping solutionfor the network N such that the error rate at a primary outputof this network exceeds the given maximum error rate. aoi21 nand2 inv1 inv1 n n e n n (a) nand2 nand2 n n e n n nand2 (b)Fig. 3: Two implementations for function of cut C the shown in Fig.2. (a) an exact implementation with area of 7.00 units and delay of2.60 units (b) an approximate implementation with area of 6.00 units,delay of 2.00 units, and error rate of 37.5%. To estimate the maximum error rate at primary outputsinjected by approximate implementation of a node, we usedthe probabilistic error propagation approach presented in [32].This approach is based on Boolean difference calculus; it takesas input the Boolean function of a gate, error probabilities at itsinputs, and the error probability of the gate itself, and producesthe error probability at the output of this gate. We storepropagated maximum error rate resulted from approximationin fanin cone of a node into data structure of this node. Next,for each choice of approximate implementation of a node andby using the said error propagation approach, we estimate theerror rate at primary outputs of the network.The Q-agent in Q-ALS learns the maximum error rate asingle node can tolerate such that the saving in delay andthen area is maximized. If the error rate on a single nodeis more than this value, it is estimated that it will violatethe requirement of maximum error rate at primary outputs.Since the error metric that is employed in Q-ALS is Hammingdistance, therefore, the Q-agent in Q-ALS basically learnshe Maximum Hamming Distance (MHD) between exact andapproximate truth tables of each node in the network.

B. Reward Function

A reward is assigned to each action based on the savingit offers for delay and area compared to the exact mappingsolution. If an approximate match does not improve on thebest delay or area of exact implementation, a reward of 0 isassigned to it. On the other hand, if it increases delay and/orarea, a negative reward is assigned to it. The goal is to generatea valid approximate mapping solution for N which maximizesthe total reward.Fig. 4 illustrates the Q-matrix for a network with six nodesand up to four valid exact and approximate mapping solutionsfor a single node. This network is for implementing thefollowing function: F = a ⊕ b ⊕ c . One dimension of thecorresponding box in this ﬁgure represents the nodes of thenetwork, and the other dimension shows the MHD per node.Using the trained Q-agent, the following set of best MHDsare obtained for this network: { , , , , , } . This showsthat for example the MHD between truth tables of exact andapproximate solutions for node 1 is two. Therefore, a mappingsolution with Hamming distance of zero, one, or two can beselected for implementing this node. MHD of zero correspondsto the exact mapping solution. C. Training

The training process starts with assigning a random integernumber between 0 and 32 to MHD of each node. Themaximum value of 32 comes from the fact that up to 5-cuts are computed for each node (32= ). These values areused to start the technology mapping process. If the generatedmapping solution based on these MHD values is not valid, thenthe corresponding entry in the Q-matrix will be reduced. Onthe other hand, if the assigned MHDs for each node generatea network whose error rate is below the given maximum errorrate, then the corresponding positions in the Q-matrix willbe increased as long as it does not exceed the maximum Optimal set of Maximum Hamming Distance: {2, 2, 2, 3, 0, 3}

Fig. 4: A Q-matrix describing optimal set of Maximum HammingDistance (MHD) for a network with six nodes. This network corre-sponds to implementing the following function: F = a ⊕ b ⊕ c . Notethat ⊕ is an XOR operation. Algorithm 1:

Q-ALS model training using Q-learning

Input:

Set of training networks: N train = { N , N , ..., N n } ,Maximum error rate: ER max Output:

Coefﬁcients to get Maximum Hamming Distance(MHD) for a test network. Initialize: Number of Episodes: N E Learning rate: α Discount factor: γ Q -matrix for each N i in N train do Get number of nodes in N i Assign random values for MHD of each node in N i . for each epoch in N E do N new = Map N i using the assigned values for MHD. if mapping is successful then ER tmp =Find the maximum error rate at theoutputs of N new . if ER tmp < ER max and Cost N new < Cost N i then Update Q matrix with positive reward. Update

Cost N i . else Update Q -matrix with negative reward else Update Q -matrix with negative reward Coefﬁcients = Perform non-linear regression on Q -matrix return Coefﬁcients value of 32. In following iterations, only a valid set of valueswhich improves on the previous best delay and area valueswill update the Q-matrix. This process will continue for N E times for a particular network and will be repeated for othertraining networks.The training process is shown in Algorithm 1. Inputs of thisalgorithm are training networks and the maximum tolerableerror rate at the outputs, ER max . Values for hyper-parametersincluding number of episodes N E , the learning rate α , and thediscount factor γ are initialized inside this algorithm. Numberof episodes is the number of times the training process isexecuted for a particular network. The Q-matrix generatedduring the training process will be a 2D matrix where X -axis denotes the nodes in the network and Y -axis denotes thenumber of options of Hamming distance for the correspondingnode. For example, for a network with six nodes, the Q-matrixwill have six entries in the X -axis, and a default of 32 entriesfor the Y -axis.At the end of the training, we will have a Q-matrix with acertain number of entries in the X -axis, which is equal to themaximum node count among training networks. Now, supposethat the node count in a test network is more than this value,then we cannot use this Q-matrix to test this network. Tosolve this issue, a non-linear regression is performed to ﬁta polynomial curve to the data in Q-matrix. Using this ﬁttedcurve, MHD values for nodes of any valid test network canbe calculated. lgorithm 2: Generating a mapped netlist for a given testnetwork in Q-ALS

Input:

Test network: N T est , Coefﬁcients generated fromtraining: L coeff Output:

An approximate mapped network: N opt L HD < = Compute MHDs for nodes in N T est using L coeff . N opt < = Generate approximate mapping netlist for N T est using L HD . return N opt D. Testing

The conclusion of training process returns coefﬁcients of apolynomial function which will be used to predict the MHDvalues of a node in the given test network. These predictedMHDs will be used in the logic synthesis engine of Q-ALSto ﬁnd the best approximate mapping solutions for individualnodes of the test network. The testing pseudo-code is shownin Algorithm 2.V. E

XPERIMENTAL R ESULTS

We implemented Q-ALS as an extension to ABC [26]. Weﬁrst trained our model on the ISCAS 89 [33] and EPFL [34]benchmark circuits and computed entries of the Q-matrix.The maximum error rate at primary outputs used in theexperimental results presented in this section is 5%, which isthe same as two other state-of-the-art ALS frameworks (i.e.,SASIMI and Single/Multi-Selection) compared in this section.The generic standard cell library, mcnc.genlib , consisting of25 gates is used in technology mapping. Test circuits arechosen from different benchmark suites including ISCAS 85[35], MCNC [36], and ITC 99 [37]. The functionality ofcircuits widely varies from simple arithmetic circuits of EPFLbenchmark suite to complex industry-level circuits of ITC 99benchmark suite. All experiments were conducted on a virtualmachine running Linux with 1GB RAM and a 2.4 GHz laptopas the host machine. Tables I − III contain list experimentalresults for different circuits. The complexity of circuits interms of number of gates, exact area, and exact delay is alsoshown.In Table I, the 3rd and 4th columns represent exact valuesof area and delay, respectively. These values are obtained byABC for benchmark circuits in the ﬁrst column. The secondcolumn represents the number of gates in the correspondingbenchmark circuit. Columns ﬁve and six represent the areaand delay of approximated circuits generated by Q-ALS.Columns seven and eight list area and delay ratios. Area ratiois calculated by dividing the area of an approximate circuitby the area of the corresponding exact circuit. Similarly, thedelay ratio is calculated.Table I shows the area and delay comparison of approx-imated circuits with their exact counterparts in the MCNCbenchmark suite. The MCNC benchmark suite contains dif-ferent types of circuits such as Finite State Machine (FSM)circuits, multi-level combinational circuits, multi-level sequen-tial circuits, and two-level circuits. Experimenting on thesecircuits, we get up to 70% area reduction and up to 36% delay reduction using our Q-ALS. The average area and delayreduction as compared to the baseline values (exact solutions)are 30% and 9%, respectively.Apart from academic benchmark circuits, we also experi-mented on industry-level benchmarks from ITC 99 benchmarksuite [37]. Table II shows experimental results for this bench-mark suite together with a short description for functionalityof each circuit. On average for 17 ITC 99 benchmark circuits,Q-ALS provides 23% reduction in area and 10% reduction indelay compared to the exact solution (baseline). This showsthat our proposed methodology is quite effective in area anddelay reduction for industrial-level circuits as well.We experimented on benchmark circuits of ISCAS 85benchmark suite to compare the results of our proposedframework (Q-ALS) with state-of-the-art approximate logicsynthesis tool: SASIMI [14] and Single/Multi-Selection ap-proaches [15]. The reason behind choosing ISCAS 85 for thiscomparison is availability of experimental results in both ofthese papers for these benchmarks. Table III shows the areacomparison between SASIMI, Single/Multi-Selection, and ourQ-ALS for ISCAS 85 benchmark circuits. We observed thatthe average area reduction by SASIMI compared with theexact values is 13.9% and for Single and Multi-Selection are15.6%, and 15.5%, respectively. The average area reduction byQ-ALS is 31.6% which clearly outperforms other counterparts.Fig. 5 shows the comparison of run-times of Q-ALS withSASIMI and Single/Multi-Selection. We observed that onaverage, run-time of Q-ALS as compared to SASIMI, Single-Selection, and Multi-Selection is 15.94 × , 8.46 × , and 2.21 × less, respectively. This is because the inference time of theQ-learning algorithm is very low. The time required to trainour model using Q-learning algorithm is around ﬁve hours.However, once the model is trained, it can generate theapproximated mapping solutions for any given circuit in asmall amount of time. In other words, it is trained once, butis used for generating approximate mapping solutions manytimes. For this reason, the training time is not considered inFig. 5. Please note that SASIMI and Single/Multi-Selectionalgorithms are implemented in SIS [20], while Q-ALS is Fig. 5: Run-time (in second) comparison among SASIMI,Single/Multi-Selection, and our Q-ALS for approximate logic syn-thesis. For better exhibition purposes, data for Single-Selection andSASIMI are scaled down by 5 and 2 times, respectively.ABLE I: Experimental results for MCNC benchmark suite.

Circuit Node Count Exact Area Exact Delay Approx. Area Approx. Delay Area Ratio Delay Ratio rd53 45 107 6.5 54 5.9 0.50 0.91rd73 115 291 10.5 169 9.7 0.58 0.92rd84 198 419 10.4 299 9.5 0.71 0.919sym 165 418 11.5 360 11.7 0.86 1.02parity 15 75 7.6 24 4.9 0.32 0.64my adder 149 361 36.4 161 26 0.45 0.71z4ml 38 89 5.2 49 5.2 0.55 1.00pm1 39 87 4.9 84 5.1 0.97 1.04c8 122 281 6.9 176 6.4 0.63 0.93x4 339 760 7.5 760 7.9 1.00 1.05count 123 273 13.8 171 12 0.63 0.87pcler8 59 150 7.7 126 7.2 0.84 0.94sct 68 151 5.5 137 5.7 0.91 1.04apex7 187 409 12.3 368 10 0.90 0.81

TABLE II: Experimental results for ITC 99 benchmark suite.

Circuit Function Node Count Exact Area Exact Delay Approx. Area Approx. Delay Area Ratio Delay Ratio b01 FSM comparing serial ﬂows 34 86 4.7 56 5.7 0.65 1.21b02 FSM that recognizes BCS numbers 19 40 5 19 2.9 0.48 0.58b04 Compute min and max 392 1018 18.6 808 14.8 0.79 0.80b06 Interrupt handler 34 86 4.1 45 4.3 0.52 1.05b07 Count points on a straight line 274 671 18.4 559 16.5 0.83 0.90b08 Find inclusions in sequences of numbers 137 308 12.2 259 10.1 0.84 0.83b09 Serial to serial converter 118 323 8.3 233 8.6 0.72 1.04b10 Voting system 142 342 9 276 8.9 0.81 0.99b11 Scramble string with variable cipher 441 1119 20.1 828 13.7 0.74 0.68b12 1-player game (guess a sequence) 843 1993 11.8 1642 12.1 0.82 1.03b13 Interface to meteo sensors 223 517 8.7 372 7.9 0.72 0.91b14 Viper processor (subset) 4233 10688 44.9 9361 40.4 0.88 0.90b17 Three copies of 80386 processor (subset) 18446 47667 69.5 41419 60.5 0.87 0.87b18 Two copies of b14 and two of b17 57978 145896 105.5 136755 105.5 0.94 1.00b20 A copy and a modiﬁed version of b14 8805 21598 53.1 17303 45.6 0.80 0.86b21 Two copies of b14 9141 22705 52.9 19060 44.8 0.84 0.85b22 A copy and two modiﬁed versions of b14 13561 33260 53.3 26573 46.5 0.80 0.87 implemented in ABC, which is much faster than SIS. Toremove this bias, we normalized run-time of SASIMI andSingle/Multi Selection approaches by a factor obtained fromdata published in [28], [38].VI. C

ONCLUSION

In this paper, we present Q-ALS, a reinforcement learningbased approximate logic synthesis framework. Q-ALS beneﬁtsfrom strong capabilities of Q-learning algorithm to learn themaximum error rate each node in the AIG form of thegiven network can tolerate in order to achieve a maximumsaving in delay and area while bounding to a predeterminederror rate at the primary outputs. Thanks to this capability,Q-ALS is able to provide up to 70% area reduction and36% delay reduction for academic benchmarks, and up to 52% area reduction and 30% delay reduction for industrial-level benchmarks. Furthermore, Q-ALS reduces run-time byan average of 15.94 × and 2.21 × over SASIMI and Multi-Selection, two academic state-of-the-art ALS tools.R EFERENCES[1] J. Huang, B. Wang, W. Wang, and P. Sen, “A surface approximationmethod for image and video correspondences,”

IEEE Transactions onImage Processing , vol. 24, no. 12, pp. 5100–5113, 2015.[2] G. Ranjan, A. Tongaonkar, and R. Torres, “Approximate matchingof persistent lexicon using search-engines for classifying mobile apptrafﬁc,” in

Computer Communications, IEEE INFOCOM 2016-The 35thAnnual IEEE International Conference on . IEEE, 2016, pp. 1–9.[3] K.-H. Yang, C.-C. Pan, and T.-L. Lee, “Approximate search engine op-timization for directory service,” in

Parallel and Distributed ProcessingSymposium, 2003. Proceedings. International . IEEE, 2003, pp. 8–pp.[4] A. J. Smola and B. Sch¨olkopf, “Sparse greedy matrix approximation formachine learning,” 2000.

ABLE III: Experimental results for ISCAS 85 benchmark suite.

SASIMI [14] Single-Selection [15] Multi-Selection [15] Q-ALSCircuit Exact Area Approx. Area Area ratio Approx. Area Area ratio Approx. Area Area ratio Approx. Area Area ratio c880 646 579 0.896 577 0.893 577 0.893 558 0.864c1908 846 516 0.610 503 0.595 506 0.598 520 0.615c2670 1298 940 0.724 859 0.662 874 0.673 866 0.667c3540 1916 1868 0.975 1851 0.966 1849 0.965 1812 0.945c5315 3060 3002 0.981 2993 0.978 3002 0.981 1901 0.621c7552 3952 3746 0.948 3715 0.940 3719 0.941 2451 0.620alu4 2740 2444 0.892 2406 0.878 2381 0.869 1260 0.460 [5] Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation:Representing model uncertainty in deep learning,” in internationalconference on machine learning , 2016, pp. 1050–1059.[6] D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backprop-agation and approximate inference in deep generative models,” arXivpreprint arXiv:1401.4082 , 2014.[7] J. Li, Z. Yuan, Z. Li, A. Ren, C. Ding, J. Draper, S. Nazarian, Q. Qiu,B. Yuan, and Y. Wang, “Normalization and dropout for stochasticcomputing-based deep convolutional neural networks,”

Integration, theVLSI Journal , 2017.[8] M. Nazemi, G. Pasandi, and M. Pedram, “Energy-efﬁcient, low-latencyrealization of neural networks through boolean logic minimization,” in .IEEE, 2019, pp. 1–6.[9] S. Dutt, S. Nandi, and G. Trivedi, “A comparative survey of approxi-mate adders,” in

Radioelektronika (RADIOELEKTRONIKA), 2016 26thInternational Conference . IEEE, 2016, pp. 61–65.[10] M. Masadeh, O. Hasan, and S. Tahar, “Comparative study of approxi-mate multipliers,” arXiv preprint arXiv:1803.06587 , 2018.[11] V. Mrazek, S. S. Sarwar, L. Sekanina, Z. Vasicek, and K. Roy, “Design ofpower-efﬁcient approximate multipliers for approximate artiﬁcial neuralnetworks,” in

Proceedings of the 35th International Conference onComputer-Aided Design . ACM, 2016, p. 81.[12] H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, “Neural ac-celeration for general-purpose approximate programs,” in .IEEE Computer Society, 2012, pp. 449–460.[13] S. Venkataramani, A. Sabne, V. Kozhikkottu, K. Roy, and A. Raghu-nathan, “Salsa: systematic logic synthesis of approximate circuits,” in

Proceedings of the 49th Annual Design Automation Conference . ACM,2012, pp. 796–801.[14] S. Venkataramani, K. Roy, and A. Raghunathan, “Substitute-and-simplify: A uniﬁed design paradigm for approximate and quality conﬁg-urable circuits,” in

Proceedings of the Conference on Design, Automationand Test in Europe . EDA Consortium, 2013, pp. 1367–1372.[15] Y. Wu and W. Qian, “An efﬁcient method for multi-level approximatelogic synthesis under error rate constraint,” in

Proceedings of the 53rdAnnual Design Automation Conference . ACM, 2016, p. 128.[16] J. Miao, A. Gerstlauer, and M. Orshansky, “Approximate logic synthesisunder general error magnitude and frequency constraints,” in

Proceed-ings of the International Conference on Computer-Aided Design . IEEEPress, 2013, pp. 779–786.[17] D. Shin and S. K. Gupta, “Approximate logic synthesis for error tolerantapplications,” in

Proceedings of the Conference on Design, Automationand Test in Europe . European Design and Automation Association,2010, pp. 957–960.[18] J. Miao, A. Gerstlauer, and M. Orshansky, “Multi-level approximatelogic synthesis under general error constraints,” in

Computer-AidedDesign (ICCAD), 2014 IEEE/ACM International Conference on . IEEE,2014, pp. 504–510.[19] G. Liu and Z. Zhang, “Statistically certiﬁed approximate logic syn-thesis,” in

Computer-Aided Design (ICCAD), 2017 IEEE/ACM Inter-national Conference on . IEEE, 2017, pp. 344–351.[20] E. M. Sentovich, K. J. Singh, L. Lavagno, C. Moon, R. Murgai, A. Sal-danha, H. Savoj, P. R. Stephan, R. K. Brayton, and A. Sangiovanni-Vincentelli, “SIS: A system for sequential circuit synthesis,” 1992.[21] Synopsys, “Design compiler user guide,” , 2001.[22] W. Haaswijk, E. Collins, B. Seguin, M. Soeken, F. Kaplan, S. S¨usstrunk,and G. De Micheli, “Deep learning for logic optimization algorithms,” in

Circuits and Systems (ISCAS), 2018 IEEE International Symposiumon . IEEE, 2018, pp. 1–4.[23] P. A. Beerel and M. Pedram, “Opportunities for machine learning inelectronic design automation,” in

Circuits and Systems (ISCAS), 2018IEEE International Symposium on . IEEE, 2018, pp. 1–5.[24] E. Ipek, O. Mutlu, J. F. Mart´ınez, and R. Caruana, “Self-optimizingmemory controllers: A reinforcement learning approach,” in

ACMSIGARCH Computer Architecture News , vol. 36, no. 3. IEEE ComputerSociety, 2008, pp. 39–50.[25] G. Pasandi and M. Pedram, “PBMap: A path balancing technologymapping algorithm for single ﬂux quantum logic circuits,”

IEEE Trans-actions on Applied Superconductivity , vol. 29, no. 4, pp. 1–14, 2019.[26] U. Berkeley, “ABC: A system for sequential synthesis and veriﬁcation,”

Berkeley Logic Synthesis and Veriﬁcation Group , 2011.[27] G. Pasandi, A. Shafaei, and M. Pedram, “SFQmap: A technologymapping tool for single ﬂux quantum logic circuits,” in

Circuits andSystems (ISCAS), 2018 IEEE International Symposium on . IEEE, 2018,pp. 1–5.[28] A. Mishchenko, S. Chatterjee, R. Brayton, X. Wang, and T. Kam,“Technology mapping with boolean matching, supergates and choices,”2005.[29] J. Cong and Y. Ding, “FlowMap: An optimal technology mappingalgorithm for delay optimization in lookup-table based fpga designs,”

IEEE Transactions on Computer-Aided Design of Integrated Circuitsand Systems , vol. 13, no. 1, pp. 1–12, 1994.[30] A. Mishchenko, S. Cho, S. Chatterjee, and R. Brayton, “Combinationaland sequential mapping with priority cuts,” in

Proceedings of the 2007IEEE/ACM international conference on Computer-aided design . IEEEPress, 2007, pp. 354–361.[31] A. Mishchenko, R. Jiang, S. Chatterjee, and R. Brayton, “Fraigs:Functionally reduced and-inv graphs,” in

International Conference onComputer Aided Design , 2004.[32] N. Mohyuddin, E. Pakbaznia, and M. Pedram, “Probabilistic errorpropagation in a logic circuit using the boolean difference calculus,”in

Advanced Techniques in Logic Synthesis, Optimizations and Applica-tions . Springer, 2011, pp. 359–381.[33] F. Brglez, D. Bryan, and K. Kozminski, “Combinational proﬁles ofsequential benchmark circuits,” in

Circuits and Systems, 1989., IEEEInternational Symposium on , May 1989, pp. 1929–1934 vol.3.[34] L. Amar´u, P.-E. Gaillardon, and G. De Micheli, “The EPFL combi-national benchmark suite,” in

Proceedings of the 24th InternationalWorkshop on Logic & Synthesis (IWLS) , no. EPFL-CONF-207551, 2015.[35] F. Brglez and H. Fujiwara, “A Neutral Netlist of 10 CombinationalBenchmark Circuits and a Target Translator in Fortran,” in

Proceedingsof IEEE Int’l Symposium Circuits and Systems (ISCAS 85) . IEEE Press,Piscataway, N.J., 1985, pp. 677–692.[36] S. Yang,

Logic synthesis and optimization benchmarks user guide:version 3.0 . Microelectronics Center of North Carolina (MCNC), 1991.[37] F. Corno, M. S. Reorda, and G. Squillero, “Rt-level itc’99 benchmarksand ﬁrst atpg results,”

IEEE Design & Test of computers , vol. 17, no. 3,pp. 44–53, 2000.[38] A. Mishchenko, S. Chatterjee, and R. Brayton, “DAG-aware AIGrewriting a fresh look at combinational logic synthesis,” in