Thermal and IR Drop Analysis Using Convolutional Encoder-Decoder Networks
Vidya A. Chhabria, Vipul Ahuja, Ashwath Prabhu, Nikhil Patil, Palkesh Jain, Sachin S. Sapatnekar
11 Thermal and IR Drop Analysis UsingConvolutional Encoder-Decoder Networks
Vidya A. Chhabria , Vipul Ahuja , Ashwath Prabhu , Nikhil Patil , Palkesh Jain , and Sachin S. Sapatnekar University of Minnesota, USA; Qualcomm Technologies Inc, India
Abstract —Computationally expensive temperature and power gridanalyses are required during the design cycle to guide IC design. Thispaper employs encoder-decoder based generative (EDGe) networks tomap these analyses to fast and accurate image-to-image and sequence-to-sequence translation tasks. The network takes a power map as input andoutputs the corresponding temperature or IR drop map. We propose twonetworks: (i) ThermEDGe: a static and dynamic full-chip temperatureestimator and (ii) IREDGe: a full-chip static IR drop predictor basedon input power, power grid distribution, and power pad distributionpatterns. The models are design-independent and must be trained justonce for a particular technology and packaging solution. ThermEDGeand IREDGe are demonstrated to rapidly predict on-chip temperatureand IR drop contours in milliseconds (in contrast with commercial toolsthat require several hours or more) and provide an average error of0.6% and 0.008% respectively.
I. I
NTRODUCTION
One of the major challenges faced by an advanced-technology nodeIC designer is the overhead of large run-times of analysis tools. Fastand accurate analysis tools that aid quick design turn-around areparticularly important for two critical, time-consuming simulationsthat are performed several times during the design cycle: • Thermal analysis , which checks the feasibility of a place-ment/floorplan solution by computing on-chip temperature dis-tributions in order to check for temperature hot spots. • IR drop analysis in power distribution networks (PDNs), whichdiagnoses the goodness of the PDN by determining voltage (IR)drops from the power pads to the gates.The underlying computational engines that form the crux of bothanalyses are similar: both simulate networks of conductances andcurrent/voltage sources by solving a large system of equations ofthe form G V = J [1], [2] with millions to billions of variables.In modern industry designs, a single full-chip temperature or IRdrop simulation can take hours to several hours. Accelerating theseanalyses opens the door to optimizations in the design cycle thatiteratively invoke these engines under the hood.The advent of machine learning (ML) has presented fast and fairlyaccurate solutions to these problems [3]–[8] which can successfullybe used in early design cycle optimizations, operating within largerallowable error margins at these stages. To the best of our knowledge,no published work addresses full-chip ML-based thermal analysis: theexisting literature focuses on coarser-level thermal modeling at thesystem level [3]–[5]. For PDN analysis, the works in [6], [7] addressincremental analysis, and are not intended for full-chip estimation.The work in [8] proposes a convolutional neural network (CNN)-based implementation for full-chip IR drop prediction, using cell-level power maps as features. However, it assumes similar resistancefrom each cell to the power pads, which may not be valid for practicalpower grids with irregular grid density. The analysis divides the chipinto regions ( tiles ), and the CNN operates on each tile and its nearneighbors. Selecting an appropriate tile and window size is nontrivial– small windows could violate the principle of locality [9], causinginaccuracies, while large windows could result in large models withsignificant runtimes for training and inference. Our approach bypasseswindow size selection by providing the entire power map as a feature,allowing ML to learn the window size for accurate estimation.We translate static analysis problems to an image-to-image transla-tion task and dynamic analysis problems to video-to-video translation,where the inputs are the power/current distributions and the required State
Encoder Decoder
Current map IR drop map
Fig. 1. Image-to-image translation using EDGe network. outputs are the temperature or IR drop contours. For static analysis,we employ fully convolutional (FC) EDGe networks for rapid andaccurate thermal and IR drop analysis. FC EDGe networks haveproven to be very successful with image-related problems with 2-Dspatially distributed data [10]–[13] when compared to other networksthat operate without spatial correlation awareness. For transient anal-ysis, we use long-short-term-memory (LSTM) based EDGe networksthat maintain memory of analyses at prior time steps.Based on these concepts, this work proposes two novel ML-basedanalyzers:
ThermEDGe for both full-chip static and transient thermalanalysis, and
IREDGe for full-chip static IR drop estimation. Thefast inference times of ThermEDGe and IREDGe enable full-chip thermal and IR drop analysis in milliseconds, as opposed to runtimesof several hours using commercial tools. We obtain average error of0.6% and 0.008% for ThermEDGe and IREDGe, respectively, overa range of testcases.
We will open-source our software.
Fig. 1 shows a general top-level structure of an EDGe network.It consists of two parts: (i) the encoder/downsampling path, whichcaptures global features of the 2-D distributions of power dissi-pation, and produces a low-dimensional state space and (ii) thedecoder/upsampling path, which transforms the state space into therequired detailed outputs (temperature or IR drop contours). TheEDGe network is well-suited for PDN/thermal analyses because: (a)
The convolutional nature of the encoder captures the dependenceof both problems on the spatial distributions of power . Unlike CNNs,EDGe networks contain a decoder which acts as a generator toconvert the extracted power and PDN density features into accuratehigh-dimensional temperature and IR drop contours across the chip. (b)
The trained EDGe network model for static analysis is chip-area-independent : it only stores the weights of the convolutionalkernel, and the same filter can be applied to a chip of any size. Theselection of the network topology (convolution filter size, number ofconvolution layers) is related to the expected sizes of the hotspotsrather than the size of the chip: these sizes are generally similar fora given application domain, technology, and packaging choice. (c)
Unlike prior methods [8] that operate tile-by-tile, where findingthe right tile and window size for accurate analysis is challenging, the choice of window size is treated as an ML hyperparameter tuningproblem to decide the necessary amount of input spatial information.II. EDG E N ETWORK FOR
PDN
AND T HERMAL A NALYSIS
A. Problem formulations and data representation
This section presents the ML-based framework for ThermEDGe andIREDGe. The first step is to extract an appropriate set of features froma standard design-flow environment. The layout database providesthe locations of each instance and block in the layout, as outlinedin Fig. 2(a). This may be combined with information from a poweranalysis tool such as [14] (Fig. 2(b)) that is used to build a 2-D spatialpower map over the die area. a r X i v : . [ c s . A R ] S e p P o w e r ( W ) Power map (d)
Power database
Internal power
Leakage power Switching power Activity factors
PDN densityHigh
MediumLow
PDN density mapPackage informationPower pad locations E ff ec ti v e d i s t a n ce t o po w e r p a d ( μ m ) Effective distance to pad
Temperature map T e m p e r a t u r e ( C )I R d r op ( m V ) IR drop map
Layout information Block locations Cell locations Block sizesPDN stripes (g) (a)(b)(c) (e) (h) (f)
Fig. 2. Data representation: Mapping PDN and thermal analysis problemsinto image-to-image translations tasks.
For thermal analysis using ThermEDGe, both the inputs andoutputs are images for the static case, and a sequence of imagesfor the transient case. Each input image shows a 2-D die powerdistribution (static) image, and each output image is a temperaturemap across the die (Fig. 2). For static PDN analysis, the output isan IR drop map across the full chip. However, in addition to the 2-Dpower distributions, IREDGe has two other inputs:(i)
A PDN density map : This feature is generated by extracting theaverage PDN pitch in each region of the chip. For example, whenused in conjunction with the PDN styles in [15], [16], where the chipuses regionwise uniform PDNs, the average PDN density in eachregion, across all metal layers, is provided as an input (Fig. 2(e)).(ii)
An effective distance to power pad : This feature represents theequivalent distance from an instance to all power pads in the package.We compute the effective distance of each instance, d e , to N powerpads on the chip as the harmonic sum of the distances to the pads: d − e = d − + d − + ... + d − N (1)where d i is the distance of the i th power pad from the instance.Intuitively, the effective distance metric and the PDN density maptogether, represent the equivalent resistance between the instance andthe pad. The equivalent resistance is a parallel combination of eachpath from the instance to the pad. We use distance to each padas a proxy for the resistance in Eq. (1). Fig. 2(f) shows a typical“checkerboard” power pad layout for flip-chip packages [17], [18].Temperature depends on the ability of the package and system toconduct heat to the ambient, and IR drop depends on off-chip (e.g.,package) parasitics. In this work, our focus is strictly on-chip, andboth ThermEDGe and IR-EDGe are trained for fixed models of agiven technology, package, and system.Next, we map these problems to standard ML networks: • For static analysis, the problem formulations require a translationfrom an input power image to an output image, both correspond-ing to contour maps over the same die area, and we employ a
U-Net-based EDGe network [11]. • The dynamic analysis problem requires the conversion of asequence of input power images, to a sequence of output imagesof temperature contours, and this problem is addressed using an
LSTM-based EDGe network [19].We describe these networks in the rest of this section.
B. U-Nets for static thermal and PDN analysis1) Overview of U-Nets:
CNNs are successful in extracting 2-D spatial information for image classification and image labelingtasks, which have low-dimensional outputs (class or label). ForPDN and thermal analysis tasks, the required outputs are high-dimensional distributions of IR drop and temperature contour, where
Power map Temperature mapSkip connections
2D convolution Max pool Concatenate 2D convolution transpose + upsample
Encoder/downsampling path Decoder/upsampling path
Fig. 3. U-Net-based EDGe network for static thermal and PDN analysis. the dimensionality corresponds to the number of pixels of the imageand the number of pixels is proportional to the size of the chip. Thiscalls for a generator network that can translate the extracted low-dimensional power and PDN features from a CNN-like encoder backinto high-dimensional representing the required output data.Fig. 3 shows the structure of the EDGe network used for static PDNand thermal analysis. At the top level, it consists of two networks:(a)
Encoder/downsampling network
Like a CNN, the network utilizesa sequence of 2-D convolution and max pooling layer pairs thatextract key features from the high-dimensional input feature set.The convolution operation performs a weighted sum on a slidingwindow across the image [20], and the max pooling layer reducesthe dimension of the input data by extracting the maximum valuefrom a sliding window across the input image. In Fig. 3, the featuredimension is halved at each stage by each layer pair, and after severalsuch operations, an encoded, low-dimensional, compressed represen-tation of the input data is obtained. For this reason, the encoder isalso called the downsampling path: intuitively, downsampling helpsunderstand the “what” (e.g., “Does the image contain power or IRhotspots?”) in the input image but tends to be imprecise with the “where” information (e.g., the precise locations of the hotspots). Thelatter is recovered by the decoder stages.(b)
Decoder/upsampling network
Intuitively, the generative decoderis responsible for retrieving the “where” information that was lostduring downsampling, This distinguishes an EDGe network from itsCNN counterpart. The decoder is implemented using the transposeconvolution [20] and upsampling layers. Upsampling layers are func-tionally the opposite of a pooling layer, and increase the dimensionof the input data matrix by replicating the rows and columns.
2) Use of skip connections:
Static IR drop and temperature arestrongly correlated to the input power – a region with high poweron the chip could potentially have an IR or temperature hotspotin its vicinity. U-Nets [11] utilize skip connections between thedownsampling and upsampling paths, as shown in Fig. 3. Theseconnections take information from one layer and incorporate it usinga concatenation layer at a deeper stage skipping intermediate layers,and appends it to the embedding along the z-dimension.For IR analysis, skip connections combine the local power, PDNinformation, and power pad locations from the downsampling pathwith the global power information from the upsampling path, allow-ing the underlying input features to and directly shuttle to the layerscloser to the output, and are similarly helpful for thermal analysis.This helps recover the fine-grained ( “where” ) details that are lost inthe encoding part of the network (as stated before) during upsamplingin the decoder for detailed temperature and IR drop contours.
3) Receptive fields in the encoder and decoder networks:
Thecharacteristic of PDN and thermal analyses problems is that the IRdrop and temperature at each location depend on both the local andglobal power information. During convolution, by sliding averagingwindows of an appropriate size across the input power image, thenetwork captures local spatially correlated distributions. For capturing the larger global impact of power on temperature and IR drop,max pooling layers are used after each convolution to appropriatelyincrease the size of the receptive field at each stage of the network.The receptive field is defined as the region in the input 2-D space thataffects a particular pixel, and it determines the impact of the local,neighboring, and global features on PDN and thermal analysis.In a deep network, the value of each pixel feature is affected by allof the other pixels in the receptive field at the previous convolutionstage, with the largest contributions coming from pixels near thecenter of the receptive field. Thus, each feature not only capturesits receptive field in the input image, but also gives an exponentiallyhigher weight to the middle of that region [21]. This matches withour applications, where both thermal and IR maps for a pixel are mostaffected by the features in the same pixel, and partially by features innearby pixels, with decreasing importance for those that are fartheraway. The size of the receptive field at each stage in the network isdetermined by the convolutional filter size, number of convolutionallayers, max pooling filter sizes, and number of max pooling layers.On both the encoder and decoder sides in Fig. 3, we use threestacked convolution layers, each followed by 2 × C. LSTM-based EDGe network for transient thermal analysis
Long short term memory (LSTM) based EDGe networks are aspecial kind of recurrent neural network (RNN) that are known to becapable of learning long term dependencies in data sequences, i.e.,they have a memory component and are capable of learning frompast information in the sequence.
2D convolution
2D convolutionMax poolMax pool
2D convolution transpose2D convolution transpose
Upsample
Upsample
ConvLSTM ConvLSTM ConvLSTM 𝑥 𝑥 𝑥 ConvLSTM
ConvLSTM ConvLSTM ො𝑥 ො𝑥 ො𝑥 𝑡 = 60𝑠, 𝑛 = 4 𝑡 = 600𝑠, 𝑛 = 40 𝑡 = 1200𝑠, 𝑛 = 80 𝑡 = 60𝑠,𝑛 = 4 𝑡 = 600𝑠,𝑛 = 40 𝑡 = 1200𝑠, 𝑛 = 80 Encoder Decoder
Fig. 4. LSTM-based EDGe network for transient analysis in ThermEDGe.
Hadamard product Convolution Sigmoid tanh
LSTM cell Conv LSTM cell
Fully connected1-D input data 2-D input data 2-D output data1-D output data Previous output data
Previous output dataPrevious cell state
Previous cell state Current cell stateCurrent cell state
Gate
Fig. 5. A fully connected LSTM cell (left) and a ConvLSTM cell (right).
For transient thermal analysis, the structure of ThermEDGe isshown in Fig. 4. The core architecture is an EDGe network, similar tothe static analysis problem described in Section II-B, except that thenetwork uses additional LSTM cells to account for the time-varying component. The figure demonstrates the time-unrolled LSTM whereinput power frames are passed to the network one frame at a time. TheLSTM cell accounts for the history of the power maps to generate theoutput temperature frames for all time steps. The network is used forsequence-to-sequence translation in transient thermal analysis, wherethe input is a set of time-varying power maps and the output is a setof time-varying temperature maps (Section II-A).Similar to the static ThermEDGe network (Fig. 3), the encoderconsists of convolution and max pooling layers to downsample andextract critical local and global spatial information and the decoderconsists of upsampling and transpose convolution layers to upsamplethe encoded output. However, in addition, transient ThermEDGe hasLSTM layers in both the encoding and decoding paths.A standard LSTM cell is shown in Fig. 5 (left). While the basicLSTM cell uses fully connected layers within each gate, our applica-tion uses a variation of an LSTM cell called a convolutional LSTM(ConvLSTM) [22], shown in Fig. 5 (right). In this cell, the fullyconnected layers in each gate are replaced by convolution layers thatcapture spatial information. Thus, the LSTM-based EDGe networkobtains a spatiotemporal view that enables accurate inference.III. T
HERM
EDG
E AND
IREDG E M ODEL T RAINING
We train the models that go into ThermEDGe and IREDGe to learnthe temperature and IR contours from the “golden” commercialtool-generated or ground truth data. We train ThermEdge using thefull physics-based thermal simulations from the Ansys-Icepak [23]simulator, incorporating off-chip thermal dynamics from package andsystem thermal characteristics. IREDGe is trained using static IR dropdistribution from a PDN analyzer [14], [24] for various power, PDNdensity, and power pad distributions.
A. Generating training data
Static ThermEDGe and IREDGe
A challenge we faced to evaluateour experiments is the dearth of public domain benchmarks thatfit these applications. The IBM benchmarks [25], are potentialcandidates for our applications, but they assume constant currentsper region and represent an older technology node. Therefore, wegenerate our dataset which comprises of 50 industry-relevant test-cases, where each testcase represents industry-standard workloadsfor commercial designs implemented in a FinFET technology. Thepower images of size 34 ×
32 pixels, with each pixel representing thepower/temperature a 250 µ m × µ m tile on an 8.5mm × Our training is specific to the resolution: for another image resolution,the model must be retrained. We reiterate that although the training isperformed on chips of fixed size, as we show (Section IV), inferencecan be performed on a chip of any size as long as the resolutionremains the same.For static ThermEDGe our training data is based on static Ansys-Icepak [23] simulations of these 50 testcases. For IREDGe, wesynthesize irregular PDNs of varying densities for each datasetelement using
PDN templates , as defined by OpeNPDN [16]. Thesetemplates are a set of PDN building blocks, spanning multiple metallayers in a 14nm commercial FinFET technology, which vary in theirmetal utilization. For our testcases, we use three templates (high,medium, and low density) and divide the chip into nine regions. Asoutlined in Section II-A), we use a checkerboard pattern of powerpads that vary in the bump pitch and offsets across the dataset.The synthesized full-chip PDN, power pad locations, and powerdistributions are taken as inputs into the IR analyzer [24] to obtaintraining data for IREDGe. For each of the 50 testcases, we synthesize10 patterns of PDN densities, and for each combination of combi-nation of power and PDN distribution we synthesize 10 patterns ofpower pad distributions, creating a dataset with 5000 points. Note that although the temperature and power map work at this resolution,the actual simulation consists of millions of nodes; using fewer node (e.g.,one node per pixel) is grossly insufficient for accuracy.
Transient ThermEDGe
For the transient analysis problem, ourtraining data is based on transient Ansys-Icepak [23] simulations.The size of the chip is the same as that of the static ThermEDGetestcases. For each testcase, we generate 45 time-step simulationsthat range from 0 to 3000s, with irregular time intervals from thethermal simulator. Each simulation is expensive in terms of the timeand memory resources: one simulation of a 3000s time intervalwith 45 time-steps can take 4 hours with 2 million nodes. TransientThermEDGe is trained using constant time steps of 15s which enableseasy integration with existing LSTM architectures which have animplicit assumption of uniformly distributed time steps, withoutrequiring additional features to account for the time. The model istrained on 150 testcases with time-varying workloads as features, andtheir time-varying temperature from Ansys-Icepak as labels.
B. Model training
For the static analysis problem, ThermEDGe and IREDGe use a staticpower map as input and PDN density map (for IR analysis only)to predict the corresponding temperature and IR drop contours. Forthe transient thermal analysis problem, the input is a sequence of200 power maps and the output is a sequence of 200 temperaturecontours maps at a 15s time interval. The ML model and traininghyperparameters used for these models are listed in Table I.
TABLE IT
HERM
EDG
E AND
IREDG E ML HYPERPARAMETERS
ML hyperparameters StaticThermEDGe IREDGe TransientThermEDGeModel layerparameters 2D conv12D conv trans1 filter size 5x5 3x3 5x5
We split the data in each set, using 80% of the data points fortraining, 10% for test, and 10% for validation. The training datasetis normalized by subtracting the mean and dividing by the standarddeviation. The normalized golden dataset is used to train the networkusing an ADAM optimizer [26] where the loss function is a pixel-wisemean square error (MSE). The convolutional operation in the encoderand the transpose convolution in the decoder are each followed byReLU activation to add non-linearity and L2 regularization to preventover fitting. The model is trained in Tensorflow 2.1 on an NVIDIAGeForce RTX2080Ti GPU. Training run-times are: 30m each forstatic ThermEDGe and IREDGe, and 6.5h for transient ThermEDGe.We reiterate that this is a one-time cost for a given technology nodeand package, and this cost is amortized over repeated use over manydesign iterations for multiple chips.IV. R
ESULTS AND A NALYSIS USING T HER
EDG E /IREDG E A. Experimental setup and metric definitions
ThermEDGe and IREDGe are implemented using Python3.7 withina Tensorflow 2.1 framework. We test the performance of our modelson the 10% of datapoints reserved for the testset (Section III-B)which are labeled T1–T21. As mentioned earlier in Section III-A, duethe unavailability of new, public domain benchmarks to evaluate ourexperiments, we use benchmarks that represent commercial industry-standard design workloads.
Error metrics
As a measure of goodness of ThermEDGe andIREDGe predictions, we define a discretized regionwise error, T err = | T true − T pred | , where T true is ground truth image,generated by commercial tools, and T pred the predicted image,generated by ThermEDGe. IR err is computed in a similar way. Wereport the average and maximum values of T err and IR err for eachtestcase. In addition, the percentage mean and maximum error arelisted as a fraction of a temperature corner, i.e., 105 ◦ C for thermalanalysis and as a fraction of VDD = 0 . V for IR drop analysis.
B. Performance of ThermEDGe and IREDGe: Accuracy and speed
Static ThermEDGe results
A comparison between the commercialtool-generated temperature and the ThermEDGe-generated temper-ature map for T1–T5 are listed in Table II. The runtime of staticThermEDGe for each the five testcases which are of size 34 ×
32 isapproximately 1.1ms in our environment. On average across the fivetestcases (five rows of the table), ThermEDGe has an average T err of0.63 ◦ C and a maximum T err of 2.93 ◦ C. These numbers are a smallfraction when compared to the maximum ground truth temperatureof these testcases (85 – 150 ◦ C). The fast runtimes imply that ourmethod can be used in the inner loop of a thermal optimizer, e.g.,to evaluate various chip configurations under the same packagingsolution (typically chosen early in the design process). For suchapplications, this level of error is very acceptable.
TABLE IIS
UMMARY OF T HERM
EDG
E RESULTS FOR STATIC AND TRANSIENTANALYSIS ACROSS TESTCASES . Static ThermEDGe Transient ThermEDGe T err Max T err T err Max T err T1 0.64C (0.61%) 2.76C (2.63%) T6 0.51C (0.49%) 5.59C (5.32%)T2 0.63C (0.60%) 2.67C (2.54%) T7 0.58C (0.55%) 6.17C (5.88%)T3 0.65C (0.62%) 2.93C (2.79%) T8 0.57C (0.54%) 5.83C (5.55%)T4 0.48C (0.46%) 2.22C (2.11%) T9 0.52C (0.50%) 6.32C (6.02%)T5 0.75C (0.71%) 2.86C (2.72%) T10 0.56C (0.53%) 7.14C (6.80%) (a) (b)(c)
Histogram of %𝑇 𝑒𝑟𝑟 (d)Worstcase %𝑇 𝑒𝑟𝑟 % 𝑇 𝑒𝑟𝑟 F r equen cy P o w e r Fig. 6. ThermEDGe static temperature estimation on T1: (a) input normalizedpower distribution, (b) histogram of T err where maximum error is 2.76 ◦ Cwhich is very small compared to the maximum temperature of 85 ◦ C, (c)ground truth temperature map, and (d) predicted temperature map.
A graphical view of the predicted map for T1 is depicted inFig. 6. For a given input power distribution in Fig. 6(a), ThermEDGegenerates the temperature contour plots, as shown in Fig. 6(d). Wecompare the predicted value against the true value (Fig. 6(c)). Thediscrepancy is visually seen to be small. Numerically, the histogram inFig. 6(b) shows the distribution of % T err across regions (Fig. 6(b).The average T err ◦ C and the maximum T err is 2.93 ◦ C. This Achieving this accuracy requires much finer discretization in Icepak. corresponds an average error of 0.52% and worst-case error of 2.79%as shown in the figure.
Transient ThermEDGe results
The transient thermal analysis prob-lem is a sequence-to-sequence prediction task where each datapointin the testset has 200 frames of power maps at a 15s interval. Trainedtransient ThermEDGe predicts the output temperature sequence forthe input power sequence. We summarize the results in Table II. Theinference run-times of T6–10 to generate a sequence 200 frames oftemperature contours is approximately 10ms in our setup. Acrossthe five testcases, the prediction has an average T err of 0.52% anda maximum T err of 6.80% as shown. The maximum T err in ourtestcases occur during transients which do not have long-last effects(e.g., on IC reliability). These errors are reduced to the average T err values at sustained peak temperatures.Fig. 7 (left) shows an animated video of the time-varying powermap for T6, where each frame (time-step) is after a 15s time interval.As before, the corresponding ground truth and predicted temperaturecontours are depicted in center and right, respectively, of the figure. Fig. 7. [For an animated version, visit the GitHub repository:https://github.com/asp-dac/asp-dac-1323.git to view the video.]
Video compar-ing the prediction of transient ThermEDGe against commercial tool-generatedtemperature contours for T6: (i) left video shows the time-varying power map,(ii) center video shows the commercial tool-generated temperature maps, and(iii) right video shows ThermEDGe-generated temperature maps
IREDGe results
We compare IREDGe-generated contours againstthe contours generated by [24] across 500 different testcases (10%of the data, orthogonal to the training set) with varying PDN densitiesand power distributions. Across the five testcases in Table III,IREDGe has an average IR err of 0.053mV and a worstcase max IR err of 0.34mV which corresponds to 0.008% and 0.048% ofVDD respectively. Given that static IR drop constraints are 1–2.5%of VDD, a worstcase error of 0.34mV is acceptable in light of therapid runtimes. We list the results of five representative testcases inTable III where the percentage errors in IR err are listed as fractionof VDD = 0 . V. TABLE IIIS
UMMARY OF RESULTS FROM
IREDG
E FOR DIFFERENT TESTCASES .T16-T20
ARE TESTCASES WHICH HAVE A CHIP SIZE THAT WAS NOT INTHE TRAINING SET . Chip size: 34x32 Chip size: 68x32 IR err Max IR err IR err Max IR err T11 0.052mV (0.007%) 0.26mV (0.03%) T16 0.035mV (0.005%) 0.16mV (0.02%)T12 0.074mV (0.011%) 0.34mV (0.05%) T17 0.054mV (0.008%) 0.42mV (0.06%)T13 0.036mV (0.005%) 0.21mV (0.03%) T18 0.035mV (0.005%) 0.35mV (0.05%)T14 0.053mV (0.008%) 0.24mV (0.03%) T19 0.068mV (0.010%) 0.22mV (0.03%)T15 0.051mV (0.007%) 0.23mV (0.03%) T20 0.061mV (0.009%) 0.38mV (0.05%)
A detailed view of T11 is shown in Fig. 8. It compares theIREDGe-generated IR drop contour plots against contour plot gener-ated by [24]. The input power maps, PDN density maps, and effectivedistance to power pad maps are shown in Fig. 8(a), (b), and (c)respectively. Fig. 8(d) and (e) shows the comparison between groundtruth and predicted value for the corresponding inputs. It is evidentthat the plots are similar; numerically, the histogram in Fig. 8(f) showsthe % IR err where the worst % IR err is less than 0.02% of VDD. (a)(d) (e) N o r m a li ze d e ff ec ti v e d i s t a n ce t o po w e r p a d ( μ m ) (f) Histogram of 𝐼𝑅 𝑒𝑟𝑟 P o w e r ( W ) (b) P DN D e n s it y (c) Worstcase 𝐼𝑅 𝑒𝑟𝑟 = 0.02% VDD F r equen cy %𝐼𝑅 𝑒𝑟𝑟 of VDD Allowable static IR = 2% VDD
Fig. 8. IREDGe static IR drop estimation on T11: (a) input power map, (b)input PDN density map, (c) effective distance to power pad map (d) groundtruth IR drop map, (e) predicted IR drop map, and (f) histogram of IR err showing a worstcase error of 0.16mV. V o l t age ( V ) (a) (b) Fig. 9. Size independent nature of IREDGe: Comparison between (a) ActualIR drop contours and (b) IREDGe-predicted contours for a power map (T16)with size 68 ×
32 using a model that was trained on images of size 34 × Size-independence
One of the primary advantages of using IREDGefor static IR estimation is that its fully-convolutional nature enablesthe use of input images of any size, and the size of the hotspotdetermines the model rather than the size of the chip. Since the trainedmodel comprises only of the trained weights of the kernel, the samekernel can be used to predict the temperature contours of chip ofany size as long as resolution of the represented image remains thesame. We test static IREDGe on chips of a different size (T16 –T20), using a power distribution of size × as input. Fig. 9(a)compares the actual IR drop of T16 (Fig. 9(a)) and the IREDGe-predicted (Fig. 9(b)) solution of T16 using a model which was trainedon × power maps. We summarize the results for the rest of thetestcases in Table III. Runtime analysis
A summary of the runtime comparison of our ML-based EDGe network approach against the temperature and IR dropgolden solvers is listed in Table IV. The runtimes are reported on aNVIDIA GeForce RTX 2080Ti GPU. With the millisecond inferencetimes, and the transferable nature of our trained models, the one-timecost of training the EDGe networks is easily amortized over multipleuses within a design cycle, and over multiple designs.
TABLE IVR
UNTIME COMPARISON BETWEEN
EDG
E NETWORKS AND GOLDENTHERMAL ANALYSIS AND IR DROP ANALYSIS TOOLS
Analysis type ) Icepak/PDNSim(minutes) ThermEDGe/IREDGe(milli seconds) Static thermal 2.0 million 64 30 mins 1.1 msTransient thermal 2.0 million 64 210 mins 10 msStatic IR drop 5.2 million 0.16 310 mins 1.1 ms
C. IREDGe compared with PowerNet
We compare the performance of IREDGe against our implementationof PowerNet, based on its description in [8]. The layout is divided into V o l t age ( m V ) PowerNet predicted
RMSE = 5.63e-2 mVIREDGe predictedRMSE = 2.40e-2 mV(a) (b) (c)
Fig. 10. IR drop comparisons on T21: (a) ground truth, (b) from IREDGe,and (c) from our implementation of PowerNet. tiles, and the CNN features are the 2-D power distributions (togglerate-scaled switching and internal power, total power, and leakagepower) within each tile and in a fixed window of surrounding tiles.The trained CNN is used to predict the IR drop on a tile-by-tile basisby sliding a window across all tiles on the chip. The work uses atile size of 5 µ m × µ m and takes into consideration a 31 ×
31 tiledneighborhood (window) power information as features. For a faircomparison, we train IREDGe under a fixed PDN density and fixedpower pad locations that is used to train PowerNet. Qualitatively,IREDGe is superior on three aspects:(1)
Tile and window size selection:
It is stated in [8] that whenthe size of the tile is increased from 1 µ m × µ m to 5 µ m × µ m andthe size of the resulting window is increased to represent 31 × µ m tiles instead of 1 µ m tiles, the accuracy of thePowerNet model improves. In general, this is the expected behaviorwith an IR analysis problem where the accuracy increases as moreglobal information is available, until a certain radius after whichthe principle of locality holds [9]. IREDGe bypasses this tile-sizeselection problem entirely by providing the entire power map as inputto IREDGe and allowing the network to learn the window size thatis needed for accurate IR estimation.(2) Run times:
Unlike PowerNet, which trains and infers IR drop ona sliding tile-by-tile basis, IREDGe has faster training and inference.IREDGe requires a single inference, irrespective of the size of thechip while PowerNet performs an inference for every tile in thechip. For this setup and data, it takes 75 minutes to train andimplementation of PowerNet, as against 30 minutes for IREDGe.For inference, PowerNet takes 3.2ms while IREDGe takes 1.1ms fora 34 ×
32 chip size. For a chip of × IREDGe takes 1.3ms togenerate IR drop contours while PowerNet takes 6.2ms.(3)
Model accuracy:
Since PowerNet uses a CNN to predict IRdrop on a region-by-region basis, where each region is 5 µ m by 5 µ m,the resulting IR drop image is pixelated, and the predicted regionprediction value does not correlate well with the neighboring regions.We compare IREDGe against our implementation of PowerNeton five different testcases T21–25. These testcases have the samepower distribution in T11–15 except that all the five testscaseshave identical uniform PDNs, and identical power pad distributions,as required by PowerNet; IREDGe does not require this. Fig. 10shows a comparison between the IR drop solutions from a goldensolver (Fig. 10(a)), IREDGe (Fig. 10(b), and our implementationof PowerNet (Fig. 10(c)) for T21 (a representative testcase). Onaverage, across T21–25 IREDGe has an average IR err of 0.028mVand a maximum IR err of 0.14mV as against 0.042mV and 0.17mVrespectively for PowerNet.V. C ONCLUSION
This paper addresses the compute-intensive tasks of thermal andIR analysis by proposing the use EDGe networks as apt ML-basedsolutions. Our EDGe-based solution not only improves runtimes butovercomes the window size-selection challenge (amount of neigh-borhood information required for accurate thermal and IR analysis), that is faced by other ML-based techniques, by allowing ML tolearn the window size. We successfully evaluate EDGe networksfor these applications by developing two ML software solutions(i) ThermEDGe and (ii) IREDGe for rapid on-chip (static anddynamic) thermal and (static) IR analysis respectively. In principle,our methodology is applicable to dynamic IR as well, but is notshown due to the unavailability of public-domain benchmarks.R
EFERENCES [1] Y. Zhan, S. V. Kumar, and S. S. Sapatnekar, “Thermally-aware design,”
Found. Trends Electron. Des. Autom. , vol. 2, no. 3, pp. 255–370, 2008.[2] Y. Zhong and M. D. F. Wong, “Fast algorithms for IR drop analysis inlarge power grid,” in
Proc. ICCAD , 2005, pp. 351–357.[3] K. Zhang, A. Guliani et al. , “Machine learning-based temperatureprediction for runtime thermal management across system components,”
IEEE Trans Parallel Distrib. Syst. , vol. 29, no. 2, pp. 405–419, 2018.[4] D. Juan, Huapeng Zhou et al. , “A learning-based autoregressive modelfor fast transient thermal analysis of chip-multiprocessors,” in
Proc. ASP-DAC , 2012, pp. 597–602.[5] S. Sadiqbatcha, H. Zhao et al. , “Hot spot identification and system pa-rameterized thermal modeling for multi-core processors through infraredthermal imaging,” in
Proc. DATE , 2019, pp. 48–53.[6] S.-Y. Lin, Y.-C. Fang et al. , “IR drop prediction of ECO-revised circuitsusing machine learning,” in
Proc. VLSI Test Symposium (VTS) , 2018,pp. 1–6.[7] C. Ho and A. B. Kahng, “IncPIRD: Fast learning-based prediction ofincremental IR drop,” in
Proc. ICCAD , 2019, pp. 1–8.[8] Z. Xie, H. Ren et al. , “PowerNet: Transferable dynamic IR dropestimation via maximum convolutional neural network,” in
Proc. ASP-DAC , 2020, pp. 13–18.[9] E. Chiprout, “Fast flip-chip power grid analysis via locality and gridshells,” in
Proc. ICCAD , 2004, pp. 485–488.[10] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networksfor semantic segmentation,”
IEEE T. Pattern Anal. Mach. Intell. , vol. 39,no. 4, pp. 640–651, 2017.[11] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networksfor biomedical image segmentation,” in
Proc. Int. Conf. Med. ImageComput. Comput.-Assisted Intervention , 2015, pp. 234–241.[12] X. Mao, C. Shen, and Y.-B. Yang, “Image restoration using very deepconvolutional encoder-decoder networks with symmetric skip connec-tions,” in
Proc. NeurIPS , 2016, pp. 2802–2810.[13] V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A deep con-volutional encoder-decoder architecture for image segmentation,”
IEEET. Pattern Anal. Mach. Intell.
IEEE T. Comput. Aid D. , vol. 25, no. 4, pp. 664–677, 2006.[16] V. A. Chhabria, A. B. Kahng et al. , “Template-based PDN synthesis infloorplan and placement using classifier and CNN techniques,” in
Proc.ASP-DAC , 2020, pp. 44–49.[17] B. W. Amick, C. R. Gauthier, and D. Liu, “Macro-modeling conceptsfor the chip electrical interface,” in
Proc. DAC , 2002, p. 391394.[18] F. Yazdani, “Foundations of heterogeneous integration: An industry-based, 2.5d/3d pathfinding and co-design approach.” Boston, MA:Springer, 2018.[19] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learningwith neural networks,” in
Proc. NeurIPS , 2014, pp. 3104–3112.[20] V. Dumoulin and F. Visin, “A guide to convolution arithmetic for deeplearning,” arXiv:1603.07285v2 , Mar. 2016.[21] W. Luo, Y. Li et al. , “Understanding the effective receptive field in deepconvolutional neural networks,” in
Proc. NeurIPS , 2016, pp. 4905–4913.[22] X. Shi, Z. Chen et al. , “Convolutional LSTM network: A machinelearning approach for precipitation nowcasting,” in
Proc. NeurIPS
Proc. ASP-DAC .IEEE Computer Society Press, 2008, p. 376381.[26] D. Kingma and J. Ba, “ADAM: A method for stochastic optimization,”in