Stochastic Automata Network for Performance Evaluation of Heterogeneous SoC Communication
aa r X i v : . [ c s . PF ] J un Stochastic Automata Network for PerformanceEvaluation of Heterogeneous SoC Communication
Ulhas Deshmukh
Lectrurer in ECE, Govt. Polytechnic, Dhule, India &Research Scholar MNIT, Jaipur, IndiaEmail: [email protected]
Vineet Sahula,
Senior Member, IEEE
Professor, Deptt. of Electronics & Comm. Engg.Malaviya National Institute of Technology, Jaipur, IndiaEmail: [email protected]
Abstract — To meet ever increasing demand for performanceof emerging System-on-Chip (SoC) applications, designer employtechniques for concurrent communication between components.Hence communication architecture becomes complex and majorperformance bottleneck. An early performance evaluation ofcommunication architecture is the key to reduce design time,time-to-market and consequently cost of the system. Moreover,it helps to optimize system performance by selecting appropriatecommunication architecture. However, performance model ofconcurrent communication is complex to describe and hard tosolve. In this paper, we propose methodology for performanceevaluation of bus based communication architectures, modelingfor which is based on modular Stochastic Automata Network(SAN). We employ Generalized Semi Markov Process (GSMP)model for each module of the SAN that emulates dynamicbehavior of a Processing Element (PE) of an SoC architecture.The proposed modeling approach provides an early estimation ofperformance parameters viz. memory bandwidth, average queuelength at memory and average waiting time seen by a processingelement; while we provide parameters viz. number of processingelements, the mean computation time of processing elementsand the first and second moments of connection time betweenprocessing elements and memories, as input to the model.
I. I
NTRODUCTION
Modern-day System-on-Chip (SoC) platforms use a largenumber of embedded processors and application specific hard-ware components [1]. An integration of these heterogeneouscomponents into a single chip makes communication amongthem critical. Besides, these components are pre-verified andoptimized. Hence, communication architecture emerges as akey performance determining component of these multiproces-sor SoC (MP-SoC) platforms. Furthermore, availability of sev-eral commercial communication architectures such as, AMBA,CoreConnect and their customization facilitate the designerwith variety of design alternatives. Therefore, system levelperformance estimation is essential for selection of optimumcommunication architecture from a wide design space at anearly stage of design cycle.System-on-Chip applications use different types of commu-nication architectures viz. bus-based, Network-on-Chip (NoC)based, hybrid bus-NoC architecture and crossbar architecture.Bus based architectures can be further classified as dedicatedbuses, single shared bus and network of shared buses. InSoCs and embedded applications, bus based architectures arepopular because these are simple, consume less power andarea. Moreover, performance of bus based architectures not only suffices for low end and high volume applications butalso results in cheaper design. This has been motivation forour efforts for estimating performance of bus based commu-nication architectures at the system level.In this paper, we propose system level performance esti-mation of bus based communication architectures based onStochastic Automata Network (SAN). Mainly, we focus onformulation of SAN model for a Single Shared Bus (SSB) ar-chitecture and its extension for Hierarchical Bus Bridge (HBB)architecture. The approach has been proposed as an extensionof GSMP based performance model of these architectures[2]. In Section II, we present basic concepts and terminologyof SAN, related work and our contribution. In Section III,we propose the SAN framework of a SSB architecture forperformance estimation. Section IV contains enhancement ofthe SAN formulation for HHB architecture. We present theresults in Section V. We conclude in Section VI.II. B
ACKGROUND
A. Stochastic Automata Network: an overview
A stochastic automata network consist of a number ofmodules or stochastic automata. A module is modeled by aset of states and a set of transitions which determines dynamicbehavior of a component of the parallel system. The state ofone module is called local state , while global or system state is the collection of local states of all modules. In short, theSAN model is modular representation of parallel system. Themodules of a SAN model interact with each other using local and synchronizing events . Local event changes the state of asingle component module by triggering local transition. Syn-chronizing event modifies the states of more than one modulesby simultaneous transitions in those modules. Probabilities oflocal and synchronizing transition can be functional or non-functional . In functional transition, transition probability is thefunction of the states of other modules whereas it is constantin non-functional transition.For formal description, let us consider a SAN modelwith N component modules and a set of events E. The i th automaton, A ( i ) (where i = 1 , , ..., N ) with a set ofstates S ( i ) = { a ( i ) , ..., z ( i ) } having cardinality n i . Local statevariable of A ( i ) is denoted by x ( i ) . Hence, global state ofthe SAN is the collection of all local states i.e. a vector ˜ x = ( x (1) , x (2) , ..., x ( N ) ) whereas S = S (1) x S (2) x ... xre-print of manuscript in NORCHIP-2008 S ( N ) is called the global state space . The details of SAN canbe found in [3] and references there in. B. Related Work
Work reported in [4], uses static performance estimationtechnique for allocation of communication channels. Our pre-vious work [2], proposes an analytical performance evaluationof SSB and HBB architectures based on GSMP model. Analyt-ical approach as in [5], estimates communication overhead inthe pipelined communication path, which considers an impactof various protocol parameters on data transfer. Work in [6]proposes simulation based approach based on Operation StateMachine for performance estimation of the system. Authors in[7] have proposed two phase hybrid performance estimationapproach which first performs initial co-simulation with ab-stract communication and then analyzes time inaccurate com-munication graph by specifying communication architecture.A large body of work dealing SAN formalization is availablein [3] [8]. Authors in [9] use SAN model for performanceanalysis in platform based design.
C. Contribution of the paper
Main contribution of the paper lies in the proposal forsystem level performance estimation of a SSB architecture andHBB architecture. The formulation is based on the SAN modelof communication architectures. We present high level simu-lation model of these architectures in the Stateflow componentof MATLAB.Proposed modeling approach provides an early estimationof memory bandwidth (BW), average queue length ( L ) andaverage waiting time ( W ) for a SSB architecture; whereas incase of HBB architecture, we estimate local bandwidth ( BW ℓ ),local average queue length ( L ℓ ), local average waiting time( W ℓ ), global memory bandwidth ( BW g ), global average queuelength ( L g ) and global average waiting time ( W g ). The inputparameters to the model are number of Processing Elements(PEs) (N), the mean computation time ( T ) and first and secondmoment of connection time of PEs ( C , C ). Additional inputparameters for HBB architecture are: probability of local andglobal requests ( X ℓ and X g ), first and second moment of localand global connection times ( C ℓ , C ℓ , C g , C g ).III. SAN BASED MODEL FOR
SSB
ARCHITECTURE
In this section, we propose the SAN model of a hetero-geneous SSB architecture for evaluating performance metrics.The model has been proposed as an extension of GSMP basedperformance model of a homogeneous SSB architecture [2].Two types of abstract communication models are being used inSoC platforms- (i) massage passing communication model and(ii) shared memory communication model. Our formulationis based on the latter model, in which SoC function involvescommunication of the PEs with the memories. Figure 1 showssynchronous SSB architecture which consists of N heteroge-neous processing elements,
P E , P E ,..., P E N competing forthe use of a bus. We assume that a bus arbitration is based onthe fixed priorities of PEs. The lowest priority is assigned to P E while the highest to P E N . The bus access is assumed tobe non-preemptive. Arbiter of N-user one-server type resolvesthe bus access conflict. ArbiterMEMPE1 PE 2 PEN
SINGLE SHARED BUS
I/F I/F I/F I/F I/F
Fig. 1. A single shared bus communication architecture.
A. Model formulation
Stochastic automata network of a heterogeneous SSB ar-chitecture is modeled as a collection of interacting modulesof PEs. We employ GSMP model [2] for each module whichrepresents dynamic behavior of a PE. We use functional andsynchronizing transitions to describe an interactions amongthese modules. Figure 2 depicts SAN model of a SSB ar-chitecture, whereas Fig. 3 shows details of one automaton A ( i ) that represents GSMP model of P E i . Computing state labelled as CP i , corresponds to the situation when the P E i is computing. In Accessing state AC i , the P E i accessesMEM. In full waiting state labelled as F W i , the P E i waitsfor MEM for full connection time of another PE which isaccessing MEM; while in residual waiting state labelled as RW i , the P E i waits for MEM for residual connection timeof a accessing PE. In each state, model spends random amountof time with mean value η k , called mean sojourn time of k th state ( k = CP i , AC i , F W i , RW i ).We express state transition probabilities of the SAN modelin terms of transition probabilities of GSMP model of ahomogeneous SSB architecture [2]. These are explained asfollows. (i) α ∗ i - a local transition involves only A ( i ) , with con-stant probability α i . (ii) α ∗ i - the functional transition whichdepends on the global state of the system. This transition takesplace if all high priorities PEs are in computing states. (iii) α ∗ i - a synchronizing transition which synchronizes with event e j (any α transitions of higher priority PEs) with probability p e and alternate probability 1. (iv) α ∗ i - a functional transitionwhich takes place if any one of the PEs is in accessing state. α ∗ i = α i = 1 α ∗ i = f ( x j ) = (cid:26) if , x j = CP j , j = i + 1 , ..., N otherwise α ∗ i = ( e j , p e , , j = i + 1 , ..., Nα ∗ i = f ( x j ) = (cid:26) if , x j = AC j , j = 1 , , ..., N otherwise α ∗ i = 1 − α ∗ i Performance parameters of the i th PE are computed fromsteady state probabilities [2] viz. BW i = P iAC , P U i = P iAC + P iCP , L i = ( P iF W + P iRW ) and W i = ( η iF W α ∗ i + re-print of manuscript in NORCHIP-2008 PE1 PE N A (N)A (2)A (1) PE 2
23 10 23 1023 10
State 0 − CPState 1 − ACState 2 − FWState 3 − RW
Fig. 2. The SAN model for a heterogeneous SSB communication architecture. η iRW α ∗ i ) /α ∗ i (where, P k is steady state probability of the k th state). RW FW i ACCP i ii α ∗ i α ∗ i α ∗ i α ∗ i α ∗ i α ∗ i α ∗ i α ∗ i Fig. 3. An automaton A ( i ) representing GSMP model of P E i . IV. SAN
BASED MODEL FOR
HBB
ARCHITECTURE
In this section, we extend SAN modeling approach for HBBarchitecture. HBB architecture is composed of two sharedbuses
BU S and BU S , and connected by a bus bridgeas shown in Fig. 4. Here, N number of PEs on each bus,compete to access shared memories M EM or M EM . Atthe bridge level communications on two buses are concurrentwhereas at bus level behavior of PEs are concurrent. Forsimplicity, let us consider a scenario when a PE mapped to BU S generates either a local request to access M EM orglobal request to access M EM . With reference to this PE,parameters of M EM and M EM are referred to as local andglobal parameters, respectively. Let X ℓ be the probability oflocal request, implying only BU S would be used to access M EM , and arbitration of BU S is sufficient. Whereas X g bethe probability of global request where both BU S and BU S would be used to access M EM , and two stage arbitration of BU S and BU S is essential. A. Model formulation
We propose two level SAN model for HBB architecture. Atbridge level the SAN consist of two automata correspond to
BU S and BU S and are similar to the Fig. 2. At bus level,each module is composed of automata of PEs. At bridge leveltwo automata of buses interact with each other while at buslevel interaction among automata of PEs is modeled.Automata of the P E i in aforementioned scenario (mappedto BU S ) is depicted in Fig. 5. State lAC i , state lF W i and B u s I/ f B u s I/ f B r i dg e ArbiterMEM1
BUS PE 2 PE NPE 1
Arbiter
PE 1
MEM2
PE 2 PE N
I/F I/F
I/F I/F I/F
I/F
I/F
I/F
BUS I/F I/F
Fig. 4. Hierarchical bus bridge communication architecture. state lRW i correspond to local memory M EM and aresimilar to the states of automata of a P E i of SSB architecture(Fig. 3). Global accessing state labelled as state gAC i , globalfull waiting state labelled as state gF W i and global residualwaiting state labelled as state gRW i are analogous states whena PE attempts to access M EM . Detail discussion of modelequations and performance parameters is omitted. lRW i i lACgACgRW gFW i i ii lFW i CP β ∗ i β ∗ i β ∗ i α ∗ i α ∗ i α ∗ i α ∗ i α ∗ i β ∗ i β ∗ i β ∗ i β ∗ i β ∗ i α ∗ i α ∗ i α ∗ i Fig. 5. An automaton A ( i ) of P E i in HBB architecture. V. R
ESULTS
In this section, we present performance evaluation resultsof SSB and HBB architectures obtained using the proposedmodeling approach. We have captured the SAN model ofboth architectures with fixed arbitration scheme in Stateflowcomponent of MATLAB. Simulation was performed on onP-IV, 1 GB Linux-workstation. In both examples, randomcomputation and communication times of PEs were generatedby using MATLAB m-functions with generalized distribution.re-print of manuscript in NORCHIP-2008As first example, we have considered a SSB architecturewith three PEs-
P E , P E and P E . We assigned the lowestpriority to P E and the highest to P E . We assigned meanvalues of computation times of PEs as: T = T = T = 2 cy-cles. We varied mean communication time ( C ) of P E with C and C as parameters. Various performance parameters ofthe PEs viz. BW , L and W have been estimated. For brevity,we present results of BW and L of P E , as shown in Fig.6(a) and 6(b). B and w i d t h ( B W ) Communication time (C l )C =C =2 cyclesC =2,C =4 cyclesC =4,C =2 cyclesC =C =4 cycles 0 0.1 0.2 0.3 0.4 0.5 2 4 6 8 10 12 14 16 18 20 Q ueue l eng t h ( L ) Communication time (C l )C =C =2 cyclesC =2,C =4 cyclesC =4,C =2 cyclesC =C =4 cycles (a) (b) Fig. 6. Variation of (a) BW and (b) L , with C . As observed from the Fig. 6(a), bandwidth increases withcommunication time which is due to increase in mean sojourntime of AC state. The Fig. also shows influence of C and/or C on BW . Reduction in bandwidth is observed when wechanged C and/or C from two to four cycles, since P E hasto wait more time in waiting states. P E received maximumbandwidth (25 %) when C = C =2 cycles and C =20 cycles;and minimum bandwidth (3 %) when C = C = 4 cyclesand C =2 cycles. Figure 6(b) reveals converse observationsfor queue length, L . For higher values of C and/or C , P E and/or P E access MEM for more time than P E . As aconsequence P E spends more time in waiting states. Hence,higher value of L is noted for C = C = 4 cycles.In second example, we have considered a HBB architecturewith two PEs mapped to each bus. Processing elements, P E and P E are mapped to BU S ; while P E and P E aremapped to BU S . We assigned descending priorities fromglobal requests of P E , P E , P E and P E ; and then lo-cal requests in the same order. Various model input parametersare assigned values as follows- X ℓ =0.7, X ℓ =0.8, X ℓ =0.7, T = T = T = T = 2 cycles, C ℓ = C ℓ = C ℓ = 2 cycles,and C g = C g = C g = 2 cycles (here, ℓ and g denote localand global parameters followed by PE number). From variousevaluated performance parameters of PEs, we present local andglobal bandwidth ( BW ℓ , BW g ) of P E . We varied C ℓ for local bandwidth and C g for global bandwidth. Figure7(a) and 7(b) show plot of these parameters with probabilityof local request, X ℓ with C ℓ and C g as parameters.We observe that local bandwidth, BW ℓ increases withincrease in X ℓ as well as with C ℓ . At higher values of X ℓ , BW ℓ is more sensitive to C ℓ . An influence of C g on BW ℓ is clearly noted from the Fig. 7(a). Share of localbandwidth declined as we increased C g from two cyclesto four cycles. In case of global bandwidth, BW g gradual Lo c a l band w i d t h ( B W l ) Probability of local request (X l )C l22 =2 cyclesC l22 =4 cyclesC l22 =2,C g11 =4 cycles 0 0.05 0.1 0.15 0.2 0.25 0.3 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 G l oba l band w i d t h ( B W g22 ) Probability of local request (X l )C g22 =2 cyclesC g22 =4 cyclesC g22 =6 cycles (a) (b) Fig. 7. Effect of X ℓ on (a) BW ℓ and (b) BW g . decrease is observed with increase in X ℓ . At the same valueof X ℓ , the P E received more bandwidth with higher C g .Variations in BW g with C g at higher values of X ℓ arenot significant. VI. C ONCLUSIONS
This paper presents SAN based modeling approachfor system level performance evaluation of SSB andHBB architectures. We have evaluated performance metricviz. bandwidth, queue length and waiting time withcommunication times of processing elements for SSBarchitecture. For HBB architecture performance parametersfor local and global memories are evaluated with localrequesting probabilities. Proposed approach provides an earlyestimation of performance metrics that can help the designerto select the appropriate communication architecture for SoCand embedded applications.
Acknowledgments:
We gratefully acknowledge the financialsupport provided by the Department of IT, Ministry of Communica-tion & IT, Govt. of India under SMDP-VLSI-II project. R EFERENCES[1] International Technology Roadmap for Semiconductor (ITRS), 2007Edition, [online] Available: http://public.itrs.net.[2] U. Deshmukh and V. Sahula, “Interactive generalized semi Markov pro-cess model for evaluating arbitration schemes of SoC bus architectures,”in
Second UKSIM European Sym. on Computer Modeling and Simulation ,Sep. 2008, pp. 578–583.[3] B. Plateau and K. Atif, “Stochastic automata network for modelingparallel systems,”
IEEE Trans. on Software Eng. , vol. 17, no. 10, pp.1093–1108, 1991.[4] J. M. Daveau, T. B. Ismail, and A. A. Jerraya, “Synthesis of system-levelcommunication by an allocation-based approach,” in
Pro. of the 8th Int.Sym. on System Synthesis, 1995 , Sep. 1995, pp. 150–155.[5] P. V. Knudsen and J. Madsen, “Integrating communication protocolselection with partitioning in hardware/software codesign,” in
Proc. 11thInt. Sym. on System Synthesis , Dec. 1998, pp. 111–116.[6] X. Zhu, W. Qin, and S. Malik, “Modeling operation and microarchi-tecture concurrency for communication architectures with application toretargetable simulation,”
IEEE Trans. VLSI Systems , vol. 14, no. 7, pp.707–716, Jul. 2006.[7] K. Lahiri, A. Raghunathan, and S. Dey, “System-level performanceanalysis for designing on-chip communication architectures,”
IEEE Trans.on CAD of ICs , vol. 20, no. 6, pp. 768–783, Jun. 2001.[8] W. J. Stewart, K. Atif, and B. Plateau, “The nuemerical solution ofstochastic automata networks,”
European Journal of Operation research ,vol. 86, no. 3, pp. 503–525, 1995.[9] A. Nandi and R. Marculescu, “System-level power/performance analysisfor embedded systems design,” in