Graph based Data Dependence Identifier for Parallelization of Programs
GGraph based Data Dependence Identifier forParallelization of Programs
Kavya Alluru
School of Computer Science and Engineering, Vellore Institute of Technology, Chennai
Jeganathan.L
Center for Advanced Data Science, Vellore Institute of Technology, Chennai
Abstract
Automatic parallelization improves the performance of serial program by auto-matically converting to parallel program. Automatic parallelization typicallyworks in three phases: check for data dependencies in the input program,perform transformations, and generate the parallel code for target machine.Though automatic parallelization is beneficial, it is not done as a part of com-piling process because of the time complexity of the data dependence tests andtransformation techniques. Data dependencies arise because of data access frommemory required for the execution of instructions of the program. In a program,memory is allocated for variables like scalars, arrays and pointers. As of now,different techniques are used to identify data dependencies in scalars, arraysand pointers in a program. In this paper, we propose a graph based DataDependence Identifier (DDI), which is capable of identifying all types of datadependencies that arise in all types of variables, in polynomial time. In our pro-posed DDI model, for identifying data dependence in a program, we represent aprogram as graph. Though many graphical representation of program exist, ourapproach of representing a program as graph takes a different approach. Alsousing our DDI model, one can perform basic transformations like dead codeelimination, constant propagation, and induction variable detection.
Keywords:
Automatic Parallelization, Parallelizing Compilers, DataDependence
Preprint submitted to Journal of L A TEX Templates February 19, 2021 a r X i v : . [ c s . D C ] F e b . Introduction Multicore processor, a chip with two or more processors, has completelyreplaced single core processor in personal computers. This development has putforth a challenge in the effective utilization of the processing power of multicoresystems. Advancement in hardware technology always throws a challenge to thesoftware community in the better utilization of the former.With multiple processes running on a personal computer, each process ismade to run on each core which enhances the throughput of the system. Butstill, a single program can not utilize the processing power of multicore systemsuntil unless it is converted to a parallel program. Converting a serial program toparallel involves breaking down the instructions of a serial program to multiplegroups such that these groups can be executed in parallel.Serial to parallel program conversion can be accomplished in two ways: man-ual or automatic. Manual conversion demands the programmer to have an in-sight in parallel architectures and parallel programming models, a quiet difficulttask for programmer. Automatic conversion, known as automatic paralleliza-tion, is a much desired choice.A compiler which performs automatic parallelization is typically called asParallelizing Compiler. A Parallelizing Compiler generally works in three phases:(1) Performs data dependence analysis on the source program. (2) Do trans-formations to remove dependencies in order to identify potential parallelism.(3) Generate parallel code appropriate to the target machine. Data dependenceanalysis plays a crucial role in deciding whether a program can be parallelizedor not. Tests to identify data dependence in a program are discussed in Section2. Though multicore computers have become a household entity, automaticparallelization has not achieved such pinnacle. Automatic parallelization is nota part of traditional compilers. One reason is due to the high time-complexityof data dependence tests. Two statements in a program are said to be data2ependent, if both the statements access same memory location i.e., a valuecomputed in a statement is used by other statement and vice versa. Typi-cally, in a program memory is allocated only using scalar variables, arrays, andpointers. In practice, there are many tests to identify different types of data de-pendencies(such as scalars, arrays and pointers) based on different approaches.As such there is no unique test to identify all kinds of data dependencies in aprogram. In this paper, we propose a unique model called Graph based DataDependence Identifier (DDI) that performs data dependence testing of scalars,arrays and pointers.
2. Related Works
Two instructions in a program are said to be data dependent if both theinstructions access a common memory location one atleast for write operation.In general, programs use scalar variables, arrays, and pointers to allocate mem-ory. Data dependence analysis has been studied in all the three ways of memoryallocation independently and extensively.
A compiler other than translating a high level language program to machineunderstandable language, was further extended to optimize the program. Inorder to optimize a program, program has been represented as directed graph,to understand the data relationships between the statements in a given program.This concept was introduced by Allen [17]. To represent a program as graph,program statements are considered as nodes and control flow between thesestatements as edges. Many such graphs to analyze the data flow in a programwere introduced by Kennedy [18], Ullman [19].This work was further extended to identify and reduce data dependenciesin a program. By eliminating data dependencies, program statements can beexecuted in parallel. The initial work on data dependence elimination was re-stricted to scalar variables in the program. Kuck [2] and Ferrante [3] introduced3he concept of data dependence reduction by representing a program as depen-dence graph.To represent a program as graph Kuck [2] considered, program components(assignment statements, for loop header, while loop header) as nodes. If datacomputed in a program component c c c c
2, which shows existence ofdata dependence between c c
2. Five such dependencies are possible: loop,flow, output, input, and anti dependencies. The main objective of Kuck’s workis to reduce the dependencies between the scalar variables in order to identifypotential parallelism there by reducing program’s execution time.Ferrante [3] extended Kuck’s model by representing a program as pair ofgraphs (data flow graph and control flow graph) together called as ProgramDependence Graph(PDG). Here program statements are considered as nodes,data flow between the program statements are considered as edges in Data flowgraph, control flow between the program statements are considered as edges inControl flow graph. Many program transformation techniques like code motion,loop fusion are performed using PDG.Though Kuck and Ferrante models help to reduce data dependencies inscalars, dependence that exist between the iterations of the loop (loop carrieddependence or array dependences) and dependence due to pointers are not dealthere. Data flow analysis techniques used to identify and reduce data dependen-cies in scalars are discussed in [4].
The problem of identifying data dependence in arrays is formulated usinglinear equations and inequalities derived from loop subscripts and loop bounds.Many such tests have been proposed, the first one being the GCD Test[5]. GCDTest considers the linear equations derived from loop subscripts as linear Dio-phantine equations and solves them using extended Euclid’s algorithm. If aninteger solution exists then the test proves the existence of data dependence.Banerjee’s test is an extension of GCD Test that adds loop bounds as linear4nequalities and solves the linear equations using Intermediate value theoremto check if any real solutions exists. Both GCD test and Banerjee test are ap-proximate methods as they converge to ’dependence exists’ in case if the test isinconclusive. I-Test is an extension of Banerjee’s tests. It is based on the obser-vation that real solutions predicted by banerjee’s test are integer solutions[6]. Allthese tests, in case of multidimensional arrays, solves each subscript expressionindependently to check whether an integer solution exists or not. Omega test[8]uses Fourier Motzkin Variable Elimination(FMVE) method to solve the systemof linear equations and inequalities for an integer solution.This test is more accu-rate but has the worst case exponential time complexity. Range test[14] checksif a data dependence exists if the loop subscripts are non-linear. NLVI Test [15]solves the non-linear and symbolic expressions derived from loop subscripts.This test experimentally proves that it can handle complex loop bounds. QPTest[13] proves that the non-linear loop subscripts that are in quadratic formcan be solved using quadratic programming. Many other dependence tests existin the literature[16], [21], [20].
Solving data dependence in pointers is considered as separate research area.Alias analysis[24] and shape analysis[23] are two popular techniques used toparallelize loops with pointers. [22] transforms the code with pointers to asimpler form.
SUIF[25] and Polaris[26] were the earliest tools designed for SymmetricMultiprocessor Systems, convert serial FORTRAN program to parallel form.Cetus[27] a successor of Polaris, coverts C programs to parallel and is designedfor multicore systems. Intel compiler [28] automatically identifies the loops thatcan be parallelized and partitions the data accordingly. Any automatic paral-lelization tool has to perform data dependence testing to convert serial program5o parallel program. All the tools mentioned here, make use of data flow analy-sis for identifying dependencies in scalars, linear equation formulation for arraydependencies, alias analysis techniques for pointer dependence. As of now, allthe existing tools, make use of different techniques to identify data dependencein scalars, arrays and pointers. As such there is no single model with which onecan identify all types of data dependencies.The main objective of the paper is to design a graph based model calledData Dependence Identifier (DDI) with which one can identify any type of datadependence in polynomial time. In contrast to Kuck and Ferrante model ofrepresenting a program as graph where ‘program statements’are considered asnodes, our model takes a different approach, we consider variables in the pro-gram as nodes and the edges between these variables are drawn based on themode of accessing the variables from memory. With this novel approach of rep-resenting a program as graph, we could identify all types of data dependenciesin polynomial time. In addition our model can also perform compiler opti-mizations such as dead code elimination, constant propagation, and inductionvariable detection.
3. Parameterization of program
As mentioned earlier, our model Data Dependence Identifier(DDI) can iden-tify all types of data dependencies that exist in a program. Since the coreconcept behind our DDI model, is the graphical representation of a programwith different approach. For that purpose we view a program as a structurewith finite number of components called as parameters of the program. In thissection, we discuss in detail about the parameterization of the program.
A program is a sequence of step-by-step instructions when executed givesthe desired output. It is a combination of instructions and data. Data is as-signed with the help of variables initialized in the program. An instruction in6 program, irrespective of the programming language, will fall into any one ofthe following types:
Assignment Instructions : In an assignment instruction, value is assigned intwo different ways: a constant can be assigned to a variable, for example a = 5.A value stored in a variable can be assigned to another variable, for example a = b . Arithmetic Instructions : In an arithmetic instruction, input is read from oneor more variables or from a programmer and the output is assigned to a vari-able after performing the required arithmetic operations. Example: a = b + c , a = b + 3, a = b + c + a , a = c − b ∗ Conditional instructions : Conditional instructions reads data from one ormore variables or constant values, performs logical operations and decides eitherTrue or False. In conditional instructions, output is not necessarily assigned toany variable .
Iterative Instructions : Iterative instructions will execute repeatedly untilthe specified condition fails .
Control transfer Instructions : These instructions switches the control fromone instruction to another. For example, instructions like break, continue, goto,jump etc are control transfer instructions.
Input Instructions : Input instructions reads the data from input devicesand writes to memory. In example: ‘
Read a, b (cid:48) , data is read from programmerthrough the input device and stored in memory location a , b respectively. Output Instructions : Output instructions reads the data from memory andwrites to an output device. Example: print ( a ). Function calls : Function calls are a request made to another routine thatexecutes a predetermined task.
Knowing all the types of instructions with which a computer program ismade, one could observe that, some instructions access the memory to read datafrom a variable, some instructions access the memory to write data to a variable.7ome instructions access the memory for both reading a data and for writingthe data. Some instructions do not access the memory at all. Based on the wayan instruction accesses the memory , we categorize an instruction as MemoryAccess Instruction (MAI) and Non Memory Access Instruction (NMAI).
Memory Access Instruction (MAI) : Instructions that access the memoryto perform the required operation. For example: arithmetic, conditional, input,output instructions are MAI.
Non Memory Access Instruction (NMAI) : Instructions that do not accessthe memory at all. For example, control transfer instructions like break, jumpfall under this category.In MAI, some instructions access the memory for read operation alone, somefor write operation alone, some for both read and write. Based on the oper-ations, performed by MAI on memory, these instructions are further classifiedinto three categories : MA-READ, MA-WRITE, MA-READWRITE.
MA-READWRITE(MARW) : Instructions that access the memory for bothread as well as write operations come under this category. For example, in Arith-metic instruction: ‘ c = a + b (cid:48) , data is read from memory locations a and b andwritten to a memory location c . In assignment instruction: ‘ a = b (cid:48) , data is readfrom memory location b and written to a . MA-READ(MAR) : Instructions that perform only read operation but nowrite operation are classified as MA-READ. For example, in conditional in-struction: ‘ if ( a > b ) (cid:48) data is only read from memory locations a and b but theoutput is not written to any variable. In the output instruction: print ( a ), datais read from memory location a and sent to an output device.In conditional instruction: if ( a > b ), data is read from memory locations a, b and sent to processor for further computation. In instruction print ( a ), data isread from memory and send to an output device. In general, data is read frommemory and send to other Hardware Units(HU) in the computer system likeprocessor or output devices. MA-WRITE(MAW) : Instructions that perform only write operation but noread operation are classified as MA-WRITE. For example, in assignment in-8truction ‘ a = 5 (cid:48) , a constant value is written to a memory location a . Theinstruction a = 5 conveys that the programmer is writing 5 into the memorylocation a . a = 5 differs from the instruction a = b where data is read fromlocation b and written to location a . In that sense, we deem that the instruction a = 5 means that the constant 5 is read from the programmer(PR) and writtento the location a . In the input instruction ‘ read a, b (cid:48) data is read from inputdevice and written to memory locations a and b . The following diagram sum-marizes the classification of instructions based on the memory access behavior. As mentioned, typically every MARW instruction will read the data frommore than one memory location and write the data into a memory location. Thisgave an idea that one can represent a MARW instruction i as an ordered pairof set of variables ( R, W ) where R is a set that contains all the variables fromwhich the instruction i reads the data and W is a set with a single variable towhich i writes the data. Eventually, this representation of MAI as ordered pairof set of variables can be applied to MAR instructions and MAW instructionsalso. A MAR instruction i can be thought of [ R, HU ] where data is read fromvariables in R and written to a hardware unit HU . A MAW instruction i can bethought of [ HU, W ] or [
P R, W ] where data is read from either programmer(PR)or Hardware Unit(HU) and written to a variable in W. For example, • A MA-READWRITE instruction ‘ c = a + b (cid:48) is written as pair [ { a, b } , { c } ].Data is read from variables a , b and output is written to c .9 MA-READ instruction ‘ if ( a > b ) (cid:48) is represented as [ { a, b } , { HU } ]. Datais read from a , b and send to a Hardware Unit(HU). • MA-WRITE instruction ‘ a = 5 (cid:48) is represented as [ { P R } , { a } ]. A valueinitialized in the instruction by the programmer(PR) is read and writtento variable a .Thus we conclude that every MA instruction can be represented as pair of setof variables ( R, W ). Thus, every program P is a collection of finite set of instructions I = { i , i , ...i n } , which operate on the data stored in finite set of memory allocations V = { v , v , ...v m } . M AI is a finite set which contains all MA Instructions where
M AI ⊆ I . Based on the discussion so far, any program P can be parameterizedwith I, V, W, HU, P R . Accordingly we write P as P ( I, V ∪ {
HU, P R } , M AI ).We consider HU and P R as fixed variables as data is read from and writtento HU and data is read from P R alone. Thus, we conclude that a program ismade up of all these following components: • Set I , finite set of instructions { i , i , ...i n } • Set V , finite set of memory allocations or variables { v , v , ...v p } • Set
M AI , finite set of MA instructions where
M AI ⊆ I . An instruction i ∈ M AI can be written as ordered pair [
R, W ] where
R, W ⊆ V . • HU represents the set of hardware units i.e. input devices, output devices,processor and any other hardware unit in the computer system. • P R is the set of constant values initialized in the program P by the pro-grammer. 10 . Directed graph Representation of a program
As mentioned earlier, main aim of the paper is to automate the identificationof data dependencies with which one can decide whether a sequential programcan be parallelized or not. For that purpose, we first transform a program P toan equivalent directed graph called graph of P written as G P . Labeled Directed Graph : A labeled directed graph G=(N,E,L) where N isthe finite set of nodes { n , n , ...n k } . E = { e ij = ( n i , n j ) | n i , n j ∈ N } , where thepair ( n i , n j ) indicates a directed edge that starts from node n i and terminates at n j . A mapping L : E → S i.e. every edge is assigned a label from the elementsof S, called the set of labels.All instructions in a given program P are indexed sequentially with the pos-itive integers 1 , , ...n . First instruction in a program is indexed as 1, secondinstruction as 2 and so on. For i n ∈ I , we call index of i n = n . We knowthat every instruction i n can be written as the pair [ R, W ]. In other words, index [ i n ] = index ([ R, W ]) = n .A program P = ( I, V ∪ {
P R, HU } , M AI ) is transformed into a directed labeledgraph G P = ( V ∪ { P R, HU } , E, L ) as follows: • Set of nodes of G P are the set of variables V ∪ { P R, HU } . • For every ordered pair of sets (
R, W ) ∈ M AI , we include the edges { [ r, w ]) |∀ r ∈ R, w ∈ W } . • Every edge in G P is labeled with elements from label set S which containsindices of instructions in I. L : E → { , , ...n } such that L (( r, w )) = k if index ([ R, W ]) = k such that ( r, w ) ∈ E, r ∈ R, w ∈ W .We use the notation ( ., . ) to represent the edges of the graph and [ ., . ] indicatesthe pair of sets R,W for representing memory access instructions.In example 1, I = { i , i , i , i } , V = { a, b, c, d } and MAI=I as all instruc-tions in program P are memory access instructions. To construct G P , V actsas nodes N. For instruction 1 : [ { a, b } , { c } ], we include the edges ( a, c ) and11 b, c ) with labels L (( a, c )) = 1 and L (( b, c )) = 1 are added to G P . For in-struction 2 : [ { a, P R } , { d } ], edges ( a, d ) and ( P R, d ) with labels L (( a, d )) =2 and L (( P R, d )) = 2 are added. For instruction 3 : [ { c, d } , { HU } ], edgeswith label L (( a, HU )) = 3 and L (( d, HU )) = 3 are added. For instruction4 : [ { a, P R } , { b } ], edges with label L (( a, b )) = 4 and L (( P R, b )) = 4 are addedto G P . Example 1(a) (b) (c)v o i d f u n c 1 ( i n t a , i n t b ) { − > d )4 : b=a+10 } bac dHU PR1 1 23 3 42 4 a b c d PR HUa 4 1 2b 1c 3d 3PR 4 2HU The adjacency matrix of graph G P is shown in example 1(c), rows gives the read information about the variables and columns gives the write informa-tion. Scanning column c of the matrix tells that variable c is accessed for ‘ write (cid:48) in instruction 1 and row of c shows variable c is accessed for ‘ Read (cid:48) in instruc-tion 3.The procedure by which we convert P ( I, V ∪ {
P R, HU } , W ) into a simple edgelabeled graph G p ( N ∪ { P R, HU } , E, L ) is discussed in algorithm 1. Graph G p is represented using adjacency matrix. The procedure works as follows. Loopin line 2 takes each instruction in a chronological order and line 3 calls a MAI-verification procedure that checks whether an instruction is a Memory AccessInstruction (MAI) or not. If instruction i k : [ R, W ] is MAI, then for each r ∈ R lgorithm 1 Convert Program P to Directed edge-labeled graph G P procedure Program to Graph
Input:
Program P ( I, V ∪ {
P R, HU } ) Output:
Graph G p ( N ∪ { P R, HU } , E, L ) for each instruction [ R, W ] ∈ I, index [ R, W ] = k do if MAI-verification() then for every r ∈ R and w ∈ W do E = E ∪ { ( r, w ) } L ([ r, w ]) = k an edge ( r, w ) with label k is added to graph G p , shown in lines 4-6.The running time of this algorithm depends on two components: procedure tocheck whether an instruction is MAI or not and number of edges added to G p .MAI-verification procedure is called for every instruction of I. Every program-ming language, will have few keywords, say jump, break, exit etc, such that thepresence of these keywords in an instruction does not require the access of mem-ory. Let ‘ m (cid:48) be the number of keywords in a programming language which doesnot require any memory access. ‘ m (cid:48) is a constant for a specific programminglanguage. If each MAI-verification procedure call has to go through maximumof m comparisons to conclude the category of an instruction and there are n instructions in a program, procedure takes O ( n ). For every instruction, set ofedges are added to G p so totally for all instructions E edges are appended to G P , adding each edge takes a constant time, E edges take O ( E ). Theorem 1.
Given a program P, there exists an unique simple edge labeledgraph G P that corresponds to P .Proof. Suppose there corresponds two graphs G (cid:48) P and G (cid:48)(cid:48) P to the program P.Let G (cid:48) P = ( V (cid:48) ∪ { P R, HU } , E (cid:48) , L (cid:48) ) G (cid:48)(cid:48) P = ( V (cid:48)(cid:48) ∪ { P R, HU } , E (cid:48)(cid:48) , L (cid:48)(cid:48) )First we show that, If a ∈ V (cid:48) then a ∈ V (cid:48)(cid:48) and vice versa.Let a ∈ V (cid:48) , 13 ⇒ a is a vertex in G (cid:48) P that corresponds to program P. ⇐⇒ a is a variable in program P. ⇐⇒ a is a vertex in G (cid:48)(cid:48) P since G (cid:48)(cid:48) P is a graph that corresponds to program P.= ⇒ a ∈ V (cid:48)(cid:48) Therefore, all the elements of V (cid:48) are also in V (cid:48)(cid:48) .= ⇒ All the elements of V (cid:48) ∪ { P R, HU } are also in V (cid:48)(cid:48) ∪ { P R, HU } .Hence, number of elements in V (cid:48) ∪ { P R, HU } is same as number of elements in V (cid:48)(cid:48) ∪ { P R, HU } . (1)Now, we show that G (cid:48) P is isomorphic to G (cid:48)(cid:48) P .1. Consider the identity function f : V (cid:48) ∪ { P R, HU } → V (cid:48)(cid:48) ∪ { P R, HU } suchthat f ( a ) = a ,where a ∈ V (cid:48) , f ( P R ) =
P R and f ( HU ) = HU .By equation 1, f is bijective since f is identity function.2. If ( a (cid:48) , b (cid:48) ) ∈ V (cid:48) then ( f ( a (cid:48) ) , f ( b (cid:48) )) ∈ V (cid:48)(cid:48) Let ( a (cid:48) , b (cid:48) ) ∈ V (cid:48) , there is an instruction in P such that data is read from b (cid:48) and written in the location a (cid:48) .= ⇒ There is an edge ( a (cid:48) , b (cid:48) ) ∈ V (cid:48)(cid:48) since G (cid:48)(cid:48) P is the graph that correspondsto P.= ⇒ the edge ( f ( a (cid:48) ) , f ( b (cid:48) )) ∈ V (cid:48)(cid:48) .Therefore, G (cid:48) P ∼ = G (cid:48)(cid:48) P
5. Data Dependence Identifier
Two instructions i j and i k in a program are said to be data dependent ifboth access the same memory location one for ‘ read (cid:48) and the other for ‘ write (cid:48) .There are four types of data dependencies [9],[10]: • Flow dependence exists between two instructions i j and i k if i j writesto a memory location that i k reads later.For example, in the code givenbelow: I : c=a+b ; 14 : d=c − b ;Instruction I uses the value computed by I . This is called as flow de-pendence as the data flows from I to I . • Anti dependence exists between two instructions i j and i k if i j readsfrom a memory location that i k writes later. In the code given below, I : c=a+b ; I : a=b+3;Instruction I computes a , I has read the old value of a before I . Thisis called as anti dependence. • Output dependence exists between two instructions i j and i k if i j writesto a memory location that i k also writes later. • Input dependence exists between two instructions i j and i k if i j readsfrom a memory location that i k also reads later.Data dependence testing is a process to identify all such dependencies in agiven program. In this section, we discuss the process by which the graph G P identifies the data dependencies in a given program P. For this reason, we callour G P as DDI. Algorithm 2 illustrates the process how DDI identifies whetherdata dependence among the instructions exists in a program. The input tothe algorithm is graph G p of program P as discussed in previous section. Thealgorithm outputs the type of data dependence that exists.For every node v ∈ N.G p (node set of G P ) the following condition is checkedin lines 3-7: whether v has an incoming edge ( u, v ) and an outgoing edge ( v, u (cid:48) ).The existence of these edges (( u, v ) , ( v, u (cid:48) )) confirm that data dependence existsbetween the respective instructions since variable v is accessed for both ‘ Read (cid:48) and ‘
W rite (cid:48) . An incoming edge to v with label k interprets memory location v is accessed for ‘ W rite (cid:48) in instruction number k and outgoing edge with label j from v interprets memory location v is accessed for ‘ Read (cid:48) in instruction number j . 15 lgorithm 2 Identification of Data Dependencies in a program procedure Data Dependencies
Input:
Graph G p ( N, E, L ) for every v ∈ N.G p do if ( L (( u, v )) (cid:54) = N U LL )&&( L (( v, u )) (cid:54) = N U LL ) then if ( L (( u, v )) < ( L (( v, u )) then Flow Dependence exists else if ( L (( u, v )) > ( L (( v, u )) then Anti dependence exists if ( L (( u, v )) = k )&&( L (( u , v )) = j ) then if k (cid:54) = j then Output dependence exists if ( L (( v, u )) = k )&&( L (( v, u )) = j ) then if k (cid:54) = j then Input dependence existsBased on the labels of the edges: L ( u, v ) and L ( v, u (cid:48) ), we identify the nature ofdata dependence.1. Flow dependence: If L (( u, v )) < L (( v, u (cid:48) )) means that v is accessed for‘ W rite (cid:48) first and then for ‘
Read (cid:48) .2.
Anti dependence: If L (( u, v )) > L (( v, u (cid:48) )) means that v is accessed for‘ Read (cid:48) first and then for ‘
W rite (cid:48) .3.
Output dependence:
If there exists edges L (( u, v )) = j and L (( u (cid:48) , v )) = k means that v is accessed for ‘ W rite (cid:48) in instructions j and k .4. Input dependence:
If there exists edges L (( v, u )) = j and L (( v, u (cid:48) )) = k means that v is accessed for ‘ Read (cid:48) in instructions j and k .The running time of this algorithm is O ( | N | ) as the entire row and column ofeach variable v has to be scanned, where N is the number of number of variablesused in the program. For the correctness of the above algorithm we prove thefollowing theorem. 16 heorem 2. Let P be a program and G P be the corresponding graph of P. If G P has a path joining the vertices v i l −→ v j m −→ v k , v j / ∈ { P R, HU } , then thepair of instructions ( i l , i m ) has the data dependence problem.Proof. A pair of instructions in a program will have the data dependence if bothinstructions access the same memory location for both the operations : read andwrite.Let G P has a path joining v i and v k , of length 2 such that v i l −→ v j m −→ v k i.e.,there is an edge ( v i , v j ) ∈ E with the label l and an edge ( v j , v k ) ∈ E with label m .1. ( v i , v j ) ∈ E with label l = ⇒ l th instruction of P instructs that thememory location v i is accessed for ‘Read’ and v j is accessed for ‘Write’.2. ( v j , v k ) ∈ E with label m = ⇒ m th instruction of P instructs that thememory location v j is accessed for ‘Read’ and v k is accessed for ‘Write’.1 and 2 = ⇒ the l th instruction accesses v j for ‘Write’ and the m th instructionaccesses v j for ‘Read’.= ⇒ l th and m th instruction of P accesses the same memory location for both‘Read’ and ‘Write’.= ⇒ Instructions ( i l , i m ) has the data dependence. Corollary 2.1. If l < m then the instructions ( i l , i m ) have flow dependence. Corollary 2.2. If l > m then the instructions ( i l , i m ) have anti dependence. Corollary 2.3. If L (( v i , v j )) = l and L (( v k , v j )) = m and then the instructions ( i l , i m ) have output dependence. Corollary 2.4. If L (( v j , v i )) = l and L (( v j , v k )) = m and then the instructions ( i l , i m ) have input dependence. Since the proof of corollary are immediate, we have not described the proof.
Note 1:
In a path of length 2: v i l −→ v j m −→ v k , and v j ∈ HU , then the pair ofinstructions ( I l , I m ) need not have the data dependence since HU is the vertexused for Read/Write from the hardware devices. Two different instructions with17ne instruction reading from a device and another instruction writing into thedevice may not have any link at all. For example,instructions print a and Readc do not have any connection among them. Note 2: If G P has a path of length k, with the sequence of labels of theedges of the path as: l , l , l , ...l k . Then all the possible pairs of instructions I l , I l , I l , ...I l k has the data dependence. One of the fundamental and crucial job of data dependence identifier is todecide whether an array can be parallelized or not i.e., whether the elements inan array can be split into groups. Generally, loops are used to traverse elementsin an array. If each index position of the array is accessed by each instance ofthe loop, then there exists no data dependence in that array. A variable(indexposition of an array) accessed for ‘Write’ in one instance of execution is ‘Read’in other instance of loop by other instructions or vice versa, data flows betweeniterations of loop. If flow dependence among different instances of executionof loop exist, such data dependence is called loop carried dependence or arraydependence . A loop can not be parallelized if such dependence exists.In this section, we discuss how our model DDI, solves the complicated problemof identifying data dependencies in arrays with the help of an example. Data Dependence in One Dimensional Arrays
With our DDI, we can easily identify the data dependence in loops. As usual,the indices of array are considered as individual variables.Consider the program in Example 2. Normally a sequential loop executes in anincreasing order of index numbers following the order of instructions. Here thereare three instructions 4, 5, 6 and the loop indexing goes from 2 to 4. Considerthree instances of execution:4 . . . . . . . . . i.k , 1 ≤ i ≤ n , where i representsthe instruction number and k represents the instance of execution.We convert the loop instructions of P to G P . In example 2, I = { , , } , V = { a [0] , c [0] , a [1] , b [1] , c [1] , a [2] , b [2] , c [2] , a [3] , b [3] , c [3] , a [4] , b [4] , c [4] , a [5] } , M AI = I as all instructions are memory access instructions. To construct G P , all vari-ables in V acts as nodes N . For instruction 4 . { b [1] , c [1] } , { a [1] } ], edgeswith label L (([ b [1] , a [1]])) = 4 . L (( c [1] , a [1])) = 4 . G P . Forinstruction 5 . { a [0] , c [0] } , { a [2] } ], edges with label L (( a [0] , a [2])) = 5 . L (( c [0] , a [2])) = 5 . G P . Similarly for all instances of execution,edges are added to G P as shown in example 2(b).In the above given instances of execution, variable a [1] computed in the instruc-tion 4 . .
2. This is shown in G P as edges ( b [2] , a [2]) and ( a [2] , a [4]), which is a flow dependence. Similarly,variable a [3] computed in 5 . .
3. This is shown in G P as edges( c [1] , a [3]) and ( a [3] , a [5]). 19xample 2(a) (b)v o i d add ( ) { f o r ( I (cid:122) (cid:125)(cid:124) (cid:123) i = 2 ; I (cid:122) (cid:125)(cid:124) (cid:123) i < I (cid:122) (cid:125)(cid:124) (cid:123) i + + )4 : a [ i ]=b [ i ]+ c [ i ] ;5 : a [ i +1]=a [ i − − − } a[2]b[2] c[2]b[3] a[3] a[1]c[1] c[3]a[4] b[4]c[4] a[5]4 . . . . . . . . . . . . . . . flow dependence in G if for node v ∈ V in G . If thereis an edge L (( u, v )) = s.i and an edge L (( v, u ‘ )) = t.j , where j > i . m ≤ i, j ≤ n where m, n represent the loop bounds. s, t are the indices ofloop instructions.2. There exists Anti dependence in G if for node v ∈ V in G . If thereis an edge L (( u, v )) = s.i and an edge L (( v, u ‘ )) = t.j , where j < i . m ≤ i, j ≤ n , where m, n represent the loop bounds. s, t are the indicesof loop instructions.3. There exists output dependence in G if for node v ∈ V in G . If there aretwo edges L (( u, v )) = s.i and L (( u ‘ , v )) = t.j where t (cid:54) = s and j (cid:54) = i .4. There exists Input dependence in G if for node v ∈ V in G . If there are20wo edges L (( v, u )) = s.i and L (( v, u ‘ )) = t.j where t (cid:54) = s and j (cid:54) = i . Data Dependence in Two Dimensional Arrays
The concept discussed for single loops is extended to nested loops. Example 3illustrates the identification of data dependencies in a program with two dimen-sional arrays. Example 3(a) (b)v o i d add ( ) { f o r ( i =1; i <
3; i ++)f o r ( j =1; j <
3; j ++) L : a [ i ] [ j ]= c [ i ] [ j − L : c [ i ] [ j ]=a [ i − } a[1][1]a[1][2]a[2][1] a[2][2]c[1][1] a[0][1]c[2][1]c[2][2] c[0][0] c[1][2]c[0][1]a[0][2]c[1][0] L . L . L . L . L . L . L . L . Scalars are variables used to store a single data. Data dependence informa-tion of scalar variables will help to group the program instructions that can beexecuted in parallel. Identifying data dependencies in scalar variables present inloops will help to determine whether a loop can be parallelized or not. Here, wediscuss the process by which our DDI model will identify the scalar dependencies21resent in loops. Example 4(a) (b)v o i d add ( ) { (cid:122) (cid:125)(cid:124) (cid:123) i = 1 ; (cid:122) (cid:125)(cid:124) (cid:123) i < (cid:122) (cid:125)(cid:124) (cid:123) i + + )5 : s=c+i ; ;6 : c=s ; } cs i PR15 . . . . . . v , if thereexists an incoming as well as outgoing edge with labels of same loop instancethen the flow dependence exists in a scalar variable in a loop. In Example 4,variables c and s are accessed for ‘Read’ and ‘Write’ in the same loop instance. Many data dependence tests [6],[5],[8],[7] has been proposed for arrays. Al-most all the tests are based on subscript analysis. If the array subscripts arelinear, they are converted to the form linear equations and inequalities. Theproblem is about solving these linear equations and inequalities to check whetheran integer solution exists. Existence of integer solution proves data dependence.The problem is more complicated in case of multidimensional arrays as it endsup in solving system of linear equations and inequalities.There is always a trade-off between accuracy and complexity in the existingdata dependence tests. Table 1 shows the experimental results from [11] on per-fect benchmarks [12]. The table displays the total number of data dependenceproblems, number of problems proven to be independent i.e., no data depen-dence exist by each test, number of problems where the test could not conclude22erfect BenchmarksTest Data DependenceProblems Independent May or maynot be de-pendent Time(msec)Banerjee 59936 17946(30%) 41990(70%) 8I-Test 59936 17970(30%) 37827(63%) 8Omega Test 59936 21232(35%) 32239(54%) 177
Table 1: Comparison of Data Dependence Tests w.r.t accuracy and time whether data dependence exists or not, average time taken by each test per de-pendence problem to conclude whether data dependence exists or not. Omegatest was able to prove that 35% of the problems are not data dependent, whereas Banerjee and I Test were able to do it for 30% alone. On the other handOmega test took 177ms on average per dependence problem where as Banerjeeand I Test took only 8ms to perform dependence testing.We have theoretically proved(Theorem 2) that our DDI model can identifyall kinds of data dependencies in a program in polynomial time with out anyerror.
6. Basic Transformation Techniques
Basic transformations like constant propagation, dead code elimination andinduction variable detection are performed on a program to optimize it. In thissection, we will discuss how these transformations can be performed using ourDDI model.
Dead code refers to the variables whose data is never used in the program.Removing such code in the program reduces the program’s size. All compilersperform dead code elimination as a part of compiler optimization. Here we will23iscuss how our model identifies and eliminates dead code in a given program.The following scenarios depict the presence of dead code in a given program: • A variable has been initialized but it is never used in the program’s exe-cution. • Data computed in the program, but it is never been used to get the finaloutput.The following characteristics in the graph shows the existence of dead code:1. For a node u , there are no edges ( v, u ) and ( u, u ‘ ) where v, u ‘ ∈ N.G P .2. For a node u , there are edges ( v, u ) but no edges ( u, u ‘ ) where v, u ‘ ∈ N.G P .3. For a node u , there are edges where L (( v, u )) = i , L (( v, u )) = j and L (( u, u ‘ )) = k but no edges where L (( u, u ‘ )) > i and < j , v, u ‘ ∈ N.G P .4. For a node u , there are edges where L (( v, u )) = i , L (( v, u )) = j and L (( u, u ‘ )) = k but no edges where L (( u, u ‘ )) > j , v, u ‘ ∈ N.G P .Algorithm illustrates the process of dead code elimination. Algorithm 3
Dead code Elimination procedure Dead code Elimination
Input:
Graph G p ( N, E, L ) for every v ∈ N.G do if ( L (( u, v )) == k )&&( L (( v, u )) == N U LL ) then remove k if ( L (( u, v )) == k )&&( L (( v, u )) < k ) then remove k if ( L (( u, v )) == k )&& L (( u, v )) == m )&&( L (( v, u )) > k &&
4. Similar structure can be observedin the graph that variable a have incoming edges with labels 2 and 4 but nooutgoing edges in between 2 and 4. Instruction 2 can be removed.The running time of this algorithm is O ( | N | ) as the entire row and columnof each variable v has to be scanned. In a program, if a variable is assigned with a constant value, references tothis variable in the program can be replaced directly with the constant value.This technique is called as constant propagation. Here, we will discuss how thistechnique is designed using our model.Constant Propagation requires 1) identification of variable assigned withconstant value. 2) Replacing the variable directly with the constant value where26 lgorithm 4
Constant Propagation procedure Constant Propagation
Input:
Graph G p ( N, E, L ) for every v ∈ N.G do if ( L (( P R, v )) == k )&&( L (( u , v )) == N U LL ) then for every L (( v, u )) = j do add edge L (( P R, u )) = k delete edge L (( P R, v )) = k and L (( v, u )) = j if ( L (( P R, v )) == k )&&( L (( u , v )) (cid:54) = N U LL ) then for every L (( v, u )) = j where j > k and < m do add edge L (( P R, u )) = k delete edge L (( P R, v )) = k and L (( v, u )) = j it has been referenced in the followed up instructions of the program.In a given instruction I : [ R, W ], a constant value is assigned to a variable if r = P R i.e. R is having only one variable PR, which represents constant value.In the graph, for a node v , if there is an incoming edge from node PR with label k and no other incoming edges to node v have the same label k , it says that ininstruction k a constant value is assigned to a variable.Lines 3-6 of algorithm 4, identifies a node u where L ( P R, u ) = k and noother incoming edges exists to u . For every outgoing edge of u i.e. L (( u, v )) = j ,where j > k , add edge L (( P R, v )) = j and remove edges L (( P R, u )) = k and L (( u, v )) = j in G p . In example 7, in instruction 1, b is assigned with a constantvalue 3. Variable b is read in instruction 2 which can be directly replaced with3. Lines 7-10 of algorithm, identifies a node u where L (( P R, u )) = k and otherincoming edges L (( u , u )) = m exists to u . For every outgoing edge of u i.e. L (( u, v )) = j , where j > k and j < m , add edge L (( P R, v )) = j and removeedges L (( P R, u )) = k and L (( u, v )) = j in G p .27xample 7(a) (b)v o i d f u n c 1 ( i n t a , i n t b ) { } ba cHU PR12 3 42 An induction variable is a variable in loop whose value either incrementsor decrements constantly in every iteration. There are two types of inductionvariables: basic and derived. Basic induction variables are of the form x = x + c where c is constant value and x is basic induction variable. Derived inductionvariable is a variable defined in the loop which is a linear function of basicinduction variable. Here, we will discuss how our model detects basic andderived induction variables. 28xample 8(a) (b)v o i d add ( ) { (cid:122) (cid:125)(cid:124) (cid:123) i = 1 ; (cid:122) (cid:125)(cid:124) (cid:123) i < (cid:122) (cid:125)(cid:124) (cid:123) i + + )5 : s=s+i ; ;6 : c=s ∗ } cs i PRHU6 .
11 25 . . . v in graph G p , consists of self loop then v is a basic inductionvariable.For a node u , if there is an incoming edge from v i.e. the basic induction variablethen u is a derived induction variable.
7. Data Dependence in pointers
A pointer is a variable that stores the address of another variable. Identi-fying data dependencies in a program that uses pointers is solved using aliasanalysis[24] and shape analysis [23]. Here, we will discuss how to represent andidentify data dependencies in pointers using DDI model.29xample 9(a) (b)v o i d p o i n 1 ( i n t a , i n t b ) { ∗ p , c ;2 : p=&a ;3 : c= ∗ p+a ;4 : p r i n t p , ∗ p , c ; } aPR c pHU1 3 3 4 44 2Pointers are variables that stores the address of another variable, they arerepresented as usually as how variables are represented in our model. The edgefor the assignment statement, which assigns the address of a variable to pointervariable is represented using dashed line and added to graph. In example 8,pointer assignment statement p = & a is represented as dashed directed edge inthe graph with label 2.Data dependence exists if there is a directed path of length ≥ G p .The same theorem applies for pointers with the constraint that the dashed edgesshould not be included in the path.
8. Conclusion
Thus, in summary, we have proposed a graph based Data Dependence Identi-fier (DDI) which will identify all types of data dependencies that include scalars,arrays and pointers. Further, compiler optimization techniques like dead codeelimination, constant propagation, and induction variable detection can be per-formed using DDI. 30ome of the salient features of our work are:1. Before representing the program as graph, we parameterized the programwith components (
I, V ∪ {