Data Flow Analysis part 2

(Chapter 8)

Edited by ori dvir

Outline:

  • iterative data flow analysis
  • Example reaching definitions
  • Structural Analysis
  • Examples
  • Difficulties of structural analysis
  • Def-use, SSA, handling pointers.

Iterative Data flow analysis

This approach tries to solve an equation system in an iterative way:

In(ENTRY) = 

In(B) = B’  Pred(B)Out(B’)

Out(B) = FB(In(B))

Using:

Iterative data-flow algorithm

Input: a flow graph G=(N,E,r)

An init value init

A monotonic function FB for every B in N

Output : For every N its in(N)

Initialization: in(Enrty):=init

For each node Bin N-{Entry} do

In(B) := 

WL:= N-{Entry}

Iteration:

While WL != {}do

Select and remove a B from the WL

Out:=FB(in(B))

For each B’ in succ(B) such that in (B’) != in B’  out do

In (B’) := in (B’)  out

WL:=WL  {B’}

Note that init and bottom () are not necessarily the same. Although in the reaching definition example we gave them both the value of an empty group. We might want to give init a different value if we do a global optimization for example the definitions of the global variables.

In order to optimize the running time of the above algorithm it is possible to order the work set in such way that will minimize the number of iterations.

We give an example of ordering the WL according to reverse post order search of the control flow graph. The idea behind this is to visit first the predecessor before the node, this promises us that we will not visit a node (block) more times than its nesting level.

Running Example :

B0

B1

B2

B3

B4

B5

B6

B7

Fig 1 . Control Flow Graph

Example (contd.):

First we will order the work list according to reverse post order.

WL:= { 7,6,4,5,3,2,1,0 }

Below are the reaching definitions after each iteration.

B0 / B1 / B2 / B3 / B4 / B5 / B6 / B7
I0 /  /  / 2,3 / 2,3,5 / 2,3,5 / 2,3,5 / 2,3
I1 /  /  / 2,3 / 2,3,5,8,9,10,11 / 2,3,5 / 2,3,5 / 2,3
I2 /  /  / 2,3 / 2,3,5,8,9,10,11 / 2,3,5,8,9,10,11 / 2,3,5,8,9,10,11 / 2,3
I3 /  /  / 2,3 / 2,3,5,8,9,10,11 / 2,3,5,8,9,10,11 / 2,3,5,8,9,10,11 / 2,3 / 2,3,5,8,9,10,11

Some optimizations require a backward algorithm, for example live variables analysis.

Iterative Backward Data flow analysis:

Out(Exit) = init

Out(B) = B’  Succ(B)In(B’)

In(B) = FB(Out(B))

Structural analysis

The iterative analysis has some disadvantages, it may take more time, and when you change the code you need to rerun the algorithm. In this section we present a structural approach, one that tries to realize the programming structures in the code and operate directly on them. Once we capture the behavior of a few programming constructs we can move their computation to the compiler building time instead of at compile time. For this we need to describe the properties of functions that operate on a few blocks rather then a single one.

Lattices of Flow functions.

For a lattice L

1) LF are the monotonic functions from L to L

f LF x  y  f(x)  f(y)

2) LF is a lattice with the order f  g

For all z: f(z)  g(z)

F (z) 

(x F y) (z)  x(z)  y(z)

3) LF is closed under composition

(f  g) (z)  (f(g(z))

f0 id, fn  f0 fn-1

f*(z)  limn  (id  f)n (z)

Examples based on reaching definition functions FG,P :

FG,P (x) = G  (x  P)

The functions are monotonic:

x  y G  (x  P) G  (y  P)

Minimal member:

F, (x) = 

Unification of two blocks : (x F y) (z)  x(z)  y(z)

G1  (x  P1)  G2  (x  P2) = (G1  G2)  (x  (P1  P2))

Compostion:

(f  g) (z)  (f(g(z))

FG1,P1FG2,P2 = FG1,P1(FG2,P2(x) )

= G1  ( (G2  (x  P2))  P1)

Structural Data-Flow analysis

  • Phase 1: Compute “the effect” of every program construct in a bottom-up fashion on the tree of control flow constructs (control tree).
  • Phase 2: Propagate the data flow value in a top down fashion into basic blocks.

Examples:

1.1 If-then Bottom-up phase.

Fif-then =(FthenFif/Y) Fif/N

1.2 Simplified If-then Bottom-up phase.

Fif-then =(FthenFif) Fif

1.3 Reaching definitions: simplified If-then Bottom-up phase.

FG,P = (FG2,P2FG1,P1)  FG1,P1

= F(G1  G2)  G2, P1  P2 FG1,P1

= F(G1  G2),P1

1.4 Simplified if then : top-down phase.

In(if) = in(if-then)

In (then) = Fif (in(if))

1.5 Reaching definitions : Simplified if-then top-down phase.

In(if) = in(if-then)

In (then) = FG1,P1 (in(if))

2.1 If-then-else Bottom-up phase.

Fif-then-else =(FthenFif/Y) (FelseFif/N)

2.2 Simplified If-then-else Bottom-up phase.

Fif-then-else =(FthenFif) (FelseFif)

=(Fthen  Felse)Fif

2.3 Simplified If-then-else Bottom-up phase.

(FG1,P1  FG2,P2)FG0,P0

= F(G1  G2, P1  P2 )FG0,P0

= F(G0(P1P2))G1G2,P0(P1P2)

2.4 If-then-else Top down phase.

In(if) =in(if-then-else)

In(then) = Fif/Y (in(if))

In(else) = Fif/N (in(if))

2.5 Simplified if-then-else Top down phase.

In(if) =in(if-then-else)

In(then) = Fif (in(if))

In(else) = Fif (in(if))

2.5 Simplified if-then-else Top down phase.

In(if) =in(if-then-else)

In(then) = FG0,P0 (in(if)) In(else) = FG0,P0 (in(if))

3.1 while Bottom-up phase.

Fwhile-loop = Fwhile/N (FbodyFwhile/Y)*

3.2 Simplified while Bottom-up phase.

Fwhile-loop = Fwhile (FbodyFwhile)*

3.3 Reaching definitions with Simplified while Bottom-up phase.

FG,P = FG0,P0 (FG1,P1FG0,P0)*

= FG0,P0 (F(G0P1)G1,P0P1)*

= FG0,P0 (F(G0P1)G1,U)*

= F((G0P1)G1)P0G0,P0

3.4 while Top-Down phase.

In(while) = (FbodyFwhile/Y)*(in(while-loop))

In(body) = Fwhile/Y (FbodyFwhile/Y)*(in(while-loop))

3.5 Simplified while Top-Down phase.

In(while) = (FbodyFwhile)*(in(while-loop))

In(body) = Fwhile (FbodyFwhile)*(in(while-loop))

3.5 Reaching definitions with Simplified while Top-Down phase.

In(while) = (FG1,P1FG0,P0)*(in(while-loop))

In(body) = FG0,P0 (FG1,P1FG0,P0)*(in(while-loop))

Example: structural analysis of the fibonachi function (fig. 1)

Bottom up phase:

After loop transformation: Fseq = F{5,8,9,10,11},{U-5}

After sequence transformation: Fif-then-else = F{2,3,5,8,9,10,11},{U-{2,3,5}}

After if-then-else transformation:

After sequence transformation:

Top –Down Phase:

1)

In(B0) =

In (B123456)=

In (B7)={2,3,5,8,9,10,11}

2)

In (B1) = 

In (B2354) = {2,3}

In (B6) = {2,3}

3)

In (B2) ={2,3}In (B35) = {2,3,5,}

In (B4) = { 2,3,5,8,9,10,11}

4)

In (B3) = {2,3,5,8,9,10,11}

In (B5) = {2,3,5,8,9,10,11}

Handling Arbitrary CFGs:

In order to handle arbitrary acyclic regions we need to join all paths.

FB = PPred(B) Fp

When the graph is irreducible we have a few practical options:

1)igonre it

2)use a transformation to turn it into a reducible graph (node splitting: increases code-size).

3)Do iteration only on the irreducible sub-graph

4)Iterate over the functions LF

Structural backward analysis

The reverse problem is more difficult in this case because we might end up with constructs that have multiple exists. See solutions in the Textbook.

Implementation issues.

Represent the computation of canonic cases with functions (if-then-else, while) moves the computation from compile-time to ‘compiler-build-time’.

Use graphs to represent arbitrary functional computations.

Def-Use, Use-Def Chains

Def-Use chains can be found using any of the above algorithms.

It can be used to improve the efficiency of iterative data-flow analysis

A du-chain for a variable v connects a definition of v to all the uses of this definition

A ud-chain for a variable v connects a use of v to all the definitions that may flow to it

A web for a variable v is the maximal union of interesting du-chains for v

Static Single Assignment (SSA)

•A normal form of the program such that def-use is immediate

•A separate variable for every assignment

•A  function combines the values of relevant variables

•Simplifies some optimizations

•Increases program’s size

Handling Pointers and Arrays.

This is a complicated problem because we can access a variable through a pointer and redefine it without referring to it by name. Most compilers will treat pointer assignments conservatively, they will assume everything dies (nothing preserved) after such assignment. This are is currently at the frontier of research.

Data-Flow analysis can yield “interesting” information on program behavior:

•Signs of variables

•Non-trivial constant values

•Termination properties

•Complicated bugs

•Partial correctness

Example: the program that output 91:

int f(int x)

{

if (x > 100) return x – 10 ;

else return f(f(x+11));

}

void main()

{

scanf(“%d”,&x);

if (x > 100) printf(“%d\n”,91);

else printf(“%d\n”,f(x));

}

The compiler could replace this code by:

void main()

{

scanf(“%d”,&x);

printf(“%d\n”,91);

}

1