Solutions to Compilers Second Edition Principles, Techniques, & Tools

 Chapter 5

 5.1.3 Exercises for Section 5.1

Exercise 5.1.1 : For the SDD of Fig. 5.1, give annotated parse trees for the

following expressions:

a) (3 + 4) * (5 + 6) n.





b) 1 * 2 * 3 * (4 + 5) n.



c) (9 + 8 * (7 + 6) + 5) * 4 n.



Exercise 5.1.2 : Extend the SDD of Fig. 5.4 to handle expressions as in Fig. 5.1.

1)L -> EnL.val = E.val
2)E -> TE'E'.inh = T.val
E.val = E'.syn
3)E' -> +TE_1'E_1'.inh = E'.inh + T.val
E'.syn = E_1'.syn
4)E' -> εE'.syn = E'.inh
5)T -> FT'T'.inh = F.val
T.val = T'.syn
6)T' -> *FT_1'T_1'.inh = T'.inh * F.val
T'.syn = T_1'.syn
7)T' -> εT'.syn = T'.inh
8)F -> (E)F.val = E.val
9)F -> digitF.val = digit.lexval

Exercise 5.1.3 : Repeat Exercise 5.1.1, using your SDD from Exercise 5.1.2.



Chapter 6

 Exercises for Section 6.1

Exercise 6.1.1 : Construct the DAG for the expression

((x + y) - ((x + y) * (x -y))) + ((x + y) * (x - y))


Exercise 6.1.2 : Construct the DAG and identify the value numbers for the subexpressions of the following expressions, assuming + associates from the left.

a) a + b + (a + b).

b) a + b + a + b.

c) a + a + 􀀀a + a + a + (a + a + a + a))

a)

1ida
2idb
3+12
4+33





b)


1ida
2idb
3+12
4+31
5+42



c)

1ida
2+11
3+21
4+31
5+34
6+25


6.2.5 Exercises for Section 6.2
Exercise 6.2.1 : Translate the arithmetic expression a + -(b + c) into:
a) A syntax tree.
b) Quadruples.
c) Triples.
d) Indirect triples.

a)

b)

oparg1arg2result
0+bct1
1minust1t2
2+at2t3

c)

oparg1arg2
0+bc
1minus(0)
2+a(1)
d)


instruction
0(0)
1(1)
2(2)

Exercise 6.2.2 : Repeat Exercise 6.2.1 for the following assignment statements:
i. a = b[i] + c[j].
ii. a[i] = b*c - b*d.
iii. x = f(y+1) + 2.
iv. x = *p + &y.

1
  0) =[]   b    i    t1
  1) =[]   c    j    t2
  2) +     t1   t2   t3
  3) =     t3        a  


  0) =[]   b    i
  1) =[]   c    j
  2) +     (0)  (1)
  3) =     a    (2)  


  Instructions
( 0) 
(1)
(2)
(3)

2
  0) *    b    c    t1
  1) *    b    d    t2
  2) -    t1   t2   t3
  3) []=  a    i    t4
  4) =    t3        t4

  0) *    b    c
  1) *    b    d
  2) -    (0)  (1)
  3) []=  a    i
  4) =    (3)  (2)

  
  0)
  1)
  2)
  3)
  4)

3
  0) +        y    1    t1
  1) param    t1
  2) call     f    1    t2
  3) +        t2    2   t3
  4) =        t3        x

  0) +        y     1
  1) param    (0)
  2) call     f     1
  3) +        (2)   2
  4) =        x     (3)


  0)
  1)
  2)
  3)
  4)


Exercise 6.2.3 : Show how to transform a three-address code sequence into one in which each de ned variable gets a unique variable name.

3adddress code sequence can be transformed into SSA in which each defines variable gets a unique variable name.
Static single-assignment form (SSA) is an intermediate representation that facilitates certain code optimizations. The all assignments in SSA are to variables with distinct names; hence the term static single-assignment.

6.3.7 Exercises for Section 6.3
Exercise 6.3.1 : Determine the types and relative addresses for the identifers in the following sequence of declarations:

float x;
record { float x; float y; } p;
record { int tag; float x; float y; } q;


S ->                  {top = new Evn(); offset = 0;}
     D 
D -> T id;            {top.put(id.lexeme, T.type, offset);
                       offset += T.width}
     D1
D -> ε
T -> int              {T.type = interget; T.width = 4;}
T -> float            {T.type = float; T.width = 8;}
T -> record '{'
                      {Evn.push(top), top = new Evn();
                       Stack.push(offset), offset = 0;}
     D '}'            {T.type = record(top); T.width = offset;
                       top = Evn.top(); offset = Stack.pop();}


line id      type        offset   Evn

  1) x       float       0        1

  2) x       float       0        2
  2) y       float       8        2
  2) p       record()    8        1

  3) tag     int         0        3
  3) x       float       4        3
  3) y       float       12       3
  3) q       record()    24       1     

Exercise 6.3.2 : Extend the handling of eld names in Fig. 6.18 to classes and single-inheritance class hierarchies.
a) Give an implementation of class Env that allows linked symbol tables, so that a subclass can either rede ne a eld name or refer directly to a eld name in a superclass.
b) Give a translation scheme that allocates a contiguous data area for the elds in a class, including inherited elds. Inherited elds must maintain the relative addresses they were assigned in the layout for the superclass.

6.4.5 Exercises for Section 6.4
Exercise 6.4.1 : Add to the translation of Fig. 6.19 rules for the following productions:
a) E ! E1  E2.
b) E ! + E1 (unary plus).

a)

E -> E1 * E2     { E.addr = new Temp();
                         E.code = E1.code || E2.code ||
                            gen(E.addr '=' E1.addr '*' E2.addr); }
                         
   | +E1             { E.addr = E1.addr;
                        E.code = E1.code; }


Exercise 6.4.2 : Repeat Exercise 6.4.1 for the incremental translation of Fig. 6.20.


E -> E1 * E2    { E.addr =  new Temp();
                         gen(E.addr '=' E1.addr '*' E2.addr; }
                         
   | +E1            { E.addr = E1.addr; }



Exercise 6.4.3 : Use the translation of Fig. 6.22 to translate the following
assignments:
a) x = a[i] + b[j].
b) x = a[i][j] + b[i][j].
! c) x = a[b[i][j]][c[k]].

a)


 t_1 = i * awidth
 t_2 = a[t_1]
 t_3 = j * bwidth
 t_4 = b[t_3]
 t_5 = t_2 + t_4
 x = t_5










b)

 t_1 = i * ai_width
 t_2 = j * aj_width
 t_3 = t_1 + t_2
 t_4 = a[t_3]
 t_5 = i * bi_width
 t_6 = j * bj_width
 t_7 = t_5 + t_6
 t_8 = b[t_7]
 t_9 = t_4 + t_8
 x = t_9


Exercise 6.4.4 : Revise the translation of Fig. 6.22 for array references of the
Fortran style, that is, id[E1; E2; : : : ; En] for an n-dimensional array.

a.type = array(i, array(j, w))
a.type.length = i
a.type.elem = array(j, w)


Exercise 6.4.5 : Generalize formula (6.7) to multidimensional arrays, and indicate what values can be stored in the symbol table and used to compute o sets. Consider the following cases:
a) An array A of two dimensions, in row-major form. The rst dimension has indexes running from l1 to h1, and the second dimension has indexes from l2 to h2. The width of a single array element is w.

3. A[i_1]]…[i_k] = base + 
                   (
                       (i_1 - l_1) * n_2 * … * n_k +
                       … + 
                       (i_k-1 - l_k-1) * n_k +
                       (i_k - l_k)
                   ) * w
                 
4. A[i_1]]…[i_k] = base + 
                   (
                       (i_1 - l_1) +
                       (i_2 - l_2) * n_1 + 
                       … +
                       (i_k - l_k) * n_k-1 * n_k-2 * … * n_1
                   ) * w

Code Generation || Issues in the Design of a Code Generator

  • The final phase in our compiler model is the code generator. 
  • It takes as input the intermediate representation (IR) produced by the front end of the compiler, along with relevant symbol table information, and produces as output a semantically equivalent target program
  • Compilers that need to produce efficient target programs, include an optimization phase prior to code generation. The optimizer maps the IR into IR from which more efficient code can be generated. In general, the code optimization and code-generation phases of a compiler, often referred to as the back end , may make multiple passes over the IR before generating the target program.

Code Generation 

A code generator has three primary tasks: 

  • Instruction selection involves choosing appropriate target-machine instructions to implement the IR statements. 
  • Register allocation and assignment involves deciding what values to keep in which registers. 
  • Instruction ordering involves deciding in what order to schedule the execution of instructions.

Issues in the Design of a Code Generator 

The most important criterion for a code generator is that it produce correct code. Correctness takes on special significance because of the number of special cases that a code generator might face. Given the premium on correctness, designing a code generator so it can be easily implemented, tested, and maintained is an important design goal.

1. Input to the Code Generator

  • The input to the code generator is the intermediate representation of the source program produced by the front end, along with information in the symbol table that is used to determine the run-time addresses of the data objects denoted by the names in the IR. 
  • The many choices for the IR include three-address representations such as quadruples, triples, indirect triples; virtual machine representations such as byte-codes and stack-machine code; linear representations such as post x notation; and graphical representations such as syntax trees and DAG's. 
  • We assume that the front end has scanned, parsed, and translated the source program into a relatively low-level IR, so that the values of the names appearing in the IR can be represented by quantities that the target machine can directly manipulate, such as integers and floating-point numbers. 
  • We also assume that all syntactic and static semantic errors have been detected, that the necessary type checking has taken place, and that type- conversion operators have been inserted wherever necessary. The code generator can therefore proceed on the assumption that its input is free of these kinds of errors.

 2. The Target Program

  • The instruction-set architecture of the target machine has a significant impact on the diffculty of constructing a good code generator that produces high-quality machine code. The most common target-machine architectures are RISC (reduced instruction set computer), CISC (complex instruction set computer), and stack based. 
  • A RISC machine typically has many registers, three-address instructions, simple addressing modes, and a relatively simple instruction-set architecture.  
  • In contrast, a CISC machine typically has few registers, two-address instructions, a variety of addressing modes, several register classes, variable-length instructions, and instructions with side effects.  
  • In a stack-based machine, operations are done by pushing operands onto a stack and then performing the operations on the operands at the top of the stack. To achieve high performance the top of the stack is typically kept in registers.


The target program is the output of the code generator.

  • Producing an absolute machine-language program as output has the advantage that it can be placed in a fixed location in memory and immediately executed. Programs can be compiled and executed quickly. 
  • Producing a relocatable machine-language program (often called an object module ) as output allows subprograms to be compiled separately. A set of relocatable object modules can be linked together and loaded for execution by a linking loader.
  • Producing an assembly-language program as output makes the process of code generation somewhat easier. We can generate symbolic instructions and use the macro facilities of the assembler to help generate code. The price paid is the assembly step after code generation.

3. Instruction Selection

  • The code generator must map the IR program into a code sequence that can be executed by the target machine. 
  • The complexity of performing this mapping is determined by factors such as the level of the IR the nature of the instruction-set architecture  the desired quality of the generated code.  
  • If the IR is high level, the code generator may translate each IR statement into a sequence of machine instructions using code templates. Such statement- by-statement code generation, however, often produces poor code that needs further optimization. 
  • If the IR reflects some of the low-level details of the underlying machine, then the code generator can use this information to generate more ecient code sequences.  
  • The nature of the instruction set of the target machine has a strong effect on the diculty of instruction selection. For example, the uniformity and completeness of the instruction set are important factors. 
  • Instruction speeds and machine idioms are other important factors. 
  • If we do not care about the efficiency of the target program, instruction selection is straightforward. For each type of three-address statement, we can design a code skeleton that defnes the target code to be generated for that construct. 

For example, every three-address statement of the form x = y + z , where x , y , and z are statically allocated, can be translated into the code sequence 

Here, the fourth statement is redundant since it loads a value that has just been stored, and so is the third if a is not subsequently used. The quality of the generated code is usually determined by its speed and
size.

4. Register Allocation

  • A key problem in code generation is deciding what values to hold in what registers. 
  • Registers are the fastest computational unit on the target machine, but we usually do not have enough of them to hold all values. Values not held in registers need to reside in memory.

The use of registers is often subdivided into two sub problems:
1. Register allocation , during which we select the set of variables that will reside in registers at each point in the program.
2. Register assignment , during which we pick the specific register that a variable will reside in.
 

Finding an optimal assignment of registers to variables is diffcult, even with single-register machines. Mathematically, the problem is NP-complete. The problem is further complicated because the hardware and/or the operating system of the target machine may require that certain register-usage conventions be observed.

Example: Certain machines require register-pairs (an even and next odd- numbered register) for some operands and results. For example, on some machines, integer multiplication and integer division involve register pairs. 

  • The multiplication instruction is of the form M x, y

where x , the multiplicand, is the odd register of an even/odd register pair and y , the multiplier, can be anywhere. The product occupies the entire even/odd register pair. 

  • The division instruction is of the form D x, y

where the dividend occupies an even/odd register pair whose even register is x ; the divisor is y . After division, the even register holds the remainder and the odd register the quotient.

5. Evaluation Order

  • The order in which computations are performed can a effect the effciency of the target code. 
  • Some computation orders require fewer registers to hold intermediate results than others. 
  • However, picking a best order in the general case is a dicult NP-complete problem.

 

Monk and Inversions

using System; public class Solution { public static void Main () { int T = Convert . ToInt32 ( Console . ReadLine...