PART I SISTEM UTILITIES Lecture 6 Compilers Ştefan Stăncescu 1

Preview:

Citation preview

PART ISISTEM UTILITIES

Lecture 6

Compilers

Ştefan Stăncescu

1

COMPILERS

2

“high level language” HLL,

w/complex grammar laws,

closer to human language

HLL mean for man computer link

human language binary language

HLL binary language

COMPILER - Automatic translation machine

COMPILERS

3

Source Code =>in HLL language

Object code =>in binary language (machine code)

COMPILATION – cf. HLL grammar law

• lexical laws

language elements type and structure

• syntactic laws

composition rules of language elements

• "semantic" laws (translation programs)

syntactic law correspondent in object code, “semantic programs” for machine

COMPILERS

4

Compiling = review + translate HLL source text

• lexical laws

scanner

• syntactic laws

parser

•  "semantic" laws

object code generator

(at the VM – intermediate code - "bytecode“)

COMPILERS

5

SCANER identifies tokens

• language elements -

one or many adjacent single characters separated by characters sp, LF,FF, etc.)

• words START, STOP, LABEL01

• operators +*/-

• special signs(){}//.,

COMPILERS

6

SCANER step I

scanning HLL source text

determine the token list by boundary

identify HLL tokens

identify programmer invented tokens

create look-up table with

numerical symbols for tokens

COMPILERS

7

SCANNER step 2

create intermediate source file

with replaced tokens with numerical symbols from the

look-up table created in step 1

COMPILERS

8

BNF – Bachus-Naur Form

syntactic rule REPRESENTATION

A rule - law in BNF format a valid construction in HLL language

formatted template of

a rule applied in a line in source file

(and a rule applied for lines in a line list)

COMPILERS

9

Syntactic rule valid construction in HLL

A template have the name of

the new built and checked element

that can be part of other construction

(including one with the same pattern)

New build name “nonterminal” symbol

BNF rule form:

<nonterminal symbol > :: = building template

COMPILERS

10

Parsing discovery in HLL source file of

successive valid BNF rules (templates) until

there are no more undiscovered laws

(no more “nonterminal” symbols)

Parsing ends only on tokens (“terminal” symbols)

Chaining BNF rules (templates) => syntax tree

The purpose parsing => the discovery of

the syntax tree of the source file

COMPILATOARE

11

Line in the source file: S = A + B

(A, B, S - integer variables - tokens)

The code generator must explain

to the machine the templates finded

The scanner identifies tokens

“S” “=“ “A” “+” “B”

tokens “A”, “B”, “S” as variables

token “+” operator , token “=“ assign

COMPILATOARE

12

The parser verifies also the coherence of variables, if are the same

(if all A, B, S integers – OK)

if one is different, the templates for “+” and “=“ need conversion to coherent type

Ex: if S is real, A,B integer

“+” rule OK , result integer

“=“ (assignment rule) add

format conversion integer => real(float)

COMPILERS

13

I-st parser operation - structures consistency

(conversion, if needed)

II-nd parser operation - A+B

(result in temporary memory)

III-rd parser operation - assigning result to S

(S=A+B)

Applicable BNF rules:

conversion, addition, assignment, in that order

COMPILERS

14

EXAMPLE II (bottom-up parsing)

S=A+B*C – D

scan the line, discover operations to be performed first

result become “nonterminal” symbol <N>

=> The precedence of operators( + <. * ) | ( * .> -)

Assuming algebraic expression rules

Syntactic algebraic rule of multiplication<product>::=<agent>*<agent>

Syntactic law of addition

<sum> ::=(<agent>+< agent >)|(< agent >-< agent >)

COMPILERS

15

EXEMPLE II (bottom-up parsing)

<N1>::=B*C

<N2>::=A+N1

<N3>::=N2-D

Syntactic tree of expression A+B*C-D

COMPILERS

16

EXEMPLE II (bottom-up parsing)

S=A+(B*C-D)

S=ATTRIB(N3)

N3=SUM(A,N2)

N2=SCAD(N1,D)

N1=PROD(B,C)

Syntactic tree of expression A+B*C-D

COMPILERS

17

STANDARD PROGRAM IN PASCAL SIMPLIFIED LANGUAGE

1 MEDIA ANALYSIS PROGRAM

2 VAR

3 NRCRT, I: INTEGER;

3 SARITM, SARMON, DIF: REAL

4 BEGIN

5 SARITM :=0;

6 SARMON :=0;

7 FOR I :=0 TO 100 DO

8 BEGIN

9 READ (NRCRT);

10 SARITM := SARITM + NRCRT;

11 SARMON := SARMON + 1 DIV NRCRT;

12 END;

13 DIF :=SARITM DIV 100 – 100 DIV SARMON;

14 WRITE (DIF);

15 END.

 

COMPILERS

18

GRAMMAR (BNF) PASCAL SIMPLIFIED LANGUAGE

1. <prog> ::= PROGRAM <prog-name> VAR <dec-list> BEGIN <stmt-list> END.

2. <prog_name> ::= id

3. <dec_list> ::= <dec> | <dec_list> ; <dec>

4. <dec> ::= <id_list> : <type>

5. <type> ::= INTEGER | REAL

6. <id_list> ::= id | <id_list> , id

7. <stmt_list> ::= <stmt> | <stmst_list> ; <stmt>

8. <stmt> ::= <assign> | <read> | <write> | <for>

9. <assign> ::= id := <exp>

10. <exp> ::= <term> | <exp> + <term> | <exp> - <term>

11. <term> ::= <factor> | <term> * <factor> | <term> DIV <factor>

12. <factor> ::= id | int | (<exp>)

13. <read> ::= READ(id_list)

14. <write> ::= WRITE(id_list)

15. <for> ::= FOR <index_exp> DO <body> ;

16. <index_exp> ::= id:= <exp> TO <exp>

17. <body> ::= <stmt> | BEGIN <stmt_list> END

 

COMPILERS

19

Token Name CodPROGRAM 1

VAR 2

BEGIN 3

END. 4

END 5

INTEGER 6

REAL 7

READ 8

WRITE 9

FOR 10

TO 11

DO 12

; 13

: 14

, 15

:= 16

+ 17

- 18

DIV 19

( 20

) 21

ID 22INT 23

COMPILERS Fisier elaborat de scaner

20

LINI TOKEN Specificity

1 1

22 ^ STATUS

:

7 10

22 ^ I

16

23 < >1

11

23 < >100

12

COMPILERS

21

STANDARD

9.READ (NRCRT);

BNF:

13. <read> ::=READ(id_list)

6. <id_list> ::=id | <id_list>) ; id

COMPILERS

22

STANDARD

15. DIF :=SARITM DIV 100 – 100 DIV SARMON;

BNF:

9. <assign> ::= id := <exp>

10. <exp> ::= <term> | <exp> - <term>

11. <term> ::= <factor> | <term> DIV <factor>

12. <factor> ::= id | int| (<exp>)

COMPILERS

23

COMPILERS

24

COMPILERS

25

PROGRAM .=. VAR

BEGIN <. FOR

; .> END.

Vide pairs - grammatical errors

Precedence relations– only one

(consistency grammar)

COMPILERS

26

Generating semantic programsDIF := SARITM DIV 100 – 100 DIV SARMON

id1 := id2 DIV int - int DIV id4

id1 := exp1 - exp2

id1 := exp3

DIV SARITM #100 i1

DIV #100 SARMON i2

- i1 i2 i3

:= i4 , DIF

 

COMPILERS

27

(1) := #0 , SARITM {SARITM:=0}

(2) := #0 , SARMON {SARMON:=0}

(3) := #1 , I {FOR i=1 to 100}

(4) JGT I #100 (15)

(5) CALL X READ {READ(NRCRT)}

(6) PARAM NRCRT

(7) + SARITM NRCRT i1 {SARITM:=SARITM+NRCRT}

(8) := i1 , SARITM

(9) DIV #1 NRCRT i2 {SARMON:=SARMON+1 DIV NRCRT)

(10) + SARMON i2 i3

(11) := i3 , SARMON

(12) + I #1 i4 {sfîrşit FOR}

(13) := i4 , I

(14) J (4)

(15) DIV SARITM #100 i6 {DIF :=SARITM DIV 100 - 100 DIV SARMON}

(16) DIV #100 SARMON i7

(17) - i6 i7 i8

(18) := i8 , DIF

(19) CALL X WRITE

(20) PARAM DIF

 

COMPILERS

28

1. L.L. Beck, „System Software: An introduction to systems programming”, Addison Wesley. 3’rd edition, 1997.

2. A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman, „Compilers: Principles, Techniques, and Tools”, 2'nd Edition. Addison-Wesley, 2007

3. Wirth Niklaus ""Compiler Construction", Addison-Wesley, 1996, 176 pages. Revised November 2005

4. Knuth, Donald E. "Backus Normal Form vs. Backus Naur Form", Communications of the ACM 7 (12), 1964, p735–736.

Recommended