Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer...

Preview:

Citation preview

Introduction to Language Processing TechnologyNatawut Nupairoj, Ph.D.

Department of Computer EngineeringChulalongkorn University

Outline

Level of Programming Languages. Language Processors. Specification of Programming Languages.

swap(int v[], int k)

{ int temp;

temp = v[k];

v[k] = v[k+1];

v[k+1] = temp;

}

swap:

muli $2, $5, 4

add $2, $4, $2

lw $15, 0($2)

...

Assembler

C Compiler

Level of Programming Languages

000010001101101100110000

000010001101101100110000

000010001101101100110000

000010001101101100110000

...

•High level: C / Java / Pascal•Low level: Assembly / Bytecode•Machine Language

High-Level Language Characteristics Expressions:

a = b + (c – d)/2; Data types:

Integer, character, boolean. Record, array.

Control structures: Selective. Iterative.

High-Level Language Characteristics Declarations:

Identifier can be constant, variable, procedure, function, and type.

Abstraction: Object-oriented concept. Concern only what, not how.

Encapsulation: Object-oriented concept. Information hiding.

Language Processors

Program that manipulates programs express in some programming languages.

Example:Editor.Translator / Compiler. Interpreter.

Translator

Translate a “source” program into an “equivalent” “object” program.

Translatorsourceprogram

objectprogram

error messages

CC++FORTRANJavaVB

AssemblyCBytecodep-code

Tombstone Diagrams

Ordinary program

Program P

Written with Language L

L

P

Java

Sort

x86

Sort

x86

Web Browser

x86

Web Browser

Tombstone Diagrams

Machine

M

Machine M

x86

SPARCx86

SPARC

x86

Web Browser

Tombstone Diagrams

Translator

L

S T

S is translatedto T

Translator is written with Language L

C

Java x86

x86

Java x86

C++

Java C

Tombstone Diagrams

Compilation

x86

C x86

x86

x86

x86

Sort

C

Sort

x86

Sort

Tombstone Diagrams

Cross Compilation

x86

C SPARC

x86

SPARC

SPARC

Sort

SPARC

Sort

C

Sort

Tombstone Diagrams

x86

Java C

x86

x86

C x86

x86

Two-stage compilation

C

Sort

Java

Sort

x86

Sort

Tombstone Diagrams

x86

C x86

x86

Compiling a compiler

C

Pascal x86

x86

Pascal x86

Tombstone Diagrams

Interpreter

S

L

Interpret source S

x86Written in language L

Basic

x86

Basic

x86

SQL

SPARC

Basic

Sort

Tombstone Diagrams

Abstract machine = hardware emulator interpreter for low-level language.

x86

C x86

x86

370

C

370

x86

x86

370

x86=

370

HW1

370

370

HW1

Tombstone Diagrams

Java Portable environment: write-once-run-anywhere. Interpretive compiler.

M

Java JVM JVM

M

JVM = Bytecode

Tombstone Diagrams

x86

JVM

x86

SPARC

JVM

SPARC

JVM

Sort

JVM

Sort

x86

Java JVM

x86

JVM

Sort

Java

Sort

Tombstone Diagrams

BootstrappingCompiler L that is written on L language.

Full bootstrapStart from nothing.

Half bootstrapStart from other machine.

NNP

C NNP

Tombstone Diagrams

Full Bootstrap

NNP

Csub

Csub NNP

NNP

Csub NNP

NNP

Csub

C NNP

NNP

C NNP

NNP

Csub NNP

NNP

Csub NNP

NNP

Csub NNP

Tombstone Diagrams

NNP

C

C NNP

NNP

C NNP

NNP

C NNP

Tombstone Diagrams

NNP

Csub

Csub NNP

NNP

Csub NNP

NNP

Csub

C NNP

NNP

C NNP

NNP

Csub NNP

NNP

C NNP

NNP

C

C NNP

Tombstone Diagrams

Half Bootstrap

x86

C x86

x86

C

C NNP

x86

C NNP

x86

C NNP

x86

C

C NNP

NNP

C NNP

x86

C X86

x86

Specification of Programming Language Specification

Syntax Define symbol and structure of the language. Grammar.

Contextual constraints Constraints beyond grammar. Rules of the language: scope rules, type rules, etc.

Semantics Meaning of program: its behaviors when run. How to translate a sentence S of the language L to a

machine code on M

Syntax

Context-free grammarTerminals.Non-terminals / Variables.Start symbol.Production rules.

Usually being expressed with BNF notation.

BNF Notation

Backus-Naur Form. Given production rule:

N N

Can be written as:

N ::=

Example: Mini-Triangle Program

! This is a comment. It continues to the end-of-line.

let

const m ~ 7;

var n: Integer

in

begin

n:= 2 * m * m;

putint(n);

end

Terminalsbegin const do else end ifin let then var while; : := ~ ( )+ - * / < >= \

Mini-Triangle Syntax

Program ::= Command

Command ::= single-Command

| Command ; single-Command

single-Command ::= V-name := Expression

| Identifier ( Expression )

| if Expression then single-Command

else single-Command

| while Expression do single-Command

| let Declaration in single-Command

| begin Command end

Mini-Triangle Syntax

Expression ::= primary-Expression

| Expression Operator primary-Expression

primary-Expression ::= Integer-Literal

| V-name

| Operator primary-Expression

| ( Expression )

V-name ::= Identifier

Declaration ::= single-Declaration

| Declaration ; single-Declaration

single-Declaration ::= const Identifier ~ Expression

| var Identifier : Type-denoter

Mini-Triangle Syntax

Type-denoter ::= Identifier

Operator ::= + | - | * | / | < | > | = | \

Identifier ::= Letter | Identifier Letter

| Identifier Digit

Integer-Literal ::= Digit | Integer-Literal Digit

Comment ::= ! Graphic* eol

Letter ::= a | b | … |z

Digit ::= 0 | 1 | 2 | … | 9

Syntax Tree

Ordered tree with Internal nodes: non-terminals.Leaf nodes: terminals.N-tree of G is a syntax tree with N as the root.

Mini-Triangle Syntax Tree

Expression ::= primary-Expression| Expression Operator primary-Expression

primary-Expression ::= Integer-Literal| V-name| Operator primary-Expression|( Expression )

V-name ::= Identifier…

Expression

Expression

Expression

primary-Expr.

V-name

Ident.

d

Op.

+

Int. Lit.

10

Op.

*

primary-Expr. primary-Expr.

V-name

Ident.

n

Recommended