MC0073 Fall Drive Assignment 2012_new

August 2012

Master of Computer Application (MCA) – Semester 3

MC0073 – System Programming– 4 Credits

(Book ID: B0811)

Assignment Set – 1 (40 Marks)

Answer all Questions Each question carries TEN marks

1. What is CISC & RISC? Explain their addressing modes.

Reduced instruction set computing From Wikipedia, the free encyclopedia

Jump to: navigation, search

"RISC" redirects here. For other uses, see RISC (disambiguation).

A Sun UltraSPARC, a RISC microprocessor

Reduced instruction set computing, or RISC (pron.: /ˈrɪsk/), is a CPU design strategy based

on the insight that simplified (as opposed to complex) instructions can provide higher

performance if this simplicity enables much faster execution of each instruction. A computer

based on this strategy is a reduced instruction set computer also called RISC. The opposing

architecture is known as complex instruction set computing, i.e. CISC.

Various suggestions have been made regarding a precise definition of RISC, but the general

concept is that of a system that uses a small, highly-optimized set of instructions, rather than a

more specialized set of instructions often found in other types of architectures. Another

common trait is that RISC systems use the load/store architecture,[1]

where memory is

normally accessed only through specific instructions, rather than accessed as part of other

instructions like add.

Although a number of systems from the 1960s and 70s have been identified as being

forerunners of RISC, the modern version of the design dates to the 1980s. In particular, two

projects at Stanford University and Berkeley University are most associated with the

popularization of the concept. Stanford's design would go on to be commercialized as the

successful MIPS architecture, while Berkeley's RISC gave its name to the entire concept,

commercialized as the SPARC. Another success from this era were IBM's efforts that

eventually lead to the Power Architecture. As these projects matured, a wide variety of similar

designs flourished in the late 1980s and especially the early 1990s, representing a major force

in the Unix workstation market as well as embedded processors in laser printers, routers and

similar products.

Well known RISC families include DEC Alpha, AMD 29k, ARC, ARM, Atmel AVR,

Blackfin, Intel i860 and i960, MIPS, Motorola 88000, PA-RISC, Power (including PowerPC),

SuperH, and SPARC. In the 21st century, the use of ARM architecture processors in smart

phones and tablet computers such as the iPad and Android tablets provided a wide user base

for RISC-based systems. RISC processors are also used in supercomputers such as the K

computer, the fastest on the TOP500 list in 2011, and the second at the 2012 list.[2][3]

Complex instruction set computing From Wikipedia, the free encyclopedia

Jump to: navigation, search

A complex instruction set computer (CISC, pron.: /ˈsɪsk/) is a computer where single

instructions can execute several low-level operations (such as a load from memory, an

arithmetic operation, and a memory store) and/or are capable of multi-step operations or

addressing modes within single instructions. The term was retroactively coined in contrast to

reduced instruction set computer (RISC).[1]

Examples of CISC instruction set architectures are System/360 through z/Architecture, PDP-

11, VAX, Motorola 68k, and x86

The RISC idea

The circuitry that performs the actions defined by the microcode in many (but not all) CISC

processors is, in itself, a processor which in many ways is reminiscent in structure to very early

CPU designs. In the early 1970s, this gave rise to ideas to return to simpler processor designs

in order to make it more feasible to cope without (then relatively large and expensive) ROM

tables and/or PLA structures for sequencing and/or decoding. The first (retroactively) RISC-

labeled processor (IBM 801 - IBMs Watson Research Center, mid-1970s) was a tightly

pipelined simple machine originally intended to be used as an internal microcode kernel, or

engine, in CISC designs, but also became the processor that introduced the RISC idea to a

somewhat larger public. Simplicity and regularity also in the visible instruction set would

make it easier to implement overlapping processor stages (pipelining) at the machine code

level (i.e. the level seen by compilers.) However, pipelining at that level was already used in

some high performance CISC "supercomputers" in order to reduce the instruction cycle time

(despite the complications of implementing within the limited component count and wiring

complexity feasible at the time). Internal microcode execution in CISC processors, on the other

hand, could be more or less pipelined depending on the particular design, and therefore more

or less akin to the basic structure of RISC processors.

Addressing mode Addressing modes are an aspect of the instruction set architecture in most central processing

unit (CPU) designs. The various addressing modes that are defined in a given instruction set

architecture define how machine language instructions in that architecture identify the operand

(or operands) of each instruction. An addressing mode specifies how to calculate the effective

memory address of an operand by using information held in registers and/or constants

contained within a machine instruction or elsewhere.

In computer programming, addressing modes are primarily of interest to compiler writers and

to those who write code directly in assembly language.

Note that there is no generally accepted way of naming the various addressing modes. In

particular, different authors and computer manufacturers may give different names to the same

addressing mode, or the same names to different addressing modes. Furthermore, an

addressing mode which, in one given architecture, is treated as a single addressing mode may

represent functionality that, in another architecture, is covered by two or more addressing

modes. For example, some complex instruction set computer (CISC) computer architectures,

such as the Digital Equipment Corporation (DEC) VAX, treat registers and literal/immediate

constants as just another addressing mode. Others, such as the IBM System/390 and most

reduced instruction set computer (RISC) designs, encode this information within the

instruction. Thus, the latter machines have three distinct instruction codes for copying one

register to another, copying a literal constant into a register, and copying the contents of a

memory location into a register, while the VAX has only a single "MOV" instruction.

The term "addressing mode" is itself subject to different interpretations: either "memory

address calculation mode" or "operand accessing mode". Under the first interpretation

instructions that do not read from memory or write to memory (such as "add literal to register")

are considered not to have an "addressing mode". The second interpretation allows for

machines such as VAX which use operand mode bits to allow for a literal operand. Only the

first interpretation applies to instructions such as "load effective address".

The addressing modes listed below are divided into code addressing and data addressing. Most

computer architectures maintain this distinction, but there are, or have been, some architectures

which allow (almost) all addressing modes to be used in any context.

The instructions shown below are purely representative in order to illustrate the addressing

modes, and do not necessarily reflect the mnemonics used by any particular computer.

1. Discuss the following:

a. Design Specification of Assembler b. Design of Single Pass Assembler

Ans Describe Design Specification of Assembler?

The purpose of a Software Design Specification (SDS) is to define the software that is to meet the functional requirements for the project. It is the stage at which the supplier specifies the detailed design of the software system, produces the program code to realize that design, tests the individual programs and integrates them into the complete software system.

Now, a PCS automation system generally has a collection of standard reusable modules that need to be configured and/or programmed. But unlike a typical IT type system the design of these modules is often part of the standard software of the system and is not needed in detail in the SDS. A good example of this is a PID Controller, where the PID algorithm is not something specifically designed for the project. It may have some documentation in the systems standard manuals but they are not normally considered to be part of the SDS. There is another class of module that a PCS often contains, these are application library objects rather than standard software. All software modules, including application specific and system standard should have version control

A programming language that is one step away from machine language. Each assembly language statement is translated into one machine instruction by the assembler. Programmers must be well versed in the computer's architecture, and, undocumented assembly language programs are difficult to maintain. It is hardware dependent; there is a different assembly language for each CPU series.

It Used to All Be Assembly Language

In the past, control programs (operating systems, database managers, etc.) and many applications were written in assembly language to maximize the machine's performance. Today, C/C++ is widely used instead. Like assembly language, C/C++ can manipulate the bits at the machine level, but it is also portable to different computer platforms. There are C/C++ compilers for almost all computers.

Assembly Language Vs. Machine Language

Although often used synonymously, assembly language and machine language are not the same. Assembly language is turned into machine language. For example, the assembly instruction COMPARE A,B is translated into COMPARE contents of memory bytes 2340-2350 with 4567-4577 (where A and B happen to be located). The physical binary format of the machine instruction is specific to the computer it's running in.

They Can Be Quite Different

Assembly languages are quite different between computers as is evident in the example below, which takes 16 lines of code for the mini and 82 lines for the micro. The example changes Fahrenheit to Celsius. Q. Describe Design of Single Pass Assembler?

Assemblers a program that turns symbols into machine instructions. ISA-specific:close correspondence between symbols and instruction set mnemonics for opcodes, labels for memory locations, additional operations for allocating storage and initializing data

Each line of a program is one of the following:

•An instruction

•An assembler directive (or pseudo-op)

•A comment

Whitespace (between symbols) and case are ignored. Comments (beginning with “;”) are also ignored

Assembler Design can be done in:

–Single pass –Two pass

•Single Pass Assembler:

–Does everything in single pass

–Cannot resolve the forward referencing

The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one pass compilers generally compile faster than multi-pass compilers. Many languages were designed so that they could be compiled in a single pass (e.g., Pascal).

In some cases the design of a language feature may require a compiler to perform more than one pass over the source. For instance, consider a declaration appearing on line 20 of the source which affects the translation of a statement appearing on line 10. In this case, the first pass needs to gather information about declarations appearing after statements that they affect, with the actual translation happening during a subsequent pass.

The disadvantage of compiling in a single pass is that it is not possible to perform many of the sophisticated optimizations needed to generate high quality code. It can be difficult to count exactly how many passes an optimizing compiler makes. For instance, different phases of optimization may analyse one expression many times but only analyse another expression once.

2. Discuss the following: a. Macro Parameters b. Nested and Recursive Macro Calls and its expansion c. Flow chart of Design of Macro Preprocessors Implementation

3. Describe Macro Parameters?

A macro is a unit of specifications for program generation through expansion. A shortcut method for invoking a

sequence of user interface functions. Macros let users turn widely used sequences of menu selections and

keystrokes into one command or key combination. For example, pressing the F2 key might cause several menu

options to be selected and several dialog box OK buttons to be clicked in a prescribed sequence. A special-

purpose command language within an application. In assembly language, a prewritten subroutine that is called for

throughout the program. At assembly time, the macro calls are substituted with the actual subroutine or

instructions that branch to it. The high-level language equivalent is a function.

Keyboard and mouse macros that are created using an application's built-in macro features are sometimes called

application macros. They are created by carrying out the sequence once and letting the application record the

actions. An underlying macro programming language, most commonly a Scripting language, with direct access to

the features of the application may also exist.

The programmers' text editor Emacs follows this idea to a conclusion. In effect, most of the editor is made of

macros. Emacs was originally devised as a set of macros in the editing language TECO; it was later ported to

dialects of Lisp.

Another programmer's text editor Vim (a descendant of vi) also has full implementation of macros. It can record

into a register (macro) what a person types on the keyboard and it can be replayed or edited just like VBA macros

for Microsoft Office. Also it has a scripting language called Vimscript[4] to create macros.[5]

Visual Basic for Applications (VBA) is a programming language included in Microsoft Office and some other

applications. However, its function has evolved from and replaced the macro languages that were originally

included in some of these applications. With this positional parameters, the programmer must be careful to

specify the arguments in the proper order. If a macro has a large number of parameters, and only a few of these

are given values in a typical invocation, a different form of parameter specification is more useful. This is called

as keyword parameters.

Q. Describe Nested and Recursive Macro Calls and its expansion?

Most macro processors allow parameters to be concatenated with other character string. For example,

XA1,XA2,….,XB1,etc. where A,B are parameters. Any symbol that begins with the character ‘&’ and this is not

a macro instruction parameter is assumed to be a macro-time variable. All such variables are initialized to a

value of ‘0’. Macro invocation of one macro by another macro is known as macro within macro and also referred

to as Recursive macro call. When a macro invocation statement is recognized, the arguments are stored in

ARGTAB according to their position in the argument list. The positional notation parameter marked as ‘?n’ for

representing the position of the parameter.

Macro processor replaces each macro instruction with the corresponding group of source statements. These

macro instructions allow the programmer to write a shorthand version of a program, and leave the mechanical

details to be handled by the macro processor.

Macro invocation of one macro by another macro is known as macro within macro and also referred to as

Recursive macro call. Most macro processors can also modify the sequence of statements generated for a macro

expansion, depending on the arguments supplied in the macro invocation. Conditional Assembly is commonly

used to describe this feature. It is also referred to as conditional macro expansion.

Formally, a frame is a procedural macro consisting of frame-text - zero or more lines of ordinary (program) text

and frame commands (that are carried out by FT’s frame processor as it manufactures custom programs). Each

frame is both a generic component in a hierarchy of nested subassemblies, and a procedure for integrating itself

with its subassembly frames (a recursive process that resolves integration conflicts in favor of higher level

subassemblies). Macros also make it possible to define data languages that are immediately compiled into code,

which means that constructs such as state machines can be implemented in a way that is both natural and

efficient.

Q. Describe Flow chart of Design of Macro Preprocessors Implementation?

In the implementation of assert macros, to prevent a performance hit at run-time. This is an excellent practice,

except that it means that any otherwise useful code performed by the assert statement becomes an undesirable

side-effect. Macros. In-line functions work just as well. Functions don't bomb the program if they are not

wrapped in enough parentheses. Functions can be stepped into and debugged; most debuggers cannot step into

macros. Most importantly, function calls always evaluate their parameters only once. (Try using 'x++' in a

'min(,)' macro. Surprise!) To comment out blocks of code from the compiler.

Preprocessor Code

#if TARGET_WINDOWS /* assume TARGET_WINDOWS is 1 for this example */

typedef enum {

WINDOWS_FLAG_A = 64,

WINDOWS_FLAG_B = 128

} WinFlags;

#endif

void routine(WinFlags flag)

{

switch(flag) {

case WINDOWS_FLAG_A: /* this compiles fine - WINDOWS_FLAG_A is defined */

doStuffA();

break;

case WINDOWS_FLAG_B:

doStuffB();

break;

}

#ifdef WINDOWS_FLAG_A

doStuffWin(); /* this isn't included - WINDOWS_FLAG_A is NOT defined */

#endif

return;

}

4. Discuss the following:

a. Phases of Compilation b. Java Compiler and Environment

5. Q. Describe Phases of Compilation?

A compiler is a computer program (or set of programs) that transforms source code written in a computer language (the source language) into another computer language (the target language, often having a binary form known as object code). The most common reason for wanting to

transform source code is to create an executable program.

The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a lower level language

(e.g., assembly language or machine code). A program that translates

from a low level language to a higher level one is a decompiler. A program that translates between high-level languages is usually called a language translator, source to source translator, or language converter. A language rewriter is usually a program that translates the form of expressions

without a change of language.

A compiler is likely to perform many or all of the following operations: lexical analysis, preprocessing, parsing, semantic analysis, code

generation, and code optimization.

Program faults caused by incorrect compiler behavior can be very difficult

to track down and work around and compiler implementors invest a lot of time ensuring the correctness of their software

Q. Describe Java Compiler and Environment?

Anything which converts 'JavaLanguage' to _any_ other form is a Java compiler, as the term is commonly understood. Only something which converts 'JavaLanguage' to 'JVM language' is a Java compiler, as defined by Sun. It is sad that the legal system forces such a distinction of technical terms to be important for political reasons

Java "compilers" that convert Java source into something executed in the host machine's native environment (Windows, Linux, VMS, OS/9, VxWorks, etc.) are not, by definition, "Java compilers." They are Java converters. This distinction is not simply a matter of semantics; it goes to the very heart of the Java Machine and its usefulness.

There are also virtual CPUs, such as the JavaVirtualMachine. These could have been implemented in hardware, but happen to be implemented in software. In the future we may see JavaVirtualMachine's implemented in hardware then we will have to distinguish between those that are JavaVirtualMachine's and those that are true JavaMachine.

August 2012

Master of Computer Application (MCA) – Semester 3

MC0073 – System Programming– 4 Credits

(Book ID: B0811)

Assignment Set – 2 (40 Marks) Answer all Questions Each question carries TEN marks

1. Explain the design of a multi-pass assembler

Describe the Design of Multi-pass(two pass) Assemblers Implementation?


Pass 1 L Assign addresses to all statements in the program L Save the values assigned to all labels for use in Pass 2 L Perform some processing of assembler directives

Pass 2 L Assemble instructions L Generate data values defined by BYTE, WORD L Perform processing of assembler directives not done in Pass 1 L Write the object program and the assembly listing

2. Explain the following:

a. Basic Assembler Functions

b. Design of Multi-pass(two pass) Assemblers Implementation c. c. Examples: MASM Assembler and SPARC Assembler.

3. Describe Basic Assembler Functions?

Often the assembler cannot generate debug information automatically. This means that you cannot get a source report unless you manually define the necessary debug information; read your assembler documentation for how you might do that. The only debugging info needed currently by O Profile is the line-number/filename-VMA association. When profiling assembly without debugging info you can always get report for symbols, and optionally for VMA, through opreport -l or opreport -d, but this works only for symbols with the right attributes.

Basic assembler directives

START, END, BYTE, WORD, RESB, RESW

Purpose: reads records from input device (code F1). copies them to output device (code 05) at the end of the file, writes EOF on the output device, then RSUB to the operating system

Data transfer (RD, WD) a buffer is used to store record buffering is necessary for different I/O rates the end of each record is marked with a null character (0016) the end of the file is indicated by a zero-length record Subroutines (JSUB, RSUB) RDREC, WRREC

save link register first before nested jump.

Assembler’s functions

Convert mnemonic operation codes to their machine language equivalents Convert symbolic operands to their equivalent machine addresses. Build the machine instructions in the proper format. Convert the data constants to internal machine representations

Write the object program and the assembly listing

Describe the Design of Multi-pass(two pass) Assemblers Implementation?


Pass 1 L Assign addresses to all statements in the program L Save the values assigned to all labels for use in Pass 2 L Perform some processing of assembler directives

Pass 2 L Assemble instructions L Generate data values defined by BYTE, WORD L Perform processing of assembler directives not done in Pass 1 L Write the object program and the assembly listing

Examples: MASM Assembler and SPARC Assembler?

You can assemble this by typing: "tasm first [enter] tlink first [enter]" or something like: "masm first [enter] link first [enter] You must have an assembler and the link/tlink program.

.model small

.stack

.data

message db "Hello world, I'm learning Assembly !!!", "$"

.code

main proc

mov ax,seg message

mov ds,ax

mov ah,09

lea dx,message

int 21h

mov ax,4c00h

int 21h

main endp

end main

.model small: Lines that start with a "." are used to provide the assembler with infomation. The word(s) behind it say what kind of info. In this case it just tells the assembler the program is small and doesn't need a lot of memory. I'll get back on this later. .stack: Another line with info. This one tells the assembler that the "stack" segment starts here. The stack is used to store temporary data. It isn't used in the program, but it must be there, because we make an .EXE file and these files MUST have a stack. .data:indicates that the data segment starts here and that the stack segment ends there. .code : indicates that the code segment starts there and the data segment ends there.

There are very few addressing modes on the SPARC, and they may be used only in certain very restricted combinations. The three main types of SPARC instructions are given below, along with the valid combinations of addressing modes. There are only a few unusual instructions which do not fall into these catagories.

1. Arithmetic/Logical/Shift instructions

opcode reg1,reg2,reg3 !reg1 op reg2 -> reg3

2. Load/Store Instructions

opcode [reg1+reg2],reg3

The SPARC code for this subroutine can be written several ways; two possible approaches are given below. (The 'X's in the center line indicate the differences between the two approaches.) .global prt_sum | .global prt_sum prt_sum: | prt_sum: save %sp,-96,%sp | save %sp,-96,%sp | clr %l0 | clr %l0 clr %l1 | clr %l1 mov %i0,%l2 X loop: | loop: cmp %l0,%i1 | cmp %l0,%i1 bge done | bge done nop | nop X sll %l0,2,%l2 ld [%l2],%o0 X ld [%i0+%l2],%o0 add %l1,%o0,%l1 | add %l1,%o0,%l1 add %l2,4,%l2 X inc %l0 | inc %l0 ba loop | ba loop nop | nop done: | done:

4. What is Relocation? Write the relocation algorithm in detail. Ans: Relocation is the process of updating the addresses used in the address sensitive instructions of a program. It is necessary that such a modification should help to execute the program from designated area of the memory.

The assembler generates the object code. This object code gets executed after loading at storage locations. The addresses of such object code will get specified only after the assembly process is over. Therefore, after loading, address of object code = Mere address of object code + relocation constant. There are two types of addresses being generated: Absolute address and, relative address. The absolute address can be directly used to map the object code in the main memory. Whereas the relative address is only after the addition of relocation constant to the object code address. This kind of adjustment needs to be done in case of relative address before actual execution of the code. The typical example of relative reference is : addresses of the symbols defined in the Label field, addresses of the data which is defined by the assembler directive, literals, redefinable symbols. Dr.Shaimaa H.Shaker 9 ص\\\\\\فحةSimilarly, the typical example of absolute address is the constants which are generated by assembler are absolute. The assembler calculates which addresses are absolute and which addresses are relative during the assembly process. During the assembly process the assembler calculates the address with the help of simple expressions. For example LOADA(X)+5 The expression A(X) means the address of variable X. The meaning of the above instruction is that loading of the contents of memory location which is 5 more than the address of variable X. Suppose if the address of X is 50 then by above command we try to get the memory location 50+5=55. Therefore as the address of variable X is relative A(X) + 5 is also relative. To calculate the relative addresses the simple expressions are allowed. It is expected that the expression should possess at the most addition and multiplication operations. A simple exercise can be carried out to determine whether the given address is absolute or relative. In the expression if the address is absolute then put 0 over there and if address is relative then put lover there. The expression then gets transformed to sum of O's and l's. If the resultant value of the expression is 0 then expression is absolute. And if the resultant value of the expression is 1 then the expression is relative. If the resultant is other than 0 or 1then the expression is illegal. For example: In the above expression the A, Band C are the variable names. The assembler is to c0l1sider the relocation attribute and adjust the object code by relocation constant. Assembler is then responsible to convey the information loading of object code to the loader. Let us now see how assembler generates code using relocation information.

• Prg_linked_origin=<link origin> from linker cmd. • For each object module

1.T_origin = translated origin of the object module. OM_size= size of object of the object module. 2.Relocation_factor=prg_linked_origin - t_origin. 3.Read the machine language program in work_area. 4.Read RELOCTAB of the object module. 5.For each entry in RELOCTAB a.Trans_addr = address in RELOCTAB ENTRY. b.Address_in_work_area := address of work_area + translated_address – t_origin. c.Add relocation_factor to the operand address in the word with the address address_in_work_area. 6.Program_linked_origin=program_linked_origin + OM_size.

5. Explain the following: a. YACC Compiler-Compiler b. Interpreters c. Compiler writing tools

Ans: . Describe YACC Compiler-Compiler?

If you have been programming for any length of time in a Unix environment, you will have encountered the mystical programs Lex & YACC, or as they are known to GNU/Linux users worldwide, Flex & Bison, where Flex is a Lex implementation by Vern Paxson and Bison the GNU version of YACC. We will call these programs Lex and YACC throughout - the newer versions are upwardly compatible, so you can use Flex and Bison when trying our examples.

These programs are massively useful, but as with your C compiler, their man page does not explain the language they understand, nor how to use them. YACC is really amazing when used in combination with Lex, however, the Bison man page does not describe how to integrate Lex generated code with your Bison program. YACC can parse input streams consisting of tokens with certain values. This clearly describes the relation YACC has with Lex, YACC has no idea what 'input streams' are, it needs preprocessed tokens. While you can write your own Tokenizer, we will leave that entirely up to Lex.

A note on grammars and parsers. When YACC saw the light of day, the tool was used to parse input files for compilers: programs. Programs written in a programming language for computers are typically *not* ambiguous - they have just one meaning. As such, YACC does not cope with ambiguity and will complain about shift/reduce or reduce/reduce conflicts.

Example: %{ #include <stdio.h> #include <string.h> void yyerror(const char *str) { fprintf(stderr,"error: %s\n",str); } int yywrap() { return 1; } main() { yyparse(); } %} %token NUMBER TOKHEAT STATE TOKTARGET TOKTEMPERATURE

Describe Interpreters?

A program that executes instructions written in a high-level language. There are two

ways to run programs written in a high-level language. The most common is to compile the program; the other method is to pass the program through an interpreter.

An interpreter translates high-level instructions into an intermediate form, which it then executes. In contrast, a compiler translates high-level instructions directly into machine language.

Compiled programs generally run faster than interpreted programs. The advantage of an interpreter, however, is that it does not need to go through the compilation stage during which machine instructions are generated. This process can be time-consuming if the program is long. The interpreter, on the other hand, can immediately execute high-level programs. For this reason, interpreters are sometimes used during the development of a program, when a programmer wants to add small sections at a time and test them quickly. In addition, interpreters are often used in education because they allow students to program interactively. Both interpreters and compilers are available for most high-level languages. However, BASIC and LISP are especially designed to be executed by an interpreter. In addition, page description languages, such as PostScript, use an interpreter. Every PostScript printer, for example, has a built-in interpreter that executes PostScript instructions.

Describe Compiler writing tools?

A compiler is a computer program (or set of programs) that transforms source code written in a computer language (the source language) into another computer language (the target language, often having a binary form known as object code). The most common reason for wanting to transform source code is to create an executable program.

The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a lower level language (e.g., assembly language or machine code). A program that translates from a low level language to a higher level one is a decompiler. A program that translates between high-level languages is usually called a language translator, source to source translator, or language converter. A language rewriter is usually a program that translates the form of expressions without a change of language. A compiler is likely to perform many or all of the following operations: lexical analysis, preprocessing, parsing, semantic analysis, code generation, and code optimization.

Purdue Compiler-Construction Tool Set tool:

(PCCTS) A highly integrated lexical analyses generator and parser generator by Terence J. Parr , Will E. Cohen and Henry G. Dietz , both of Purdue University. ANTLR (Another Tool for Language Recognition) corresponds to YACC and DLG (DFA-based Lexical analyzer Generator) functions like LEX. PCCTS has many additional features which make it easier to use for a wide range of translation problems. PCCTS grammars contain specifications for lexical and syntactic analysis with selective backtracking ("infinite look ahead"), semantic predicates, intermediate-form construction and error reporting. Rules may employ Extended BNF (EBNF) grammar constructs and may define parameters, return values, and have local variables.

Languages described in PCCTS are recognized via LLk parsers constructed in pure, human-readable, C code. Selective backtracking is available to handle non-LL(k) constructs. PCCTS parsers may be compiled with a C++ compiler. PCCTS also includes the SORCERER tree parser generator. Latest version: 1.10, runs under Unix, MS-DOS, OS/2, and Macintosh and is very portable.

If you are thinking of creating your own programming language, writing a compiler or interpreter, or a scripting facility for your application, or even creating a documentation parsing facility, the tools on this page are designed to (hopefully) ease your task. These compiler construction kits, parser generators, lexical analyzer / analyzer (lexers) generators, code optimizers (optimizer generators), provide the facility where you define your language and allow the compiler creation tools to generate the source code for your software.

d.

Documents

MC0073 Fall Drive Assignment 2012_new