9
GNU GCC - what just a compiler...? A quick lookup of overview in reference of GCC that is GNU Compiler Collection. 2012 Saket Kr. Pathak Software developer 3D Graphics

GNU GCC - what just a compiler...?

Embed Size (px)

DESCRIPTION

A quick look-up of overview in reference of GCC that is GNU Compiler Collection.

Citation preview

Page 1: GNU GCC - what just a compiler...?

A quick lookup of overview in reference of GCC that is GNU Compiler Collection.

2012

Page 2: GNU GCC - what just a compiler...?

GNU GCC - what just a compiler...?

Among all of us, we had learnt or studied about compilers and languages supported by these compilers from last few days (*whatever days might be in multiple of 365 ... :) ) and most of us had specific paper entitled as "Compiler Design" ... or any other name having similar sense or content of syllabus. That's really great mate, because I never got that type of fortunate chance to study "Compiler Designing" and all. Whatever ... it's all my interest and fortunate time that I found some thing valuable as well as sensible to learn and study. How many of times any one asked you in which compiler you work ... this question belong to all studying professional and stud fellows? Even I had asked a few times, and most of the time, I replied the answer to the question.

And my answer was, GNU GCC or VC++ as per environment matters. Today I realized this is quite foolish answer as ... If some one asked you about the flavor of coffee (in reference of dating friend) ... and you replied ... The coffee was good ... what a sense of humor ... haaa haa. :)

So I realized it and studied my answer in reference of... GNU GCC compiler.It's basically GCC, stands for GNU Compiler Collection. Originally named the GNU C Compiler, because it only handled the C programming language, GCC 1.0 was released in 1987, and the compiler was extended to compile C++ in December of that year. Later on it is embed with compilers concern to languages like Objective-C, Objective-C++, Fortran, Java, Ada, and Go etc.

Now since it's a collection of compilers so It can't be the exact answer as the question is about the type and version of your compiler. So In reference of C++ we have a specific name for the compiler embed within this GC Collection and that is G++, similarly in reference of C we have a specific name as GNU C Compiler (i.e GCC). A lot of other languages with there compiler are supported by GCC and are listed as bellow:

Seq. Language Compiler

1. C gcc2. C++ g++3. Objective-C gobjc4. Fortran gFortran5. Java gcj6. Ada GNAT7. Go gccgo8. Pascal gpc

Saket Kr. Pathak Page 2

Page 3: GNU GCC - what just a compiler...?

GNU GCC - what just a compiler...?

9. Mercury Mercury10. PL/I PL/111. VHDL GHDL

Basically GNU Project has some component modules, that I found to discuss here as some add-on, because. I thought how this big list of Compilers is going to handle within a single GNU tool chain for all compiler logic, programming libraries and their syntax. Then I found quite intellectual overview of GCC architecture that is basically categorized into 3 hierarchical modules and each compiler includes the following three components: a Front End, a Middle End, and a Back End. GCC compiles one file at a time. A source file goes through all three components one after another. These three components are discussed in bit details as follows:

GCC basic componentsFront-End The purpose of the front end is to read the source file, parse it, and convert it into

the standard Abstract Syntax Tree (AST) representation. There is one front end for each programming language. Because of the differences in languages, the format of the generated ASTs is slightly different for each language. The next step after AST generation is the unification step in which the AST tree is converted into a unified form called Generic.

Middle-End The middle end part of the compiler takes control. First, the tree is converted into another representation called GIMPLE. In this form, each expression contains no more than three operands, all control flow constructs are represented as combinations of conditional statements and goto operators, arguments of a function call can only be variables. GIMPLE is a convenient representation for optimizing the source code. After GIMPLE, the source code is converted into the Static Single Assignment (SSA) representation i.e. each variable is assigned to only once, but can be used at the right hand side of an expression any time. GCC performs more than 20 different optimizations on SSA trees. The tree is converted back to the GIMPLE form which is then used to generate a Register-Transfer Language (RTL) form of a tree. RTL is a hardware-based representation that corresponds to abstract target architecture with an infinite number of registers. An RTL optimization pass optimizes the tree in the RTL form.

Back-End Finally, a GCC back-end generates the assembly code for the target architecture using the RTL representation. Examples of back-ends are x86 back end, mips back end, etc.

Saket Kr. Pathak Page 3

Page 4: GNU GCC - what just a compiler...?

GNU GCC - what just a compiler...?

Hence from the above short-descriptions we have the overview of all the three components.

Front-End:

Frontends vary internally, having to produce trees that can be handled by the backend. Currently, the parsers are all hand-coded recursive descent parsers, though there is no reason why a parser generator could not be used for new front-ends in the future hence, version 2 of the C compiler used a bison based grammar. Here a recursive descent parser is a top-down parser built from a set of mutually-recursive procedures (or a non-recursive equivalent) where each such procedure usually implements one of the production rules of the grammar, whereas GNU bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification of a context-free language, warning about any parsing ambiguities, and generates a parser (either in C, C++, or Java) which reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar.

Then as it converts the source file to abstract syntax tree which has somewhat different meaning for different language front-ends, and front-ends could provide their own tree codes. This was simplified with the introduction of GENERIC and GIMPLE, two new forms of language-independent trees that were introduced with the advent of GCC 4.0. GENERIC is more complex, based on the GCC 3.x Java front-end's

Saket Kr. Pathak Page 4

Page 5: GNU GCC - what just a compiler...?

GNU GCC - what just a compiler...?

intermediate representation. GIMPLE is a simplified GENERIC, in which various constructs are lowered to multiple GIMPLE instructions. The C, C++ and Java front ends produce GENERIC directly in the front end. Other front ends instead have different intermediate representations after parsing and convert these to GENERIC.

Middle-end:

As it takes control, GENERIC that is an intermediate representation language used as a "middle-end" while compiling source code into executable binaries. A subset, called GIMPLE, is targeted by all the front-ends of GCC. So it’s responcible for all the code analysis and optimization, working independently of both the compiled language and the target architecture, starting from the GENERIC representation and expanding it to Register Transfer Language. The GENERIC representation contains only the subset of the imperative programming constructs optimized by the middle-end. In transforming the source code to GIMPLE, complex expressions are split into a three address code using temporary variables. This representation was inspired by the SIMPLE representation proposed in the McCAT compiler by Laurie J. Hendren for simplifying the analysis and optimization of imperative programs.

As it performs optimization that occurs during any phase of compilation, however the bulk of optimizations are performed after the syntax and semantic analysis of the front-end and before the code generation of the back-end. The exact set of GCC optimizations varies from release to release as it develops, but includes the standard algorithms, such as loop optimization, jump threading, common sub-expression elimination, instruction scheduling, and so forth. The RTL optimizations are of less importance with the addition of global SSA-based optimizations on GIMPLE trees. Some of these optimizations performed at this level include dead code elimination, partial redundancy elimination, global value numbering, sparse conditional constant propagation, and scalar replacement of aggregates. Array dependence based optimizations such as automatic vectorization and automatic parallelization are also performed.

Back-end:

The behavior of GCC's back end is partly specified by preprocessor macros and functions specific to a target architecture, for instance to define the endianness, word size, and calling conventions. The front part of the back end uses these to help decide RTL generation, so although GCC's RTL is nominally processor-independent, the initial sequence of abstract instructions is already adapted to the target. At any moment, the actual RTL instructions forming the program representation have to comply with the

Saket Kr. Pathak Page 5

Page 6: GNU GCC - what just a compiler...?

GNU GCC - what just a compiler...?

machine description of the target architecture. At the end of compilation, valid RTL is further reduced to a strict form in which each instruction refers to real machine registers and real instructions from the target's instruction set. Forming strict RTL is a very complicated task, done mostly by the register allocation first but completed only by a separate "reloading" phase which must account for the vagaries of all of GCC's targets.The final phase is somewhat anticlimactic, because the patterns to match were generally chosen during reloading, and so the assembly code is simply built by running substitutions of registers and addresses into the strings specifying the instructions.

Compatible IDEsIntegrated development environments written for GNU/Linux and some for other operating systems support GCC. These include:

Anjuta Code::Blocks CodeLite Dev-C++ Eclipse geany KDevelop Net Beans Qt Creator Xcode

Hmmm ... So please never tell any one like fool ... "I use to work on GNU GCC" ... be specific with GNU C Compiler/G++ for C/C++ receptively. A few links I would like to mention here, If any of you people like to read about GNU project in bit detail can definitely enjoy your time with these all ... But I know ... you are Quite busy ... :) ... whatever ...

 

Saket Kr. Pathak Page 6

Page 7: GNU GCC - what just a compiler...?

GNU GCC - what just a compiler...?

References:http://en.wikipedia.org/wiki/GNU_Projecthttp://en.wikipedia.org/wiki/GNU_Compiler_Collection#cite_note-8http://gcc.gnu.org/frontends.htmlhttp://en.wikipedia.org/wiki/GNU_Compiler_Collectionhttp://www.enotes.com/topic/GNU_Compiler_Collectionhttp://www.enotes.com/topic/GNU_Compiler_Collection#Back-end

Saket Kr. Pathak Page 7