A(n abridged) tour of the Rust compiler
Tom Lee@tglee
Saturday, March 29, 14
Brace yourselves.
• I’ve contributed some code to Rust
• Mostly fixing crate metadata quirks
• Haven’t touched most of the stuff I’m covering today
• Sorry in advance for any lies I tell you
Saturday, March 29, 14
What is this?
• We’re digging into the innards of Rust’s compiler.
• Along the way, I’ll cover some “compilers 101” stuff that may not be common knowledge.
• Not really covering any of the runtime stuff -- data representation, garbage collection, etc.
Saturday, March 29, 14
Intro to compilers
• Most compilers follow a familiar pattern:scan, parse, generate code
• A scanner converts raw source code into a stream of tokens.
• A parser converts the stream of tokens into an intermediate representation.
• A code generator emits the target code (e.g. bytecode, x86_64 assembly, etc.)
Saturday, March 29, 14
Intro to compilers (cont.)
• Real-world compilers do other stuff too.
• Semantic analysis often follows the parse phase.For example, if the language is statically typed, a type checking step might happen here.
• Often one or more optimization steps.
• The compiler may also be kind enough to invoke external tools on your behalf.
Saturday, March 29, 14
A 10,000 foot view of Rust’s compiler
• Scan
• Parse
• Semantic Analysis
• (Optimizations occur somewhere here)
• Generate target code
• Link object files into an ELF/PE/Mach-O binary.
Saturday, March 29, 14
A 10,000 ft view (cont.)
• Where does it all begin?
• src/librustc/lib.rsmain(...) and run_compiler(...)
• src/librustc/driver/driver.rssee compile_input and all the phase_X methods like phase_1_parse_input, phase_2_configure_and_expand, etc.
Saturday, March 29, 14
Scanners
• Raw source code goes in e.g.if (should_buy(goat_simulator)) { ... }
• Tokens come out e.g.[IF, LPAREN, ID(“should_buy”), LPAREN, ID(“goat_simulator”), RPAREN, RPAREN, LBRACE, ..., RBRACE]
• This simple translation makes the parser’s job easier.
Saturday, March 29, 14
Rust’s Scanner
• Fully contained within libsyntax
• src/libsyntax/parse/lexer.rs(another name for scanning is “lexical analysis”, ergo “lexer”)Refer to the Reader trait
• src/libsyntax/parse/token.rsTokens and keywords defined here.
Saturday, March 29, 14
Parsers
•Nom on a token stream from the scanner/lexer e.g.[IF, LPAREN, ID(“should_buy”), LPAREN, ID(“goat_simulator”), RPAREN, RPAREN, LBRACE, ..., RBRACE]
• Apply grammar rules to convert the token stream into an Abstract Syntax Tree(or some other representative data structure)
Saturday, March 29, 14
Abstract Syntax Trees
• Or “AST”
• Data structure representing the syntactic structure of your source program.
• Abstract in that it omits unnecessary crap (parentheses, quotes, etc.)
Saturday, March 29, 14
Abstract Syntax Trees (cont.)
If( Call( Id(“should_buy”), [Id(“goat_simulator”)]), [...])
example AST for input“if (should_buy(goat_simulator)) { ... }”
Saturday, March 29, 14
Rust’s Parser and AST
• Also fully contained within libsyntax
• src/libsyntax/ast.rsthe Expr_ enum is an interesting starting point, containing the AST representations of most Rust expressions.
• src/libsyntax/parse/mod.rssee parse_crate_from_file
• src/libsyntax/parse/parser.rsMost of the interesting stuff is in impl<‘a> Parser<‘a>.
Maybe check out parse_while_expr, for example.
Saturday, March 29, 14
Semantic Analysis
• Language- & implementation-specific, but there are common themes.
• Typically performed by analyzing and/or annotating the AST (directly or indirectly).
• Statically typed languages often do type checking etc. here.
Saturday, March 29, 14
Semantic Analysis in Rust
• Here we apply all the weird & wonderful rules that make Rust unique.
• Mostly handled by src/librustc/middle/*.rs
• Name resolution (resolve.rs)
• Type checking (typeck/*.rs)
• Much, much more...see phase_3_run_analysis_passes in compile_input for the full details
Saturday, March 29, 14
Semantic Analysis in Rust:Name Resolution
• src/librustc/middle/resolve.rs
• Resolve names“what does this name mean in this context?”
• Type? Function? Local variable?
• Rust has two namespaces: types and valuesthis is why you can e.g. refer to the str type and the str module at the same time
• resolve_item seems to be the real workhorse here.
Saturday, March 29, 14
Semantic Analysis in Rust:Type Checking
• src/librustc/middle/typeck/mod.rssee check_crate
• Infer and unify types.
• Using inferred & explicit type info, ensure that the input program satisfies all of Rust’s type rules.
Saturday, March 29, 14
Semantic Analysis in Rust: Rust-y Stuff
• A borrow checking pass enforces memory safety rulessee src/librustc/middle/borrowck/doc.rs for details
• An effect checking pass to ensure that unsafe operations occur in unsafe contexts.see src/librustc/middle/effect.rs
• A kind checking pass enforces special rules for built-in traits like Send and Dropsee src/librustc/middle/kind.rs
Saturday, March 29, 14
Semantic Analysis in Rust: More Rust-y Stuff
• A compute moves pass to determine whether the use of a value will result in a move in a given expression.Important to enforce rules on non-copyable (”linear”) types.
see src/librustc/middle/moves.rs
Saturday, March 29, 14
Code Generators
• Takes an AST as input e.g.If(Call(Id(“should_buy”), [Id(“goat_simulator”)]), [...])
• Emits some sort of target code e.g. (some made up bytecode)LOAD goat_simulatorCALL should_buyJMPIF if_stmt_body_addr
Saturday, March 29, 14
Rust’s Code Generator
• First, Rust translates the analyzed, type-checked AST into an LLVM module.This is phase_4_translate_to_llvm
• src/librustc/middle/trans/base.rstrans_crate is a good place to start
Saturday, March 29, 14
Rust’s Code Generator (cont.)
• src/librustc/back/link.rs
• Passes are run over the LLVM module to write the target code to diskthis is phase_5_run_llvm_passes in driver.rs, which calls the appropriate stuff on rustc::back::link
• We can tweak the output format using command line options: assembly code, LLVM bitcode files, object files, etc.see build_session_options and the OutputType* variants as used in driver.rs
Saturday, March 29, 14
Rust’s Code Generator (cont.)
• If you’re trying to build a native executable, the previous step will produce object files...
• ... but LLVM won’t link our object files into a(n ELF/PE) binary.this is phase_6_link_output
• Rust calls out to the system’s cc program to do the link step.see link_binary, link_natively and get_cc_prog in src/librustc/back/link.rs
Saturday, March 29, 14