43
BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary. Brainfuck takes this too far. It is a minimalist programming language that has only 8 commands. As such, it is fairly easy to create a TransCompiler for. BFTC and its sister program BFTCi are a TransCompiler and interpreter duo that allow for easy language extension. BrainFuck interpretation is supported, as well as BrainFuck compilation to BrainFuck (optimized), C and Python 3. 1

BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

BFTC: The BrainFuck TransCompiler

John Lekberg

March 19, 2015

Abstract

Programming languages should be designed not by piling feature ontop of feature, but by removing the weaknesses and restrictions that makeadditional features appear necessary. Brainfuck takes this too far. Itis a minimalist programming language that has only 8 commands. Assuch, it is fairly easy to create a TransCompiler for. BFTC and its sisterprogram BFTCi are a TransCompiler and interpreter duo that allow foreasy language extension. BrainFuck interpretation is supported, as wellas BrainFuck compilation to BrainFuck (optimized), C and Python 3.

1

Page 2: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 2

I would like to thank the following people for their contributions to thisproject...

Izaak MecklerStuart KurtzMark Stoehr

Page 3: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 3

Contents

1 The Commands 41.1 Storing The Commands . . . . . . . . . . . . . . . . . . . . . . . 4

2 Interpreting Symbols 62.1 Commands That Do Nothing . . . . . . . . . . . . . . . . . . . . 72.2 Commands That Don’t Need User Input . . . . . . . . . . . . . . 72.3 Commands That Need User Input . . . . . . . . . . . . . . . . . 82.4 Wrapping it all up . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Implementation of a Machine 103.1 Using Two Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Optimizing the Code 114.1 Optimizing ModPtr . . . . . . . . . . . . . . . . . . . . . . . . . 124.2 Optimizing ModCell . . . . . . . . . . . . . . . . . . . . . . . . . 124.3 Optimizing Output . . . . . . . . . . . . . . . . . . . . . . . . . . 134.4 Optimizing Input . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.5 Optimizing LoopNonZero . . . . . . . . . . . . . . . . . . . . . . 144.6 Putting it all together . . . . . . . . . . . . . . . . . . . . . . . . 15

5 Parsing the Code 155.1 Parsing ModPtr . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.2 Parsing ModCell . . . . . . . . . . . . . . . . . . . . . . . . . . . 165.3 Parsing Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165.4 Parsing Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165.5 Parsing LoopNonZero . . . . . . . . . . . . . . . . . . . . . . . . 165.6 Parsing a Symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.7 Parsing a Program . . . . . . . . . . . . . . . . . . . . . . . . . . 175.8 Parsing from a character stream . . . . . . . . . . . . . . . . . . 17

6 BFTCi 186.1 Getting the file name . . . . . . . . . . . . . . . . . . . . . . . . . 186.2 Reading the file . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186.3 Parsing the file . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196.4 Optimizing the program . . . . . . . . . . . . . . . . . . . . . . . 196.5 Running the optimized program . . . . . . . . . . . . . . . . . . . 196.6 Wrapping it up . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

7 Language Extensions 207.1 Abstract Language . . . . . . . . . . . . . . . . . . . . . . . . . . 207.2 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217.3 BrainFuck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

7.3.1 Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237.3.2 Footer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247.3.3 TranslateProgram . . . . . . . . . . . . . . . . . . . . . . 24

Page 4: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 4

7.3.4 BrainFuckLanguage.hs . . . . . . . . . . . . . . . . . . . . 267.3.5 Example Output . . . . . . . . . . . . . . . . . . . . . . . 27

7.4 C-30000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277.4.1 Limitations of C . . . . . . . . . . . . . . . . . . . . . . . 277.4.2 Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287.4.3 Footer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297.4.4 TranslateProgram . . . . . . . . . . . . . . . . . . . . . . 297.4.5 C30000Language.hs . . . . . . . . . . . . . . . . . . . . . 317.4.6 Example Output . . . . . . . . . . . . . . . . . . . . . . . 33

7.5 Python3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347.5.1 Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347.5.2 Footer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357.5.3 TranslateProgram . . . . . . . . . . . . . . . . . . . . . . 357.5.4 Python3Language.hs . . . . . . . . . . . . . . . . . . . . . 387.5.5 Example Output . . . . . . . . . . . . . . . . . . . . . . . 39

7.6 Closing Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

8 BFTC 408.1 Getting the arguments . . . . . . . . . . . . . . . . . . . . . . . . 408.2 Getting the language . . . . . . . . . . . . . . . . . . . . . . . . . 408.3 Compiling and outputing the code . . . . . . . . . . . . . . . . . 418.4 Wrapping it all up . . . . . . . . . . . . . . . . . . . . . . . . . . 41

9 Epilogue 42

Page 5: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 5

1 The Commands

1.1 Storing The Commands

BrainFuck has 8 commands:

1. Incrementing the data pointer.

2. Decrementing the data pointer.

3. Incrementing the byte at the data pointer.

4. Decrementing the byte at the data pointer.

5. Outputing the byte at the data pointer.

6. Getting a byte of input and storing it at the data pointer.

7. Jump forward to right after the matching “Jump backward” if the currentbyte at the data pointer is zero.

8. Jump backward to right after the matching “Jump forward” if the currentbyte at the data pointer is nonzero.

Our initial instinct may be to create a set of 8 symbols:

Page 6: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 6

5a 〈symbolInitial 5a〉≡data Symbol = Inc

| Dec

| IncByte

| DecByte

| Output

| Input

| JumpForwardIfZero

| JumpBackwardIfNonZero

But, if we are clever, we can reduce this to 5 symbols:

5b 〈symbolClever 5b〉≡data Symbol = ModPtr Int

| ModCell Int8

| Output Word

| Input Word

| LoopNonZero Program

If the integer value associated with ModPtr is positive, then it moves thepointer to the right, and if it is negative, then it moves it to the left. If it iszero, then nothing happens.

The 8-bit value associated with ModCell tells us how much to add to thecell’s contents. Because Int8 is bounded on [−128, 127], addition and subtractionwill wrap around to the appropriate bounds. For example

(250 : Int8) + (23 : Int8) = (17 : Int8)(128 : Int8) = (−128 : Int8)

The integer value associated with Output tells us how many times to outputthe current cell. Because it is of type Word, it is unsigned—and therefore alwayszero or positive.

Likewise, the integer value associated with Input tells us how many timesto take the user’s input and store it in the current cell.

The last—and most complicated—symbol is LoopNonZero, which has a Programassociated with it. Our first question may be, “What is a Program?” But let’sexamine our choice of a loop instead of jumps first. It can be shown that in awell formed BrainFuck program, all JumpZeros are balanced with a correspond-ing JumpNonZero, and this has the same affect as having a loop that loops whilethe current cell is non-zero.

As to what Program means, we will say that a program is a list of Symbols,so that:

5c 〈programType 5c〉≡type Program = [Symbol]

Page 7: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 7

Now that we have developed a way to store the program commands, we mustinterpret them.

6a 〈Symbol.hs 6a〉≡module Symbol where

import Data.Int

import Data.Word

〈symbolClever 5b〉〈programType 5c〉

2 Interpreting Symbols

What does interpreting a symbol entail? When we interpret the symbol, wetransform the state of the machine. Thus we can represent an interpreted stateas a transform that takes a machine state and produces a new machine statewith a possible side-effect due to the input and output symbols.

6b 〈InterpretedSymbol 6b〉≡type InterpretedSymbol a = a -> IO a

We can write a function that takes a Symbol and returns an InterpretedSymbol

called interpret. For us to program interpret, we must understand what aMachine can do.

Based on the BrainFuck specifications, a machine has a head that can moveleft or right, a byte stored in each cell, and the ability to output the contents ofthe current cell, and get user input and store it in the current cell.

6c 〈Machine-definition 6c〉≡class Machine a where

setCell :: Word8 -> a -> a

getCell :: a -> Word8

moveLeft :: a -> a

moveRight :: a -> a

And we can package this up in a small file.

6d 〈Machine.hs 6d〉≡module Machine where

import Data.Word

〈Machine-definition 6c〉We won’t worry about how the machine is implemented for now, but now we

can describe how to interpret symbols. The interpreter function takes Symbolsand maps them to InterpretedSymbols.

6e 〈interpretType 6e〉≡interpret :: (Machine a)

=> Symbol

-> InterpretedSymbol a

Page 8: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 8

Let’s start with the simplest class, symbols that have no effect on the inter-preter.

2.1 Commands That Do Nothing

Some commands have no effect on the machine.

1. ModPtr 0

2. ModCell 0

3. Output 0

4. Input 0

Thus we can simply write:

7a 〈interpret-doNothing 7a〉≡interpret (ModPtr 0) = return

interpret (ModCell 0) = return

interpret (Output 0) = return

interpret (Input 0) = return

2.2 Commands That Don’t Need User Input

For interpreting modding the cell, we can get the current value of the cell andadd it to the new value, and store that back in the cell.

7b 〈interpret-ModCell 7b〉≡interpret (ModCell n) = \m

-> return

$ setCell (fromIntegral n + getCell m)

m

For modifying the position of the head, we can examine the number passed.If it is negative, then we move to the left and increment the number and call thefunction again. If the number is positive, we move to the right and decrementthe number and call the function again.

7c 〈interpret-ModPtr 7c〉≡interpret (ModPtr n)

| n < 0 = \m

-> liftM moveLeft

. interpret (ModPtr (n + 1))

$ m

| n > 0 = \m

-> liftM moveRight

. interpret (ModPtr (n - 1))

$ m

Page 9: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 9

To interpret the loop we should consider how we will interpret a program.Because each if we just map interpret over the program, then the resultanttype is [InterpretedSymbol]. We can fold over this list with the >=> operator,which will compose the monads.

8a 〈interpretProgram 8a〉≡interpretProgram :: (Machine a)

=> Program

-> InterpretedSymbol a

interpretProgram = foldr (>=>) return

. map interpret

Now that we can interpret a Program, it becomes easy to interpret the loopsymbol. If the value at the current node is 0 then we return the machineunchanged, otherwise we run through the inner program and run the loop again.

8b 〈interpret-LoopNonZero 8b〉≡interpret (LoopNonZero prgm) = \m

-> (if (getCell m == 0)

then return

else interpretProgram prgm >=>

interpret (LoopNonZero prgm)) m

2.3 Commands That Need User Input

The commands that need user input are fairly simple to do. We will use someof Haskell’s built in IO functions to display to the user and read input from theuser.

When we output to the screen, we can simply replicate the current cell valuen number of times, for some n in the symbol and then map that list to charactersand display the string. Thus

8c 〈interpret-DisplayCommand 8c〉≡interpret (Output n) = \m

-> (putStr

. map chr

. replicate (fromIntegral n)

. fromIntegral

. getCell

$ m)

>> return m

Page 10: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 10

After we display the characters, we return the machine unchanged. However,for input, we will change the state of the machine. We could set the value ofthe cell everytime, but let’s be smart about this. We only need to set the valueof the cell on the last iteration, because that is the value that remains after thesymbols have been executed.

9a 〈interpret-Input1 9a〉≡interpret (Input 1) = \m

-> getChar >>= \c

-> return

. (‘setCell‘ m)

. fromIntegral

. ord

$ c

Since we have already covered n = 0, we can say that for any n /∈ {0, 1} wecan take the character from input, but not use it for anything and recursivelycall interpret for (Input (n - 1)).

9b 〈interpret-InputN 9b〉≡interpret (Input n) = \m

-> getChar

>> interpret (Input $ n -1) m

2.4 Wrapping it all up

Now we can interpret all the BrainFuck commands.

9c 〈interpret 9c〉≡〈interpretType 6e〉〈interpret-doNothing 7a〉〈interpret-ModCell 7b〉〈interpret-ModPtr 7c〉〈interpret-LoopNonZero 8b〉〈interpret-DisplayCommand 8c〉〈interpret-Input1 9a〉〈interpret-InputN 9b〉

Page 11: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 11

Let’s package our functions up in a little file.

10a 〈Interpret.hs 10a〉≡module Interpret where

import Control.Monad

import Data.Char

import Symbol

import Machine

〈InterpretedSymbol 6b〉

〈interpret 9c〉

〈interpretProgram 8a〉All that’s left to do is implement a machine.

3 Implementation of a Machine

3.1 Using Two Lists

Recalling our definition of the class of Machines, how shall we implement one?It seems an easy way to represent an infinite tape would be to have two infinitelists. One list could represent the tape extending to the left and the other couldrepresent the tape extending to the right.

10b 〈tapeMachine-tapeType 10b〉≡data Tape = Tape [Word8] [Word8]

Since the BrainFuck specification says that each cell is to be initialized to 0,we can just generate two infinite lists of 0’s.

10c 〈tapeMachine-initialTape 10c〉≡initialTape :: Tape

initialTape = Tape (repeat 0) (repeat 0)

For this to be an actual machine, it must implement the properties of classMachine. We can say that the first element of the right list is the currentlocation of the head, and by this convention, setting the cell to a value justreplaces the head of the right list.

10d 〈tapeMachine-setCell 10d〉≡setCell n (Tape ls (r:rs)) = (Tape ls (n:rs))

Similarily, getting the value of the tape head involves returning the head ofthe right list.

10e 〈tapeMachine-getCell 10e〉≡getCell (Tape _ (r:_)) = r

Page 12: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 12

To move left, we take the head of the left list and “cons” it onto the rightlist.

11a 〈tapeMachine-moveLeft 11a〉≡moveLeft (Tape (l:ls) rs) = (Tape ls (l:rs))

And similarily, for moving right, we take the head of the right list and “cons”it onto the left list.

11b 〈tapeMachine-moveRight 11b〉≡moveRight (Tape ls (r:rs)) = (Tape (r:ls) rs)

This is a really simple way to represent the machine, and it’s a good repre-sentation for programs where the cells are densely used.

11c 〈tapeMachine 11c〉≡instance Machine Tape where

〈tapeMachine-setCell 10d〉〈tapeMachine-getCell 10e〉〈tapeMachine-moveLeft 11a〉〈tapeMachine-moveRight 11b〉

And we can package this up in a nice little file.

11d 〈TapeMachine.hs 11d〉≡module TapeMachine where

import Data.Word

import Machine

〈tapeMachine-tapeType 10b〉〈tapeMachine-initialTape 10c〉〈tapeMachine 11c〉

4 Optimizing the Code

Recall our definition of Symbol from earlier. We changed it from our naiveinterpretation to a more concise, albeit more complex, definition. Why did wedo that? If we want to optimize our code, now we can. Optimization transformsa program into a functionally equivalent program that is more efficient in someway, so this lends to a simple definition of optimize.

11e 〈optimizeType 11e〉≡optimize :: Program -> Program

Optimizing an empty program results in just the empty program, because itis already maximally efficient.

11f 〈optimizeEmpty 11f〉≡optimize [] = []

Page 13: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 13

Our default rule of optimization for any symbol is to return that symbolattached to an optimized version of the rest of the program.

12a 〈optimizeDefault 12a〉≡optimize (x:xs) = x : (optimize xs)

Now let’s consider optimizing each different type of Symbol.

4.1 Optimizing ModPtr

If we have two ModPtr symbols next to each other, we can combine them intoone ModPtr symbol and store the sum of both the modification values in thisnew symbol. We will then place this new symbol onto the front of the program.To allow for more that two ModPtr symbols to be adjacent and be optimized,we will then recursively call optimize on the new program.

12b 〈optimize-ModPtr-sum 12b〉≡optimize ((ModPtr n):(ModPtr n’):program) =

optimize $ (ModPtr (n + n’)):program

If we encounter (ModPtr 0), then we can optimize it out of the programand return the optimization of the rest of the program.

12c 〈optimize-ModPtr-zero 12c〉≡optimize ((ModPtr 0):program) = optimize program

12d 〈optimize-ModPtr 12d〉≡〈optimize-ModPtr-sum 12b〉〈optimize-ModPtr-zero 12c〉

4.2 Optimizing ModCell

Like ModPtr, if we encounter two adjacent ModCell symbols, we replace themwith a new ModCell symbol contains the sum of each of the modification valuesfrom the original symbols. Then we can add this the the rest of the optimizedprogram. To allow for more than two ModCell symbols to be adjacent and beoptimized, we will recursively call optimize on the new program.

12e 〈optimize-ModCell-sum 12e〉≡optimize ((ModCell n):(ModCell n’):program) =

optimize $ (ModCell (n + n’)):program

If we encounter (ModCell 0), then we can simply optimize it out of theprogram and return the optimization of the rest of the program.

12f 〈optimize-ModCell-zero 12f〉≡optimize ((ModCell 0):program) = optimize program

Page 14: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 14

And if we encounter ModCell followed by Input, we can optimize ModCell

out of the program as well, because the modified value will just be replacedwith whatever Input provides. However, this optimization only works when thevalue associated with Input is nonzero, so we must check for that.

13a 〈optimize-ModCell-Input 13a〉≡optimize ((ModCell _):program@((Input n):_))

| n /= 0 = optimize program

13b 〈optimize-ModCell 13b〉≡〈optimize-ModCell-sum 12e〉〈optimize-ModCell-zero 12f〉〈optimize-ModCell-Input 13a〉

4.3 Optimizing Output

Like ModPtr and ModCell above, we can replace two adjacent Output symbolswith a new symbol that contains the sum of the values associated with each ofthe original symbols. We will optimize the resulting program as well, to allowfor more that two adjacent Output symbols to be optimized.

13c 〈optimize-Output-sum 13c〉≡optimize ((Output n):(Output n’):program) =

optimize $ (Output (n + n’)):program

If we encounter (Output 0), then we can simply optimize it out of theprogram and return an optimization of the rest of the program.

13d 〈optimize-Output-zero 13d〉≡optimize ((Output 0):program) = optimize program

13e 〈optimize-Output 13e〉≡〈optimize-Output-sum 13c〉〈optimize-Output-zero 13d〉

4.4 Optimizing Input

Like the three preceding symbols, we can replace two adjacent Input symbolswith a new symbol that contains the sum of the values associated with each ofthe original symbols. We will optimize the resulting program too, to allow formultiple adjacent Input symbols to be optimized.

13f 〈optimize-Input-sum 13f〉≡optimize ((Input n):(Input n’):program) =

optimize $ (Input (n + n’)):program

If we encounter (Input 0), then we can simply optimize it out of the pro-gram and return an optimization of the rest of the program.

13g 〈optimize-Input-zero 13g〉≡optimize ((Input 0):program) = optimize program

Page 15: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 15

14a 〈optimize-Input 14a〉≡〈optimize-Input-sum 13f〉〈optimize-Input-zero 13g〉

4.5 Optimizing LoopNonZero

If we see two adjacent LoopNonZero symbols, then we can remove the secondLoopNonZero. By it’s nature, when a LoopNonZero terminates the value in thecurrent cell is zero. We will then immediately reach the next LoopNonZero, butsince we have guaranteed that the current cell is zero we know that this loopwill never be run. We will then optimize the resulting program, in case thereare more than two adjacent loops.

14b 〈optimize-LoopNonZero-adjacent 14b〉≡optimize ((LoopNonZero prgm):(LoopNonZero _):program) =

optimize $ (LoopNonZero prgm):program

If we see a LoopNonZero, we can optimize the program inside of it and thenoptimize the resulting code.

14c 〈optimize-LoopNonZero-optimizeInner 14c〉≡optimize ((LoopNonZero prgm):program) =

(LoopNonZero (optimize prgm)) : (optimize program)

If we see a LoopNonZero with the only program inside as another LoopNonZerowe can remove the outer loop and optimize the resulting program. Like above,when we leave a LoopNonZero, an invariant is that the current cell value is zero.Immediately after leaving the inner loop, we encounter the outer loop. But sincewe know that the value of the current cell is zero, we are guaranteed not to runthe outer loop.

14d 〈optimize-LoopNonZero-nested 14d〉≡optimize ((LoopNonZero ((LoopNonZero prgm):[])):program) =

optimize $ (LoopNonZero prgm):program

14e 〈optimize-LoopNonZero 14e〉≡〈optimize-LoopNonZero-adjacent 14b〉〈optimize-LoopNonZero-nested 14d〉〈optimize-LoopNonZero-optimizeInner 14c〉

Page 16: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 16

4.6 Putting it all together

Now that we considered all the symbols, we can put this function together.

15a 〈optimize 15a〉≡〈optimizeType 11e〉〈optimize-ModPtr 12d〉〈optimize-ModCell 13b〉〈optimize-Output 13e〉〈optimize-Input 14a〉〈optimize-LoopNonZero 14e〉〈optimizeEmpty 11f〉〈optimizeDefault 12a〉And then we can package this up in a nice little file.

15b 〈Optimize.hs 15b〉≡module Optimize where

import Symbol

〈optimize 15a〉

5 Parsing the Code

Now that we have a working infrastructure, we need to figure out how to getcode from a file to our screen! We will need to parse each symbol from a textfile, so we must associate some string of text to each symbol. To do the parsing,we will use a library called Parsec.

5.1 Parsing ModPtr

Traditionally, BrainFuck has two symbols that modify the pointer. > moves thepointer to the right by 1 and < moves the pointer to the left by 1. We can parseModPtr by checking if the current character is > or < and returning ModPtr 1

or ModPtr (-1) respectively.

15c 〈parse-ModPtr 15c〉≡parseModPtr = parseIncPtr <|> parseDecPtr where

parseIncPtr = do

char ’>’

return . ModPtr $ 1

parseDecPtr = do

char ’<’

return . ModPtr $ (-1)

Page 17: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 17

5.2 Parsing ModCell

Similarily to ModPtr, ModCell is represented by 2 symbols in the BrainFucklanguage. + adds 1 to the current cell and - subtracts 1 to the current cell.We can parse ModCell by checking if the current character is + or - and returnModCell 1 or ModPtr (-1) respectively.

16a 〈parse-ModCell 16a〉≡parseModCell = parseIncCell <|> parseDecCell where

parseIncCell = do

char ’+’

return . ModCell $ 1

parseDecCell = do

char ’-’

return . ModCell $ (-1)

5.3 Parsing Output

The Output symbol is represented by .. To parse Output, we check if the currentsymbol is . and return Output 1.

16b 〈parse-Output 16b〉≡parseOutput = do

char ’.’

return . Output $ 1

5.4 Parsing Input

Like Output, Input is represented by only one symbol: ,. To parse Input, wecheck if the current symbol is , and return Input 1.

16c 〈parse-Input 16c〉≡parseInput = do

char ’,’

return . Input $ 1

5.5 Parsing LoopNonZero

Parsing LoopNonZero is going to be trickier than the other symbols, becauseLoopNonZero is composed of two symbols: [ and ]. We know that a Programexists between matching brackets, so to parse LoopNonZero, we can parser theprogram between the brackets and return that.

16d 〈parse-LoopNonZero 16d〉≡parseLoopNonZero = do

prog <- between (char ’[’) (char ’]’) parseProgram

return . LoopNonZero $ prog

Page 18: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 18

5.6 Parsing a Symbol

Now we haven’t defined parsing a program, so we must do that. But first, let’sgroup all these parsing operations together into one big parseSymbol functionthat delegates to the individual symbol parsers. This is mearely an aestheticconstruction, to make understanding the code easier.

17a 〈parseSymbol 17a〉≡parseSymbol :: GenParser Char st Symbol

parseSymbol = parseModPtr

<|> parseModCell

<|> parseOutput

<|> parseInput

<|> parseLoopNonZero where

〈parse-ModPtr 15c〉〈parse-ModCell 16a〉〈parse-Output 16b〉〈parse-Input 16c〉〈parse-LoopNonZero 16d〉

5.7 Parsing a Program

Now that we’ve defined parsing symbols, how do we parse a BrainFuck program?Well, we can define a BrainFuck program as a stream of symbols.

17b 〈parseProgram 17b〉≡parseProgram :: GenParser Char st Program

parseProgram = many parseSymbol

5.8 Parsing from a character stream

Now we can create a wrapper function that, given a string, returns either aProgram or a ParseError. A ParseError occurs when “[” and “]” aren’tmatched. To make make sure we have a valid stream of characters, we willfilter out all of the characters that aren’t BrainFuck.

17c 〈parseString 17c〉≡parseString :: String -> Either ParseError Program

parseString = parse parseProgram "" . filter (‘elem‘ "+-<>[].,")

Page 19: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 19

Now let’s wrap all this up into a nice file.

18a 〈Parser.hs 18a〉≡module Parser where

import Text.ParserCombinators.Parsec

import Text.ParserCombinators.Parsec.Combinator

import Text.ParserCombinators.Parsec.Char

import Symbol

〈parseString 17c〉〈parseSymbol 17a〉〈parseProgram 17b〉

6 BFTCi

Now that we have created the framework for building an interpreter, let’s createthe interpreter itself. The business of the interpreter is to load a file, parse it,optimize it, and the run it. I’ve chosen to use the Tape implementation ofMachine because that is the only version I have working right now.

6.1 Getting the file name

The filename will be passed as a command line argument, so we can get it withthe getArgs command, which returns a list of the command line arguments (ex-cluding the program name). We should also complain if the user doesn’t supplyexactly 1 argument, because 0 options indicates no file name is supplied andmore than 1 option indicates that there is extraneous information in additionto the file name. Both of these are situations that we don’t want.

18b 〈BFTCi-getFileName 18b〉≡args <- getArgs

case args of

[fileName] -> do

〈BFTCi-readFile 18c〉〈BFTCi-parseFile 19a〉

_ -> do

putStrLn "Error: Supply only input file name."

return ()

6.2 Reading the file

Given a filename, we can read its contents using the readFile function.

18c 〈BFTCi-readFile 18c〉≡fileContents <- readFile fileName

Page 20: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 20

6.3 Parsing the file

To parser the contents, which is simply a String, we can run our parseStringfunction on it. Because the return value of parse string is either a failure, or theprogram, we must check. If it is a failure, then we print it and stop. Otherwise,we continue on and optimize the program and then run it.

19a 〈BFTCi-parseFile 19a〉≡case parseString fileContents of

(Left errorMsg) -> print errorMsg

(Right program ) -> do

〈BFTCi-optimizeProgram 19b〉〈BFTCi-runProgram 19c〉return ()

6.4 Optimizing the program

Optimizing the program is as simple as passing it through our optimize func-tion.

19b 〈BFTCi-optimizeProgram 19b〉≡let program’ = optimize program

6.5 Running the optimized program

To run the optimized program, we must first instantiate a machine to run it onand then we just pass it through our interpretProgram function.

19c 〈BFTCi-runProgram 19c〉≡interpretProgram program’ initialTape

6.6 Wrapping it up

We can wrap this up into a function called main, which is the main function ofour interpreter.

19d 〈BFTCi-main 19d〉≡main :: IO ()

main = do

〈BFTCi-getFileName 18b〉

Page 21: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 21

And we can wrap this up in a nice little file.

20a 〈BFTCi.hs 20a〉≡module Main where

import System.Environment

import Symbol

import Interpret

import Parser

import Optimize

import TapeMachine

〈BFTCi-main 19d〉

7 Language Extensions

Now that we’ve completed the first half of BFTC—the interpreter called BFTCi—we can focus on the TransCompiler. To do this, we need to have languages tocompile to. Before we implement TransCompilation for a specific language, we’lllook at the abstract properties that a language should embody.

7.1 Abstract Language

An abstract language has a header section, that allows for the proper code toset up the appropriate environment.

20b 〈AbstractLanguage-header 20b〉≡header :: language -> Code

Like wise, an abstract language also has a footer section that allows for theproper code to tear down the environment.

20c 〈AbstractLanguage-footer 20c〉≡footer :: language -> Code

And of course, an abstract language needs to transform a program into code.We can make a function called translateProgram that is given a Program andreturns some code. We can say that—for any language—translating the emptyprogram produces emptyCode. This will be the base case for many recursivecalls to translateProgram.

20d 〈AbstractLanguage-translateProgram 20d〉≡translateProgram :: language -> Program -> Code

Page 22: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 22

Compiling a program into an abstract language simply implies writing theheader, translating the program and then writing the footer code.

21a 〈AbstractLanguage-compile 21a〉≡compile :: language -> Program -> String

compile language program = codeToString . sequenceCode $

[

header language,

translateProgram language program,

footer language

]

So, these are all the properties of an abstract language. We can encapsulatethat into a typeclass.

21b 〈AbstractLanguage 21b〉≡class AbstractLanguage language where

〈AbstractLanguage-header 20b〉〈AbstractLanguage-footer 20c〉〈AbstractLanguage-translateProgram 20d〉〈AbstractLanguage-compile 21a〉

And we can package this up into a nice little file.

21c 〈AbstractLanguage.hs 21c〉≡module AbstractLanguage where

import Code

import Symbol

〈AbstractLanguage 21b〉

7.2 Code

We used the Code type in the previous section with an intuitive understanding ofwhat it is, but not a rigorous definition. Now we will give a rigorous definition.Code allows us to ouput lines and change the indentation level.

21d 〈Code 21d〉≡type Code = ([String], Int)

Our code starts with no lines and an indentation level of 0, so we can createa function to initialize empty code.

21e 〈emptyCode 21e〉≡emptyCode :: Code

emptyCode = ([], 0)

The list of strings represents the list of lines to output and the Int representsthe indentation level in spaces. Now we need some way to act on Code to addlines or change the indentation level, so let’s create a type called Command

21f 〈Command 21f〉≡data Command = Line String

| ChangeIndent Int

Page 23: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 23

We can make a function called runCommand that, given a command and somecode, produces a new piece of code.

22a 〈runCommand-type 22a〉≡runCommand :: Command -> Code -> Code

When the ChangeIndent command is passed, the value associated with it isadded to the indentation level of the code and the sum replaces the old value.

22b 〈runCommand-ChangeIndent 22b〉≡runCommand (ChangeIndent n’) (codeLines, n) =

(codeLines, n + n’)

When the Line command is passed, we add the string plus indent levelnumber of spaces to to the list of lines.

22c 〈runCommand-Line 22c〉≡runCommand (Line ln) (codeLines, n) =

(codeLines ++ [replicate n ’ ’ ++ ln], n)

22d 〈runCommand 22d〉≡〈runCommand-type 22a〉〈runCommand-ChangeIndent 22b〉〈runCommand-Line 22c〉We can also make a helper function to run a list of commands, called

runCommands. It takes a list of commands composes all their effects into onebig effect.

22e 〈runCommands 22e〉≡runCommands :: [Command] -> Code -> Code

runCommands = foldr (.) id

. map runCommand

. reverse

And last but not least, we need some way of sequencing code chunks.

22f 〈sequenceCode-type 22f〉≡sequenceCode :: [Code] -> Code

If we get an empty list, then we just return emptyCode.

22g 〈sequenceCode-empty 22g〉≡sequenceCode [] = emptyCode

If we have one element in the list, then we return that element.

22h 〈sequenceCode-singleton 22h〉≡sequenceCode [code] = code

Page 24: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 24

We’ll say that for two code chunks in sequence, we take indentation level ofthe first, apply it to the lines of the second chunk and then return a new chunkwith both chunk’s lines appended and a new indentation level, which is the sumof the identation levels of the individual chunks.

23a 〈sequenceCode-list 23a〉≡sequenceCode ((code, indent):(code’, indent’):codes) =

sequenceCode $ (code ++ code’’, indent + indent’):codes

where code’’ = map (replicate indent ’ ’ ++) code’

This completes our examination of sequenceCode.

23b 〈sequenceCode 23b〉≡〈sequenceCode-type 22f〉〈sequenceCode-empty 22g〉〈sequenceCode-singleton 22h〉〈sequenceCode-list 23a〉Also, at some point, we will want to extract a string containing the program.

We can do this by calling unlines on the list of lines in the Code tuple.

23c 〈codeToString 23c〉≡codeToString (code, _) = unlines code

We can wrap all our functions up into a neat little file.

23d 〈Code.hs 23d〉≡module Code where

〈Code 21d〉〈emptyCode 21e〉〈Command 21f〉〈runCommand 22d〉〈runCommands 22e〉〈sequenceCode 23b〉〈codeToString 23c〉Now we have the proper structure to generate code in different languages.

7.3 BrainFuck

Our first extension language will be BrainFuck itself. Because our code is passedthrough the optimizer, this allows us to output optimized BrainFuck code.

7.3.1 Header

There is no need for a header, so we can simply have it return emptyCode.

23e 〈BrainFuck-header 23e〉≡header BrainFuck = emptyCode

Page 25: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 25

7.3.2 Footer

Likewise, there is no need for a footer, so we can simply have it return emptyCode.

24a 〈BrainFuck-footer 24a〉≡footer BrainFuck = emptyCode

7.3.3 TranslateProgram

Now we have to deal with translating various Symbols into BrainFuck code. Let’slook at each symbol individually and output the corresponding characters.

ModPtr When we encounter ModPtr, we can simply output the character >n times, where n is the value associated with ModPtr and n is positive. If n isnegative, then we output < −n number of times.

24b 〈BrainFuck-translateProgram-ModPtr 24b〉≡translateProgram BrainFuck ((ModPtr n):program) =

sequenceCode [

([replicate n’ modChar], 0),

translateProgram BrainFuck program

]

where n’ = abs n

modChar = if n < 0 then ’<’ else ’>’

ModCell Similarily, if we encounter ModCell, we can output the character +n times, where n is the value associated with ModCell and n is positive. If n isnegative, then we output − −n number of times.

24c 〈BrainFuck-translateProgram-ModCell 24c〉≡translateProgram BrainFuck ((ModCell n):program) =

sequenceCode [

([replicate n’ modChar], 0),

translateProgram BrainFuck program

]

where n’ = abs . fromIntegral $ n

modChar = if n < 0 then ’-’ else ’+’

Output When we encounter Output, we output the character . n times, wheren is the value associated with Output.

24d 〈BrainFuck-translateProgram-Output 24d〉≡translateProgram BrainFuck ((Output n):program) =

sequenceCode [

([replicate n’ ’.’], 0),

translateProgram BrainFuck program

]

where n’ = fromIntegral n

Page 26: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 26

Input When we encounter Input, we will output the character , n times,where n is the value associated with Input.

25a 〈BrainFuck-translateProgram-Input 25a〉≡translateProgram BrainFuck ((Input n):program) =

sequenceCode [

([replicate n’ ’,’], 0),

translateProgram BrainFuck program

]

where n’ = fromIntegral n

LoopNonZero When we encounter LoopNonZero we can output [, indent ourcode and recursively translate the nested program and then unindent our codeand output ]. I am going to use an indent spacing of 1 space, but this choice isarbitrary.

25b 〈BrainFuck-translateProgram-LoopNonZero 25b〉≡translateProgram BrainFuck ((LoopNonZero prgm):program) =

sequenceCode [

runCommands [

ChangeIndent (-1),

Line "]"

] $ sequenceCode [

(["["], 1),

translateProgram BrainFuck prgm

],

translateProgram BrainFuck program

]

Wrapping it up We also want to tackle the case where the empty programis passed. If this happens then we just return emptyCode.

25c 〈BrainFuck-translateProgram-emptyProgram 25c〉≡translateProgram BrainFuck [] = emptyCode

Now we can wrap this up into the translateProgram function.

25d 〈BrainFuck-translateProgram 25d〉≡〈BrainFuck-translateProgram-ModPtr 24b〉〈BrainFuck-translateProgram-ModCell 24c〉〈BrainFuck-translateProgram-Output 24d〉〈BrainFuck-translateProgram-Input 25a〉〈BrainFuck-translateProgram-LoopNonZero 25b〉〈BrainFuck-translateProgram-emptyProgram 25c〉

Page 27: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 27

7.3.4 BrainFuckLanguage.hs

Now that we have implemented all the functions required of a language exten-sion, we can implement a dummy type for BrainFuck, because we don’t need tostore any extra data.

26a 〈BrainFuck-type 26a〉≡data BrainFuck = BrainFuck

And now we can create an instance of AbstractLanguage using BrainFuck,by simply supplying our functions.

26b 〈BrainFuck-AbstractLanguage 26b〉≡instance AbstractLanguage BrainFuck where

〈BrainFuck-header 23e〉〈BrainFuck-footer 24a〉〈BrainFuck-translateProgram 25d〉

Now we can package this up into a nice little file.

26c 〈BrainFuckLanguage.hs 26c〉≡module BrainFuckLanguage where

import AbstractLanguage

import Symbol

import Code

〈BrainFuck-type 26a〉〈BrainFuck-AbstractLanguage 26b〉

Page 28: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 28

7.3.5 Example Output

Using this example program:

prgm = [

LoopNonZero [

LoopNonZero [

ModPtr 3, ModPtr (-3), ModCell 5

],

Output 3

],

ModCell 20,

Input 1

]

If we compile it we get the following output.

[

[

>>>

<<<

+++++

]

...

]

++++++++++++++++++++

,

We can also test our optimizer. If we compile prgm after it has been opti-mized, we get:

[

[

+++++

]

...

]

,

7.4 C-30000

Now we will move onto something slightly more complicated, compiling ourBrainFuck code to C code. There are some limitations to be aware of.

7.4.1 Limitations of C

Unlike Haskell, C doesn’t easily let us deal infinitely large tapes. For this im-plementation, we will use the de facto standard of an array of 30, 000 bytes. We

Page 29: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 29

will also use the convention that the pointer starts at the leftmost element ofthe array, and thus negative indices are not allowed.

7.4.2 Header

To set up the proper environment, we will need to include a few library files.Here is the code that does that.

28a 〈C-30000-header-includes 28a〉≡includes = ([

"#include <stdio.h>",

"#include <stdlib.h>",

"#include <stdint.h>"

], 0)

We include stdio.h so that we may use IO functions to output charactersto the screen and read characters from input. We include stdlib.h so that wemay allocate the array in memory. Now we need to begin our main function, solet’s right that out.

28b 〈C-30000-header-main 28b〉≡main = (["int main(int argc, char **argv) {"], 4)

Note that I am using 4 spaces of indentation for C-30000. This is an arbitrarychoice. Now that we’ve began the function declaration, we have to set up theenvironment. This involves creating the array and initializing the pointer, andsetting up the input and output streams.

28c 〈C-30000-header-setup 28c〉≡setup = ([

"FILE *instream = stdin;",

"FILE *outstream = stdout;",

"uint8_t *memory = calloc(30000, sizeof(uint8_t));",

"uint8_t *head = memory;"

], 0)

We can combine these together to make our whole “header” portion of theprogram.

28d 〈C-30000-header 28d〉≡header C30000 =

sequenceCode [

includes,

main,

setup

] where

〈C-30000-header-includes 28a〉〈C-30000-header-main 28b〉〈C-30000-header-setup 28c〉

Page 30: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 30

7.4.3 Footer

Tearing down the environment will be similar. All we need to do is free theallocated array and end the function.

29a 〈C-30000-footer 29a〉≡footer C30000 =

([

" free((void*)memory);",

"}"

], (-4))

7.4.4 TranslateProgram

As with BrainFuck, it will be pretty simple to translate programs. Becausethere is a nice correspondance between BrainFuck symbols and the C code.

ModPtr To modify the position of the head, we just add the value associatedwith ModPtr to the current address of head. This will change the address of thepointer appropriately.

29b 〈C-30000-translateProgram-ModPtr 29b〉≡translateProgram C30000 ((ModPtr n):program) =

sequenceCode [

(updateHead, 0),

translateProgram C30000 program

] where

updateHead = ["head += " ++ show n ++ ";"]

ModCell To modify the value of the current cell, we can just add the valueassociated with ModCell to the value of the current cell. Because the cell is oftype uint8_t, overflow will be acounted for.

29c 〈C-30000-translateProgram-ModCell 29c〉≡translateProgram C30000 ((ModCell n):program) =

sequenceCode [

(updateCell, 0),

translateProgram C30000 program

] where

updateCell = ["*head += " ++ show n ++ ";"]

Page 31: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 31

Output For output, we simply output the current character to the outputstream. We will do this by generating a fprintf statement that has a formattingstring of n %c’s, where n is the number associated with Output.

30a 〈C-30000-translateProgram-Output 30a〉≡translateProgram C30000 ((Output n):program) =

sequenceCode [

(printCell, 0),

translateProgram C30000 program

] where

printCell = ["fprintf(outstream, \""

++ formatString

++ "\","

++ formatData

++ ");"]

formatString = concat

. replicate n’

$ "%c"

formatData = intercalate ", "

. replicate n’

$ "(char)(*head)"

n’ = fromIntegral n

Input For Input, we can use a for loop to get a character of input n−1 times,and then we can get another character of input and set head to that. It is safeto assume that n 6= 0 if the data comes from user input. Because there is noway for the parser to generate Input 0.

30b 〈C-30000-translateProgram-Input 30b〉≡translateProgram C30000 ((Input n):program) =

sequenceCode [

(forLoop, 0),

(getInput, 0),

translateProgram C30000 program

] where

forLoop = if n == 1

then []

else [

"for (int i = 0; i != "

++ show (n - 1)

++ "; ++i) {",

" fgetc(instream);",

"}"

]

getInput = ["*head = (uint8_t)fgetc(instream);"]

Page 32: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 32

LoopNonZero Implementing LoopNonZero is nice in C, because this justcorresponds with a while loop. We can generate a while loop that runs whilethe value of the cell at the head is not zero. Because 0 is equivalent to false,then we don’t need to include the condition != 0.

31a 〈C-30000-translateProgram-LoopNonZero 31a〉≡translateProgram C30000 ((LoopNonZero prgm):program) =

sequenceCode [

runCommands [

ChangeIndent (-4),

Line "}"

] . sequenceCode $ [

(["while (*head) {"], 4),

translateProgram C30000 prgm

],

translateProgram C30000 program

]

Wrapping it up The last case we want to tackle is if the program passed isthe empty program. If this happens then we return emptyCode with the indentshifted 4 to the left. The indent is shifted to return to the original indentationbefore the code block.

31b 〈C-30000-translateProgram-emptyProgram 31b〉≡translateProgram C30000 [] = ([], (-4))

Now we can wrap this up into the translateProgram function.

31c 〈C-30000-translateProgram 31c〉≡〈C-30000-translateProgram-ModPtr 29b〉〈C-30000-translateProgram-ModCell 29c〉〈C-30000-translateProgram-Output 30a〉〈C-30000-translateProgram-Input 30b〉〈C-30000-translateProgram-LoopNonZero 31a〉〈C-30000-translateProgram-emptyProgram 31b〉

7.4.5 C30000Language.hs

Now that we implemented all the language translation functions, we can im-plement a dummy type for C30000, because we don’t need to store any extradata.

31d 〈C-30000-type 31d〉≡data C30000 = C30000

Page 33: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 33

And now we create an instance of AbstractLanguage using C30000, bysupplying our functions.

32a 〈C-30000-AbstractLanguage 32a〉≡instance AbstractLanguage C30000 where

〈C-30000-header 28d〉〈C-30000-footer 29a〉〈C-30000-translateProgram 31c〉

And now we can package this up into a nice little file.

32b 〈C30000Language.hs 32b〉≡module C30000Language where

import AbstractLanguage

import Symbol

import Code

import Data.List

〈C-30000-type 31d〉〈C-30000-AbstractLanguage 32a〉

Page 34: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 34

7.4.6 Example Output

Using this example program:

prgm = [

LoopNonZero [

LoopNonZero [

ModPtr 3, ModPtr (-3), ModCell 5

],

Output 3

],

ModCell 20,

Input 1

]

We get the following output when we compile it.

#include <stdio.h>

#include <stdlib.h>

#include <stdint.h>

int main(int argc, char **argv) {

FILE *instream = stdin;

FILE *outstream = stdout;

uint8_t *memory = calloc(30000, sizeof(uint8_t));

uint8_t *head = memory;

while (*head) {

while (*head) {

head += 3;

head += -3;

*head += 5;

}

fprintf(outstream, "%c%c%c",(char)(*head), (char)(*head),

(char)(*head));

}

*head += 20;

free((void*)memory);

}

If we compile the optimized code, we get this.

#include <stdio.h>

#include <stdlib.h>

#include <stdint.h>

int main(int argc, char **argv) {

FILE *instream = stdin;

FILE *outstream = stdout;

uint8_t *memory = calloc(30000, sizeof(uint8_t));

Page 35: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 35

uint8_t *head = memory;

while (*head) {

while (*head) {

*head += 5;

}

fprintf(outstream, "%c%c%c",(char)(*head), (char)(*head),

(char)(*head));

}

*head += 20;

free((void*)memory);

}

7.5 Python3

Our next language that we will extend BrainFuck compilation to is Python3.This language has a special spot in BFTC’s history, because it spurred meto allow pretty printed code to be output. A common tactic in BrainFuckcompilers is to use regex substituion to replace BrainFuck symbols with thesource code that they correspond to. The problem with this is that it onlyworks for languages that allow for arbitrary whitespace. This presents a problemwhen compiling to Python3, because Python3 depends on whitespace. Arbitrarywhitespace can change the meaning on the program, so we must take it intoaccount.

7.5.1 Header

To simulate the BrainFuck machine, we can make a dictionary that maps loca-tions to cell values. This may be more innefficient than using lists, but it alsoallows us a different way in looking at this problem.

34a 〈Python3-header-dictionary 34a〉≡setupMemory = (["memory = {}"], 0)

We will also need a number that corresponds to the current location. Bydefault, we’ll start at the 0th location.

34b 〈Python3-header-index 34b〉≡setupIndex = (["index = 0"], 0)

Notice that this implementation allows for negative indices and unboundedcell size (like our Haskell interpreter). Now we also need to set up the input andoutput streams. This is a nice abstraction that can allow them to be changedeasily, without having to change the whole program.

34c 〈Python3-header-streams 34c〉≡setupStreams = ([

"instream = sys.stdin",

"outstream = sys.stdout"

], 0)

Page 36: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 36

To access sys we also must import it.

35a 〈Python3-header-imports 35a〉≡setupImports = (["import sys"], 0)

35b 〈Python3-header 35b〉≡header Python3 = sequenceCode [

setupImports,

setupStreams,

setupMemory,

setupIndex

] where

〈Python3-header-dictionary 34a〉〈Python3-header-index 34b〉〈Python3-header-streams 34c〉〈Python3-header-imports 35a〉

7.5.2 Footer

Because Python3 manages memory for us, we don’t have to worry about anycleanup at the end of the program. And so, we can just return emptyCode.

35c 〈Python3-footer 35c〉≡footer Python3 = emptyCode

7.5.3 TranslateProgram

Now we need to tackle the problem of translating a Program into proper Python3code. This should be relatively simple, as there is a simple correspondancebetween Python3 code and our symbols.

ModPtr To change the position of the head pointer, we can simply add thevalue associated with ModPtr to index. This will move the location appropri-ately.

35d 〈Python3-translateProgram-ModPtr 35d〉≡translateProgram Python3 ((ModPtr n):program) =

sequenceCode [

(modPtrStr, 0),

translateProgram Python3 program

] where

modPtrStr = ["index += " ++ show n]

Page 37: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 37

ModCell In a similar fashion to ModPtr, we can translate ModCell into Python3code. Because the memory is stored in a dictionary, we have said the defaultvalue of a key is 0. We can’t simply use += because this presupposes a valuein the first place. However, we can use Python3’s get method, which allowsus to supply a default value. Because Python3 uses strings as dictionary keys,we will convert our number to a string and then access the dictionary entry.We also need to account for integer overflow and underflow, because 256 = 0 inBrainFuck but Python3 doesn’t allow us to work with the integers modulo 256.

36a 〈Python3-translateProgram-ModCell 36a〉≡translateProgram Python3 ((ModCell n):program) =

sequenceCode [

(modCellStr, 0),

translateProgram Python3 program

] where

modCellStr = [

"memory[str(index)] = ("

++ show n

++ " + memory.get(str(index), 0)) % 256"

]

Output When we encounter Output, we can simply write the character tothe outstream. We multiply the character by n, where n is the value associatedwith Ouptut, to repeat that character the correct number of times.

36b 〈Python3-translateProgram-Output 36b〉≡translateProgram Python3 ((Output n):program) =

sequenceCode [

(outputStr, 0),

translateProgram Python3 program

] where

outputStr = [

"outstream.write(chr(memory[str(index)]) * "

++ show n

++ ")"

]

Page 38: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 38

Input Similarily, for Input, we read a character from the instream and convertit to its codepoint. Then we store that value at the current cell. We will readin n bytes, where n is the value associated with Input. If we parse a file, thereis no way to generate the symbol Input 0, so we can assume n 6= 0. Then wecan assume that at least 1 character of input will be read, so we can get thefirst character of input and take the codepoint of that without having to worryabout an array out-of-bounds exception.

37a 〈Python3-translateProgram-Input 37a〉≡translateProgram Python3 ((Input n):program) =

sequenceCode [

(inputStr, 0),

translateProgram Python3 program

] where

inputStr = [

"memory[str(index)] = ord((instream.read("

++ show n

++ "))[0])"

]

LoopNonZero Similarily to C, we can represent LoopNonZero with a while

loop in Python3. We loop while the value of the current cell is not zero.

37b 〈Python3-translateProgram-LoopNonZero 37b〉≡translateProgram Python3 ((LoopNonZero prgm):program) =

sequenceCode [

runCommand (ChangeIndent (-4))

. sequenceCode $ [

(["while memory.get(str(index),0) != 0:"], 4),

translateProgram Python3 prgm

],

translateProgram Python3 program

]

Wrapping it up The last case we deal with is the empty program. The emptyprogram indicates the end of a block, so we can decrease the indent by 4.

37c 〈Python3-translateProgram-emptyProgram 37c〉≡translateProgram Python3 [] = ([], (-4))

Now we can wrap this up into the translateProgram function.

37d 〈Python3-translateProgram 37d〉≡〈Python3-translateProgram-ModPtr 35d〉〈Python3-translateProgram-ModCell 36a〉〈Python3-translateProgram-Output 36b〉〈Python3-translateProgram-Input 37a〉〈Python3-translateProgram-LoopNonZero 37b〉〈Python3-translateProgram-emptyProgram 37c〉

Page 39: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 39

7.5.4 Python3Language.hs

Now that we’ve implemented all our functions, we can implement a dummy typefor Python3.

38a 〈Python3-type 38a〉≡data Python3 = Python3

We can create an instance of AbstractLanguage using Python3, by supplingour functions.

38b 〈Python3-AbstractLanguage 38b〉≡instance AbstractLanguage Python3 where

〈Python3-header 35b〉〈Python3-footer 35c〉〈Python3-translateProgram 37d〉

And we can package this up into a nice little file.

38c 〈Python3Language.hs 38c〉≡module Python3Language where

import AbstractLanguage

import Symbol

import Code

〈Python3-type 38a〉〈Python3-AbstractLanguage 38b〉

Page 40: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 40

7.5.5 Example Output

Using this example program:

prgm = [

LoopNonZero [

LoopNonZero [

ModPtr 3, ModPtr (-3), ModCell 5

],

Output 3

],

ModCell 20,

Input 1

]

We get the following Python3 code when we compile it.

import sys

instream = sys.stdin

outstream = sys.stdout

memory = {}

index = 0

while memory.get(str(index),0) != 0:

while memory.get(str(index),0) != 0:

index += 3

index += -3

memory[str(index)] = (5 + memory.get(str(index), 0)) % 256

outstream.write(chr(memory[str(index)]) * 3)

memory[str(index)] = (20 + memory.get(str(index), 0)) % 256

memory[str(index)] = ord((instream.read(1))[0])

If we compile the optimized code, we get this.

import sys

instream = sys.stdin

outstream = sys.stdout

memory = {}

index = 0

while memory.get(str(index),0) != 0:

while memory.get(str(index),0) != 0:

memory[str(index)] = (5 + memory.get(str(index), 0)) % 256

outstream.write(chr(memory[str(index)]) * 3)

memory[str(index)] = ord((instream.read(1))[0])

7.6 Closing Thoughts

We have created a few language extensions for the BFTC. There is some boiler-plate code that I want to remove, but this will do for now. If you are interestedtry creating your own AbstractLanguage.

Page 41: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 41

8 BFTC

It is finally time to create the BrainFuck TransCompiler. This will be brokendown into a few major parts. BFTC takes a language name as an argument andparses standard input. Then it reads the input file, parses the code, optimizesthe code and then compiles it to another language and outputs that code tostandard output.

8.1 Getting the arguments

We can get the arguments from the command line with the getArgs command.We should be supplied 3 arguments. So if we aren’t, then we will report anerror.

40a 〈BFTC-getArgs 40a〉≡args <- getArgs

case args of

[language] -> do

〈BFTC-getLanguage 40b〉〈BFTC-compile 41a〉

_ ->

putStrLn "Error: only pass 1 command line argument."

8.2 Getting the language

We only have to deal with our 3 languages, so we can construct an if ...

else block to decide which language to use. I’m planning on implementing amore elegant solution in the future.

40b 〈BFTC-getLanguage 40b〉≡let compile’ = if language == "BrainFuck"

then compile BrainFuck

else if language == "Python3"

then compile Python3

else if language == "C30000"

then compile C30000

else undefined

Page 42: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 42

8.3 Compiling and outputing the code

To compile and output the code, we need to parse it, optimize it, and thencompile it with the given language. If the parsing returns an error message thenwe output the error message and end the program. Otherwise, we optimize,compile and print the program.

41a 〈BFTC-compile 41a〉≡contents <- getContents

case parseString contents of

Right program -> putStr

. compile’

. optimize

$ program

Left errorMsg -> print errorMsg

8.4 Wrapping it all up

Now we can wrap up our functionality into a main function.

41b 〈BFTC-main 41b〉≡main :: IO ()

main = do

〈BFTC-getArgs 40a〉And we can wrap this up into a small little file.

41c 〈BFTC.hs 41c〉≡module Main where

import Python3Language

import BrainFuckLanguage

import C30000Language

import Parser

import Optimize

import AbstractLanguage

import System.Environment

〈BFTC-main 41b〉

Page 43: BFTC: The BrainFuck TransCompiler · BFTC: The BrainFuck TransCompiler John Lekberg March 19, 2015 Abstract Programming languages should be designed not by piling feature on top of

March 19, 2015 BFTC.nw 43

9 Epilogue

We set out to create a general BrainFuck interpreter and compiler. We haveachieved our goal, but there is still more work to be done. One area for furtherwork is, in particular, the mechanism for code output. There is a lot of boil-erplate code that goes into writing a new instance of AbstractLanguage. Butthese are projects for another time. If you have any questions, feel free to emailme at [email protected].

John Lekberg