Othello Processor Bret Taylor, Olatunji Ruwase, and Jim Norris CS343, Spring 2003

Othello Processor

Bret Taylor, Olatunji Ruwase, and Jim Norris

CS343, Spring 2003

Project Overview

AI algorithms for games like Othello are highly parallelizableGenerally, algorithm performance is increased

by adding more, generic processors What price-performance can we achieve

with a custom processor?What granularity of custom instructions

achieves greatest price-performance?How generic can we make the instructions?

Overview of Othello

Goal: Have more pieces when the game ends

You flip your opponent’s pieces when you surround them with your pieces on each end

Overview of Othello



Overview of Othello



Othello in Academia

Software Rosenbloom, Paul S.: A World-Championship-Level

Othello Program Lee, K.; Mahajan, S.: The Development of a World

Class Othello Program

No specific hardware implementations, but related work for MiniMax-style algorithms Powley, Curtis Nelson: Parallel Tree Search on a

Single-Instruction, Multiple-Data (SIMD) Machine

Common Algorithm Structure

MiniMax search to a depth determined by global or per-move time limit

Heuristics evaluate “value” of move at leaves

Common Algorithm Properties

The deeper the search, the better the processor Assumes heuristic is “reasonable,” i.e., it does not get

worse with more information

Effectively infinitely parallelizable Many operations are expensive in software:

Determining successors (“is valid move”) Calculating successors (“make move”) Heuristic calculation

Our Othello Implementation

Based on Iago, concentrating on high-quality heuristic variables: Stability – Number of pieces that can never be flipped Mobility – Number of available moves Piece differential Vulnerability – Entrance points to stable squares on

the corners and sides Heuristic value is weighted sum of variables

Weights learned through reinforcement learning Weights vary over the course of the game

Software Overview

Boards are 128-bit entries (2 bits per piece) Lookup tables for things like stability Lookup tables are indexed by the ternary

number represented by the row or column:

1 + 2 * 32 + 2 * 33 + 34 154

Software Trace Profile

Vast majority of CPU time consumed in IsLegalMove and MakeMove 53.21% DoOneDirection 30.64% DoAllDirections

Loops in all rows/diagonals to find/flip valid directions

Called to find successors, to calculate mobility, and to make moves

Operations are common to all Othello players (extensions are at least slightly generic)!

Flip Instruction Granularity

Split a single MakeMove or IsLegalMove operation into a sequence of four operations corresponding the four flip directions (row, column, rdiag, ldiag)

Made a lookup table of the 38 row/column/diagonal configurations to lookup which pieces get flipped on an axis given a piece placement

Reducing Die Area

We only implemented FLIPROW and FLIPDIAGONAL instructionsWe do a 90o rotation of the board and back

again to flip the other two directions Saved on die area and cycle time;

transposing and rotating are very cheap instructions

New DoAllDirections

Output dependencies galore!B = FLIPROW(B, row, col);B = FLIPDIAG(B, row, col);B = ROTATEBOARD(B, CLW);B = FLIPROW(B, col, 7 – row);B = FLIPDIAG(B, col, 7 – row);B = ROTATEBOARD(B, ACLW);

State Registers

Store 64-bit FLIPTABLE state register to keep track of which pieces should be flipped: no output dependencies between instructions

Added benefit: seeing if a move is valid simply amounts to (FLIPTABLE != 0) after flip operations

Results

New instructions are extremely effective with relatively little complexity compared to optimizing for a multi-processor environment

~4.1 times better performance than base processor

CPU Cycles CycleTime Area PricePerf

Base 492143906 ~10 ns ~4.2 mm2 ?

Extended 126832159 10.63 ns 20389? ?

Conclusions

Positives Optimizations can be used for all Othello players With very little work, we could reduce the cycle time to

that of the base processor Negatives

Cost of custom processors is prohibitive It may be more effective to exploit parallelism of

search algorithm Combining custom processors with MP

parallelism for best results?

Documents

Othello Processor Bret Taylor, Olatunji Ruwase, and Jim Norris CS343, Spring 2003