Course written by Richard E. Korf, UCLA. The slides were made by students of this course
from Bar-ilan University, Tel-Aviv, Israel.
ProblemsProblems
There are 3 general categories of problems in AI:
Single-agent pathfinding problems. Two-player games. Constraint satisfaction problems.
Single Agent Pathfinding ProblemsSingle Agent Pathfinding Problems
In these problems, in each case, we have a single problem-solver making the decisions, and the task is to find a sequence of primitive steps that take us from the initial location to the goal location.
Famous Example Domains15 puzzle
10^13 states First solved by [Korf 85] with IDA*
and Manhattan distance
24 puzzle 10^24 states First solved by [Korf 96]
1 2 34 5 6 78 9 1011
12131415
2021222324
1 2 3 45 6 7 8 9
10111213141516171819
Rubik’s cube 10^19 states First solved by [Korf 97]
Traveling Salesman Problem
Famous Example Domains
Two-Player GamesTwo-Player Games In a two-player game, one must consider the moves
of an opponent, and the ultimate goal is a strategy that will guarantee a win whenever possible.
Two-player perfect information have received the most attention of the researchers untill now.
But, nowadays, researchers are starting to consider more complex games, many of them involve an element of chance.
The best Chess, Checkers, and Othello players in the world are computer programs!
Constraint-Satisfaction ProblemsConstraint-Satisfaction Problems In these problems, we also have a single-agent
making all the decisions, but here we are not concerned with the sequence of steps required to reach the solution, but simply the solution itself.
The task is to identify a state of the problem, such that all the constraints of the problem are satisfied.
Famous Examples: Eight Queens Problem. Number Partitioning.
Problem Spaces Problem Spaces
A problem space consists of a set of statesstates of a problem and a set of operatorsoperators that change the state.
StateState : a symbolic structure that represents a single configuration of the problem in a sufficient detail to allow problem solving to proceed.
OperatorOperator : a function that takes a state and maps it to another state.
Not all operators are applicable to all states. The conditions that must be true in order for an operator to be legally applied to a state are known as the preconditionspreconditions of the operator.
Examples: 8-Puzzle:
statesstates: the different permutations of the tiles. operatorsoperators: moving the blank tile up, down, right
or left. Chess:
states:states: the different locations of the pieces on the board.
operatorsoperators: legal moves according to chess rules.
Problem Spaces Problem Spaces
A problem instanceA problem instance: consists of a problem space, an initial state, and a set of goal states.
There may be a single goal state, or a set of goal states, anyone of which would satisfy the goal criteria. In addition, the goal could be stated explicitly or implicitly, by giving a rule of determining when the goal has been reached.
All 4 combinations are possible:
Problem Spaces Problem Spaces
[single\set of goal state(s)] [explicit\implicit]. [single\set of goal state(s)] [explicit\implicit].
For Constraint Satisfaction Problems, the goal will always be represented implicitly, since an explicit description is the solution itself.
Example:
4-Queens has 2 different goal states. Here the goal is stated explicitly.
Problem Spaces Problem Spaces
Q
QQ Q
Problem RepresentationProblem Representation
For some problems, the choice of a problem space is not so obvious.
The choice of representation for a problem can have an enormous impact on the efficiency of solving the problem.
There are no algorithms for problem representation. One general rule is that a smaller representation, in the sense of fewer states to search, is often better than a larger one.
For example, in the 8-Queens problem, when every state is an assignment of the 8 queens on the board: The number of possibilities with all 8 queens on the
board is 64 choose 8, which is overover 4 billion4 billion.
The solution of the problem prohibits more than one queen per row, so we may assign each queen to a separate row, now we’ll have 88 > 16 million16 million possibilities.
Same goes for not allowing 2 queens in the same column either, this reduces the space to 8!, which is only 40,32040,320 possibilities.
Problem RepresentationProblem Representation
Problem-Space GraphsProblem-Space Graphs
A Problem-Space Graph is a mathematical abstraction often used to represent a problem space:
The statesThe states are represented by nodes nodes of the graph.
The operatorsThe operators are represented by edges edges between nodes.
Edges may be undirected or directed.
Example: a small part of the 8-puzzle problem-space graph:
1 2 3
8 4
7 6 5
1 2 3
8 4
7 6 5
1 2 3
8 6 4
7 5
1 2 3
8 4
7 6 5
1 3
8 2 4
7 6 5
1 3
8 2 4
7 6 5
1 3
8 2 4
7 6 5
1 2
8 4 3
7 6 5
1 2 3
8 4 5
7 6
1 2 3
8 6 4
7 5
1 2 3
8 6 4
7 5
1 2 3
7 8 4
6 5
2 3
1 8 4
7 6 5
Problem-Space GraphsProblem-Space Graphs
In most problems spaces there is more than one path between a pair of nodes.
Detecting when the same state has been regenerated via a different path requires saving all the previously generated states, and comparing newly generated states against the saved states.
Many search algorithms don’t detect when a state has previously been generated. The cost of this is that any state that can be reached by 2 different paths will be represented by duplicate nodes. The benefits are memory savings and simplicity.
Problem-Space GraphsProblem-Space Graphs
The branching factor of a nodebranching factor of a node : is the number of children it has, not counting its parent if
the operator is reversible. is a function of the problem space.
The branching factor of a problem spacebranching factor of a problem space :
is the average number of children of the nodes in the space.
The solution depthsolution depth in a single-agent problem: is the length of the shortest path from the initial node to a
goal node. is a function of the particular problem instance.
Branching Factor and Solution DepthBranching Factor and Solution Depth
In many cases we can reduce the size of the search tree, by eliminating some simple duplicate paths.
In general,
we never apply an operator and it’s inverse in succession, since no optimal path can contain such a sequence.
Therefore we never list the parent of a node as one of his children.
This reduces the branching factor of the problem by approximately 1.
Eliminating Duplicate NodesEliminating Duplicate Nodes
Types of Problem SpacesTypes of Problem Spaces
There are several types of problem spaces:
State space (OR graphs) Problem Reduction Space (AND graphs) Games (AND/OR Graphs)
State SpaceState Space The statesstates represent situations of the problem. The operatorsoperators represent actions in the world.
forward searchforward search: the root of the problem space represents the start state, and the search proceeds forward to a goal state.
backward searchbackward search : the root of the problem space represents the goal state, and the search proceeds backward to the initial state.
For example: in Rubik’s Cube and the Sliding-Tile Puzzle, either a forward or backward search are possible.
This can be considered as an OR graph because the solution picks one branch at each node.
In a problem reduction space, the nodes represent problems to be solved or goals to be achieved, and the edges represent the decomposition of the problem into subproblems.
This is best illustrated by the example of the Towers of Hanoi problem.
This is an AND graph
Problem Reduction SpaceProblem Reduction Space
CA BA CB
2AB
3AC
1AC 2BC
1AC 1AB 1CB 1BA 1BC 1AC
Problem Reduction SpaceProblem Reduction Space The root node, labeled “3AC” represents the original
problem of transferring all 3 disks from peg A to peg C.
The goal can be decomposed into three subgoals: 2AB, 1AC, 2BC. In order to achieve the goal, all 3 subgoals must be achieved.
Problem Reduction SpaceProblem Reduction Space
3AC
CA B
Problem Reduction SpaceProblem Reduction Space
3AC
2AB
1AC
CA B
Problem Reduction SpaceProblem Reduction Space
3AC
2AB
1AC 1AB
CA B
Problem Reduction SpaceProblem Reduction Space
CA B
3AC
2AB
1AC 1AB 1CB
Problem Reduction SpaceProblem Reduction Space
CA B
3AC
2AB
1AC 1AB 1CB
1AC
Problem Reduction SpaceProblem Reduction Space
3AC
2AB
1AC 1AB 1CB
1AC 2BC
1BA
CA B
Problem Reduction SpaceProblem Reduction Space
3AC
2AB
1AC 1AB 1CB
1AC 2BC
1BA 1BC
CA B
Problem Reduction SpaceProblem Reduction Space
CA B
3AC
2AB
1AC 1AB 1CB
1AC 2BC
1BA 1BC 1AC
An AND graph consists entirely of AND nodes, and in order to solve a problem represented by it, you need to solve the problems represented by allall of his children (Hanoi towers example).
An OR graph consists entirely of OR nodes, and in order to solve the problem represented by it, you only need to solve the problem represented by oneone of his children (Eight Puzzle Tree example).
AND/OR GraphsAND/OR Graphs
An AND/OR graph consists of both AND nodes and OR nodes.
One source of AND/OR graphs is a problem where the effect of an action cannot be predicted in advanced, as in an interaction with the physical world.
Example:
the counterfeit-coin problem.
AND/OR GraphsAND/OR Graphs
Two-Player Game TreesTwo-Player Game Trees The most common source of AND/OR graphs is
2-player perfect-information games. Example: Game Tree for 5-Stone Nim:
5
4 3
3 2 2 1
2 1 1 0 1 0 0
1 0 0 0 0
0
OR nodes
AND nodes
x
x
Solution subgraph for AND/OR treesSolution subgraph for AND/OR trees
In general, a solution to an AND/OR graph is a sub graph with the following properties:
It contains the root node. For every OR node included in the solution sub graph, one child is included.
For every AND node included in the solution subgraph, all the children are included.
Every terminal node in the solution subgraph is a solved node.
SolutionsSolutions The notion of a solution is different for the different
problem types:
For a path-finding problem, an optimal solution is a solution of lowest cost.
For a CSP, if there is a cost function associated with a state of the problem, an optimal solution would again be one of lowest cost.
For a 2-player game: If the solution is simply a move to be made, an optimal
solution would be the best possible move that can be made in a given situation.
If the solution is considered a complete strategy subgraph, then an optimal solution might be one that forces a win in the fewest number of moves in the worst case.
Combinatorial ExplosionCombinatorial Explosion The number of different states of the problems above
is enormous, and grows extremely fast as the problem size increases.
Examples for the number of different possibilities:
# of nodesGame9!8-Puzzle16!15-Puzzle25!24-Puzzle3,265,920Rubik’s Cube- 2x2x24.32x1019Rubik’s Cube- 3x3x3n!N-city TSP1020Checkers1040Chessn!N-Queens
The combinatorial explosion of the number of possible states as a function of problem size is a key characteristic that separates artificial intelligence search algorithms in other areas of computer science.
Techniques that rely on storing all possibilities in memory, or even generating all possibilities, are out of the question except for the smallest of these problems. As a result, the problem-space graphs of AI problems are usually represented implicitly by specifying an initial state and a set of operators to generate new states.
Combinatorial ExplosionCombinatorial Explosion
Search AlgorithmsSearch Algorithms This course will focus on systematic search
algorithms that are applicable to the different problem types, so that a central concern is their efficiency.
There are 4 primary measures of efficiency of a search algorithm: The completeness of the algorithm
(whether it returns a solution in the end??)
The quality of the solution returned.
Optimal, near-optimal (e-optimal) or sub optimal
The running time of the algorithm.
The amount of memory required by the algorithm
Chapter 2 : brute force searches.
Chapter 3 : heuristic search algorithms.
Chapter 4 : search algorithms that run in linear space.
Chapter 5 : search algorithms for the case where individual moves of a solution must be executed in the real world before a complete optimal solution can be computed.
Chapter 6 : methods for deriving the heuristic function
Chapter 7 : 2-player perfect-information games.
Chapter 8 : analysis of alpha-beta minimax.
Chapter 9 : games with more than 2 players.
Chapter 10: the decision quality of minimax.
Chapter 11: automatic learning of heuristic functions for 2-player games.
Chapter 12: Constraint Satisfaction Problems.
Chapter 13: parallel search algorithms.
The Next ChaptersThe Next Chapters
Brute-Force SearchBrute-Force Search The most general search algorithms are Brute-Force
searches, that do not use any domain specific knowledge.
It requires: a state description a set of legal operators an initial state a description of the goal state.
We will assume that all edges have unit cost.
To generateTo generate a node means to create the data structure corresponding to that node.
To expandTo expand a node means to generate all the children of that node.
Breadth-First Search (BFS)Breadth-First Search (BFS) BFS expands nodes in order of their depth from the
root.
Generating one level of the tree at a time.
Implemented by first-in first-out (FIFO) queue.
At each cycle the node at the head of the queue is removed and expanded, and its children are placed at the end of the queue.
Breadth-First Search (BFS)Breadth-First Search (BFS)
The numbers represent the order generated by BFS
0
1 2
c3 4 c c5 6
7 813
14
c910
11
12
Solution QualitySolution Quality
BFS continues until a goal node is generated.
Two ways to report the actual solution path:
Store with each node the sequence of moves made to reach that node.
Store with each node a pointer back to his parent - more memory efficient.
If a goal exists in the tree BFS will find a shortest path to a goal.
Time ComplexityTime Complexity We assume :
each node can be generated in constant time
function of the branching factor b and the solution depth d
number of nodes depends on where at level d the goal node is found.
the worst case - have to generate all the nodes at level d.
N(b,d) - total number of nodes generated.
Time ComplexityTime Complexity
Time Complexity of BFS is
O(bd)
Time Complexity of BFS is
O(bd)
N(b,d)= 1+b+b +b +...+b
b N(b,d)= b+b +b +...+ b
b N(b,d)- N(b,d) = -1 + b
N(b,d) (b -1) = b 1
N(b,d) = b 1
b -1
N(b,d) b 1
b -1b
b
b 1
2 3 d
2 3 d+1
d+1
d+1
d+1
d+1d
Space ComplexitySpace Complexity To report the solution we need to store all nodes
generated.
Example:Machine speed = 1GHz
Generated a new state in 100 Instruction
10^7 nodes/sec
node size = 4 bytes
total memory = 2GB=2*10^9 byte
nodes’ capacity=2*10^9/4=500*10^6
After 50 seconds = 1 minute the memory is exhausted !
Space Complexity=Time Complexity= O(bd)Space Complexity=Time Complexity= O(bd)
Space ComplexitySpace Complexity The previous example based on current technology.
The problem won’t go away since as memories increase in size, processors get faster and our appetite to solve larger problem grows.
BFS and any algorithm that must store all the nodes are severely space-bound and will exhaust the memory in minutes.
Depth-First Search (DFS)Depth-First Search (DFS)
DFS generates next a child of the deepest node that has not been completely expanded yet.
First ImplementationFirst Implementation is by last in first out (LIFO) stack. AKA depth-first expansion
At each cycle the node at the head of the stack is removed and expanded, and its children are placed on top of the stack.
DFS - stack implementationDFS - stack implementation
The numbers represent the order generated by DFS
0
1 2
c3 4 c c910
5 613
14
c7 8 11
12
Depth-First Search (DFS)Depth-First Search (DFS)
Second ImplementationSecond Implementation is recursive.
The recursive function takes a node as an argument and perform DFS below that node. This function will loop through each of the node’s children and make a recursive call to perform a DFS below each of the children in turn.
AKA depth-first generation
DFS - recursive implementationDFS - recursive implementation
The numbers represent the order generated by DFS
0
1 8
c2 5 c c912
3 413
14
c6 7 10
11
Space ComplexitySpace Complexity
The space complexity is linear in the maximum search depth.
d is the maximum depth of search and b is the Branching Factor.
Depth-first generation stores O(d) nodes.
Depth-first expansion stores O(b^d) nodes.
Depth-first generation stores O(d) nodes.
Depth-first expansion stores O(b^d) nodes.
DFS is time-limited rather than space-limited.
Time Complexity and Solution Time Complexity and Solution QualityQuality
DFS generate the same set of nodes as BFS.
However, on infinite tree DFS may not terminate.
For example: Eight puzzle contain 181,440 nodes but every path is infinitely long and thus DFS will never end.
Time Complexity of DFS is
O(bd)
Time Complexity of DFS is
O(bd)
Time Complexity and Solution Time Complexity and Solution QualityQuality
The solution for infinite tree is to impose an artificial Cutoff depth on the search.
If the chosen cutoff depth is less than d, the algorithm won’t find a solution.
If the cutoff depth is greater than d, time complexity is larger than BFS.
The first solution DFS found may not be the optimal one.
Depth-First Iterative-Deepening Depth-First Iterative-Deepening (DFID) (DFID)
Combines the best features of BFS and DFS.
DFID first performs a DFS to depth one. Then starts over executing DFS to depth two. Continue to run DFS to successively greater depth until a solution is found.
The numbers represent the order generated by DFID
0
1,3,9
2,6,16
c4,10
5,13 c
7,17
8,20
11 12 21 22c14 15 18 19
Depth-First Iterative-Deepening Depth-First Iterative-Deepening (DFID) (DFID)
Solution QualitySolution Quality
DFID never generates a node until all shallower nodes have already been generated.
The first solution found by DFID is guaranteed to be along a shortest path.
Space ComplexitySpace Complexity
Like DFS, at any given point DFID saving only a stack of nodes.
The space complexity is only O(d)The space complexity is only O(d)
Time ComplexityTime Complexity DFID do not waste a great deal of time in the
iterations prior to the one that finds a solution. This extra work is usually insignificant.
The ratio of the number of nodes generated by DFID to those generated by BFS on a tree is:
bb
bb
b
b
b
bd d
1 1 1
2
/
The total number of nodes generated by DFID is The total number of nodes generated by DFID is bb
bd
1
2
Optimality of DFIDOptimality of DFID
Theorem 2.1 : DFID is asymptotically optimal in terms of time and space among all brute-force shortest-path algorithms on a tree with unit edge costs.
Steps of proof:
verify that DFID is optimal in terms of: solution quality time complexity space complexity
Optimality of DFID- Optimality of DFID- Solution QualitySolution Quality
Since DFID generates all nodes at given level before any nodes at next deeper level, the first solution it finds is arrived at via an optimal path.
Assume the contrary that Algorithm A is: Running on Problem P.
Finding a shortest path to a goal.
Running less than b^d .
Since its running time is less than b^d and there are b^d nodes at depth d, there must be at least one node n at depth d that A doesn’t generate when solve P.
Optimality of DFID- Optimality of DFID- Time ComplexityTime Complexity
New Problem P’. P’ identical to P except that n is the goal.
A examines the same nodes in both P and P’.
A doesn’t examine the node n.
A fail to solve P’ since n is the only goal node.
There is no Algorithm runs better than O(b^d ).
Since DFID takes O(b^d ) time, its time complexity is asymptotically optimal.
Optimality of DFID- Optimality of DFID- Time ComplexityTime Complexity
There is a well-known result from C.S that: Any algorithm that takes f(n) time must use at least
logf(n) space. We have already seen that any brute-force search
must take at least bd time, any such algorithm must use at least log(b^d) space, which is O(d) space.
Since DFID uses O(d) space, it’s asymptotically optimal in space.
Optimality of DFID- Optimality of DFID- Space ComplexitySpace Complexity
Graph with CyclesGraph with Cycles On graph with cycles BFS can be more
efficient because it can detect all duplicate nodes whereas a DFS can’t.
The complexity of BFS grows only as a numbers of nodes at a given depth.
The complexity of DFS depends on the numbers of paths of a given length.
In a graph with a large number of very short cycles, BFS is preferable to DFS, if sufficient memory is available.
In a square grid with radius r, there is O(r^2) nodes and O(4^r) paths.
Graph with CyclesGraph with Cycles
Pruning duplicate Nodes in DFSPruning duplicate Nodes in DFS Eliminate the parent of each node as one of its children.
Easily done with FSM.
Reduce the branching factor from 4 to 3.
start right
up
left
down
Pruning duplicate Nodes in DFSPruning duplicate Nodes in DFS
More Efficient FSM allowed sequences of moves up only or down only . And sequences of moves left only or right only.
Time complexity of DFS controlled by this FSM, like BFS, is O(r2).
start rightleft
up
down
Node Generation TimesNode Generation Times BFS, DFS, DFID generates asymptotically the
same number of nodes on a tree.
DFS, DFID are more efficient than BFS.
The amount of time to generate a node is proportional to the size of the state representation.
If DFS is implemented as a recursive program, a move would require only a constant time, instead of time linear in the number of tiles.
This advantage of DFS, becomes increasingly significant the larger state description.
Backward Chaining/SearchBackward Chaining/Search
The root node represent the goal state, and we could search backward until we reach the initial state.
Requirements: Requirements: The goal state represented explicitly. We be able to reason backwards about the
operators.
Bidirectional SearchBidirectional Search Main idea:Main idea: Simultaneously search forward from the initial state and backward
from the goal state, until the two search frontiers meet at a
common state.
S G
Solution QualitySolution Quality
Bidirectional search guarantees finding a shortest path from the initial state to the goal state, if one exist.
Assume that there is a solution of length d and the both searches are breadth-first.
When the forward search has proceeded to depth k, its frontier will contain all nodes at depth k from the initial state.
Solution QualitySolution Quality When the backward search has proceeded to depth
d-k, its frontier will contain all states at depth d-k from the goal state.
State s reached along an optimal solution path at depth k from the initial state and at depth d-k from the goal state.
The state s is in the frontier of both searches and the algorithm will find the match and return the optimal solution.
Time ComplexityTime Complexity If the two search frontiers meet in the middle, each
search proceeds to depth d/2 before they meet.
But this isn’t the asymptotic time complexity because we have to compare every new node with the opposite search frontier.
Naively, compare each node with the all opposite search frontier cost us O(bd/2).
The total number of nodes generated is O(2bd/2) = O(bd/2).The total number of nodes generated is O(2bd/2) = O(bd/2).
Time ComplexityTime Complexity The time complexity of the whole algorithm
becomes O(bd).
More efficiently is using hash tables.
In the average case:
The time to do hashing and compare will be constant.
the asymptotically time complexity is O(bd/2).
Space ComplexitySpace Complexity
The simplest implementation of bidirectionalbidirectional is to use one search in BFS, and the search in other direction can be DFS such as DFID.
At least one of the frontiers must be sorted in memory.
The space complexity of bidirectional search is dominated by BFS search and is O(bd/2).
BidirectionalBidirectional search is space bound.
BidirectionalBidirectional search is much more time efficient than unidirectional search.
Perimeter SearchPerimeter Search
A special kind of bidirectional search.
A bredth first search up to depth d is performed backwards from the goal state. (this is the perimeter P)
Then any search is performed from the initial state towards the perimeter nodes.
Once the perimeter is reached the search can stop.
GS