96
1 DCP 1172 Introduction to Artificial Intelligence Lecture notes for Chap. 4 [AIMA] Chang-Sheng Chen

DCP 1172 Introduction to Artificial Intelligence

  • Upload
    kishi

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

DCP 1172 Introduction to Artificial Intelligence. Lecture notes for Chap. 4 [AIMA] Chang-Sheng Chen. Last time: Problem-Solving. Problem solving: Goal formulation Problem formulation (states, operators) Search for solution Problem formulation : Initial state Operators Goal test - PowerPoint PPT Presentation

Citation preview

Page 1: DCP 1172 Introduction to Artificial Intelligence

1

DCP 1172Introduction to Artificial Intelligence

Lecture notes for Chap. 4 [AIMA]Chang-Sheng Chen

Page 2: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 2

Last time: Problem-Solving

• Problem solving: Goal formulation Problem formulation (states, operators) Search for solution

• Problem formulation: Initial state Operators Goal test Path cost

• Problem types: single state: fully observable and deterministic

environment multiple state: partially observable and deterministic

environment contingency: partially observable and nondeterministic

environment exploration: unknown state-space

Page 3: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 3

Last time: Finding a solution

Function General-Search(problem, strategy) returns a solution, or failureinitialize the search tree using the initial state problemloop do

if there are no candidates for expansion then return failurechoose a leaf node for expansion according to strategyif the node contains a goal state then return the corresponding

solutionelse expand the node and add resulting nodes to the search tree

end

Solution: is a sequence of operators that bring you from current state to the goal state

Basic idea: offline, systematic exploration of simulated state-space by generating successors of explored states (expanding)

Strategy: The search strategy is determined by the order in which the nodes are expanded.

Page 4: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 4

A Clean Robust Algorithm

Function UniformCost-Search(problem, Queuing-Fn) returns a solution, or failureopen make-queue(make-node(initial-state[problem]))closed [empty]loop do

if open is empty then return failurecurrnode Remove-Front(open)if Goal-Test[problem] applied to State(currnode) then return

currnodechildren Expand(currnode, Operators[problem])while children not empty

[… see next slide …]endclosed Insert(closed, currnode)open Sort-By-PathCost(open)

end

Page 5: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 5

A Clean Robust Algorithm

[… see previous slide …]children Expand(currnode, Operators[problem])while children not empty

child Remove-Front(children)if no node in open or closed has child’s state

open Queuing-Fn(open, child)else if there exists node in open that has child’s state

if PathCost(child) < PathCost(node)open Delete-Node(open, node)open Queuing-Fn(open, child)

else if there exists node in closed that has child’s state

if PathCost(child) < PathCost(node)closed Delete-Node(closed, node)open Queuing-Fn(open, child)

end[… see previous slide …]

Page 6: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 6

Last time: search strategies

Uninformed: Use only information available in the problem formulation• Breadth-first

• Uniform-cost

• Depth-first

• Depth-limited

• Iterative deepening

Informed: Use heuristics to guide the search• Best first

• A*

Page 7: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 7

Evaluation of search strategies

• Search algorithms are commonly evaluated according to the following four criteria:• Completeness: does it always find a solution if one exists?

• Time complexity: how long does it take as a function of number of nodes?

• Space complexity: how much memory does it require?

• Optimality: does it guarantee the least-cost solution?

• Time and space complexity are measured in terms of:• b – max branching factor of the search tree

• d – depth of the least-cost solution

• m – max depth of the search tree (may be infinity)

Page 8: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 8

Last time: uninformed search strategies

Uninformed search:

Use only information available in the problem formulation• Breadth-first

• Uniform-cost

• Depth-first

• Depth-limited

• Iterative deepening

Page 9: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 9

This time: informed search

Informed search:

Use heuristics to guide the search• Best first

• A*

• Heuristics

• Hill-climbing

• Simulated annealing

Page 10: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 10

Best-first search

• Idea:use an evaluation function for each node; estimate of “desirability” expand most desirable unexpanded node.

• Implementation:

QueueingFn = insert successors in decreasing order of desirability

• Special cases:greedy searchA* search

Page 11: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 11

Romania with step costs in km

374

329

253

Page 12: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 12

Greedy search

• Estimation function:

h(n) = estimate of cost from node n to goal (heuristic)

• For example:

hSLD(n) = straight-line distance from n to Bucharest

• Greedy search expands first the node that appears to be closest to the goal, according to h(n).

Page 13: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 13

Page 14: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 14

Page 15: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 15

Page 16: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 16

Page 17: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 17

Properties of Greedy Search

• Complete? No – can get stuck in loops

e.g., Iasi > Neamt > Iasi > Neamt > … Complete in finite space with repeated-state checking.

• Time? O(b^m) but a good heuristic can give

dramatic improvement

• Space? O(b^m) – keeps all nodes in memory

• Optimal? No.

Page 18: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 18

A* search

• Idea: avoid expanding paths that are already expensive evaluation function: f(n) = g(n) + h(n) with:

g(n) – cost so far to reach n

h(n) – estimated cost to goal from n

f(n) – estimated total cost of path through n to goal

• A* search uses an admissible heuristic, that is,

h(n) h*(n) where h*(n) is the true cost from n. For example: hSLD(n) never overestimates actual road distance.

• Theorem: A* search is optimal

Page 19: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 19

Page 20: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 20

Page 21: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 21

Page 22: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 22

Page 23: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 23

Page 24: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 24

Page 25: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 25

1

Optimality of A* (standard proof)

Suppose some suboptimal goal G2 has been generated and is in the queue. Let n be an unexpanded node on a shortest path to an optimal goal G1.

Page 26: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 26

Optimality of A* (more useful proof)

Page 27: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 27

f-contours

How do the contours look like when h(n) =0?

Page 28: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 28

Properties of A*

• Complete? Yes, unless infinitely many nodes with f f(G)

• Time? Exponential in [(relative error in h) x (length of solution)]

• Space? Keeps all nodes in memory

• Optimal? Yes – cannot expand fi+1 until fi is finished

Page 29: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 29

Proof of lemma: pathmax

Page 30: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 30

Admissible heuristics

Page 31: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 31

Relaxed Problem

• Admissible heuristics can be derived from the exact solution cost of a relaxed version of the problem.

• If the rules of the 8-puzzle are relaxed so that a tile can move anywhere, then h1(n) gives the shortest solution.

• If the rules are relaxed so that a tile can move to any adjacent square, then h2(n) gives the shortest solution.

Page 32: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 32

Recall: breadth-first search, step by step

Page 33: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 33

Implementation of search algorithms

Function General-Search(problem, Queuing-Fn) returns a solution, or failurenodes make-queue(make-node(initial-state[problem]))loop do

if nodes is empty then return failurenode Remove-Front(nodes)if Goal-Test[problem] applied to State(node) succeeds then return nodenodes Queuing-Fn(nodes, Expand(node, Operators[problem]))

end

Queuing-Fn(queue, elements) is a queuing function that inserts a set of elements into the queue and determines the order of node expansion. Varieties of the queuing function produce varieties of the search algorithm.

Page 34: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 34

Recall: breath-first search, step by step

Page 35: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 35

Breadth-first search

Node queue: initialization

# state depth path cost parent #

1 Arad 0 0 --

Page 36: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 36

Breadth-first search

Node queue: add successors to queue end; empty queue from top (i.e., FIFO)

# state depth path cost parent #

1 Arad 0 0 --

2 Zerind 1 1 1

3 Sibiu 1 1 1

4 Timisoara 1 1 1

Page 37: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 37

Breadth-first search

Node queue: add successors to queue end; empty queue from top

# state depth path cost parent #

1 Arad 0 0 --

2 Zerind 1 1 1

3 Sibiu 1 1 1

4 Timisoara 1 1 1

5 Arad 2 2 2

6 Oradea 2 2 2

(get smart: e.g., avoid repeated states like node #5)

Page 38: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 38

Depth-first search

Page 39: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 39

Depth-first search

Node queue: initialization

# state depth path cost parent #

1 Arad 0 0 --

Page 40: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 40

Depth-first search

Node queue: add successors to queue front; empty queue from top (i.e., FILO, or stack )

# state depth path cost parent

2 Zerind 1 1 1

3 Sibiu 1 1 1

4 Timisoara 1 1 1

1 Arad 0 0 --

Page 41: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 41

Depth-first search

Node queue: add successors to queue front; empty queue from top

# state depth path cost parent #

5 Arad 2 2 2

6 Oradea 2 2 2

2 Zerind 1 1 1

3 Sibiu 1 1 1

4 Timisoara 1 1 1

1 Arad 0 0 --

Page 42: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 42

Last time: search strategies

Uninformed: Use only information available in the problem formulation• Breadth-first• Uniform-cost• Depth-first• Depth-limited• Iterative deepening

Informed: Use heuristics to guide the search• Best first:

Greedy search -- queue first nodes that maximize heuristic “desirability” based on estimated path cost from current node to goal;

• A* search – queue first nodes that minimize sum of path cost so far and estimated path cost to goal.

Page 43: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 43

This time – Local Search

• Iterative improvement

• Hill climbing

• Simulated annealing

Page 44: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 44

Iterative improvement

• In many optimization problems, path is irrelevant;the goal state itself is the solution.

• Then, state space = space of “complete” configurations.Algorithm goal:

- find optimal configuration (e.g., TSP), or,- find configuration satisfying constraints

(e.g., n-queens)

• In such cases, can use iterative improvement algorithms: keep a single “current” state, and try to improve it.

Page 45: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 45

Iterative improvement example: vacuum world

Simplified world: 2 locations, each may or not contain dirt,each may or not contain vacuuming agent.

Goal of agent: clean up the dirt.

If path does not matter, do not need to keep track of it.

Page 46: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 46

Iterative improvement example: n-queens

• Goal: Put n chess-game queens on an n x n board, with no two queens on the same row, column, or diagonal.

• Here, goal state is initially unknown but is specified by constraints that it must satisfy.

Page 47: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 47

Hill climbing (or gradient ascent/descent)

• Iteratively maximize “value” of current state, by replacing it by successor state that has highest value, as long as possible.

( 健忘症 )

Page 48: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 48

Question: What is the difference

between this problem and our problem

(finding global

minima)?

Page 49: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 49

Hill climbing

• Note: minimizing a “value” function v(n) is equivalent to

maximizing –v(n),

thus both notions are used interchangeably.

• Notion of “extremization”: find extrema (minima or maxima) of a value function.

Page 50: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 50

Hill climbing

• Problem: depending on initial state, may get stuck in local extremum.

Page 51: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 51

Minimizing energy

• Let’s now change the formulation of the problem a bit, so that we can employ new formalism:- let’s compare our state space to that of a physical system that is subject to natural interactions,- and let’s compare our value function to the overall potential energy E of the system.

• On every updating,we have E 0

B

C

A

Basin of

Attraction for C

D

E

Page 52: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 52

Minimizing energy

• Hence the dynamics of the system tend to move E toward a minimum.

• We stress that there may be different such states — they are local minima. Global minimization is not guaranteed.

B

C

A

Basin of

Attraction for C

D

E

Page 53: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 53

Local Minima Problem

• Question: How do you avoid this local minima?

startingpoint

descenddirection

local minima

global minima

barrier to local search

Page 54: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 54

Consequences of the Occasional Ascents

Help escaping the local optima.

desired effect

Might pass global optima after reaching it

adverse effect(easy to avoid bykeeping track ofbest-ever state)

Page 55: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 55

Boltzmann machines

h

The Boltzmann Machine of Hinton, Sejnowski, and Ackley (1984)uses simulated annealing to escape local minima.

To motivate their solution, consider how one might get a ball-bearing traveling along the curve to "probably end up" in the deepest minimum. The idea is to shake the box "about h hard" — then the ball is more likely to go from D to C than from C to D. So, on average, the ball should end up in C's valley.

Page 56: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 56

Simulated annealing: basic idea

• From current state, pick a random successor state;

• If it has better value than current state, then “accept the transition,” that is, use successor state as current state;

• Otherwise, do not give up, but instead flip a coin and accept the transition with a given probability (that is lower as the successor is worse).

• So we accept to sometimes “un-optimize” the value function a little with a non-zero probability.

Page 57: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 57

Boltzmann’s statistical theory of gases

• In the statistical theory of gases, the gas is described not by a deterministic dynamics, but rather by the probability that it will be in different states.

• The 19th century physicist Ludwig Boltzmann developed a theory that included a probability distribution of temperature (i.e., every small region of the gas had the same kinetic energy).

• Hinton, Sejnowski and Ackley’s idea was that this distribution might also be used to describe neural interactions, where low temperature T is replaced by a small noise term T (the neural analog of random thermal motion of molecules). While their results primarily concern optimization using neural networks, the idea is more general.

Page 58: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 58

Boltzmann distribution

• At thermal equilibrium at temperature T, the Boltzmann distribution gives the relative probability that the system will occupy state A vs. state B as:

• where E(A) and E(B) are the energies associated with states A and B.

)/)(exp(

)/)(exp()()(exp

)(

)(

TAE

TBE

T

BEAE

BP

AP

Page 59: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 59

Simulated annealing

Kirkpatrick et al. 1983:

• Simulated annealing is a general method for making likely the escape from local minima by allowing jumps to higher energy states.

• The analogy here is with the process of annealing used by a craftsman in forging a sword from an alloy.

• He heats the metal, then slowly cools it as he hammers the blade into shape. • If he cools the blade too quickly the metal will form patches of

different composition;

• If the metal is cooled slowly while it is shaped, the constituent metals will form a uniform alloy.

Page 60: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 60

Real annealing: Sword

• He heats the metal, then slowly cools it as he hammers the blade into shape.

• If he cools the blade too quickly the metal will form patches of different composition;

• If the metal is cooled slowly while it is shaped, the constituent metals will form a uniform alloy.

Page 61: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 61

Simulated annealing in practice

- set T- optimize for given T- lower T- repeat

MDSA: Molecular Dynamics Simulated Annealing

Page 62: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 62

Simulated annealing in practice

- set T- optimize for given T- lower T (see Geman & Geman, 1984)- repeat

• Geman & Geman (1984): if T is lowered sufficiently slowly (with respect to the number of iterations used to optimize at a given T), simulated annealing is guaranteed to find the global minimum.

• Caveat: this algorithm has no end (Geman & Geman’s T decrease schedule is in the 1/log of the number of iterations, so, T will never reach zero), so it may take an infinite amount of time for it to find the global minimum.

Page 63: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 63

Simulated annealing algorithm

• Idea: Escape local extrema by allowing “bad moves,” but gradually decrease their size and frequency.

Note: goal here is tomaximize E.-

Page 64: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 64

Simulated annealing algorithm

• Idea: Escape local extrema by allowing “bad moves,” but gradually decrease their size and frequency.

Algorithm when goalis to minimize E.< -

-

Page 65: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 65

Note on simulated annealing: limit cases

• Boltzmann distribution: accept “bad move” with E<0 (goal is to maximize E) with probability P(E) = exp(E/T)

• If T is large: E < 0

E/T < 0 and very small

exp(E/T) close to 1

accept bad move with high probability

• If T is near 0: E < 0

E/T < 0 and very large

exp(E/T) close to 0

accept bad move with low probability

Page 66: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 66

Note on simulated annealing: limit cases

• Boltzmann distribution: accept “bad move” with E<0 (goal is to maximize E) with probability P(E) = exp(E/T)

• If T is large: E < 0E/T < 0 and very smallexp(E/T) close to 1accept bad move with high probability

• If T is near 0: E < 0E/T < 0 and very largeexp(E/T) close to 0accept bad move with low probability

Random walk

Deterministicdown-hill

Page 67: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 67

Evolutionary Computation

• Several different methods of evolutionary computation are now known.

• They all simulate natural evolution, generally by:

• creating a populations of individuals,

• evaluating their fitness,

• generating a new population through genetic operations,

• and repeating a number of times.

Page 68: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 68

Overview of Genetic Algorithm

• Difference with traditional search techniques:

• Coding of the design variables as opposed to the design variables themselves, allowing both discrete and continuous variables

• Works with population of designs as opposed to single design, thus reducing the risk of getting stuck at local minima

• Only requires the objective function value, not the derivatives. This aspect makes GAs domain-independent

• The fitness function defines how well each solution solves the problem objective

• GA is a probabilistic search method, not deterministic, making the search highly exploitative.

Page 69: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 69

Nature Genetics vs. Genetic Algorithm

Page 70: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 70

Model of Genetic Algorithm

Page 71: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 71

Evolutionary mechanism of the Genetic Algorithm

• Definition of GA:

• Genetic algorithms are a class of stochastic search algorithms based on biological evolution.

Page 72: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 72

Overview of GA

• stochastic, directed and highly parallel search technique based on principles of population genetics

• Darwin's principle of survival of the fittest: evolution is performed by genetically breeding the population of individuals over a number of generations

• crossover combines good information from the parents

• mutation prevents premature convergence

Page 73: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 73

Genetic Algorithm Flow-chart

Page 74: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 74

GA - Gene Encoding

• Legal Representation after Variation/Recombination

• 無 論 用 什 麼 方 法 來 做 gene encoding , 在 其 執 行 reproduction, crossover, mutation 等 運 算 後, 其 出 現 的 基 因 表 示 方 式 必 須 是 合 法 的 表 示 方 式。

• 也 就 是 說, 好 的 基 因 encoding 方 法 不 但 能 夠 表 示 所 有 合 法 的 基 因 組 合, 同 時, 不 會 因 為 執 行 遺 傳 演 算 法 的 基 本 運 算, 而 演 化 出 不 合 法 的 個 體。

•Gene Encoding (Representation scheme)

1 1 0 1 0 1 0 1

0.3 1.2 2.3 1.9 0 3.2 0.4 4.0

Discrete

Floating

Page 75: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 75

Genetic Algorithm - Operators

• Mutation

• Crossover

11001011+11011111 = 11001111

11001001 =>  10001001

•Target function = prediction accuracy + feature subset size

Page 76: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 76

Evaluation Function of GA

Page 77: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 77

Solving the 8-queen problem using GA

Page 78: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 78

Solving the 8-queen problem using GA (cont)

Page 79: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 79

GA Parameters - Population Size

• Population Size

• 選 擇 一 適 當 的 population size 可 以 使 得 遺 傳 演 算 法 兼 顧 效 率 和 效 用。

• 在 遺 傳 演 算 法 中 population 的 大 小 通 常 是 固 定 的, 在 較 大 的 population 中 每 一 代 training 的 時 間 較 長, 但 是 較 大 的 population 的 training 品 質 較 好, 較 小 的 population 則 每 一 代 training 的 時 間 較 快, 但 其 training 品 質 可 能 較 差 。

Page 80: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 80

GA Parameters – Mutation Rates

• Mutation Rate

• mutation 是 為 了 得 到 更 多 的 資 訊, 但 是 通 常 來 說 mutation rate 的 值 是 很 小 的, 因 為 太 高 的 mutation rate 反 而 會 使 chromosomes 中 的 有 用 資 訊 因 為 mutation 而 遺 失。

• 而 mutation rate 是可以動態的, 例 如 當 連 續 一 段 時 間 chromosomes 都 沒 什 麼 進 步 時, 便 可 以 將 mutation rate 調 大,以 期 有 更 多 變 化。

Page 81: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 81

GA Parameters – Crossover Rate

• Crossover Rate

• Crossover 主 要 是 為 了 讓 chromosomes 互 相 交 換 有 用 的 資 訊, 使 得 chromosomes 獲 得 更 高 的 fitness , 以 期 望 能 在 下 一 代 有 更 好 的 chromosomes , 而 獲 得 更 好 的 performance 。 但 是 有 時 為 了 讓 某 些 chromosomes 的 基 因 可 以 完 全 保 留 給 到 下 一 代, 所 以 便 有 了 crossover rate。

• 其 大 小 依 照 各 問 題 而 定, 適 當 的 crossover rate 對 於 訓 練 品 質 是 非 常 重 要 的。

Page 82: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 82

Summary

• Best-first search => general search, where the minimum-cost nodes (according to

some measure) are expanded first.• Greedy search = best-first with the estimated cost to reach

the goal as a heuristic measure.- Generally faster than uninformed search- not optimal- not complete.

• A* search = best-first with measure = path cost so far + estimated path cost to goal. combines advantages of uniform-cost and greedy searches complete, optimal and optimally efficient space complexity still exponential

Page 83: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 83

Summary

• Time complexity of heuristic algorithms depend on quality of heuristic function. Good heuristics can sometimes be constructed by examining the

problem definition or by generalizing from experience with the problem class.

• Iterative improvement algorithms keep only a single state in memory. Can get stuck in local extrema; simulated annealing provides a way to escape local extrema, and

is complete and optimal given a slow enough cooling schedule.

Page 84: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 84

Case Study - E-mail Filtering by Search

• E-mail basics• Overview of Anti-SPAM filtering

• Pattern Matching• Filtering by automatic learning

• E-mail filtering using Heuristic Search• Simple Static Pattern matching

• White List• Black List

• Dynamic Pattern Matching• Grey List

• Automatic E-mail Filtering• GA• Bayesian Network

Page 85: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 85

Internet

•Bouncing server

•Incoming SMTP Gateway Farm

• Mail Spool server

•Outgoing SMTP Gateway Farm

•Transparent Firewall

Model of the E-mail System

Page 86: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 86

Sample SPAM Message-20041018

Page 87: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 87

Sample SPAM Mail-20041018a (1)

Page 88: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 88

Generic Mail Filtering

• Generic Mail Filtering Functions

F(n) = g(n) + h(n)

• G(n): exact value known• H(n): Heuristic / estimate value

GenericMail

Filtering

Anti-SPAMSearch Engine

Reject

Mail Spool

•AcceptPass

Fail

Client

Page 89: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 89

Generic Mailing Operation (1)

MTAAnti-SPAM

Search Engine-1

Anti-SPAMSearch Engine-K

Anti-SPAM Learning Engine-N

•Reject

Mail Spool •Accept

•Discard

Accountdatabase

•Bounce

Client

MailDelivery

•Account verification

Milter-likeAPI

Page 90: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 90

Generic Mail Filtering (cont)

GenericMail

Filtering

White List

Black List

AutomaticSPAM Learning

Reject

Mail Spool

•AcceptGrey List

(1)

(2)

(3)

(4)

Pass

Pass

Fail

Fail

Failtemporarily

Client

Update

Page 91: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 91

Sample SPAM Mail-20041018a (2)

Page 92: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 92

Sample SPAM Mail-20041018a (3)

Page 93: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 93

SPAM Mail -20041018c(0)

Page 94: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 94

SPAM Message-20041018c(1)

Page 95: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 95

SPAM Mail-20041018c(2)

Page 96: DCP 1172 Introduction to Artificial Intelligence

DCP 1172, Ch.4 96

SPAM Mail Filtering Tool- Netscape Communicator