Download pdf - 2D1431 Machine Learning

2D1431 Machine Learning

Fuzzy Logic &

Learning in Robotics

Outline

n Fuzzy Logicn Learning Controln Evolutionary Robotics

Types of Uncertainty

n Stochastic uncertaintyn example: rolling a dice

n Linguistic uncertaintyn examples : low price, tall people, young age

n Informational uncertaintyn - example : credit worthiness, honesty

Classical Setyoung = { x ∉ P | age(x) ≤ 20 }

characteristic function:

µyoung(x) =1 : age(x) ≤ 200 : age(x) > 20

A=“young”

x [years]

µyoung(x)

1

0

{

Fuzzy SetClassical Logic

Element x belongs to set Aor it does not:

µ(x)∈{0,1}

A=“young”

x [years]

µA(x)

1

0

Fuzzy Logic

Element x belongs to set Awith a certaindegree of membership:

µ(x)∈[0,1]

A=“young”

x [years]

µA(x)

1

0

Fuzzy Set

Fuzzy Set A = {(x, µA(x)) : x ∈ X, µ A(x) ∈ [0,1]} • a universe of discourse X : 0 ≤ x ≤ 100• a membership function µA : X → [0,1]

A=“young”

x [years]

µA(x)

1

0

µ=0.8

x=23

Definition :

Types of Membership Functions

x

µ(x)1

0 a b c d

Trapezoid: <a,b,c,d>

x

µ(x)1

0

Gaussian: N(m,s)

m

s

x

µ(x)1

0 a b

Singleton: (a,1) and (b,0.5)

x

µ(x)1

0 a b d

Triangular: <a,b,b,d>

The Extension Principle

For arbitrary functions f:µf(A)(y) = max{µA(x) | y=f(x)}

f

x

µA(x)

y

µ f(

A)

(y

)

Assume a fuzzy set A and a function f:How does the fuzzy set f(A) look like?

f

x

µA(x)

y

µ f(

A)

(y

) max

Operators on Fuzzy SetsUnion

x

1

0

µA∧B(x)=min{µA(x),µB(x)}

µA(x) µB(x)

x

1

0

µA∨B(x)=max{µA(x),µB(x)}

µA(x) µB(x)

Intersection

x

1

0

µA∧B(x)=µA(x) • µB(x)

µA(x) µB(x)

x

1

0

µA∨B(x)=min{1,µA(x)+µB(x)}

µA(x) µB(x)

Complement

Negation: µ¬A(x)= 1 - µA(x)

µ¬A∨A(x) ≡ 1µ¬A∧A(x) ≡ 0

Classical law does not always hold:

Example : µA(x) = 0.6 µ¬A(x) = 1 - µA(x) = 0.4µ¬A∨A(x) = max(0.6,0.4) = 0.6 ¹ 1µ¬A∧A(x) = min(0.6,0.4) = 0.4 ¹ 0

Fuzzy Relations

classical relationR : X x Y defined by µR(x,y) = 1 if (x,y) ∈ R

0 if (x,y) ∉ R|{

fuzzy relationR : X x Y defined by µR(x,y) ∈ [0,1]

µR(x,y) describes to which degree x and y are related

It can also be interpreted as the truth value of the proposition x R y

Fuzzy Relations

X = { rainy, cloudy, sunny }

Y = { swimming, bicycling, camping, reading }

X/Y swimming bicycling camping reading

rainycloudy

sunny

0.0 0.2 0.0 1.0

0.0 0.8 0.3 0.3

1.0 0.2 0.7 0.0

Example:

Fuzzy Sets & Linguistic Variables

A linguistic variable combines several fuzzy sets.

linguistic variable : temperaturelinguistics terms (fuzzy sets) : { cold, warm, hot }

x [C]

µ(x)

1

0

µcold µwarm µhot

6020

Fuzzy Rules

n causal dependencies can be expressed in form of if-then-rules

n general form:if <antecedent> then <consequence>

n example:if temperature is cold and oil is cheap

then heating is high

linguistic values/terms (fuzzy sets)linguistic variables

if temperature is cold and oil price is low then heating is high

if temperature is hot and oil price is normal then heating is low

Fuzzy Rule Base

Temperature :cold warm hotOil price:

cheap

normal

expensive

high high medium

high medium low

medium low low

Heating

fuzzy knowledge base

Fuzzy Data-Base:Definition of linguistic input and output variables

Definition of fuzzy membership functions

Fuzzy Knowledge Base

Fuzzy Rule-Base:if temperature is cold and oil price is cheap

then heating is high….

x [C]

µ(x)1

0

µcold µwarm µhot

6020

Fuzzification

t

1

0

µcold(t)=0.5

If temperature is cold ...

15C p

1

0

µcheap(p)=0.3

and oil is cheap ...

$13/barrel

0.5 0.3

Determine degree of membership for each term of an input variable :

temperature : t=15 C oilprice : p=$13/barrel

1. Fuzzification

Fuzzy Combination2. Combine the terms in one degree of fulfillment for the entire

antecedent by fuzzy AND: min-operator

µante = min{µcold(t), µcheap(p)} = min{0.5,0.3} = 0.3

t

1

0

µcold(t)=0.5

if temperatur is cold ...

15C p

1

0

µcheap(p)=0.3

and oil is cheap ...

$13/barrel

0.5 0.3

Fuzzy Inference

3. Inference step: Apply the degree of membership of the antecedent to the consequent of the rule

µhigh(h)

... then heating is high

µconsequent(h)

h

1

0µante =0.3...

h

1

0

µhigh(h)

µante =0.3...

µconsequent(h)

min-inference:µcons. = min{µante , µhigh }

prod-inference:µcons. = µante • µhigh

Fuzzy Aggregation

h

1

0

... then heating is high

... then heating is medium

... then heating is low

4. Aggregation: Aggregate all the rules consequents using the max-operator for union

Defuzzification5. Determine crisp value from output membership function

for example using “Center of Gravity”-method:

h

1

0

µconsequent(h) COG

73

Center of singletons defuzzification:

h = Si mi • Ai • ci

Si mi • Ai

mi = degree of membership fuzzy set iAi = area of fuzzy set ici = center of gravity of fuzzy set i

Schema of a Fuzzy Decision

Fuzzification Inference Defuzzification

t

µcold µwarm µhot

measuredtemperature

0.2

0.7

if temp is coldthen valve is open

if temp is warmthen valve is half

if temp is hotthen valve is close

rule-base

µcold =0.7

µwarm =0.2

µhot =0.0

v

µopen µhalf µclose

crisp outputfor valve-setting

0.2

0.7

Machine vs. Robot Learning Machine Learning Learning in Robotics

Machine vs. Robot Learning

Machine Learningn Learning in vaccumn Statistically well-behaved

datan Mostly off-linen Informative feed-backn Computational time not an

issuen Hardware does not mattern Convergence proof

Robot Learningn Embedded learningn Data distribution not

homegeneousn Mostly on-linen Qualitative and sparse

feed-backn Time is crucialn Hardware is a priorityn Empirical proof

Methods of Robot Learningn Dynamic Programming / Reinforcement Learning:The desired behavior is expressed as an optimization

criterion r to be optimized over a temporal horizon, resulting in a cost function (long term accumulated reward)

J(xt) = Σt r(xt,ut)

n Problem: curse of dimensionality, large state spaces, large amount of exploration

n Idea: modularize control policy

Learning Taskn Learn a task specfic control policy π that maps the

continuous valued state vector s to a continuous valued control action u.

u = π(x,α,t)

Control policyπ(x,α,t)

Robot & environment

Learningsystem

u s

αα

DesiredBehavior

Learning Control with Sub-Policiesn Learn or design sub-policies and subsequently build the

complete policy out of the sub-policies

sub-policy π2

Robot & environment

Learningsystem

u s

DesiredBehavior

sub-policy π1

sub-policy π3

sub-policy π4

Indirect Learning of Control Policiesn Decompose task into planning and execution stage n Planning generates a desired kinematic trajectory n Execution transforms plan into appropriate motor commandn Learn inverse kinematic model for the execution module

Control policy

Robot & environment

Learningsystem

uDesiredBehavior

trajectoryplanning

feedbackcontroller

feedforwardcontroller

ΣΣ

Learning Inverse Models

n Learn inverse kinematic model for feed-forward control

n Kinematic function: x=f(u)n Inverse model: u = f-1(x)n Dynamic model: dx/dt = f(x,u)n Inverse dynamic model: u=g(xdesired,x)

Evolutionary Robotics in a Nutshellpopulation

1001

0011

0100 0110

1101

environment

u=f(s,αα)

0110 → α

evaluationrecombination

mutation

11010110

01 0111 10

selection

fitness( )01101101 0100 0110X

Evolutionary Behavior Design

Evolutionaryalgorithm

Evolutionaryalgorithm Evaluation

schemeEvaluationscheme

EnvironmentEnvironmentRoboticBehaviorRoboticBehavior

control action: a

observed state : s

fitness

observed reward : rbehaviorparameters

genotype

Evolving in Simulation vs. Reality

Simulation Reality• Requires model of thesensors and environment

• Real world is the model

• Brittleness of adapted behaviors

• Robust behaviors

• Identical test cases for all candidate controllers

• Difficult to initialize for a newcontroller under evaluation

• Time-consuming, manual,fitness evaluation

• automated fast fitness evaluation

EnvironmentReal time online evolution in an 200x100cm maze withabout 10-15 minutes per generation

Robot & Sensorsn 6 binary sensors (4 antenna + 2 bumpers)

n 1 rotation sensor

External vs. Internal Fitness External fitness n can not be measured by the robot itself (e.g. location in

world coordinates) n external observer perspectiven useful in simulations

Internal fitnessn directly accessible to the robot by means of sensors (e.g.

sensor readings, battery level)n useful when learning on the real robotn fitness function might be more difficult to design

Functional vs. Behavorial Fitness Functional:n measures directly the way in which the system functions,

observes the causes of a behaviorn Example: learn to generate a desired oscillatory pattern

of leg motion

Behavioral: n Measures the resulting behavior, observes the effects of

the behavior n Example: measure the absolute distance traveled by the

robot using the rotation sensor

Explicit vs. Implicit Fitnessn Explicit:

n Large number of constraintsn Actively steers the evolutionary system towards

desired behaviorsn Problem: weighting and aggregating multiple

constraintsn Implicit:

n Small number of constraintsn Allow evolution of emergent, novel behaviors n Problem: for complex behaviors (e.g. find cylinders,

pick up cylinders and drop them outside the arena) finding an initial behavior is like searching for a needle in the haystack

Behavior Representationn The robot is controlled by the duration and direction of left and

right motor command.n Sensory states :

n s1,…,s6 (26 possible states reduced to 9 different states)

n Control action :

n direction left, right motorn duration of left, right motor action

n Mapping:

n For each of the nine different sensory states, the direction and duration of left and right motor commands are encoded by one byte.

Sensor States to Motor Actions

S3: left bumper

S2: front bumper

S1: no contact

Right motor actionLeft motor actionSensor state

0 [ms] 0 [ms]

50 [ms] 50 [ms]

40 [ms] 70 [ms]


S6: left antenna inward

S5: left antenna outward

S4: right bumper


30 [ms] 30 [ms]

60 [ms] 60 [ms]

30 [ms]Float 20 [ms]

(if black vertical axle is pressed this state is equivalent to S3)


S9: left & right

antenna outward

S8: right antennaoutward

S7: right antennainward


60 [ms] 70 [ms]

70 [ms] 40 [ms]

20 [ms] 10 [ms]

(if black vertical axle is pressed this state is equivalent to S4)

Communication between RCX and PC

IR comunication tower

RCX IR port

Environment

Serial link

Host computer

Behavior Evaluation

n The parameters of the robotic behavior are downloaded on the LEGO robot.

n The robot performs behavior for one minute.n The number of rotations of the tracking wheel,

equivalent to the distance traveled is returned as the fitness.

n Based on the fitness the evolutionary algorithm, selects good behaviors and generates new candidate behaviors by means of recombination and mutation.

n Population size 10 individuals, 20 generations, one run of the evolutionary algorithm takes about 3-4 hours

Evolved Behavior

n ..\..\..\Movies\p90913g2.mov

Evolution of a Wall-Following Behavior

n 2 light sensorsn 2 bumpern 1 rotation sensor

Sensor Characteristicn Light sensor readings S1, S2 as a function of the distance to the

obstacle

Behavior Representation and Fitness

n Neural network: ω=f(S1, S2, wij, θi )n Turn rate ω → motor commands

n Genotype encodes: n 7 ANN parameters {wij , θi } : 8 bit/parametern Motor command for collision states left and right bumper n Fitness: absolute distance traveled#rotation

forwardforwardbackward

ω ∆T (1−ω) ∆T

Network Architectures

S1S2

H

ω

wij

Feed-forward network (purely reactive behaviors)

S2

S1

H

ω

ω H S1 S2

X(t)

X(t+1)

Recurrent Network(dynamic behaviors)

Evolved Behavior

..\..\..\Movies\PB251814.MOV

Distance Maximizationn Fitness function contains an additional penalty term for low

proximity to obstacles Si < Smin

without proximity penalty with proximity penalty