2D1431 Machine Learning
Fuzzy Logic &
Learning in Robotics
Outline
n Fuzzy Logicn Learning Controln Evolutionary Robotics
Types of Uncertainty
n Stochastic uncertaintyn example: rolling a dice
n Linguistic uncertaintyn examples : low price, tall people, young age
n Informational uncertaintyn - example : credit worthiness, honesty
Classical Setyoung = { x ∉ P | age(x) ≤ 20 }
characteristic function:
µyoung(x) =1 : age(x) ≤ 200 : age(x) > 20
A=“young”
x [years]
µyoung(x)
1
0
{
Fuzzy SetClassical Logic
Element x belongs to set Aor it does not:
µ(x)∈{0,1}
A=“young”
x [years]
µA(x)
1
0
Fuzzy Logic
Element x belongs to set Awith a certaindegree of membership:
µ(x)∈[0,1]
A=“young”
x [years]
µA(x)
1
0
Fuzzy Set
Fuzzy Set A = {(x, µA(x)) : x ∈ X, µ A(x) ∈ [0,1]} • a universe of discourse X : 0 ≤ x ≤ 100• a membership function µA : X → [0,1]
A=“young”
x [years]
µA(x)
1
0
µ=0.8
x=23
Definition :
Types of Membership Functions
x
µ(x)1
0 a b c d
Trapezoid: <a,b,c,d>
x
µ(x)1
0
Gaussian: N(m,s)
m
s
x
µ(x)1
0 a b
Singleton: (a,1) and (b,0.5)
x
µ(x)1
0 a b d
Triangular: <a,b,b,d>
The Extension Principle
For arbitrary functions f:µf(A)(y) = max{µA(x) | y=f(x)}
f
x
µA(x)
y
µ f(
A)
(y
)
Assume a fuzzy set A and a function f:How does the fuzzy set f(A) look like?
f
x
µA(x)
y
µ f(
A)
(y
) max
Operators on Fuzzy SetsUnion
x
1
0
µA∧B(x)=min{µA(x),µB(x)}
µA(x) µB(x)
x
1
0
µA∨B(x)=max{µA(x),µB(x)}
µA(x) µB(x)
Intersection
x
1
0
µA∧B(x)=µA(x) • µB(x)
µA(x) µB(x)
x
1
0
µA∨B(x)=min{1,µA(x)+µB(x)}
µA(x) µB(x)
Complement
Negation: µ¬A(x)= 1 - µA(x)
µ¬A∨A(x) ≡ 1µ¬A∧A(x) ≡ 0
Classical law does not always hold:
Example : µA(x) = 0.6 µ¬A(x) = 1 - µA(x) = 0.4µ¬A∨A(x) = max(0.6,0.4) = 0.6 ¹ 1µ¬A∧A(x) = min(0.6,0.4) = 0.4 ¹ 0
Fuzzy Relations
classical relationR : X x Y defined by µR(x,y) = 1 if (x,y) ∈ R
0 if (x,y) ∉ R|{
fuzzy relationR : X x Y defined by µR(x,y) ∈ [0,1]
µR(x,y) describes to which degree x and y are related
It can also be interpreted as the truth value of the proposition x R y
Fuzzy Relations
X = { rainy, cloudy, sunny }
Y = { swimming, bicycling, camping, reading }
X/Y swimming bicycling camping reading
rainycloudy
sunny
0.0 0.2 0.0 1.0
0.0 0.8 0.3 0.3
1.0 0.2 0.7 0.0
Example:
Fuzzy Sets & Linguistic Variables
A linguistic variable combines several fuzzy sets.
linguistic variable : temperaturelinguistics terms (fuzzy sets) : { cold, warm, hot }
x [C]
µ(x)
1
0
µcold µwarm µhot
6020
Fuzzy Rules
n causal dependencies can be expressed in form of if-then-rules
n general form:if <antecedent> then <consequence>
n example:if temperature is cold and oil is cheap
then heating is high
linguistic values/terms (fuzzy sets)linguistic variables
if temperature is cold and oil price is low then heating is high
if temperature is hot and oil price is normal then heating is low
Fuzzy Rule Base
Temperature :cold warm hotOil price:
cheap
normal
expensive
high high medium
high medium low
medium low low
Heating
fuzzy knowledge base
Fuzzy Data-Base:Definition of linguistic input and output variables
Definition of fuzzy membership functions
Fuzzy Knowledge Base
Fuzzy Rule-Base:if temperature is cold and oil price is cheap
then heating is high….
x [C]
µ(x)1
0
µcold µwarm µhot
6020
Fuzzification
t
1
0
µcold(t)=0.5
If temperature is cold ...
15C p
1
0
µcheap(p)=0.3
and oil is cheap ...
$13/barrel
0.5 0.3
Determine degree of membership for each term of an input variable :
temperature : t=15 C oilprice : p=$13/barrel
1. Fuzzification
Fuzzy Combination2. Combine the terms in one degree of fulfillment for the entire
antecedent by fuzzy AND: min-operator
µante = min{µcold(t), µcheap(p)} = min{0.5,0.3} = 0.3
t
1
0
µcold(t)=0.5
if temperatur is cold ...
15C p
1
0
µcheap(p)=0.3
and oil is cheap ...
$13/barrel
0.5 0.3
Fuzzy Inference
3. Inference step: Apply the degree of membership of the antecedent to the consequent of the rule
µhigh(h)
... then heating is high
µconsequent(h)
h
1
0µante =0.3...
h
1
0
µhigh(h)
µante =0.3...
µconsequent(h)
min-inference:µcons. = min{µante , µhigh }
prod-inference:µcons. = µante • µhigh
Fuzzy Aggregation
h
1
0
... then heating is high
... then heating is medium
... then heating is low
4. Aggregation: Aggregate all the rules consequents using the max-operator for union
Defuzzification5. Determine crisp value from output membership function
for example using “Center of Gravity”-method:
h
1
0
µconsequent(h) COG
73
Center of singletons defuzzification:
h = Si mi • Ai • ci
Si mi • Ai
mi = degree of membership fuzzy set iAi = area of fuzzy set ici = center of gravity of fuzzy set i
Schema of a Fuzzy Decision
Fuzzification Inference Defuzzification
t
µcold µwarm µhot
measuredtemperature
0.2
0.7
if temp is coldthen valve is open
if temp is warmthen valve is half
if temp is hotthen valve is close
rule-base
µcold =0.7
µwarm =0.2
µhot =0.0
v
µopen µhalf µclose
crisp outputfor valve-setting
0.2
0.7
Machine vs. Robot Learning Machine Learning Learning in Robotics
Machine vs. Robot Learning
Machine Learningn Learning in vaccumn Statistically well-behaved
datan Mostly off-linen Informative feed-backn Computational time not an
issuen Hardware does not mattern Convergence proof
Robot Learningn Embedded learningn Data distribution not
homegeneousn Mostly on-linen Qualitative and sparse
feed-backn Time is crucialn Hardware is a priorityn Empirical proof
Methods of Robot Learningn Dynamic Programming / Reinforcement Learning:The desired behavior is expressed as an optimization
criterion r to be optimized over a temporal horizon, resulting in a cost function (long term accumulated reward)
J(xt) = Σt r(xt,ut)
n Problem: curse of dimensionality, large state spaces, large amount of exploration
n Idea: modularize control policy
Learning Taskn Learn a task specfic control policy π that maps the
continuous valued state vector s to a continuous valued control action u.
u = π(x,α,t)
Control policyπ(x,α,t)
Robot & environment
Learningsystem
u s
αα
DesiredBehavior
Learning Control with Sub-Policiesn Learn or design sub-policies and subsequently build the
complete policy out of the sub-policies
sub-policy π2
Robot & environment
Learningsystem
u s
DesiredBehavior
sub-policy π1
sub-policy π3
sub-policy π4
Indirect Learning of Control Policiesn Decompose task into planning and execution stage n Planning generates a desired kinematic trajectory n Execution transforms plan into appropriate motor commandn Learn inverse kinematic model for the execution module
Control policy
Robot & environment
Learningsystem
uDesiredBehavior
trajectoryplanning
feedbackcontroller
feedforwardcontroller
ΣΣ
Learning Inverse Models
n Learn inverse kinematic model for feed-forward control
n Kinematic function: x=f(u)n Inverse model: u = f-1(x)n Dynamic model: dx/dt = f(x,u)n Inverse dynamic model: u=g(xdesired,x)
Evolutionary Robotics in a Nutshellpopulation
1001
0011
0100 0110
1101
environment
u=f(s,αα)
0110 → α
evaluationrecombination
mutation
11010110
01 0111 10
selection
fitness( )01101101 0100 0110X
Evolutionary Behavior Design
Evolutionaryalgorithm
Evolutionaryalgorithm Evaluation
schemeEvaluationscheme
EnvironmentEnvironmentRoboticBehaviorRoboticBehavior
control action: a
observed state : s
fitness
observed reward : rbehaviorparameters
genotype
Evolving in Simulation vs. Reality
Simulation Reality• Requires model of thesensors and environment
• Real world is the model
• Brittleness of adapted behaviors
• Robust behaviors
• Identical test cases for all candidate controllers
• Difficult to initialize for a newcontroller under evaluation
• Time-consuming, manual,fitness evaluation
• automated fast fitness evaluation
EnvironmentReal time online evolution in an 200x100cm maze withabout 10-15 minutes per generation
Robot & Sensorsn 6 binary sensors (4 antenna + 2 bumpers)
n 1 rotation sensor
External vs. Internal Fitness External fitness n can not be measured by the robot itself (e.g. location in
world coordinates) n external observer perspectiven useful in simulations
Internal fitnessn directly accessible to the robot by means of sensors (e.g.
sensor readings, battery level)n useful when learning on the real robotn fitness function might be more difficult to design
Functional vs. Behavorial Fitness Functional:n measures directly the way in which the system functions,
observes the causes of a behaviorn Example: learn to generate a desired oscillatory pattern
of leg motion
Behavioral: n Measures the resulting behavior, observes the effects of
the behavior n Example: measure the absolute distance traveled by the
robot using the rotation sensor
Explicit vs. Implicit Fitnessn Explicit:
n Large number of constraintsn Actively steers the evolutionary system towards
desired behaviorsn Problem: weighting and aggregating multiple
constraintsn Implicit:
n Small number of constraintsn Allow evolution of emergent, novel behaviors n Problem: for complex behaviors (e.g. find cylinders,
pick up cylinders and drop them outside the arena) finding an initial behavior is like searching for a needle in the haystack
Behavior Representationn The robot is controlled by the duration and direction of left and
right motor command.n Sensory states :
n s1,…,s6 (26 possible states reduced to 9 different states)
n Control action :
n direction left, right motorn duration of left, right motor action
n Mapping:
n For each of the nine different sensory states, the direction and duration of left and right motor commands are encoded by one byte.
Sensor States to Motor Actions
S3: left bumper
S2: front bumper
S1: no contact
Right motor actionLeft motor actionSensor state
0 [ms] 0 [ms]
50 [ms] 50 [ms]
40 [ms] 70 [ms]
Sensor States to Motor Actions
S6: left antenna inward
S5: left antenna outward
S4: right bumper
Right motor actionLeft motor actionSensor state
30 [ms] 30 [ms]
60 [ms] 60 [ms]
30 [ms]Float 20 [ms]
(if black vertical axle is pressed this state is equivalent to S3)
Sensor States to Motor Actions
S9: left & right
antenna outward
S8: right antennaoutward
S7: right antennainward
Right motor actionLeft motor actionSensor state
60 [ms] 70 [ms]
70 [ms] 40 [ms]
20 [ms] 10 [ms]
(if black vertical axle is pressed this state is equivalent to S4)
Communication between RCX and PC
IR comunication tower
RCX IR port
Environment
Serial link
Host computer
Behavior Evaluation
n The parameters of the robotic behavior are downloaded on the LEGO robot.
n The robot performs behavior for one minute.n The number of rotations of the tracking wheel,
equivalent to the distance traveled is returned as the fitness.
n Based on the fitness the evolutionary algorithm, selects good behaviors and generates new candidate behaviors by means of recombination and mutation.
n Population size 10 individuals, 20 generations, one run of the evolutionary algorithm takes about 3-4 hours
Evolved Behavior
n ..\..\..\Movies\p90913g2.mov
Evolution of a Wall-Following Behavior
n 2 light sensorsn 2 bumpern 1 rotation sensor
Sensor Characteristicn Light sensor readings S1, S2 as a function of the distance to the
obstacle
Behavior Representation and Fitness
n Neural network: ω=f(S1, S2, wij, θi )n Turn rate ω → motor commands
n Genotype encodes: n 7 ANN parameters {wij , θi } : 8 bit/parametern Motor command for collision states left and right bumper n Fitness: absolute distance traveled#rotation
forwardforwardbackward
ω ∆T (1−ω) ∆T
Network Architectures
S1S2
H
ω
wij
Feed-forward network (purely reactive behaviors)
S2
S1
H
ω
ω H S1 S2
X(t)
X(t+1)
Recurrent Network(dynamic behaviors)
Evolved Behavior
..\..\..\Movies\PB251814.MOV
Distance Maximizationn Fitness function contains an additional penalty term for low
proximity to obstacles Si < Smin
without proximity penalty with proximity penalty