ALVINN: Autonomous Land Vehicle Visually-Guided Robot

1

Visually-Guided Robot Control

15-486/782: Artificial Neural NetworksDavid S. Touretzky

Fall 2006

This material is based on earlierlecture notes prepared by Dean Pomerleau.

2

ALVINN: Autonomous Land VehicleIn a Neural Network

Dean Pomerleau's Ph.D.thesis (1992).

� How ALVINN Works

� Architecture

� Training Procedure

� Performance

� Why ALVINN Works

� Hidden Unit Analysis

� Integrating MultipleNetworks

� Other Applications

3

ALVINN Network Architecture

How many inputs?30×32=960

How many weights?961×4�5×30=3994

4

Original Training Scheme

Generate artificial roadimages mimicing situationsthe network is expected toencounter, including noise.

Calculate correct steeringdirection for each image.

Train on artificial images,then test on real roads.

Problem: realistic trainingimages are difficult toproduce: training isexpensive.

tree

road edges

15-486/782: Artificial Neural Networks David S. Touretzky Fall 2006

5

Training on the Fly

Digitize the steering wheelposition.

Train the network byhaving it observe livesensor data as a humandrives the vehicle.

The human �teaches� thenetwork how to drive.

Can this really work?

It's not so simple...

6

Measuring Steering Error

Train with a Gaussianbump centered over thedesired steering direction.

To test: fit a Gaussian tothe network's outputvector.

Measure distance betweenGaussian's peak andhuman steering direction.

Why use a Gaussianfor the output pattern?

7

Learning to Correct Steering Errors

If the human drives perfectly, the network never learnsto make corrections when it drifts off the desiredtrack.

Crude solution:

�Turn learning off temporarily, and drive off course.

�Turn learning back on, and let the network observe thehuman making the necessary corrections.

� Repeat.

Relies on the human driver to generate a rich set ofsteering errors: time consuming and unreliable.Can be dangerous if training in traffic. 8

Simulating the Steering Errors

Let humans drive as best they can.

Increase training set varietyby artificially shifting androtating the video images,so that the vehicle appearsat different orientationsrelative to the road.

Generate 14 random shift/rotations for each image.

A simple steering model is used to predict how ahuman driver would react to each transformation.


9

Road Shifts Lead to Missing Pixels

Rotating and translating the camera can be simulatedby copying pixels. But what about pixels in the newfield of view that weren't present in the originalcamera image?

An exaggerated rotation

Effects of amodest shift

10

Filling In Missing Pixels

1. Fill in (A) from closestimage pixel (B).

Problem: smearing.

The smear becomes animage �feature� that thenetwork learns to exploit!

2. Project along a line from(A) parallel to the vehicle'sheading to find closest pixel(C).

smearing

21Orig

11

Estimating Correct SteeringDirection

�Pure pursuit� steering model generates a fairly goodestimate of what the human driver would do.

s = offset distance

� = offset angle

l = lookahead distance

rp = person's steering radius

12

Network Weights Evolving

Initial random weightslook like �salt and pepper�noise.

During training, thehidden units evolve into aset of complementaryfeature detectors.


13

Problem with Online Learning:Network Can �Forget�

The network tends tooverlearn recentlyencountered examples andforget how to drive insituations encounteredearlier in training.

After a long right turn, thenetwork will be biasedtoward turning right, sincerecent training datafocused on right turns.

14

Solution: Maintain a Buffer ofBalanced Training Images

This is a semi-batch learning approach. Keep a bufferof 200 training images.

Replace 15 old exemplars with new ones derived fromthe current camera image. Replacement strategies:

(1) Replace the image with the lowest error

(2) Replace the image with the closest steering direction

Buffer:

New Exemplars:

15

�Training on the Fly� Details

1) Take current camera image plus 14 shifted/rotatedvariants, each with computed steering direction.

2) Replace 15 old exemplars in the 200 elementtraining exemplar buffer with these 15 new ones.

3) Perform one epoch of backpropagation learning onthe training exemplar buffer.

4) Repeat steps 1-3 until the network's predictedsteering direction reliably matches the person'ssteering direction.

How long does training take?

Just a few minutes!

(movie)16

ALVINN Weight Diagram

This hidden unit is excitedby a road on the left of theimage.

Its projections to theoutput layer are voting fora left turn, to bring thevehicle back to the centerof the road.


17


This hidden unit is excitedby roads slightly left ORslightly right of center.

It suggests two steeringdirections: a shallow leftturn and a shallow rightturn.

In order to determinewhich is correct, thenetwork must combineoutputs from severalactive hidden units.

A �distributed representation�allows a very simple networkto drive accurately.

18


This unit was taken from anetwork trained on a roadwhose width varied.

In this case, the hiddenunits focus on detectingjust one road edge.

The units vote for arelatively wide range ofsteering directions; theircooperative activity fine-tunes the steeringdirection.

19

Multi-Modal Inputs

ALVINN can avoid obstacles using a laserrangefinder. It can drive at night using laserreflectance imaging.

Regular

Video

Laser

RangefinderLaser

Reflectance

20

Comparison with the�Traditional Approach�

1) Determine which image features are important, e.g.,a yellow stripe down the center of the road.

ALVINN finds the important features itself.

2) Hand-code algorithms to find the importantfeatures, e.g., edge detection to find yellow lines.

ALVINN constructs its own feature detectors.

3) Hand-code algorithm to determine steeringdirection based on feature positions in the image.

ALVINN learns the mapping from featuredetector outputs to steering direction.


21

ALVINN's Shortcomings

The single-network ALVINN architecture can onlydrive on one type of road (unpaved, single-lane,double-lane, lane-striped, etc.)

Can't transition from one road type to another.

Can't follow a route.

Solution: rule-based multi-network integration.

22

Hybrid ALVINN Architecture

23

Symbolic Mapping Module

Provides two kinds of information: (1) current roadtype, and (2) estimated steering direction.

24

Map-Based Relevancy Arbitration

Which module should drive the vehicle?

If the map says the vehicle is on a two-lane road, thearbitrator will choose to listen to the two-lane roaddriving network.

If the map says the vehicle is approaching anintersection, the arbitrator will choose to steer in thedirection dictated by the mapping module in order tofollow the correct path to the destination.


25

Obstacle Avoidance Network

26

Priority Arbitration

The obstacle avoidance network is trained to steerstraight ahead unless there is an obstacle in the way.

The arbitrator will ignore the obstacle avoidancenetwork when it says �Steer straight ahead,� since themessage has low urgency.

But when the obstacle avoidance network suggests asharp turn, its urgency is high, and the arbitrator willgive it precedence over other behaviors.

27

Problems with Rule-BasedArbitration

Relevancy and priority arbitration each choose a singlenetwork to listen to. What if we want to combineresults from multiple networks?

28

Problems with Rule-BasedArbitration

Requires detailed knowledge of each network's areasof expertise.

Requires a detailed and accurate map of theenvironment.

Assumes precise knowledge of the vehicle's position.

Would like to be be able to say: �Go about ¼ mile andturn right at the intersection.�


29

Connectionist Arbitration

Estimate the reliability of all networks in the currentsituation.

Use the reliability estimate to:

1) Weight the outputs of the networks.

2) Pinpoint the vehicle's location.

3) Control the vehicle's speed.

4) Determine the need for retraining.

But how do you estimate a network's reliability?

30

Output Appearance ReliabilityEstimation (OARE)

Compare actual networkoutput with the closest�ideal� output pattern.

Calculate the �outputappearance error�.

The larger the appearanceerror, the less reliable is thenetwork.

31

Measuring Output Appearance Error

Output Appearance Error=�i

�Actuali�Ideali �2

Steering Error=�curveh�curven�

where curveh=human turn curvature

and curven=network turn curvature

OARE predictssteering error.

Correlationcoeff. = 0.78

32

OARE Predicts Steering Error

A = one lane road (trained)

B = fork in road: output hasa bimodal distribution

C = two lane road


33

Fork Detection

A fork in the road causes a bimodal output pattern,which has high OARE.

If the map shows we'reapproaching a fork, wecan use OARE to detectwhen the fork has beenreached.

34

Comparative OARE

Comparing OARE for two networks can tell us whenwe have transitioned from one road type to another.

35

Input Reconstruction ReliabilityEstimation (IRRE)

Treat the network as anauto-encoder: require it toreconstruct the input.

Hidden unit �bottleneck�extracts principalcomponents of the image.

Images far from thetraining set will not bereconstructed accurately.

36

Auto-Encoder Fills Gap in LaneStripe


37

Difference In Reconstructed Image

38

Drawbacks to IRRE

Hidden units are forced to learn all the features of theinput image, not just the ones relevant to the task.(Didn't seem to hurt ALVINN.) Solutions:

Separate set ofhiddens for IRRE

Dont propagate error fromIRRE outputs to hiddens

39

Speed Control

If none of the networks appears to be reliable in thecurrent situation, the system slows the vehicle andasks for further training.

Other ways ALVINN controls speed:

� If networks are steering erratically, slow down.

� If networks are steering sharply, slow down.

40

Lessons Learned

1) Preprocessing Tradeoff: more preprocessing allowsthe net to solve harder problems, but it will be lessflexible in new situations.

2) Importance of Modularity: by only requiringindividual networks to handle relatively restrictedsituations, network training can be made fast androbust.

3) Viability of Hybrid Approach: to achieve high levelbehavior like route following, ANNs can (and currentlymust) be combined with symbolic techniques.


41

Lessons Learned

4) Analyzability: neural nets are not just black boxes.We can look at the internal representations anddetermine how they work.

5) The techniques developed in ALVINN for robotdriving are also applicable to other forms of vision-based robot guidance.

The simplicity of adapting ALVINN to new domainsunderscores the advantage that learning networkshave over conventional hand-coded systems.

42

Self-Mobile Space Manipulator

43

SMSM Network Architecture

44

What the Network Sees


45

SMSM Learned Weights

Weights start outrandom.

With even a littletraining, a clearhorizontal or verticalstructure emerges inthe input to hiddenweights.

(movie) 46

RALPH

� Rapidly AdaptingLateral PositionHandler (Pomerleau &Jochem, 1996)

� Non-neural successorto ALVINN.

� Estimate roadcurvature, then usetemplate match toestimate lateralposition.

47

Lieb et al. (2004): Self-supervisedLearning and Reverse Optic Flow.

1. Horizonal cross-sectional road templatesfound by reverse opticflow.

2. Match templatesagainst road atappropraite verticalheights.

3. Dynamic programmingto find globally optimalhorizontal road positionat each height.


Documents

ALVINN: Autonomous Land Vehicle Visually-Guided Robot