54
Last update: May 4, 2010 Vision CMSC 421: Chapter 24 CMSC 421: Chapter 24 1

Vision - cs.umd.edu

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Vision - cs.umd.edu

Last update: May 4, 2010

Vision

CMSC 421: Chapter 24

CMSC 421: Chapter 24 1

Page 2: Vision - cs.umd.edu

Outline

♦ Perception generally

♦ Image formation

♦ Early vision

♦ 2D → 3D

♦ Object recognition

CMSC 421: Chapter 24 2

Page 3: Vision - cs.umd.edu

Perception generally

Stimulus (percept) S, World W

S = g(W )

E.g., g = “graphics.” Can we do vision as inverse graphics?

W = g−1(S)

CMSC 421: Chapter 24 3

Page 4: Vision - cs.umd.edu

Perception generally

Stimulus (percept) S, World W

S = g(W )

E.g., g = “graphics.” Can we do vision as inverse graphics?

W = g−1(S)

Problem: massive ambiguity!

CMSC 421: Chapter 24 4

Page 5: Vision - cs.umd.edu

Perception generally

Stimulus (percept) S, World W

S = g(W )

E.g., g = “graphics.” Can we do vision as inverse graphics?

W = g−1(S)

Problem: massive ambiguity!

CMSC 421: Chapter 24 5

Page 6: Vision - cs.umd.edu

Perception generally

Stimulus (percept) S, World W

S = g(W )

E.g., g = “graphics.” Can we do vision as inverse graphics?

W = g−1(S)

Problem: massive ambiguity!

CMSC 421: Chapter 24 6

Page 7: Vision - cs.umd.edu

Example

CMSC 421: Chapter 24 7

Page 8: Vision - cs.umd.edu

Example

CMSC 421: Chapter 24 8

Page 9: Vision - cs.umd.edu

Better approaches

Bayesian inference of world configurations:

P (W |S) = α P (S|W )︸ ︷︷ ︸“graphics”

P (W )︸ ︷︷ ︸“prior knowledge”

Better still: no need to recover exact scene!Just extract information needed for

– navigation– manipulation– recognition/identification

CMSC 421: Chapter 24 9

Page 10: Vision - cs.umd.edu

Vision “subsystems”

behaviorsscenes

objects

depth map

optical flowdisparityedges regions

features

image sequence

s.f. contour

trackingdata association

objectrecognition

segmentationfilters

edgedetection

matching

s.f.motion

s.f.shading

s.f.stereo

HIGH−LEVEL VISION

LOW−LEVEL VISION

Vision requires combining multiple cues

CMSC 421: Chapter 24 10

Page 11: Vision - cs.umd.edu

Image formation

f

imageplane

pinhole

P

P’

Y

X

Z

P is a point in the scene, with coordinates (X, Y, Z)P ′ is its image on the image plane, with coordinates (x, y, z)

x =−fXZ

, y =−fYZ

where f is a scaling factor.

CMSC 421: Chapter 24 11

Page 12: Vision - cs.umd.edu

Images

CMSC 421: Chapter 24 12

Page 13: Vision - cs.umd.edu

Images, continued

195 209 221 235 249 251 254 255 250 241 247 248

210 236 249 254 255 254 225 226 212 204 236 211

164 172 180 192 241 251 255 255 255 255 235 190

167 164 171 170 179 189 208 244 254 234

162 167 166 169 169 170 176 185 196 232 249 254

153 157 160 162 169 170 168 169 171 176 185 218

126 135 143 147 156 157 160 166 167 171 168 170

103 107 118 125 133 145 151 156 158 159 163 164

095 095 097 101 115 124 132 142 117 122 124 161

093 093 093 093 095 099 105 118 125 135 143 119

093 093 093 093 093 093 095 097 101 109 119 132

095 093 093 093 093 093 093 093 093 093 093 119

255 251

I(x, y, t) is the intensity at (x, y) at time t

typical digital camera ≈ 5 to 10 million pixelshuman eyes ≈ 240 million pixels; about 0.25 terabits/sec

CMSC 421: Chapter 24 13

Page 14: Vision - cs.umd.edu

Color vision

Intensity varies with frequency → infinite-dimensional signal

intensity

frequency

frequency

sensitivity

Human eye has three types of color-sensitive cells;each integrates the signal ⇒ 3-element vector intensity

CMSC 421: Chapter 24 14

Page 15: Vision - cs.umd.edu

Primary colors

Did anyone ever tell you that the three primary colors are red, yellow, andblue?

CMSC 421: Chapter 24 15

Page 16: Vision - cs.umd.edu

Primary colors

Did anyone ever tell you that the three primary colors are red, yellow, andblue?

Not correct.

Did anyone ever tell you that• red, green, and blue are the primary colors of light, and• red, yellow, and blue are the primary pigments?

CMSC 421: Chapter 24 16

Page 17: Vision - cs.umd.edu

Primary colors

Did anyone ever tell you that the three primary colors are red, yellow, andblue?

Not correct.

Did anyone ever tell you that• red, green, and blue are the primary colors of light, and• red, yellow, and blue are the primary pigments?

Only partially correct.

What do we mean by “primary colors”?

CMSC 421: Chapter 24 17

Page 18: Vision - cs.umd.edu

Primary colors

Did anyone ever tell you that the three primary colors are red, yellow, andblue?

They weren’t correct.

Did anyone ever tell you that• red, green, and blue are the primary colors of light, and• red, yellow, and blue are the primary pigments?

Only partially correct.

What do we mean by “primary colors”?

Ideally, a small set of colors from which wecan generate every color the eye can see

Some colors can’t be generated by any combination of red, yellow, and bluepigment.

CMSC 421: Chapter 24 18

Page 19: Vision - cs.umd.edu

Primary colors

Use color frequencies that stimulate the three color receptors

Primary colors of light (additive)⇔ frequencies where the three

color receptors are most sensitive⇔ red, green, blue

Primary pigments (subtractive)white minus red = cyanwhite minus green = magentawhite minus blue = yellow

“Four color” (or CMYK) printing uses cyan, magenta, yellow, and black.

CMSC 421: Chapter 24 19

Page 20: Vision - cs.umd.edu

Edge detection

Edges in image ⇐ discontinuities in scene:1) depth2) surface orientation3) reflectance (surface markings)4) illumination (shadows, etc.)

Edges correspond to abrupt changes in brightness or color

CMSC 421: Chapter 24 20

Page 21: Vision - cs.umd.edu

Edge detection

Edges correspond to abrupt changes in brightness or colorE.g., a single row or column of an image:

Ideally, we could detect abrupt changes by computing a derivative:

In practice, this has some problems . . .

CMSC 421: Chapter 24 21

Page 22: Vision - cs.umd.edu

Noise

Problem: the image is noisy:

The derivative looks like this:

First need to smooth out the noise

CMSC 421: Chapter 24 22

Page 23: Vision - cs.umd.edu

Smoothing out the noise

Convolution g = h ? f : for each point f [i, j], compute a weighted averageg[i, j] of the points near f [i, j]

g[i, j] = Σku=−kΣ

kv=−kh[u, v]f [i− u, j − v]

Use convolutionto smooth theimage, thendifferentiate:

CMSC 421: Chapter 24 23

Page 24: Vision - cs.umd.edu

Edge detection

Find all pixels at which the derivative is above some threshold:

Label above-threshold pixels with edge orientation

Infer “clean” line segments by combining edge pixels with same orientation

CMSC 421: Chapter 24 24

Page 25: Vision - cs.umd.edu

Inferring shape

Can use cues from prior knowledge to help infer shape:

Shape from. . . Assumes

motion rigid bodies, continuous motionstereo solid, contiguous, non-repeating bodiestexture uniform textureshading uniform reflectancecontour minimum curvature

CMSC 421: Chapter 24 25

Page 26: Vision - cs.umd.edu

Inferring shape from motion

Is this a 3-d shape? If so, what shape is it?

CMSC 421: Chapter 24 26

Page 27: Vision - cs.umd.edu

Inferring shape from motion

Rotate the shape slightly

CMSC 421: Chapter 24 27

Page 28: Vision - cs.umd.edu

Inferring shape from motion

Match the corresponding pixels in the two imagesUse this to infer 3-dimensional locations of the pixels

CMSC 421: Chapter 24 28

Page 29: Vision - cs.umd.edu

Inferring shape using stereo vision

Can do something similar using stereo vision

Perceived object

Right imageLeft image

P0

P

CMSC 421: Chapter 24 29

Page 30: Vision - cs.umd.edu

Inferring shape from texture

Idea: assume actual texture is uniform, compute surface shape that wouldproduce this distortion

Similar idea works for shading—assume uniform reflectance, etc.—butinterreflections give nonlocal computation of perceived intensity⇒ hollows seem shallower than they really are

CMSC 421: Chapter 24 30

Page 31: Vision - cs.umd.edu

Inferring shape from edges

Sometimes we can infer shape from properties of the edges and vertices:

Let’s only consider solid polyhedral objects with trihedral vertices⇒ Only certain types of labels are possible, e.g.,

CMSC 421: Chapter 24 31

Page 32: Vision - cs.umd.edu

All possible combinations of vertex/edge labels

1

5 7

3

3 3 3 5

1 5 1 5

3

173

Given a line drawing, use constraint-satisfaction to assign labels to edges

CMSC 421: Chapter 24 32

Page 33: Vision - cs.umd.edu

Vertex/edge labeling example

3 3 3 5

1 5 1 5

3

173

Use a constraint-satisfaction algorithm to assign labels to edgesvariables = edges, constraints = possible node configurations

CMSC 421: Chapter 24 33

Page 34: Vision - cs.umd.edu

Vertex/edge labeling example

3 3 3 5

1 5 1 5

3

173

Start by putting clockwise arrows on the boundary

CMSC 421: Chapter 24 34

Page 35: Vision - cs.umd.edu

Vertex/edge labeling example

1

1

1

3 3 3 5

1 5 1 5

3

173

=⇒

CMSC 421: Chapter 24 35

Page 36: Vision - cs.umd.edu

Vertex/edge labeling example

1

1

1

1 1

1

3 3 3 5

1 5 1 5

3

173

=⇒

CMSC 421: Chapter 24 36

Page 37: Vision - cs.umd.edu

Vertex/edge labeling example

1

1

1

1 1

1

1

3 3 3 5

1 5 1 5

3

173 =⇒

CMSC 421: Chapter 24 37

Page 38: Vision - cs.umd.edu

Vertex/edge labeling example

1

1

1

1 1

1

1

T

3 3 3 5

1 5 1 5

3

173

=⇒

CMSC 421: Chapter 24 38

Page 39: Vision - cs.umd.edu

Vertex/edge labeling example

1

1

1

1 1

1

1

T3

3 3 3 5

1 5 1 5

3

173

=⇒

CMSC 421: Chapter 24 39

Page 40: Vision - cs.umd.edu

Vertex/edge labeling example

1

1

1

1 1

1

1

T3

3

3 3 3 5

1 5 1 5

3

173

=⇒

CMSC 421: Chapter 24 40

Page 41: Vision - cs.umd.edu

Vertex/edge labeling example

1

1

1

1 1

1

1

T3

3

3 3 3 5

1 5 1 5

3

173

Need to backtrack

CMSC 421: Chapter 24 41

Page 42: Vision - cs.umd.edu

Vertex/edge labeling example

1

1

1

1 1

1

1

T

3 3 3 5

1 5 1 5

3

173

CMSC 421: Chapter 24 42

Page 43: Vision - cs.umd.edu

Vertex/edge labeling example

1

1

1

1 1

1

1

T3

3 3 3 5

1 5 1 5

3

173

=⇒

CMSC 421: Chapter 24 43

Page 44: Vision - cs.umd.edu

Vertex/edge labeling example

1

1

1

1 1

1

1

T3

3

3 3 3 5

1 5 1 5

3

173

=⇒

CMSC 421: Chapter 24 44

Page 45: Vision - cs.umd.edu

Vertex/edge labeling example

1

1

1

1 1

1

1

T3

3

3 3 3 5

1 5 1 5

3

173

Need to backtrack again

CMSC 421: Chapter 24 45

Page 46: Vision - cs.umd.edu

Vertex/edge labeling example

1

1

1

1 1

1

1

T3

3 3 3 5

1 5 1 5

3

173

CMSC 421: Chapter 24 46

Page 47: Vision - cs.umd.edu

Vertex/edge labeling example

1

1

1

1 1

1

1

T3

3

3 3 3 5

1 5 1 5

3

173

=⇒

CMSC 421: Chapter 24 47

Page 48: Vision - cs.umd.edu

Vertex/edge labeling example

1

1

1

1 1

1

1

T3

3

33

3 3 3 5

1 5 1 5

3

173=⇒

Found a consistent labeling

CMSC 421: Chapter 24 48

Page 49: Vision - cs.umd.edu

Object recognition

Simple idea:– extract 3-D shapes from image– match against “shape library”

Problems:– extracting curved surfaces from image– representing shape of extracted object– representing shape and variability of library object classes– improper segmentation, occlusion– unknown illumination, shadows, markings, noise, complexity, etc.

Approaches:– index into library by measuring invariant properties of objects– alignment of image feature with projected library object feature– match image against multiple stored views (aspects) of library object– machine learning methods based on image statistics

CMSC 421: Chapter 24 49

Page 50: Vision - cs.umd.edu

Handwritten digit recognition

3-nearest-neighbor = 2.4% error400–300–10 unit MLP (multi-layer perceptron) = 1.6% errorLeNet: 768–192–30–10 unit MLP = 0.9% error

Current best (kernel machines, vision algorithms) ≈ 0.6% error♦ A kernel machine is a statistical learning algorithm♦ The “kernel” is a particular kind of vector function (see Section 20.6)

CMSC 421: Chapter 24 50

Page 51: Vision - cs.umd.edu

Shape-context matching

Do two shapes represent the same object?♦ Represent each shape as a set of sampled edge points p1, . . . , pn♦ For each point pi, compute its shape context

- a function that represents pi’s relation to the other points in the shape

Shape P ShapeQ Polar coordinates (log r, θ)

CMSC 421: Chapter 24 51

Page 52: Vision - cs.umd.edu

Shape-context matching contd.

A point’s shape context is a histogram:a grid of discrete values of log r and θthe grid divides space into “bins,” one for each pair (log r, θ)

θ

log r

θ

log r

θ

log r

hi(k) = the number of points in the k’th bin for point pi

CMSC 421: Chapter 24 52

Page 53: Vision - cs.umd.edu

Shape-context matching contd.

For pi in shape P and qj in shape Q, let

Cij =1

2

K∑k=1

[hi(k)− hj(k)]2/[hi(k) + hj(k)]

Cij is the χ2 distance between the histograms of pi and qj. The smaller itis, the closer the more alike pi and qj are.

Find a one-to-one correspondence between the points in shape P and thepoints in shape Q, in a way that minimizes the total cost

weighted bipartite matching problem, can be solved in time O(n3)

Simple nearest-neighbor learning gives 0.63% error rate on NIST digit data

CMSC 421: Chapter 24 53

Page 54: Vision - cs.umd.edu

Summary

Vision is hard—noise, ambiguity, complexity

Prior knowledge is essential to constrain the problem

Need to combine multiple cues: motion, contour, shading, texture, stereo

“Library” object representation: shape vs. aspects

Image/object matching: features, lines, regions, etc.

CMSC 421: Chapter 24 54