Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
COMP_4190 Artificial Intelligence
Computer Vision
Jacky Baltes
Department of Computer Science
University of Manitoba
Winnipeg, Manitoba
Canada, R3T 2N2
http://www.cs.umanitoba.ca/~jacky
Computer Vision
� Introduction
� Digital images
� Edge detection
� Segmentation
� Watershed algorithm
� Snakes
� Hough transform
� detect lines
� general Hough transform
� Scene interpretation
� Waltz Algorithm
Digital Images
� Computer graphics: model -> image
� Computer vision: image -> model
� Two dimensional array of picture elements
(pixels)
� Captured through a sensor
� camera, scanner
� structured lighting
Levels of Abstraction
� Lowest level:
� Pre-processing, noise removal, edge detection
� Middle level:
� Segmentation into regions
� High level:
� Object recognition, motion analysis
� World model
Optical Illusions Optical Illusions
PPM File Format
� We will use the PPM file format in this course
(easy,lossless)
� Colour format with three channels (red, green, blue)
� ASCII Header
� P6 Magic number
� 312 235 width height
� 255 # of colours per channel (depth)
� Followed by width * height * log2(depth) * 3 bytes
PPM File Format
P6
312 235
255
<00><D9><D6><D7><D4><D4><D6><D7><DA
><DA><E0><DB><E6><D8><E6><D6><E5><
D6>
Edge Detection
� Colours are very susceptible to lighting
� An important pre-processing step is edge
detection
� How do we find an edge?
� Sharp contrast in the image
� Derivative of the image function I(i,j)
� Approximate derivative with I(i+1,j) - I(i-1,j)
Convolution
� Edge Detection and many
other image pre-
processing steps can be
implemented as a
convolution
� A convolution mask is a
matrix that is applied to
each pixel in the image
� Specifies weights of the
neighbors
0 0 0
-1 0 +1
0 0 0
Derivative
Convolution
� What is the output of [ 0, 255, 0, 0, ..... ]
� -255?
� Use divisor and offset to normalize result of
convolution to 0 .. 255
� How to deal with colour images?
� Convert to grey scale
� Handle each channel seperately
Sobel Edge Detection
� To reduce noise, average over several rows
� Weigh rows differently
� Divisor = 8, Offset = 128
-1 0 +1
-2 0 +2
-1 0 +1
Sobel Edge Detection
� Use seperate convolution matrices for horizontal
and vertial
-1 0 +1
-2 0 +2
-1 0 +1
-1 -2 -1
0 0 0
+1 +2 +1
Horizontal Vertical
Convolution
� Many other filters can be implemented efficiently
as convolution
� One problem: what to do at the borders
� What does the following filter do?
+1 +1 +1
+1 +1 +1
+1 +1 +1
Blurring (Simple)
� Blurring is used to reduce noise in the image
+1 +1 +1
+1 +1 +1
+1 +1 +1
Template Matching Convolution
� Find Specific features in the image
0 +1 0
+1 +1 +1
0 0 0
Divisor = 4
Segmentation
Image
"What are the objects to be analyzed?"
Pre-processing, im age enhancem ent
Segmentat ion
Binary operat ions
Morphological operat ions and feature ext ract ion
Classificat ion and m atching
Im age analysis
DataData
Segmentation
� Full segmentation: Individual objects are separated from the background and given individual ID numbers (labels).
� Partial segmentation: The amount of data is reduced (usually by separating objects from background) to speed up the further processing.
� Segmentation is often the most difficult problem to solve in the process; there is no universal solution!
� The problem can be made much easier if solved in cooperation with the constructor of the imaging system (choice of sensors, illumination, background etc) .
Three Types of Segmentation
� Classification – Based on some similarity measure between pixel values. The simplest form is thresholding.
� Edge-based – Search for edges in the image. They are then used as borders between regions
� Region-based – Region growing, merge & split
Common idea: search for discontinuities or/and similitudes in the image
Thresholding
(Global and Local)
� Global: based on some kind of histogram: grey-level, edge, feature etc.� Lighting conditions are extremely important, and it will
only work under very controlled circumstances.
� Fixed thresholds: the same value is used in the whole image
� Local (or dynamic thresholding): depends on the position in the image. The image is divided into overlapping sections which are thresholded one by one.
Classical Automatic Thresholding Algorithm
1. Select an initial estimate for T
2. Segment the image using T. This produces 2 groups: G1 ,
pixels with value >T and G2 , with value <T
3. Compute µ1 and µ2, average pixel value of G1 and G2
4. New threshold: T=1/2(µ1+µ2)
5. Repeat steps 2 to 4 until T stabilizes.
� Very easy + very fast
� Assumptions: normal dist. + low noise
Optimal Thresholding
� Based on the shape of the current image histogram. Search
for valleys, Gaussian distributions etc.
Background
Real histogram
Opt im althreshold ?
Both
Foreground
Histograms
To love� �and to hate
Thresholding and illumination
� Solutions:
� Calibration of the
imaging system
� Percentile filter with very
large mask
� Morphological operators
MR non-uniformity
median filtering thresholding
-
More thresholding
� Can also be used on other kinds of histogram: grey-level,
edge, feature etc.
Multivariate data (� see next lectures)
� Problems:
� Only considers the graylevel pixel value, so it can leave
“holes” in segmented objects.
� Solution: post-processing with morphological operators
� Requires strong assumptions to be efficient
� Local thresholding is better � see region growing
techniques
Edge-based Segmentation
Based on finding discontinuities (local variations of image
intensity)
3. Apply an edge detector
ex gradient operator (Sobel)
second derivative (Laplace)
4. Threshold the edge image to get a binary image
5. Depending on the type of edge detector:
� Link edges together to close shapes (using edge direction
for example)
� Remove spurious edges
Gradient based procedure
Sobel
Sobel
Zero-crossing based procedure
LoG
Laplacian of Gaussian
Edge-based Segmentation: examples
Prewit t : needs edge linking Canny: needs � cleaning�
Region based segmentation
� Work by extending some region based on local
similarities between pixels
� region growing (bottom-up method)
� region splitting and merging (top-down method)
� Bottom-up: from data to representation
� Top-down: from model to data
Region growing: Bottom Up Method
1. Find starting points
3. Include neighbouring pixels with similar features (grey-level, texture, color).
5. Continue untill all pixels have been included with one of the starting points.
� Problems:
� Not trivial to find good starting points, difficult to automate
� Need good criteria for similarity.
Region Growing: Watershed Algorithm
� Think of the grey-level image as a landscape. Let water rise from the bottom of each valley (the water from each valley is given its own label). As soon as the water from two valleys meet, build a dam, or watershed. These watersheds will then define the borders between different regions.
Exam ple of watershed direct ly on a gray-level im age
Exam ple of Watershed on a binary im age
Watershed: Problems and Solutions
� Oversegmentation
� Watershed from markers
� Computation
� new algorithm for fast watershed
� Graylevel might not be the optimal choice as the local
similarity measure
� bigger neighborhood when growing
� other local features (statistical, edge enhanced image,
distance transformed image…)
Region Growing: Split & Merge
1. ) Set up some criteria for what is a uniform area (ex mean, variance, bimodality of histogram, texture, etc…)
2.) Start with the full image and split it in to 4 sub-images
3.) Check each sub-image. If not uniform, divide into 4 new sub-images
4.) After each iteration, compare touching regions with neighboring regions end merge if uniform
The method is also called "quadtree" decomposition/division (and is also used for compression)
Split & Merge The Hough transform
� A method for finding global relationships between pixels.
Example: We want to find straight lines in an image
� 1. Apply edge enhancing filter (ex Laplace)
� 2. Set a threshold for what filter response is considered a true ”edge pixel”
� 3. Extract the pixels that are on a straight line using the Hough transform
original im age edge enhanced image
thresholded edge image
The Hough transform
Finding straight lines:
1. consider a pixel in position (xi,yi)
2. equation of a straight line yi=axi+b
3. set b=-axi+ yi and draw this (single) line in ”ab-space”
4. consider the next pixel with position (xj,yj) and draw the line b=-axj+ yj ”ab-space” (also called parameter space). The points (a’,b’) where the two lines intersect represent the line y=a’x+b’ in ”xy-space” which will go through both (xi,yi) and (xj,yj).
5. draw the line in ab-space corresponding to each pixel in xy-space.
6. divide ab-space into accumulator cells and find most common (a’, b’) which will give the line connecting the largest number of pixels
y
The Hough transform
xy-space
x
ab- or parameter space
b
a
The Hough transform� In reality we have a problem with y=ax+b because a reaches infinity for
vertical lines.
Use instead.
� It is common to use ”filters” for finding the intersection: ”butterfly filters”
� Different variations of the Hough transform can also be used for finding other shapes of the form g(v,c)=0, v is a vector of coordinates, c is a vector of coefficients.
� Possible to find any kind of simple shape
ex. circle: (3D parameter space)� x�c1 �
2�� y�c2 �
2=c3
2
The Hough transform
Conclusions
� The segmentation procedure
� Pre-processing
� Segmentation
� Post-processing
� � Like any IP procedure
� There exists NO universal segmentation method
� Evaluation of segmentation performance is important
46
Snakes
� Example: segmentation of the brain in MRI
Snake after initialization Snake at equilibrium
User interaction
Snakes (active contours)
� A snake is an active contour parametrically represented by its
position v(s)=(x(s), y(s)). s ranges from 0 to 1
� Each position is associated to an energy:
� The final position corresponds to the minimum of the energy
E snake=0
1
E int [ v� s � ]ds�0
1
E ext [ v� s � ]ds
Internal Energy
The internal energy of the snake is due to bending and it
is associated with a priori constraints:
� �(s) controls the tension of the contour
� �(s) controls its rigidity
External Energy
� The external energy depends on the image and accounts for a
posteriori information
� Several energy forms have been proposed based on features
of interest in the image
� An energy commonly used to attract snakes towards edges is:
Applications
Applications
(by Terzopoulos)
52
Considerations
� The number of nodes is an important factor for the behavior of
the snake. Ability to resample the contour may be necessary.
� If we want a closed contour, we set the first and the last point
equal.
� Anchor points are necessary to keep the snake in position if
the image forces are not enough.
� It may be necessary to allow a snake contour to divide into
two contours, or two contours to merge into one contour.
� Different applications may need different potential functions
and different settings of the control parameters (damping,
tension and rigidity).
53
Applications
� Tracking of a moving object
� An initial estimate for the contour (e.g. interactively defined)
is used in the first frame.
� The contour at equilibrium is used as the starting contour for
the next frame. The snake locks on to the object.
� Reconstruction from serial sections
� The user draws an approximate contour in the first slice.
� The contour at equilibrium is used as the starting contour in
the next slice.
� The 3D object is reconstructed from the contours using
triangulation.
� …..
More segmentation
Important in Image Processing in general:
“If you can use expert knowledge (user interaction,
modelling,…) at relatively low cost (development,
computational,…)”
JUST DO IT!!
Shape descriptions
Goal: descript ion of im age inform at ion suitableto use for classificat ion, recognit ion, interpretat ion
� Based on contours:
� chain codes
� poligonal approximation
� signature
� Fourier descriptors, …
� Based on regions:
� area, P2A,…
� topology
� moments,…
Representation schemes
� based on boundary or
region?
� reconstruction?
� stable under scaling,
rotation, and translation?
� recognition by
incomplete
representation?
Chain coding
� boundary as a sequence of straight lines
� 4- or 8-connectedness
Example of chain coding
4-connectedness 8-connectedness
4-code: 0 0 0 3 0 3 0 3 0 3 2 3 2 1 1 2 2 2 2 3 2 1 1 0 1 1
8-code: 0 0 0 7 7 7 6 5 2 3 4 4 5 4 2 2 1 2
Problems with chain coding
� Long code
� Dependent on
� Scale
� Starting point
� Rotation
� Small change in the boundary (noise or bad
segmentation) can generate a chain code that does
not reflect the shape of the object
More stable
� Rescale
� Starting point
� treat as a circular sequence
� define as starting point, the point giving the least
magnitude
� Rotation
� use difference code
10103322
3133030
Codes from different starting points
AB
1212232330301010Bx
2323303010101212Bo
0101012122323303B•
1133345775557110Ax
3457755571101133Ao
1101133345775557A•
Shape number
� Object descriptor by chain coded edge
� Difference code
� Starting point giving least magnitude
� Number of digits gives the order
rotat ion independent & unique ... ... but dependent on orientat ion of
grid
Shape numbers of order 8≤
3333
3333
1230
order 4
shape no.
difference
chain code
order 6
330330
303303
122300
30303030
03030303
11223300
33003300
30033003
12223000
33133030
03033133
11223030
order 8
Basic rectangle
� diameter for a boundary B is defined as
diam(B)=max(D(pi,pj)), pi,pjB
� length
� orientation
� diameter defines major axis
� minor axis perpendicular to major axis
� basic rectangle – minimal enclosing rectangle
using minor & major
Shape number –
using basic rectangle
� Shape order n
� Choose rectangle of order n best approximating the
basic rectangle
� Rectangle gives gridsize
� n=12: 2�4, 3�3, 1�5
� �� a unique shape number
Polygonal approximation
� represent the boundary with a polygon
� reduction of discretization effects
� idea: find a simple polygon that reflects the shape
of the boundary
� fit straight lines to boundary
� often not trivial and time consuming
Minimum perimeter polygon
original minimum perimeter
Polygon: merge, split Signature
� 1D functional representation of the boundary� simplest approach: distance from centroid as
function of angle� 2D boundary � 1D function� problems with rotation and scaling
� select centroid� select start point, e.g., as furthest for centroid� rescale function, e.g., so values[0,1]
Signatures for two boundaries Boundary Segments
� decomposition of boundary into segments
� reduce complexity �� simplify description
� often of interest in case of concavities
� use decomposition based on convex hull
Curvature
� rate of change of slope
�2s(t)/ �t2
� difficult in the discrete case
(= images) due to uneven
boundary
� use boundary segments
� fit straight line
� define as change of slope for n
n1
n2
Using Curvature Information
� Bending energy for boundary of length L
� energy necessary to bend a rod to desired shape...
� sum of squared curvature c(k)
� Decomposition based on points of extreme curvature
BE=1
L k=1
L
c2 �k �
Fourier descriptors
� represent boundary (of K pixels) by a sequence of
coordinates
� s(k)=(x(k),y(k)), k=0,1,2,...,K-1
� complex number s(k)=x(k)+iy(k)
� 2D �� 1D
� discrete Fourier Transform (DFT)
a(u) – Fourier descriptors of the boundary
Fourier descriptors – reconstruction
� reconstruction of boundary using inverse DFT
� approximation of boundary using only P first Fourier coefficients
� gives the same number of points in boundary� remember:
� high-frequency for fine details� low-frequency for global shape
Properties of Fourier descriptors
ap(u)=a(u)e-j2�k0u/Ksp(k)=s(k-k0)starting point
as(k)=�a(u)ss(k)=�s(u)scaling
at(u)=a(u)+ �xy�(u)st(k)=s(k)+�xytranslation
ar(u)=a(u)ej�sr(k)=s(k)ej�rotation
a(u)s(k)identity
fourier descriptorboundarytransformation
�xy=�x+ j�y
Measures
� length of boundary – perimeter � count pixels
� ROUGH approximation
� chain code� simple representation
� a x # edge steps + b x # point steps
� diameter: largest distance between two pixels on the boundary� length
� direction
a
b
Area
Num ber of pixels
blue = 10green = 4
Compactness
� P2/A: perimeter x perimeter / area
� dimensionless
� minimal for disc
� rotation invariant
compact not compact
P2/A
perim eter x perim eter / area norm alizat ion:
area = 3591, perim eter = 221
area = 10538, perim eter = 798
P2/A= 60.43, P2/A norm= 4.81
P2/A= 13.60, P2/A norm= 1.08
Examples of P2/A measurements
Eccentricity
� longest chord/max perpendicular chord
� there are different definitions!
Elongatedness
1. high/width of the minimal bounding rectangle
7. area/(2d2)
� d is maximum width
8. longest possible path
OKnot OK
Rectangularity
� area of region/ area of
bounding rectangle
Topology
� Euler number
E=C-H
� C – number of connected components
� H – number of holes
Invariant under ”rubber sheet” transformation
Convex hull
� convex region R
� for any x1,x2R, straight line between x1 and x2 is in R
� convex hull H of a region R
� smallest convex set containing R
� convex deficiency D
R=green
H=green +brown
D=brown
Projections
ph� i �=
j
f �i , j �
pv� j �=
i
f �i , j �
Principle component analysis (PCA)
� points as 2D vectors
� (a,b)� a – x-coordinate
� b – y-coordinate
� mean vector and
covariance matrix
� find eigenvectors
e1 and e2
� rotate
Hotelling transform
Moments
Given a 2D discrete funct ion f(x,y)
moment of order p+ q
central moment of order p+ q
cent re of mass
Measurements with moments
� µ20 horizontal centralness
� µ02 vertical centralness
� µ11 diagonality
indicates with respect to centroid where R has more ”mass”� µ
12 horizontal divergence
indicates the relative extent of the left of R compared to the right� µ
21 vertical divergence
indicates the relative extent of the right of R compared to the left � µ
30 horizontal imbalance
location of center of gravity with respect to half horizontal extent� µ03 vertical imbalance
location of center of gravity with respect to half vertical extent
Example: Cell Classification
n1,…,n6 = normal
d1,…,d6 = diabetic
Shapes differing
in size, orientation,
border irregularity…
Waltz Algorithm
� One of the earliest examples of a constraint
satisfaction problem
� Interpret line drawings from solid polyhedra
Waltz's Algorithm
� Look at all intersections
� What type of intersection is this?
� Concave intersection of three planes
� Externel convex intersection
� Adjacent intersections impose constraints on each other
� CSP to find a unique set of labels
� First step in image understanding
Waltz Algorithm
� Assumptions
� No shadows, no cracks
� Three-faced vertices
� General position: no junctions that move with small movement
of the eye
� Then each line in the images is
� Boundary line of the object (right hand = solid, left hand =
outside)
� Convex edge (+)
� Concave edge (-)
Waltz Algorithm
� Labeling of the edges
Waltz Algorithm
Types of Junctions� 18 legal types of junctions
� Label each junction as one of
those 18
� Both ends of a line must be
consistent (same label)
� Reformulate as a CSP.
Constraint propagation always
works perfectly
Waltz Examples
References
� Les Kitchen (Lecture Slides)
� http://www.cs.mu.oz.au/480/lec_intro_part.pdf
� Lucia Ballerini (Lecture slides)
� http://www.cb.uu.se/~lucia/
� Andrew Moore's lecture slides (CMU)
� http://www2.cs.cmu.edu/~awm/tutorials/constraint05.