Grand Rehearsal FEV 2018 · Grand Rehearsal FEV 2018 Prof. Bart ter Haar Romeny The relation between biological vision and computer vision

Grand Rehearsal FEV 2018

Prof. Bart ter Haar Romeny

The relation between biological vision and computer vision

Example filters learned by Krizhevsky et al. Each of the 96 filters shown here is of size [11x11x3], and each one is shared by the 55*55 neurons in one depth slice. Notice that the parameter sharing assumption is relatively reasonable: If detecting a horizontal edge is important at some location in the image, it should intuitively be useful at some other location as well due to the translationally-invariant structure of images. There is therefore no need to relearn to detect a horizontal edge at every one of the 55*55 distinct locations in the Conv layer output volume.

Can we understand these filters?

PAGE 3

• Geometry, N-jet: Gaussian derivatives measure structure to high orderExample: Deblurring

• Fundamental constraint between order, scale and accuracyExample: be careful with too small scales

• Color receptive fields take Gaussian derivatives along the wavelength axisExample: yellow-blue (1st order), red-green (2nd order)

• Temporal receptive fields take derivatives along logarithmic real-timeExample: skewed temporal cortical receptive fields

• Direction selective receptive fields measure retinal velocity• Disparity receptive fields measure small shifts between RF’s in both eyes

Many classes of receptive fields learned from vision:

Can we inverse the diffusion equation?

Recall that scale-space is infinitely differentiable due to the regularization properties of the observation process.

We can construct a Taylor expansion of the scale-space in any direction, including the negative scale direction.

Deblurring Gaussian blur

Taylor expansion ‘downwards’:

The derivatives with respect to s (scale) can be expressed inspatial derivatives due to the diffusion equation

𝐿𝐿 𝑥𝑥,𝑦𝑦, 𝑠𝑠 − 𝛿𝛿𝑠𝑠 = 𝐿𝐿 −𝜕𝜕𝐿𝐿𝜕𝜕𝑠𝑠𝛿𝛿𝑠𝑠 +

12!𝜕𝜕2𝐿𝐿𝜕𝜕𝑠𝑠2

𝛿𝛿𝑠𝑠2 −13!𝜕𝜕3𝐿𝐿𝜕𝜕𝑠𝑠3

𝛿𝛿𝑠𝑠3 + 𝑂𝑂(𝛿𝛿𝑠𝑠)4

𝜕𝜕𝐿𝐿𝜕𝜕𝑠𝑠

= 𝜕𝜕2𝐿𝐿𝜕𝜕𝑥𝑥2

+𝜕𝜕2𝐿𝐿

𝜕𝜕𝑦𝑦2

𝐿𝐿 𝑥𝑥,𝑦𝑦, 𝑠𝑠 − 𝛿𝛿𝑠𝑠 = 𝐿𝐿 −𝜕𝜕2𝐿𝐿𝜕𝜕𝑥𝑥2 +

𝜕𝜕2𝐿𝐿𝜕𝜕𝑦𝑦2 𝛿𝛿𝑠𝑠 +

12!

(𝜕𝜕4𝐿𝐿

𝜕𝜕𝑥𝑥4+ 2 𝜕𝜕4𝐿𝐿

𝜕𝜕𝑥𝑥2𝜕𝜕𝑦𝑦2+ 𝜕𝜕4𝐿𝐿

𝜕𝜕𝑦𝑦4)𝛿𝛿𝑠𝑠2 + 𝑂𝑂(𝛿𝛿𝑠𝑠)3

order16 order 32

order 4 order8

Deblurring to 4th, 8th,16th and 32nd order:

There are 560 derivativeterms in the 32nd orderexpression!

Six layers of primary visual cortex

Blobs: non-orientation

specific cells for color processing

Salzmann, Matthias F. Valverde, et al. "Color blobs in cortical areas V1 and V2 of the new world monkey Callithrix jacchus, revealed by non-differential optical imaging." Journal of Neuroscience 32.23 (2012): 7881-7894.

Cytochrome oxidase

The color blobs are sparsely distributed over the cortex.

Bartfeld, Eyal, and Amiram Grinvald. "Relationships between orientation-preference pinwheels, cytochrome oxidase blobs, and ocular-dominance columns in primate striate cortex." Proceedings of the National Academy of Sciences 89.24 (1992): 11905-11909.

More than 80% of the pinwheels were centered along the midline of ocular-dominance columns.

The centers of blobs also lie along the midline of ocular-dominance columns. However, the centers of pinwheels did not coincide with the centers of blobs; these two subsystems are spatially independent.

Colour receptivefields fromEigenpatches

λx

If we extend the PCA to color images, we get color receptive fields.They clearly exhibit spatio-chromal differential structure.

Multi-scale derivative operators to space and

wavelength

Color receptive fields

xλ

y0 1 2

λx

Multi-scale derivative operators to space and wavelength(Koenderink 1999)

Taylor expansion color model

λ

λ

λ

Luminance

Blue-yellowness

Purple-greenness

300 400 500 600 700 800 90000.10.20.30.40.50.60.70.80.9

LM

S

Cone sensitivity

)()),,(1()()( 2 λρλλ ∞−= RvsneE f

The reflected spectrum is:

v = viewing directionn = surface patch normals = direction of illuminationρf = Fresnel front surface reflectance coefficient in vR = body reflectance

Aim: describe material changes independent of the illumination.

λρλ

λλρ

λ

∂∂

−

+∂∂

−=∂∂

∞

∞

Rxxie

exRxxiE

f

f

2

2

))(1)(()(

),())(1)((

),())(1)(()(),( 2 xRxxiexE f λρλλ ∞−=

Both equationshave manycommon terms

λλλλλ ∂∂

+∂∂

=∂∂

= ∞

∞

RxR

ee

EE

E),(

1)(

11ˆ

The normalized differential

determines material changes independent of the viewpoint, surface orientation, illumination direction, illumination intensity and illumination color !

[im,] 1e e

color invariant 1e e

[im,]

first wavelength derivative of

[im,] 22

second wavelength derivative of

g[im,] x2y2 yellow-blue edges

g[im,]2x22y

2 red-green edges

g[im,] x2y22x

22y2 total color edge strength

Some colordifferentialinvariants

Note the complete absence of detection of black-white edges.

Blue-yellow edges

50

50

Time sequence simple cell RF, separable

Reverse Correlation Technique

Spatio-temporal operators are measured in V1

DeAngelis, Gregory C., Izumi Ohzawa and Ralph D. Freeman. Receptive-field dynamics in the central visual pathways. Trends in Neurosciences, 18.10 (1995): 451-458.

Spatio-temporal receptive field mapping

by reverse correlation by Ohzawa and Freeman

(UC Berkeley)Stimulus

RF profile

Temporal RF

Reverse Correlation Technique

DeAngelis, Gregory C., Izumi Ohzawa and Ralph D. Freeman. Receptive-field dynamics in the central visual pathways. Trends in Neurosciences, 18.10 (1995): 451-458.

t0 t

Dimensionless elapsed time, t0 is present moment.

1 0.5 0.5 1t

1

1

2

3

4s

present

s

1 s c1 ln lnt0 t

The desired mapping is s(t).

Diffusion should be homogeneous in the s-domain: Fss = FσThe magnification |ds/dμ| should be inversely proportional with μ.

On this logarithmic mapping s(t) we can now go from – infinity to + infinity.

s

t

Kt, t0; 12 e 1

22lnt0t

2The receptive fieldsbecome skewed:

0.4 0.3 0.2 0.1

0.51

1.52

temporalorder0

time

0.4 0.3 0.2 0.1

302010

1020

temporalorder1

time0.4 0.3 0.2 0.1

1000500

5001000

temporalorder 2

time

time

space

time

A clear skewness is observed in the time direction.

De Valois, R. L., Cottaris, N. P., Mahon, L. E., Elfar, S. D., & Wilson, J. A. (2000). Spatial and temporal receptive fields of geniculate and cortical cells and directional selectivity. Vision research, 40(27), 3685-3702.

Geometry-driven diffusion‘adaptive filtering’

Prof. Bart ter Haar Romeny

The relation between biological vision and computer vision

Extensive feedback from primary visual

cortex to LGN:

Why is this?

Feedback in the visual system: geometry-driven diffusion

Edges remain, areas are smoothed: images become cleaner, ‘piecewise homogeneous’

J.Kacur, K.Mikula, Slowed anisotropic diffusion, in B. ter Haar Romeny, L. Florack, J. Koenderink, M. Viergevier (Eds.), Lecture Notes in Computer Science, Vol. 1252, Proc. of 1-st Intern. Conf. on Scale Space Theory in Computer Vision, Springer Berlin (1997) pp. 357-360.

http://www.cv.ruu.nl/Conferences/ScaleSpace97.html

It is a divergence of a flow. We also call the flux function. Withc = 1 we have normal linear, isotropic diffusion: the divergence of the gradient flow is the Laplacian.

We go modify the process of diffusion (blurring) with a geometry-dependent term.A conductivity coefficient (c) is introduced in the diffusion equation:

Ls .cL

c cL, Lx ,

2 Lx2 , ...

cL

Ls

2 Lx2

2 Ly2

Theory:

→

where c is a function of the geometry,i.e. of the derivatives:

(the divergence of the gradient)

Ls .cLLThe Perona & Malik equation (1991):

c1 eL2

k2

c211 L2k2

1L2k2 L42k4 OL5

1L2k2 L4

k4 OL5These conductivity terms are equivalent to first order.

Paper P&M

With increasing edge strength, we want a smaller c (less diffusion).With two possible choices for c:

http://bmia.bmt.tue.nl/Education/Courses/FEV/course/literature/PeronaMalik1990.pdf

Lx2Ly2

k2 k22 Lx2Lxx4 Lx Lxy Lyk22 Ly

2Lyyk2

Working out the differentiations, we get a strongly nonlinear diffusion equation:

The solution is not known analytically, so we have to relyon numerical methods, such as the forward Euler method:

L s.cLLs

This process is an evolutionary computation.

Nonlinear scale-space

Locally adaptive elongation of the diffusion kernel:Coherence Enhancing Diffusion

But: reducing to a small kernel is sometimes not effective, and we need a new solution:

1

22Ex.D.x

Diffusion tensor

Maxima

Minimum

Saddles∇L=0Critical Points

Det(H)=0Top-Points

Critical points, paths and top-points

1 2 3 4 5 87 96

Database image retrieval

Dat

abas

eQuery image

Comparing top-points of images

CompareEMD

Differential invariants

• We use the complete set of irreducible 3rd order differential invariants.

• These features are rotation and scaling invariant.

• Top-points are related to SIFT keypoints(Scale Invariant Feature Transform)

In Mathematica: ImageKeypoints[]

https://reference.wolfram.com/language/ref/ImageKeypoints.html

• The top-points and differential invariants are calculated for the query object and the scene.

• We now compare the differential invariant features.

compare

distance = 0.5distance = 0.2distance = 0.3

The vectors with the smallest distance are paired.

smallest distance

distance = 0.2

• A set of coordinates is formed from the differences in scale (Log(σo1)-Log(σs2)) and in angles (θo1- θs2).

(∆θ1, ∆τ1)

∆τ

∆θ

Important Clusters

For these clusters we calculate the mean ∆θ and ∆τ

Clustering (∆θ,∆τ)

• If these coordinates are plotted in a scatter plot clusters can be identified.

• In this scatter plot we find two dense clusters

The stability criterion removes much of the scatter

Rotate and scale according to the cluster means.

The translations we find correspond to the location of the objects in the scene.

In this example we have two clusters of correctly matched points.

C1

C2

MR slice hartcoronair

↑scale

toppoints

• graphtheory

Edge focusing

2D deephierarchicalstructure

Multi-scalesignaturefunctionof this imageline

Marching-cubes isophote surface ofthe macrophage.

slice 24 slice21 slice 25 slice18 slice22 slice 21

slice 24 slice23 slice 24 slice20 slice18 slice 24

Preprocessing:- Blur with σ = 3 px- Detect N strongest maxima

We interpolatewith cubic splinesinterpolation35 radial tracksin 35 3Dorientations

The profiles are extremely noisy:

Observation: visually we can reasonably point the steepest edgepoints.

Edge focusingover all profiles.

Choose a startlevel based onthe task, i.e. finda single edge.

Detected 3D points per maximum.

We need a 3D shape fit function.

12 , 1

2 3

2 sin, 123

cos,

12

32 sin, 1

4 2 15

2 sin2,12

152 cossin, 1

453cos21,

12

152 cossin, 1

4 2 15

2 sin2

The 3D points are least square fit with 3D spherical harmonics:

Resulting detection:

Next steps:

• Perceptual grouping layer 2• World in motion• Fill-in from models

The visual cascade versus Deep Learningwith neural nets of many layers

Front-End Vision and Deep Learning 2018

Documents

Grand Rehearsal FEV 2018 · Grand Rehearsal FEV 2018 Prof. Bart ter Haar Romeny The relation between biological vision and computer vision