Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Grand Rehearsal FEV 2018
Prof. Bart ter Haar Romeny
The relation between biological vision and computer vision
Example filters learned by Krizhevsky et al. Each of the 96 filters shown here is of size [11x11x3], and each one is shared by the 55*55 neurons in one depth slice. Notice that the parameter sharing assumption is relatively reasonable: If detecting a horizontal edge is important at some location in the image, it should intuitively be useful at some other location as well due to the translationally-invariant structure of images. There is therefore no need to relearn to detect a horizontal edge at every one of the 55*55 distinct locations in the Conv layer output volume.
Can we understand these filters?
PAGE 3
• Geometry, N-jet: Gaussian derivatives measure structure to high orderExample: Deblurring
• Fundamental constraint between order, scale and accuracyExample: be careful with too small scales
• Color receptive fields take Gaussian derivatives along the wavelength axisExample: yellow-blue (1st order), red-green (2nd order)
• Temporal receptive fields take derivatives along logarithmic real-timeExample: skewed temporal cortical receptive fields
• Direction selective receptive fields measure retinal velocity• Disparity receptive fields measure small shifts between RF’s in both eyes
Many classes of receptive fields learned from vision:
Can we inverse the diffusion equation?
Recall that scale-space is infinitely differentiable due to the regularization properties of the observation process.
We can construct a Taylor expansion of the scale-space in any direction, including the negative scale direction.
Deblurring Gaussian blur
Taylor expansion ‘downwards’:
The derivatives with respect to s (scale) can be expressed inspatial derivatives due to the diffusion equation
𝐿𝐿 𝑥𝑥,𝑦𝑦, 𝑠𝑠 − 𝛿𝛿𝑠𝑠 = 𝐿𝐿 −𝜕𝜕𝐿𝐿𝜕𝜕𝑠𝑠𝛿𝛿𝑠𝑠 +
12!𝜕𝜕2𝐿𝐿𝜕𝜕𝑠𝑠2
𝛿𝛿𝑠𝑠2 −13!𝜕𝜕3𝐿𝐿𝜕𝜕𝑠𝑠3
𝛿𝛿𝑠𝑠3 + 𝑂𝑂(𝛿𝛿𝑠𝑠)4
𝜕𝜕𝐿𝐿𝜕𝜕𝑠𝑠
= 𝜕𝜕2𝐿𝐿𝜕𝜕𝑥𝑥2
+𝜕𝜕2𝐿𝐿
𝜕𝜕𝑦𝑦2
𝐿𝐿 𝑥𝑥,𝑦𝑦, 𝑠𝑠 − 𝛿𝛿𝑠𝑠 = 𝐿𝐿 −𝜕𝜕2𝐿𝐿𝜕𝜕𝑥𝑥2 +
𝜕𝜕2𝐿𝐿𝜕𝜕𝑦𝑦2 𝛿𝛿𝑠𝑠 +
12!
(𝜕𝜕4𝐿𝐿
𝜕𝜕𝑥𝑥4+ 2 𝜕𝜕4𝐿𝐿
𝜕𝜕𝑥𝑥2𝜕𝜕𝑦𝑦2+ 𝜕𝜕4𝐿𝐿
𝜕𝜕𝑦𝑦4)𝛿𝛿𝑠𝑠2 + 𝑂𝑂(𝛿𝛿𝑠𝑠)3
order16 order 32
order 4 order8
Deblurring to 4th, 8th,16th and 32nd order:
There are 560 derivativeterms in the 32nd orderexpression!
Six layers of primary visual cortex
Blobs: non-orientation
specific cells for color processing
Salzmann, Matthias F. Valverde, et al. "Color blobs in cortical areas V1 and V2 of the new world monkey Callithrix jacchus, revealed by non-differential optical imaging." Journal of Neuroscience 32.23 (2012): 7881-7894.
Cytochrome oxidase
The color blobs are sparsely distributed over the cortex.
Bartfeld, Eyal, and Amiram Grinvald. "Relationships between orientation-preference pinwheels, cytochrome oxidase blobs, and ocular-dominance columns in primate striate cortex." Proceedings of the National Academy of Sciences 89.24 (1992): 11905-11909.
More than 80% of the pinwheels were centered along the midline of ocular-dominance columns.
The centers of blobs also lie along the midline of ocular-dominance columns. However, the centers of pinwheels did not coincide with the centers of blobs; these two subsystems are spatially independent.
Colour receptivefields fromEigenpatches
λx
If we extend the PCA to color images, we get color receptive fields.They clearly exhibit spatio-chromal differential structure.
Multi-scale derivative operators to space and
wavelength
Color receptive fields
xλ
y0 1 2
λx
Multi-scale derivative operators to space and wavelength(Koenderink 1999)
Taylor expansion color model
λ
λ
λ
Luminance
Blue-yellowness
Purple-greenness
300 400 500 600 700 800 90000.10.20.30.40.50.60.70.80.9
LM
S
Cone sensitivity
)()),,(1()()( 2 λρλλ ∞−= RvsneE f
The reflected spectrum is:
v = viewing directionn = surface patch normals = direction of illuminationρf = Fresnel front surface reflectance coefficient in vR = body reflectance
Aim: describe material changes independent of the illumination.
λρλ
λλρ
λ
∂∂
−
+∂∂
−=∂∂
∞
∞
Rxxie
exRxxiE
f
f
2
2
))(1)(()(
),())(1)((
),())(1)(()(),( 2 xRxxiexE f λρλλ ∞−=
Both equationshave manycommon terms
λλλλλ ∂∂
+∂∂
=∂∂
= ∞
∞
RxR
ee
EE
E),(
1)(
11ˆ
The normalized differential
determines material changes independent of the viewpoint, surface orientation, illumination direction, illumination intensity and illumination color !
[im,] 1e e
color invariant 1e e
[im,]
first wavelength derivative of
[im,] 22
second wavelength derivative of
g[im,] x2y2 yellow-blue edges
g[im,]2x22y
2 red-green edges
g[im,] x2y22x
22y2 total color edge strength
Some colordifferentialinvariants
Note the complete absence of detection of black-white edges.
Blue-yellow edges
50
50
Time sequence simple cell RF, separable
Reverse Correlation Technique
Spatio-temporal operators are measured in V1
DeAngelis, Gregory C., Izumi Ohzawa and Ralph D. Freeman. Receptive-field dynamics in the central visual pathways. Trends in Neurosciences, 18.10 (1995): 451-458.
Spatio-temporal receptive field mapping
by reverse correlation by Ohzawa and Freeman
(UC Berkeley)Stimulus
RF profile
Temporal RF
Reverse Correlation Technique
DeAngelis, Gregory C., Izumi Ohzawa and Ralph D. Freeman. Receptive-field dynamics in the central visual pathways. Trends in Neurosciences, 18.10 (1995): 451-458.
t0 t
Dimensionless elapsed time, t0 is present moment.
1 0.5 0.5 1t
1
1
2
3
4s
present
s
1 s c1 ln lnt0 t
The desired mapping is s(t).
Diffusion should be homogeneous in the s-domain: Fss = FσThe magnification |ds/dμ| should be inversely proportional with μ.
On this logarithmic mapping s(t) we can now go from – infinity to + infinity.
s
t
Kt, t0; 12 e 1
22lnt0t
2The receptive fieldsbecome skewed:
0.4 0.3 0.2 0.1
0.51
1.52
temporalorder0
time
0.4 0.3 0.2 0.1
302010
1020
temporalorder1
time0.4 0.3 0.2 0.1
1000500
5001000
temporalorder 2
time
time
space
time
A clear skewness is observed in the time direction.
De Valois, R. L., Cottaris, N. P., Mahon, L. E., Elfar, S. D., & Wilson, J. A. (2000). Spatial and temporal receptive fields of geniculate and cortical cells and directional selectivity. Vision research, 40(27), 3685-3702.
Geometry-driven diffusion‘adaptive filtering’
Prof. Bart ter Haar Romeny
The relation between biological vision and computer vision
Extensive feedback from primary visual
cortex to LGN:
Why is this?
Feedback in the visual system: geometry-driven diffusion
Edges remain, areas are smoothed: images become cleaner, ‘piecewise homogeneous’
J.Kacur, K.Mikula, Slowed anisotropic diffusion, in B. ter Haar Romeny, L. Florack, J. Koenderink, M. Viergevier (Eds.), Lecture Notes in Computer Science, Vol. 1252, Proc. of 1-st Intern. Conf. on Scale Space Theory in Computer Vision, Springer Berlin (1997) pp. 357-360.
It is a divergence of a flow. We also call the flux function. Withc = 1 we have normal linear, isotropic diffusion: the divergence of the gradient flow is the Laplacian.
We go modify the process of diffusion (blurring) with a geometry-dependent term.A conductivity coefficient (c) is introduced in the diffusion equation:
Ls .cL
c cL, Lx ,
2 Lx2 , ...
cL
Ls
2 Lx2
2 Ly2
Theory:
→
where c is a function of the geometry,i.e. of the derivatives:
(the divergence of the gradient)
Ls .cLLThe Perona & Malik equation (1991):
c1 eL2
k2
c211 L2k2
1L2k2 L42k4 OL5
1L2k2 L4
k4 OL5These conductivity terms are equivalent to first order.
Paper P&M
With increasing edge strength, we want a smaller c (less diffusion).With two possible choices for c:
Lx2Ly2
k2 k22 Lx2Lxx4 Lx Lxy Lyk22 Ly
2Lyyk2
Working out the differentiations, we get a strongly nonlinear diffusion equation:
The solution is not known analytically, so we have to relyon numerical methods, such as the forward Euler method:
L s.cLLs
This process is an evolutionary computation.
Nonlinear scale-space
Locally adaptive elongation of the diffusion kernel:Coherence Enhancing Diffusion
But: reducing to a small kernel is sometimes not effective, and we need a new solution:
1
22Ex.D.x
Diffusion tensor
Maxima
Minimum
Saddles∇L=0Critical Points
Det(H)=0Top-Points
Critical points, paths and top-points
1 2 3 4 5 87 96
Database image retrieval
Dat
abas
eQuery image
Comparing top-points of images
CompareEMD
Differential invariants
• We use the complete set of irreducible 3rd order differential invariants.
• These features are rotation and scaling invariant.
• Top-points are related to SIFT keypoints(Scale Invariant Feature Transform)
In Mathematica: ImageKeypoints[]
• The top-points and differential invariants are calculated for the query object and the scene.
• We now compare the differential invariant features.
compare
distance = 0.5distance = 0.2distance = 0.3
The vectors with the smallest distance are paired.
smallest distance
distance = 0.2
• A set of coordinates is formed from the differences in scale (Log(σo1)-Log(σs2)) and in angles (θo1- θs2).
(∆θ1, ∆τ1)
∆τ
∆θ
Important Clusters
For these clusters we calculate the mean ∆θ and ∆τ
Clustering (∆θ,∆τ)
• If these coordinates are plotted in a scatter plot clusters can be identified.
• In this scatter plot we find two dense clusters
The stability criterion removes much of the scatter
Rotate and scale according to the cluster means.
The translations we find correspond to the location of the objects in the scene.
In this example we have two clusters of correctly matched points.
C1
C2
MR slice hartcoronair
↑scale
toppoints
• graphtheory
Edge focusing
2D deephierarchicalstructure
Multi-scalesignaturefunctionof this imageline
Marching-cubes isophote surface ofthe macrophage.
slice 24 slice21 slice 25 slice18 slice22 slice 21
slice 24 slice23 slice 24 slice20 slice18 slice 24
Preprocessing:- Blur with σ = 3 px- Detect N strongest maxima
We interpolatewith cubic splinesinterpolation35 radial tracksin 35 3Dorientations
The profiles are extremely noisy:
Observation: visually we can reasonably point the steepest edgepoints.
Edge focusingover all profiles.
Choose a startlevel based onthe task, i.e. finda single edge.
Detected 3D points per maximum.
We need a 3D shape fit function.
12 , 1
2 3
2 sin, 123
cos,
12
32 sin, 1
4 2 15
2 sin2,12
152 cossin, 1
453cos21,
12
152 cossin, 1
4 2 15
2 sin2
The 3D points are least square fit with 3D spherical harmonics:
Resulting detection:
Next steps:
• Perceptual grouping layer 2• World in motion• Fill-in from models
The visual cascade versus Deep Learningwith neural nets of many layers
Front-End Vision and Deep Learning 2018