Spherical CNNs · 2020-01-26 · Rotated MNIST on the sphere Recognition of 3D shapes Prediction of...

Preview:

Citation preview

Spherical CNNs

Credit: Taco S. Cohen1, Mario Geiger2, Jonas Kohler1, Max Welling1,3

1University of Amsterdam

2EPFL

3CIFAR

Presenter: Fuwen Tanhttps://qdata.github.io/deep2Read

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 1 / 32

Outline

1 Background reading

2 GoalsCNNs on planar images → CNNs on spherical images

3 Equivariance propertiesPlanar CNN is translation-equivariantSpherical CNN is rotation-equivariant

4 Implementation

5 ExperimentsEquivariance errorRotated MNIST on the sphereRecognition of 3D shapesPrediction of atomization energies from molecular geometry

6 Take-home messages

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 2 / 32

If you get confused

Group Equivariant Convolutional Networks [1]T.S. Cohen, M. WellingICML, 2016.

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 3 / 32

Outline

1 Background reading

2 GoalsCNNs on planar images → CNNs on spherical images

3 Equivariance propertiesPlanar CNN is translation-equivariantSpherical CNN is rotation-equivariant

4 Implementation

5 ExperimentsEquivariance errorRotated MNIST on the sphereRecognition of 3D shapesPrediction of atomization energies from molecular geometry

6 Take-home messages

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 4 / 32

Parameterization

Plane : x(u, v) ∈ R2

Sphere : x(α, β) = Z (α)Y (β)n ∈ S2

n : north pole

3D Rotation : R(α, β, γ) = Z (α)Y (β)Z (γ) ∈ SO(3)

Figure: Proper Euler angles geometrical definition. The xyz (fixed) system isshown in blue, the XYZ (rotated) system is shown in red. The line of nodes (N)is shown in green. Credit: https://en.wikipedia.org/wiki/Euler_angles

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 5 / 32

CNNs on planar images

(f ∗ ψ)(x) =

∫R2

f (y)ψ(x − y)dy

f : R2 → R (e.g . feature Maps)

ψ : R2 → R (e.g . locally − supported filter)

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 6 / 32

CNNs on planar images

(f ∗ ψ)(x) =

∫R2

f (y)ψ(T−1x (y))dy

Tx(t) = t + x (translation)

T−1x (t) = x − t

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 7 / 32

CNNs on spherical images (first layer)

(f ∗ ψ)(x) =

∫S2

f (y)ψ(R−1x (y))dy

x , y : 3D unit vector ∈ S2

Rx(t) = Rx · t (3Drotation)

Rx : (α, β, γ) ∈ SO(3)

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 8 / 32

A subtitle point

(f ∗ ψ)(x) =

∫S2

f (y)ψ(R−1x (y))dy

First layer:

Input: S2 → 2D.Output: SO(3)→ 3D.The output is indexed by an entry in SO(3)

An extra dimension modeling the rotation

Movement over S2: 2 dofRotation around the position x: 1 dofDifferent from [2], which ”restricts the filter to be circularly symmetricabout the Z axis.”

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 9 / 32

CNNs on spherical images (higher layers)

(f ∗ g)(R) =

∫SO(3)

f (Q)g(R−1(Q))dQ

R,Q : (α, β, γ) ∈ SO(3)

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 10 / 32

Outline

1 Background reading

2 GoalsCNNs on planar images → CNNs on spherical images

3 Equivariance propertiesPlanar CNN is translation-equivariantSpherical CNN is rotation-equivariant

4 Implementation

5 ExperimentsEquivariance errorRotated MNIST on the sphereRecognition of 3D shapesPrediction of atomization energies from molecular geometry

6 Take-home messages

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 11 / 32

Transformation of the filter and the feature map

[Lgψ

](t) = ψ(g−1t)[

Lg f](t) = f (g−1t)

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 12 / 32

Equivariance properties of CNNs

φ(Tgx) = T ′gφ(x).

transforming an input x by a transformation (e.g. translation) g(forming Tgx) and then passing it through the learned map φ shouldgive the same result as first mapping x through φ and thentransforming the representation.

Planar CNN is equivariant to translations.

([LT f ] ∗ ψ) = LT (f ∗ ψ)f: e.g. earlier CNN layers

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 13 / 32

Proof

Planar CNN is equivariant to translations.T (t) = t + u; T−1(x) = x − udT (t) = d(t + u) = dt

LT (f ∗ ψ)(x) = (f ∗ ψ)(T−1x) = (f ∗ ψ)(x − u)

=

∫R2

f (y)ψ((x − u)− y)dy

=

∫R2

f (y)ψ(x − (u + y))dy

{substitute : v = u + y} =

∫R2

f (v − u)ψ(x − v)dv

=

∫R2

f (T−1v))ψ(x − v)dv

=

∫R2

[LT f ](v))ψ(x − v)dv

= ([LT f ] ∗ ψ)(x)Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 14 / 32

Outline

1 Background reading

2 GoalsCNNs on planar images → CNNs on spherical images

3 Equivariance propertiesPlanar CNN is translation-equivariantSpherical CNN is rotation-equivariant

4 Implementation

5 ExperimentsEquivariance errorRotated MNIST on the sphereRecognition of 3D shapesPrediction of atomization energies from molecular geometry

6 Take-home messages

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 15 / 32

Spherical CNN is rotation-equivariant

Spherical CNN is equivariant to rotations.

([LQ f ] ∗ ψ) = LQ(f ∗ ψ)Requirement:

dy is the invariant measure on S2

dQ is the invariant measure on SO(3)dRy = dy ; dRQ = dQ∫S2 θ(Ry)dy =

∫S2 θ(Ry)d(Ry) =

∫S2 θ(y)dy

guarantee by the parameterization (appendix A)

(f ∗ ψ)(x) =

∫S2

f (y)ψ(R−1x (y))dy

(f ∗ ψ)(R) =

∫SO(3)

f (Q)ψ(R−1(Q))dQ

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 16 / 32

Proof (appendix B)

([LQ f ] ∗ ψ)(R) =

∫S2

f (Q−1y)ψ(R−1y)dy

{substitute : y = Qt} =

∫S2

f (t)ψ(R−1Qt)d(Qt)

=

∫S2

f (t)ψ((Q−1R)−1t)d(t)

= (f ∗ ψ)(Q−1R)

=[LQ(f ∗ ψ)

](R)

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 17 / 32

GFFT

GFFT defined on S2 and SO(3)

SO(3): Wigner D-function

S2: spherical harmonics

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 18 / 32

Outline

1 Background reading

2 GoalsCNNs on planar images → CNNs on spherical images

3 Equivariance propertiesPlanar CNN is translation-equivariantSpherical CNN is rotation-equivariant

4 Implementation

5 ExperimentsEquivariance errorRotated MNIST on the sphereRecognition of 3D shapesPrediction of atomization energies from molecular geometry

6 Take-home messages

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 19 / 32

Equivariance error

∆ =1

n

n∑i=1

std(LRi(Φ(fi ))− Φ(LRi

fi ))

std(Φ(fi ))

Φ : spherical CNN layers with randomly initialized filters

fi ,Ri := randomly chosen features (with channel K=10) and rotations

n = 500

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 20 / 32

0

0.5

1

1.5

2∆

10−6no act. & 1 layer no act. & res. of 30

0 10 20 30

resolution

0

0.1

0.2

0.3

0.4

0.5

ReLU & 1 layer

1 5 10

layers

ReLU & res. of 30

Figure: ∆ as a function of the resolutionand the number of layers.

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 20 / 32

Outline

1 Background reading

2 GoalsCNNs on planar images → CNNs on spherical images

3 Equivariance propertiesPlanar CNN is translation-equivariantSpherical CNN is rotation-equivariant

4 Implementation

5 ExperimentsEquivariance errorRotated MNIST on the sphereRecognition of 3D shapesPrediction of atomization energies from molecular geometry

6 Take-home messages

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 21 / 32

MNIST on the sphere

Dataset 1 (NR): projected on the northern hemisphere

Dataset 2 (R): projected on the northern hemisphere and thenrandomly rotated

Planar images for baseline methods:

stereographic projection

Figure: Stereographic projection.

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 22 / 32

Results

Baseline: conv-ReLU-conv-ReLU-FC

kernel: 5 x 5channels: 32, 64, 10

Spherical CNN: S2conv-ReLU-SO(3)conv-ReLU-FC

bandwidth: 30, 10, 6channels: 20, 40, 10

NR / NR R / R NR / R

planar 0.98 0.23 0.11spherical 0.96 0.95 0.94

Table: Test accuracy for the networks evaluated on the spherical MNIST dataset.Here R = rotated, NR = non-rotated and X / Y denotes, that the network wastrained on X and evaluated on Y.

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 23 / 32

Outline

1 Background reading

2 GoalsCNNs on planar images → CNNs on spherical images

3 Equivariance propertiesPlanar CNN is translation-equivariantSpherical CNN is rotation-equivariant

4 Implementation

5 ExperimentsEquivariance errorRotated MNIST on the sphereRecognition of 3D shapesPrediction of atomization energies from molecular geometry

6 Take-home messages

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 24 / 32

3D recognition

SHREC17 task [3]Training data: 51300 non-aligned 3D modelsClassification: 55 categories

RepresentationRay casting on the surface and its convex hullchannels: 6 ((length, cos, sin) x 2)

ray castingfrom the sphere to the origin

distance sphere-impact

normal at impact

Figure: The ray is cast from the surface of the sphere towards the origin. Thefirst intersection with the model gives the values of the signal. The two images ofthe right represent two spherical signals in (α, β) coordinates. They containrespectively the distance from the sphere and the cosine of the ray with thenormal of the model. The red dot corresponds to the pixel set by the red line.

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 25 / 32

Results

Model

S2conv-BN-ReLU (50 features)2 x (SO(3)conv-BN-ReLU) (70/350 features)max-pooling-BN-FCbandwidths: 128, 32, 22, 7

Method P@N R@N F1@N mAP NDCG

Tatsuma ReVGG 0.705 0.769 0.719 0.696 0.783Furuya DLAN 0.814 0.683 0.706 0.656 0.754SHREC16-Bai GIFT 0.678 0.667 0.661 0.607 0.735Deng CM-VGG5-6DB 0.412 0.706 0.472 0.524 0.624

Ours 0.701 (3rd) 0.711 (2nd) 0.699 (3rd) 0.676 (2nd) 0.756 (2nd)

Table: Results and best competing methods for the SHREC17 competition.

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 26 / 32

Outline

1 Background reading

2 GoalsCNNs on planar images → CNNs on spherical images

3 Equivariance propertiesPlanar CNN is translation-equivariantSpherical CNN is rotation-equivariant

4 Implementation

5 ExperimentsEquivariance errorRotated MNIST on the sphereRecognition of 3D shapesPrediction of atomization energies from molecular geometry

6 Take-home messages

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 27 / 32

Molecular energy regression

QM7 task

Input: for each molecule, positions pi and charges zi of the atomsN = 23 atoms of T = 5 types (H, C, N, O, S) for each moleculeOutput: atomization energy of the molecule (scalar)

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 28 / 32

Representation as a spherical signal

A sphere Si around pi

Uniform radius such that no intersections among spheres

For each possible z and for each point x ∈ S2:

Uz(x) =∑

j 6=i,zj=zzi ·z|x−pi |

For each atom: a T channel spherical function

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 29 / 32

Model

ResNet block

S2SO(3)con − BN − ReLU − SO(3)conv − BN

Shared weights for all atoms: N x F feature maps

To achieve permutation invariance:

Embedding: MLP φSum poolingRegression: MLP ψ

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 30 / 32

Results

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 31 / 32

Take-home messages

One of the best papers in ICLR 2018

Potential applications on omnidirectional vision (e.g. for AR/VR)

Potential extensions to more transformation groups (e.g. SE(3))

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 32 / 32

Taco Cohen and Max Welling.

Group equivariant convolutional networks.

In International Conference on Machine Learning (ICML), pages 2990–2999.PMLR, 20–22 Jun 2016.

J. R. Driscoll and D. M. Healy.

Computing fourier transforms and convolutions on the 2-sphere.

Adv. Appl. Math., 15(2):202–250, June 1994.

M. Savva, F. Yu, Hao Su, M. Aono, B. Chen, D. Cohen-Or, W. Deng, HangSu, S. Bai, X. Bai, N. Fish, J. Han, E. Kalogerakis, E. G. Learned-Miller,Y. Li, M. Liao, S. Maji, A. Tatsuma, Y. Wang, N. Zhang, and Z. Zhou.

Large-scale 3d shape retrieval from shapenet core55.

In Proceedings of the Eurographics 2016 Workshop on 3D Object Retrieval,3DOR ’16, pages 89–98, Goslar Germany, Germany, 2016. EurographicsAssociation.

Credit: Taco S. Cohen, Mario Geiger, Jonas Kohler, Max Welling (shortinst)Spherical CNNs Presenter: Fuwen Tan https://qdata.github.io/deep2Read 32 / 32

Recommended