DeepONet: Learning nonlinear operators · 2021. 5. 6. · 2) G(u): y∈Rd 7→G(u)(y) ∈R Theorem (Chen & Chen, 1995) Suppose that σis a continuous non-polynomial function, Xis

DeepONet: Learning nonlinear operators

Lu Lu

Applied Mathematics InstructorDepartment of Mathematics, MIT

SIAM Conference on Computational Science and EngineeringMar 3, 2021

Lu Lu (MIT Math) DeepONet SIAM CSE21 1 / 14

From function to operator

Function: Rd1 → Rd2

e.g., image classification: 7→ 5

Operator: function (∞-dim) 7→ function (∞-dim)e.g., derivative (local): x(t) 7→ x′(t)e.g., integral (global): x(t) 7→

∫K(s, t)x(s)ds

e.g., dynamic system:

e.g., biological systeme.g., social system

⇒ Can we learn operators via neural networks?⇒ How?


From function to operator

Function: Rd1 → Rd2

e.g., image classification: 7→ 5

Operator: function (∞-dim) 7→ function (∞-dim)e.g., derivative (local): x(t) 7→ x′(t)e.g., integral (global): x(t) 7→

∫K(s, t)x(s)ds

e.g., dynamic system:

e.g., biological systeme.g., social system⇒ Can we learn operators via neural networks?⇒ How?


Problem setupG : u ∈ V ⊂ C(K1) 7→ G(u) ∈ C(K2)G(u) : y ∈ Rd 7→ G(u)(y) ∈ R


Universal Approximation Theorem for OperatorG : u ∈ V ⊂ C(K1) 7→ G(u) ∈ C(K2)G(u) : y ∈ Rd 7→ G(u)(y) ∈ R

Theorem (Chen & Chen, 1995)Suppose that σ is a continuous non-polynomial function, X is a BanachSpace, K1 ⊂ X, K2 ⊂ Rd are two compact sets in X and Rd, respectively,V is a compact set in C(K1), G is a continuous operator, which maps Vinto C(K2).Then for any ε > 0, there are positive integers n, p, m, constantsck

i , ξkij , θ

ki , ζk ∈ R, wk ∈ Rd, xj ∈ K1, i = 1, . . . , n, k = 1, . . . , p,

j = 1, . . . ,m, such that∣∣∣∣∣∣G(u)(y)−p∑

k=1

n∑i=1

cki σ

m∑j=1

ξkiju(xj) + θk

i

σ(wk · y + ζk)

∣∣∣∣∣∣ < ε

holds for all u ∈ V and y ∈ K2.


Problem setup

G : u ∈ V ⊂ C(K1) 7→ G(u) ∈ C(K2)G(u) : y ∈ Rd 7→ G(u)(y) ∈ R

A Inputs & Output

u: function

u(x1)u(x2)· · ·u(xm)

y ∈ Rd

Network G(u)(y) ∈ R

B Training dataInput function u

at fixed sensors x1, . . . , xm

Output function G(u)at random location y

......

G

x1

x2

xm

x1x2

xm

C Stacked DeepONet

u

u(x1)u(x2)· · ·u(xm)

Branch net1

Branch net2

...

Branch netp

b1

b2

...

bp

y Trunk net

t1

t2

...

tp

× G(u)(y)

D Unstacked DeepONet

u

u(x1)u(x2)· · ·u(xm)

Branch net

b1

b2

...

bp

y Trunk net

t1

t2

...

tp

× G(u)(y)

Inputs: u at sensors {x1, x2, . . . , xm} ∈ Rm, y ∈ Rd

Output: G(u)(y) ∈ R

Lu et al., Nature Mach Intell, 2021Lu Lu (MIT Math) DeepONet SIAM CSE21 5 / 14

Deep operator network (DeepONet)

G(u)(y) ≈p∑

k=1bk(u)︸︷︷︸branch

tk(y)︸︷︷︸trunk

A Inputs & Output

u: function

u(x1)u(x2)· · ·u(xm)

y ∈ Rd

Network G(u)(y) ∈ R

B Training dataInput function u

at fixed sensors x1, . . . , xm

Output function G(u)at random location y

......

G

x1

x2

xm

x1x2

xm

C Stacked DeepONet

u

u(x1)u(x2)· · ·u(xm)

Branch net1

Branch net2

...

Branch netp

b1

b2

...

bp

y Trunk net

t1

t2

...

tp

× G(u)(y)

D Unstacked DeepONet

u

u(x1)u(x2)· · ·u(xm)

Branch net

b1

b2

...

bp

y Trunk net

t1

t2

...

tp

× G(u)(y)

Idea: G(u)(y): a function of y conditioning on utk(y): basis function of ybk(u): u-dependent coefficient, i.e., functional of u


Error analysis: the number of sensorsConsider G : u(x) 7→ s(x) (x ∈ [0, 1]) by ODE system

d

dxs(x) = g(s(x), u(x), x), s(0) = s0

∀u ∈ V ⇒ um ∈ Vm

Let κ(m,V ) := supu∈V maxx∈[0,1] |u(x)− um(x)|

e.g., Gaussian process with kernel e−‖x1−x2‖

2

2l2 : κ(m,V ) ∼ 1m2l2

Theorem (Lu et al., Nature Mach Intell, 2021)Assume g is Lipschitz continuous (c).Then there exists a DeepONet N , such that

supu∈V

maxx∈[0,1]

‖G(u)(x)−N ([u(x1), . . . , u(xm)], x)‖2 < cecκ(m,V ).


Explicit operator: A simple ODE caseds(x)dx

= u(x), x ∈ [0, 1], s(0) = 0

G : u(x) 7→ s(y) = s(0) +∫ y

0u(τ)dτ

Very small generalization error!

10-6

10-5

10-4

FNN(best)

ResNet(best)

Seq2Seq(best)

Seq2Seq 10x(best)

Stacked(no bias)

Stacked(bias)

Unstacked(no bias)

Unstacked(bias)

A

MS

E

TrainTest

10-6

10-5

10-4

10-3

10-2

10-1

0 10000 20000 30000 40000 50000

B

MS

E

# Iterations

TrainTest

10-2

10-1

1 1.5 2 2.5 3 3.5 4 4.5 5

C

L2 r

ela

tive

err

or

1/l

10-5

10-4

10-3

10-2

10-1

100

0 5000 10000 15000 20000 25000 30000

D

MS

E

# Iterations

Train (Poly-fractonomial)Test (Poly-fractonomial)Train (Legendre)Test (Legendre)Train (GRF)Test (GRF)

10-4

10-3

10-2

10-1

100

0 1000 2000 3000 4000 5000

D E

MS

E

# Epochs

Train (DeepONet)Test (DeepONet)Train (CNN)Test (CNN)


Implicit operator: Gravity pendulum with an external forceds1dt

= s2,ds2dt

= −k sin s1 + u(t)

G : u(t) 7→ s(t)

10-6

10-5

10-4

10-3

10-2

10-1

103 104 105

A

e-x/1400

x-1.5

x-1.2

MS

E

# Train data

Branch/trunk width 50

Gen.Test

10-6

10-5

10-4

10-3

10-2

10-1

103 104 105

B

e-x/3100

x-1.4

x-1.3

MS

E

# Train data

Branch/trunk width 200

Gen.Test

10-6

10-5

10-4

10-3

10-2

10-1

103 104 105

C

e-x/5800

x-1.5

x-1.1

MS

E

# Train data

FNN width 100

Gen.Test

10-6

10-5

10-4

10-3

10-2

10-1

103 104 105

D

x-1.3

MS

E

# Train data

Seq2Seq

Gen.Test

Test/generalization error:small dataset: exponential convergencelarge dataset: polynomial rateslarger network has later transition point


Implicit operator: Diffusion-reaction system

∂s

∂t= D

∂2s

∂x2 + ks2 + u(x), x ∈ [0, 1], t ∈ [0, 1]

# Training points = #u× P

Small dataset:Exponential convergence

10-5

10-4

10-3

10-2

10-1

10 20 30 40 50 60 70

Ae-x/23

e-x/13

e-x/10

e-x/7.6

MSE

P

# u = 50# u = 100

# u = 200# u = 400

10-5

10-4

10-3

10-2

10-1

10 20 30 40 50 60 70

10-4

10-3

10-2

10-1

10 20 30 40 50 60 70

B

e-x/16

e-x/10

e-x/9.2

e-x/9.0

MSE

# u

P = 50P = 100

P = 200P = 400

10-4

10-3

10-2

10-1

10 20 30 40 50 60 70

0

0.05

0.1

0.15

0.2

50 100 200 400

50 100 200 400C

0.04ln(x) - 0.1

0.1 - 0.2e-x/37

1/k

# u

P

Rate w.r.t. P (Fig. A)Rate w.r.t. # u (Fig. B)

0

0.05

0.1

0.15

0.2

50 100 200 400

50 100 200 400

0

1

2

3

4

5

50 100 200 400

50 100 200 400D

2.4 - 5.1e-x/31

0.5 + 0.5ln(x)

Rate

of c

onve

rgen

ce# u

P

Rate w.r.t. P (Fig. 4C)Rate w.r.t. # u (Fig. 4D)

0

1

2

3

4

5

50 100 200 400

50 100 200 400

Large dataset:Polynomial convergence

10-910-810-710-610-510-410-310-210-1

10 100 1000

A

MSE

P

# u = 100

TrainTest

10-910-810-710-610-510-410-310-210-1

10 100 100010-7

10-6

10-5

10-4

10-3

10-2

10-1

100

10 100 1000

B

MSE

# u

P = 100

TrainTest

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

10 100 1000

10-5

10-4

10-3

10-2

10-1

10 100 1000

C

x-1.4

x-2.2

x-2.4

x-2.4

MSE

P

# u = 50# u = 100# u = 200# u = 400

10-5

10-4

10-3

10-2

10-1

10 100 100010-5

10-4

10-3

10-2

10-1

10 100 1000

D

x-2.6

x-2.9

x-3.3

x-3.7

MSE

# u

P = 50P = 100P = 200P = 400

10-5

10-4

10-3

10-2

10-1

10 100 1000


Stochastic PDEConsider the elliptic problem with multiplicative noise

−div(eb(x;ω)∇u(x;ω)) = f(x), x ∈ (0, 1)

with zero boundary conditions, f(x) = 10.Stochastic process b(x;ω) ∼ GP(0, σ2 exp(−‖x1 − x2‖2/2l2))

G : b(x;ω) 7→ u(x;ω)

Ideas:

Karhunen-Loeve (KL) expansion: b(x;ω) ≈∑N

i=1√λiei(x)ξi(ω)

branch net inputs: [√λ1e1(x),

√λ2e2(x), . . . ,

√λNeN (x)] ∈ RN×m,

where√λiei(x) =

√λi[ei(x1), ei(x2), . . . , ei(xm)] ∈ Rm

trunk net inputs: [x, ξ1, ξ2, . . . , ξN ] ∈ RN+1


Stochastic PDE

Example: 10 different random samples of u(x;ω) for b(x;ω) with l = 1.5

0

1

2

3

4

5

6

0 0.2 0.4 0.6 0.8 1

u(x;

ω)

x

ReferenceDeepONet

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 0.2 0.4 0.6 0.8 1u(

x; ω

)

x

ReferenceDeepONet


DeepONet for learning operators

DeepONet (Lu et al., Nature Mach Intell, 2021)

Algorithms & ApplicationsI 16 ODEs/PDEs (nonlinear, fractional & stochastic)I Multiphysics & Multiscale problems

F Bubble growth dynamics from nm to mm (J Chem Phys, 2021)F Hypersonics (arXiv:2011.03349)F Electroconvection (arXiv:2009.12935)

Observation: Good performanceI Small generalization errorI Exponential/polynomial error convergence

TheoryI Convergence rate for advection-diffusion equations (arXiv:2102.10621)


DeepM&Mnet: Multiphysics & Multiscale problemsHypersonics: Coupled flow and finite-rate chemistry behind a normalshock (arXiv:2011.03349)


Documents

DeepONet: Learning nonlinear operators · 2021. 5. 6. · 2) G(u): y∈Rd 7→G(u)(y) ∈R Theorem (Chen & Chen, 1995) Suppose that σis a continuous non-polynomial function, Xis