Reinforcement Learning in BU Phases of Quantum Controlcnls.lanl.gov/external/piml/Marin Bukov.pdf · teach reinforcement learning agent to prepare states in non-integrable quantum

BU

A. Polkovnikov

D. Sels P. MehtaA, G.R. Day

P, Weinberg

Reinforcement Learning in Phases of Quantum Control

Marin BukovarXiv: 1705.00565 (2017)arXiv: 1711.09109 (2017)

BU

A. Polkovnikov

D. Sels P. MehtaA, G.R. Day

P, Weinberg

Reinforcement Learning in Phases of Quantum Control

Marin BukovarXiv: 1705.00565 (2017)arXiv: 1711.09109 (2017)

GOAL: teach reinforcement learning agent to prepare states

in non-integrable quantum Ising model

Marin Bukov

H(t) = �X

j

JSzj+1S

zj + hzS

zj + hx(t)S

xj



Marin Bukov

1. Single-qubit System (THIS TALK)problem setup, RL agent solutioncontrol phase transitions overconstrained phase, correlated (glassy) phase, controllable phase

J = 0

H(t) = �X

j

JSzj+1S

zj + hzS

zj + hx(t)S

xj



Marin Bukov


J = 0

H(t) = �X

j

JSzj+1S

zj + hzS

zj + hx(t)S

xj

2. Two-qubit System (POSTER)spontaneous symmetry breaking in optimal control landscape

L = 2



Marin Bukov


J = 0

H(t) = �X

j

JSzj+1S

zj + hzS

zj + hx(t)S

xj

2. Two-qubit System (POSTER)spontaneous symmetry breaking in optimal control landscape

L = 2

3. Many-Body System (POSTER)control phase diagram: overconstrained phase, glassy/correlated phase

Alex

x

y

|"i

|#i

| ii

Optimal Qubit State Preparation

Hamiltonian:| ⇤i

Marin Bukov

H(t) = �Sz � hx(t)S

x

initial state:

target state:

| ii : GS of Hi = �Sz � 2Sx

| ⇤i : GS of H⇤ = �Sz + 2Sx

x

y

|"i

|#i

| ii

T

Optimal Qubit State Preparation

Hamiltonian:

GOAL: find protocol

such that | (t = 0)i = | ii, | (t = T )i = | ⇤ih(t) 2 [�4, 4]

measure: fidelity Fh(T ) = |h (T )| ⇤i|2

| ⇤i

Marin Bukov


x

initial state:

target state:

| ii : GS of Hi = �Sz � 2Sx

| ⇤i : GS of H⇤ = �Sz + 2Sx

What is Reinforcement Learning?

Silver et al., Nature 529 (2016) [Google DeepMind]Marin Bukov

agent

ENVIRONMENT

reward

action

What is Reinforcement Learning?

Silver et al., Nature 529 (2016) [Google DeepMind]Marin Bukov

agent

ENVIRONMENT

reward

action

Reinforcement Learning in a Nutshell

• Machine Learning • Supervised Learning • Reinforcement Learning (RL)• Unsupervised Learning

agent

ENVIRONMENT

Sutton and Barto, Reinforcement Learning: an Introduction, MIT press (2015)Marin Bukov



agent

ENVIRONMENT




RL states S = {s}

agent

ENVIRONMENT

RL states S = {(t, h(t))}




RL states S = {s}available actions A = {a}

agent

ENVIRONMENT

available actions A = {a = �h} = {0,±0.1,±0.2,±0.5,±1.0,±2.0,±4.0,±8.0}RL states S = {(t, h(t))}




RL states S = {s}available actions A = {a}rewards R = {r}

agent

ENVIRONMENT

available actions A = {a = �h} = {0,±0.1,±0.2,±0.5,±1.0,±2.0,±4.0,±8.0}

rewards R = {r(t) 2 [0, 1]}






cumulative expected reward or Q-function Q(s, a)

agent

ENVIRONMENT


rewards R = {r(t) 2 [0, 1]}


encodes experience/knowledge





GOAL: maximise cumulative expected reward / Q-function Q(s, a)asstarting from state and taking action

cumulative expected reward or Q-function Q(s, a)

agent

ENVIRONMENT


rewards R = {r(t) 2 [0, 1]}


encodes experience/knowledge


RL Applied to Quantum State Preparation

agent1

2

ENVIRONMENT

3 reward

feedback loop, updatesQ(s, a)

Marin Bukov


i@t| (t)i = H(t)| (t)i, t 2 [0, T ]

| ii : GS of Hi = Sz � S

x

| ⇤i : GS of H⇤ = Sz + S

x

H(t) = Sz + h(t)Sx

agent1

2

ENVIRONMENT

3 reward r =

⇢, 0 t < T

, t = T

0

|h ⇤| (t = T )i|2


Marin Bukov


i@t| (t)i = H(t)| (t)i, t 2 [0, T ]


x

| ⇤i : GS of H⇤ = Sz + S

x

H(t) = Sz + h(t)Sx

tT

h(t)

agent1

2

ENVIRONMENT

3 reward r =

⇢, 0 t < T

, t = T

0

|h ⇤| (t = T )i|2


Marin Bukov


i@t| (t)i = H(t)| (t)i, t 2 [0, T ]


x

| ⇤i : GS of H⇤ = Sz + S

x

H(t) = Sz + h(t)Sx

1 start from state s0 = (t = 0, h = �1)

tT

h(t)

agent1

2

ENVIRONMENT

3 reward r =

⇢, 0 t < T

, t = T

0

|h ⇤| (t = T )i|2


Marin Bukov


i@t| (t)i = H(t)| (t)i, t 2 [0, T ]


x

| ⇤i : GS of H⇤ = Sz + S

x

H(t) = Sz + h(t)Sx


take action a0 : �h = 0.8go to state s1 = (t = 0.05, h = �0.2)

tT

h(t)

agent1

2

ENVIRONMENT

3 reward r =

⇢, 0 t < T

, t = T

0

|h ⇤| (t = T )i|2


Marin Bukov


2 solve Schrödinger Eq. and obtain the QM state | (�t)i

i@t| (t)i = H(t)| (t)i, t 2 [0, T ]


x

| ⇤i : GS of H⇤ = Sz + S

x

H(t) = Sz + h(t)Sx



tT

h(t)

agent1

2

ENVIRONMENT

3 reward r =

⇢, 0 t < T

, t = T

0

|h ⇤| (t = T )i|2


Marin Bukov



3 calculate reward and use it to update which in turn is used to choose subsequent actions

rQ(s, a)

i@t| (t)i = H(t)| (t)i, t 2 [0, T ]


x

| ⇤i : GS of H⇤ = Sz + S

x

H(t) = Sz + h(t)Sx



tT

h(t)r = 0

agent1

2

ENVIRONMENT

3 reward r =

⇢, 0 t < T

, t = T

0

|h ⇤| (t = T )i|2


Marin Bukov




rQ(s, a)

i@t| (t)i = H(t)| (t)i, t 2 [0, T ]


x

| ⇤i : GS of H⇤ = Sz + S

x

H(t) = Sz + h(t)Sx



tT

h(t)r = 0

agent1

2

ENVIRONMENT

3 reward r =

⇢, 0 t < T

, t = T

0

|h ⇤| (t = T )i|2


Marin Bukov




rQ(s, a)

i@t| (t)i = H(t)| (t)i, t 2 [0, T ]


x

| ⇤i : GS of H⇤ = Sz + S

x

H(t) = Sz + h(t)Sx



tT

h(t)r = 0

agent1

2

ENVIRONMENT

3 reward r =

⇢, 0 t < T

, t = T

0

|h ⇤| (t = T )i|2


Marin Bukov




rQ(s, a)

i@t| (t)i = H(t)| (t)i, t 2 [0, T ]


x

| ⇤i : GS of H⇤ = Sz + S

x

H(t) = Sz + h(t)Sx



tT

h(t)r = 0

agent1

2

ENVIRONMENT

3 reward r =

⇢, 0 t < T

, t = T

0

|h ⇤| (t = T )i|2


Marin Bukov




rQ(s, a)

i@t| (t)i = H(t)| (t)i, t 2 [0, T ]


x

| ⇤i : GS of H⇤ = Sz + S

x

H(t) = Sz + h(t)Sx



tT

h(t)r = |h ⇤| (t = T )i|2

agent1

2

ENVIRONMENT

3 reward r =

⇢, 0 t < T

, t = T

0

|h ⇤| (t = T )i|2

episode completed


Marin Bukov


i@t| (t)i = H(t)| (t)i, t 2 [0, T ]


x

| ⇤i : GS of H⇤ = Sz + S

x

H(t) = Sz + h(t)Sx

tT

h(t)r = |h ⇤| (t = T )i|2

agent1

2

ENVIRONMENT

3 reward r =

⇢, 0 t < T

, t = T

0

|h ⇤| (t = T )i|2


problems:state space exponentially big

how do we choose actions?

Marin Bukov


i@t| (t)i = H(t)| (t)i, t 2 [0, T ]


x

| ⇤i : GS of H⇤ = Sz + S

x

H(t) = Sz + h(t)Sx

tT

h(t)r = |h ⇤| (t = T )i|2

agent1

2

ENVIRONMENT

3 reward r =

⇢, 0 t < T

, t = T

0

|h ⇤| (t = T )i|2



how do we choose actions?biased MC sampling

Marin Bukov


i@t| (t)i = H(t)| (t)i, t 2 [0, T ]


x

| ⇤i : GS of H⇤ = Sz + S

x

H(t) = Sz + h(t)Sx

tT

h(t)r = |h ⇤| (t = T )i|2

agent1

2

ENVIRONMENT

3 reward r =

⇢, 0 t < T

, t = T

0

|h ⇤| (t = T )i|2



how do we choose actions?biased MC sampling

exploration exploitation dilemma

Marin Bukov

Reinforcement Learning Quantum State Preparation

bang-bang protocols

Marin Bukov

h 2 {±4}


x

arXiv: 1705.00565 (2017)

the learning process

Chen et al, IEEE 25, 90 9920 (2014)

tT

episode completed

r = |h ⇤| (t = T )i|2hx(t)


bang-bang protocols

Marin Bukov

h 2 {±4}


x

arXiv: 1705.00565 (2017)


Chen et al, IEEE 25, 90 9920 (2014)

tT

episode completed

r = |h ⇤| (t = T )i|2hx(t)


bang-bang protocols

Marin Bukov

h 2 {±4}


x

arXiv: 1705.00565 (2017)


How hard is this optimisation problem?Chen et al, IEEE 25, 90 9920 (2014)

tT

episode completed

r = |h ⇤| (t = T )i|2hx(t)

RL-inspired Discovery: Phase Diagram of Quantum Control

bang-bang protocols

Marin Bukov

h 7! 1� Fh(T )


x

arXiv: 1705.00565 (2017)


bang-bang protocols

infidelity landscape (schematic)

0 .0 0 .5 1 .0 1 .5 2 .0t

−4

−2

0

2

4

h(t)

Fh(T ) = 1 .000

(iii)

{h↵}infidelity landscape minima:

Marin Bukov

h 7! 1� Fh(T )


x

arXiv: 1705.00565 (2017)


bang-bang protocols


0 .0 0 .5 1 .0 1 .5 2 .0t

−4

−2

0

2

4

h(t)

Fh(T ) = 1 .000

(iii)


Marin Bukov

h 7! 1� Fh(T )


x

arXiv: 1705.00565 (2017)


bang-bang protocols


0 .0 0 .5 1 .0 1 .5 2 .0t

−4

−2

0

2

4

h(t)

Fh(T ) = 1 .000

(iii)


Marin Bukov

h 7! 1� Fh(T )


x

arXiv: 1705.00565 (2017)


bang-bang protocols


0 .0 0 .5 1 .0 1 .5 2 .0t

−4

−2

0

2

4

h(t)

Fh(T ) = 1 .000

(iii)


Marin Bukov

h̄(t) =1

#real

X

↵

h↵(t)

Edwards-Anderson-like order parameter:

q(T ) ⇠X

t

{h(t)� h̄(t)}2

h 7! 1� Fh(T )


x

arXiv: 1705.00565 (2017)

0 .0 0 .5 1 .0 1 .5 2 .0 2 .5 3 .0 3 .5 4 .0T

0 .0

0 .2

0 .4

0 .6

0 .8

1 .0

Fh(T )

q(T )

TminTc

III

(iii)Fh (T )

(i)(ii)

I II


bang-bang protocols


0 .0 0 .5 1 .0 1 .5 2 .0t

−4

−2

0

2

4

h(t)

Fh(T ) = 1 .000

(iii)


Marin Bukov

h̄(t) =1

#real

X

↵

h↵(t)


q(T ) ⇠X

t

{h(t)� h̄(t)}2

h 7! 1� Fh(T )


x

arXiv: 1705.00565 (2017)

0 .0 0 .5 1 .0 1 .5 2 .0 2 .5 3 .0 3 .5 4 .0T

0 .0

0 .2

0 .4

0 .6

0 .8

1 .0

Fh(T )

q(T )

TminTc

III

(iii)Fh (T )

(i)(ii)

I II


bang-bang protocols

0 .0 0 .1 0 .2 0 .3 0 .4 0 .5t

−4

−2

0

2

4

h(t)

Fh(T ) = 0 .633(i)

0 .0 0 .2 0 .4 0 .6 0 .8 1 .0t

−4

−2

0

2

4

h(t)

Fh(T ) = 0 .845

(ii)


0 .0 0 .5 1 .0 1 .5 2 .0t

−4

−2

0

2

4

h(t)

Fh(T ) = 1 .000

(iii)


Marin Bukov

h̄(t) =1

#real

X

↵

h↵(t)


q(T ) ⇠X

t

{h(t)� h̄(t)}2

h 7! 1� Fh(T )


x

arXiv: 1705.00565 (2017)

Nature of Control Phase Transitions

Marin Bukov


x

0 .0 0 .2 0 .4 0 .6 0 .8 1 .0t

−4

−2

0

2

4

hx(t)


Marin Bukov


x

0 .0 0 .2 0 .4 0 .6 0 .8 1 .0t

−4

−2

0

2

4

hx(t)


Marin Bukov


x

lattice sites

Isin

g sp

in

0 .0 0 .2 0 .4 0 .6 0 .8 1 .0t

−4

−2

0

2

4

hx(t)


Marin Bukov

one-to-one correspondence:

bang-bang classical protocol spin state

infidelity energy


x

lattice sites

Isin

g sp

in

0 .0 0 .2 0 .4 0 .6 0 .8 1 .0t

−4

−2

0

2

4

hx(t)


Marin Bukov



infidelity energy


x

effective classical energy function governs control phase transitions

j : sites on time lattice

0 .0 0 .2 0 .4 0 .6 0 .8 1 .0t

−4

−2

0

2

4

h(t)

Fh(T ) = 0 .845

(ii) He↵(T ) = I(T ) +X

j

Gj(T )hj +X

ij

Jij(T )hihj +X

ijk

Kijk(T )hihjhk + . . .

lattice sites

Isin

g sp

in

0 .0 0 .2 0 .4 0 .6 0 .8 1 .0t

−4

−2

0

2

4

hx(t)


Marin Bukov



infidelity energy

control phase transitions: classical (?), non-equilibrium


x

effective classical energy function governs control phase transitions

j : sites on time lattice

0 .0 0 .2 0 .4 0 .6 0 .8 1 .0t

−4

−2

0

2

4

h(t)

Fh(T ) = 0 .845

(ii) He↵(T ) = I(T ) +X

j

Gj(T )hj +X

ij

Jij(T )hihj +X

ijk

Kijk(T )hihjhk + . . .

lattice sites

Isin

g sp

in

Outlook

web: mgbukov.github.ioarXiv: 1705.00565 (2017)arXiv: 1711.09109 (2017)

one can teach a reinforcement learning agent to prepare quantum states at short times with high fidelity

finding optimal driving protocol as hard as searching for absolute GS of a spin glass (even if system is disorder-free)

quantum control problems have extremely rich phase diagrams with overconstrained, controllable, correlated and glassy phases —> POSTER (Alex Day) exhibit symmetry breaking —> POSTER (MB)

control phase transitions: classical & nonequilibrium, generic?

Outlook

web: mgbukov.github.ioarXiv: 1705.00565 (2017)arXiv: 1711.09109 (2017)

open-source Python package for ED and quantum dynamics of arbitrary boson, fermion and spin many-body systems, supporting various (user-defined) symmetries and time evolution.

QuSpin: weinbe58.github.io/QuSpin/

SciPost Phys. 2, 003 (2017)

Documents

Reinforcement Learning in BU Phases of Quantum Controlcnls.lanl.gov/external/piml/Marin Bukov.pdf · teach reinforcement learning agent to prepare states in non-integrable quantum