Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
BU
A. Polkovnikov
D. Sels P. MehtaA, G.R. Day
P, Weinberg
Reinforcement Learning in Phases of Quantum Control
Marin BukovarXiv: 1705.00565 (2017)arXiv: 1711.09109 (2017)
BU
A. Polkovnikov
D. Sels P. MehtaA, G.R. Day
P, Weinberg
Reinforcement Learning in Phases of Quantum Control
Marin BukovarXiv: 1705.00565 (2017)arXiv: 1711.09109 (2017)
GOAL: teach reinforcement learning agent to prepare states
in non-integrable quantum Ising model
Marin Bukov
H(t) = �X
j
JSzj+1S
zj + hzS
zj + hx(t)S
xj
GOAL: teach reinforcement learning agent to prepare states
in non-integrable quantum Ising model
Marin Bukov
1. Single-qubit System (THIS TALK)problem setup, RL agent solutioncontrol phase transitions overconstrained phase, correlated (glassy) phase, controllable phase
J = 0
H(t) = �X
j
JSzj+1S
zj + hzS
zj + hx(t)S
xj
GOAL: teach reinforcement learning agent to prepare states
in non-integrable quantum Ising model
Marin Bukov
1. Single-qubit System (THIS TALK)problem setup, RL agent solutioncontrol phase transitions overconstrained phase, correlated (glassy) phase, controllable phase
J = 0
H(t) = �X
j
JSzj+1S
zj + hzS
zj + hx(t)S
xj
2. Two-qubit System (POSTER)spontaneous symmetry breaking in optimal control landscape
L = 2
GOAL: teach reinforcement learning agent to prepare states
in non-integrable quantum Ising model
Marin Bukov
1. Single-qubit System (THIS TALK)problem setup, RL agent solutioncontrol phase transitions overconstrained phase, correlated (glassy) phase, controllable phase
J = 0
H(t) = �X
j
JSzj+1S
zj + hzS
zj + hx(t)S
xj
2. Two-qubit System (POSTER)spontaneous symmetry breaking in optimal control landscape
L = 2
3. Many-Body System (POSTER)control phase diagram: overconstrained phase, glassy/correlated phase
Alex
x
y
|"i
|#i
| ii
Optimal Qubit State Preparation
Hamiltonian:| ⇤i
Marin Bukov
H(t) = �Sz � hx(t)S
x
initial state:
target state:
| ii : GS of Hi = �Sz � 2Sx
| ⇤i : GS of H⇤ = �Sz + 2Sx
x
y
|"i
|#i
| ii
T
Optimal Qubit State Preparation
Hamiltonian:
GOAL: find protocol
such that | (t = 0)i = | ii, | (t = T )i = | ⇤ih(t) 2 [�4, 4]
measure: fidelity Fh(T ) = |h (T )| ⇤i|2
| ⇤i
Marin Bukov
H(t) = �Sz � hx(t)S
x
initial state:
target state:
| ii : GS of Hi = �Sz � 2Sx
| ⇤i : GS of H⇤ = �Sz + 2Sx
What is Reinforcement Learning?
Silver et al., Nature 529 (2016) [Google DeepMind]Marin Bukov
agent
ENVIRONMENT
reward
action
What is Reinforcement Learning?
Silver et al., Nature 529 (2016) [Google DeepMind]Marin Bukov
agent
ENVIRONMENT
reward
action
Reinforcement Learning in a Nutshell
• Machine Learning • Supervised Learning • Reinforcement Learning (RL)• Unsupervised Learning
agent
ENVIRONMENT
Sutton and Barto, Reinforcement Learning: an Introduction, MIT press (2015)Marin Bukov
Reinforcement Learning in a Nutshell
• Machine Learning • Supervised Learning • Reinforcement Learning (RL)• Unsupervised Learning
agent
ENVIRONMENT
Sutton and Barto, Reinforcement Learning: an Introduction, MIT press (2015)Marin Bukov
Reinforcement Learning in a Nutshell
• Machine Learning • Supervised Learning • Reinforcement Learning (RL)• Unsupervised Learning
RL states S = {s}
agent
ENVIRONMENT
RL states S = {(t, h(t))}
Sutton and Barto, Reinforcement Learning: an Introduction, MIT press (2015)Marin Bukov
Reinforcement Learning in a Nutshell
• Machine Learning • Supervised Learning • Reinforcement Learning (RL)• Unsupervised Learning
RL states S = {s}available actions A = {a}
agent
ENVIRONMENT
available actions A = {a = �h} = {0,±0.1,±0.2,±0.5,±1.0,±2.0,±4.0,±8.0}RL states S = {(t, h(t))}
Sutton and Barto, Reinforcement Learning: an Introduction, MIT press (2015)Marin Bukov
Reinforcement Learning in a Nutshell
• Machine Learning • Supervised Learning • Reinforcement Learning (RL)• Unsupervised Learning
RL states S = {s}available actions A = {a}rewards R = {r}
agent
ENVIRONMENT
available actions A = {a = �h} = {0,±0.1,±0.2,±0.5,±1.0,±2.0,±4.0,±8.0}
rewards R = {r(t) 2 [0, 1]}
RL states S = {(t, h(t))}
Sutton and Barto, Reinforcement Learning: an Introduction, MIT press (2015)Marin Bukov
Reinforcement Learning in a Nutshell
• Machine Learning • Supervised Learning • Reinforcement Learning (RL)• Unsupervised Learning
RL states S = {s}available actions A = {a}rewards R = {r}
cumulative expected reward or Q-function Q(s, a)
agent
ENVIRONMENT
available actions A = {a = �h} = {0,±0.1,±0.2,±0.5,±1.0,±2.0,±4.0,±8.0}
rewards R = {r(t) 2 [0, 1]}
RL states S = {(t, h(t))}
encodes experience/knowledge
Sutton and Barto, Reinforcement Learning: an Introduction, MIT press (2015)Marin Bukov
Reinforcement Learning in a Nutshell
• Machine Learning • Supervised Learning • Reinforcement Learning (RL)• Unsupervised Learning
RL states S = {s}available actions A = {a}rewards R = {r}
GOAL: maximise cumulative expected reward / Q-function Q(s, a)asstarting from state and taking action
cumulative expected reward or Q-function Q(s, a)
agent
ENVIRONMENT
available actions A = {a = �h} = {0,±0.1,±0.2,±0.5,±1.0,±2.0,±4.0,±8.0}
rewards R = {r(t) 2 [0, 1]}
RL states S = {(t, h(t))}
encodes experience/knowledge
Sutton and Barto, Reinforcement Learning: an Introduction, MIT press (2015)Marin Bukov
RL Applied to Quantum State Preparation
agent1
2
ENVIRONMENT
3 reward
feedback loop, updatesQ(s, a)
Marin Bukov
RL Applied to Quantum State Preparation
i@t| (t)i = H(t)| (t)i, t 2 [0, T ]
| ii : GS of Hi = Sz � S
x
| ⇤i : GS of H⇤ = Sz + S
x
H(t) = Sz + h(t)Sx
agent1
2
ENVIRONMENT
3 reward r =
⇢, 0 t < T
, t = T
0
|h ⇤| (t = T )i|2
feedback loop, updatesQ(s, a)
Marin Bukov
RL Applied to Quantum State Preparation
i@t| (t)i = H(t)| (t)i, t 2 [0, T ]
| ii : GS of Hi = Sz � S
x
| ⇤i : GS of H⇤ = Sz + S
x
H(t) = Sz + h(t)Sx
tT
h(t)
agent1
2
ENVIRONMENT
3 reward r =
⇢, 0 t < T
, t = T
0
|h ⇤| (t = T )i|2
feedback loop, updatesQ(s, a)
Marin Bukov
RL Applied to Quantum State Preparation
i@t| (t)i = H(t)| (t)i, t 2 [0, T ]
| ii : GS of Hi = Sz � S
x
| ⇤i : GS of H⇤ = Sz + S
x
H(t) = Sz + h(t)Sx
1 start from state s0 = (t = 0, h = �1)
tT
h(t)
agent1
2
ENVIRONMENT
3 reward r =
⇢, 0 t < T
, t = T
0
|h ⇤| (t = T )i|2
feedback loop, updatesQ(s, a)
Marin Bukov
RL Applied to Quantum State Preparation
i@t| (t)i = H(t)| (t)i, t 2 [0, T ]
| ii : GS of Hi = Sz � S
x
| ⇤i : GS of H⇤ = Sz + S
x
H(t) = Sz + h(t)Sx
1 start from state s0 = (t = 0, h = �1)
take action a0 : �h = 0.8go to state s1 = (t = 0.05, h = �0.2)
tT
h(t)
agent1
2
ENVIRONMENT
3 reward r =
⇢, 0 t < T
, t = T
0
|h ⇤| (t = T )i|2
feedback loop, updatesQ(s, a)
Marin Bukov
RL Applied to Quantum State Preparation
2 solve Schrödinger Eq. and obtain the QM state | (�t)i
i@t| (t)i = H(t)| (t)i, t 2 [0, T ]
| ii : GS of Hi = Sz � S
x
| ⇤i : GS of H⇤ = Sz + S
x
H(t) = Sz + h(t)Sx
1 start from state s0 = (t = 0, h = �1)
take action a0 : �h = 0.8go to state s1 = (t = 0.05, h = �0.2)
tT
h(t)
agent1
2
ENVIRONMENT
3 reward r =
⇢, 0 t < T
, t = T
0
|h ⇤| (t = T )i|2
feedback loop, updatesQ(s, a)
Marin Bukov
RL Applied to Quantum State Preparation
2 solve Schrödinger Eq. and obtain the QM state | (�t)i
3 calculate reward and use it to update which in turn is used to choose subsequent actions
rQ(s, a)
i@t| (t)i = H(t)| (t)i, t 2 [0, T ]
| ii : GS of Hi = Sz � S
x
| ⇤i : GS of H⇤ = Sz + S
x
H(t) = Sz + h(t)Sx
1 start from state s0 = (t = 0, h = �1)
take action a0 : �h = 0.8go to state s1 = (t = 0.05, h = �0.2)
tT
h(t)r = 0
agent1
2
ENVIRONMENT
3 reward r =
⇢, 0 t < T
, t = T
0
|h ⇤| (t = T )i|2
feedback loop, updatesQ(s, a)
Marin Bukov
RL Applied to Quantum State Preparation
2 solve Schrödinger Eq. and obtain the QM state | (�t)i
3 calculate reward and use it to update which in turn is used to choose subsequent actions
rQ(s, a)
i@t| (t)i = H(t)| (t)i, t 2 [0, T ]
| ii : GS of Hi = Sz � S
x
| ⇤i : GS of H⇤ = Sz + S
x
H(t) = Sz + h(t)Sx
1 start from state s0 = (t = 0, h = �1)
take action a0 : �h = 0.8go to state s1 = (t = 0.05, h = �0.2)
tT
h(t)r = 0
agent1
2
ENVIRONMENT
3 reward r =
⇢, 0 t < T
, t = T
0
|h ⇤| (t = T )i|2
feedback loop, updatesQ(s, a)
Marin Bukov
RL Applied to Quantum State Preparation
2 solve Schrödinger Eq. and obtain the QM state | (�t)i
3 calculate reward and use it to update which in turn is used to choose subsequent actions
rQ(s, a)
i@t| (t)i = H(t)| (t)i, t 2 [0, T ]
| ii : GS of Hi = Sz � S
x
| ⇤i : GS of H⇤ = Sz + S
x
H(t) = Sz + h(t)Sx
1 start from state s0 = (t = 0, h = �1)
take action a0 : �h = 0.8go to state s1 = (t = 0.05, h = �0.2)
tT
h(t)r = 0
agent1
2
ENVIRONMENT
3 reward r =
⇢, 0 t < T
, t = T
0
|h ⇤| (t = T )i|2
feedback loop, updatesQ(s, a)
Marin Bukov
RL Applied to Quantum State Preparation
2 solve Schrödinger Eq. and obtain the QM state | (�t)i
3 calculate reward and use it to update which in turn is used to choose subsequent actions
rQ(s, a)
i@t| (t)i = H(t)| (t)i, t 2 [0, T ]
| ii : GS of Hi = Sz � S
x
| ⇤i : GS of H⇤ = Sz + S
x
H(t) = Sz + h(t)Sx
1 start from state s0 = (t = 0, h = �1)
take action a0 : �h = 0.8go to state s1 = (t = 0.05, h = �0.2)
tT
h(t)r = 0
agent1
2
ENVIRONMENT
3 reward r =
⇢, 0 t < T
, t = T
0
|h ⇤| (t = T )i|2
feedback loop, updatesQ(s, a)
Marin Bukov
RL Applied to Quantum State Preparation
2 solve Schrödinger Eq. and obtain the QM state | (�t)i
3 calculate reward and use it to update which in turn is used to choose subsequent actions
rQ(s, a)
i@t| (t)i = H(t)| (t)i, t 2 [0, T ]
| ii : GS of Hi = Sz � S
x
| ⇤i : GS of H⇤ = Sz + S
x
H(t) = Sz + h(t)Sx
1 start from state s0 = (t = 0, h = �1)
take action a0 : �h = 0.8go to state s1 = (t = 0.05, h = �0.2)
tT
h(t)r = |h ⇤| (t = T )i|2
agent1
2
ENVIRONMENT
3 reward r =
⇢, 0 t < T
, t = T
0
|h ⇤| (t = T )i|2
episode completed
feedback loop, updatesQ(s, a)
Marin Bukov
RL Applied to Quantum State Preparation
i@t| (t)i = H(t)| (t)i, t 2 [0, T ]
| ii : GS of Hi = Sz � S
x
| ⇤i : GS of H⇤ = Sz + S
x
H(t) = Sz + h(t)Sx
tT
h(t)r = |h ⇤| (t = T )i|2
agent1
2
ENVIRONMENT
3 reward r =
⇢, 0 t < T
, t = T
0
|h ⇤| (t = T )i|2
feedback loop, updatesQ(s, a)
problems:state space exponentially big
how do we choose actions?
Marin Bukov
RL Applied to Quantum State Preparation
i@t| (t)i = H(t)| (t)i, t 2 [0, T ]
| ii : GS of Hi = Sz � S
x
| ⇤i : GS of H⇤ = Sz + S
x
H(t) = Sz + h(t)Sx
tT
h(t)r = |h ⇤| (t = T )i|2
agent1
2
ENVIRONMENT
3 reward r =
⇢, 0 t < T
, t = T
0
|h ⇤| (t = T )i|2
feedback loop, updatesQ(s, a)
problems:state space exponentially big
how do we choose actions?biased MC sampling
Marin Bukov
RL Applied to Quantum State Preparation
i@t| (t)i = H(t)| (t)i, t 2 [0, T ]
| ii : GS of Hi = Sz � S
x
| ⇤i : GS of H⇤ = Sz + S
x
H(t) = Sz + h(t)Sx
tT
h(t)r = |h ⇤| (t = T )i|2
agent1
2
ENVIRONMENT
3 reward r =
⇢, 0 t < T
, t = T
0
|h ⇤| (t = T )i|2
feedback loop, updatesQ(s, a)
problems:state space exponentially big
how do we choose actions?biased MC sampling
exploration exploitation dilemma
Marin Bukov
Reinforcement Learning Quantum State Preparation
bang-bang protocols
Marin Bukov
h 2 {±4}
H(t) = �Sz � hx(t)S
x
arXiv: 1705.00565 (2017)
the learning process
Chen et al, IEEE 25, 90 9920 (2014)
tT
episode completed
r = |h ⇤| (t = T )i|2hx(t)
Reinforcement Learning Quantum State Preparation
bang-bang protocols
Marin Bukov
h 2 {±4}
H(t) = �Sz � hx(t)S
x
arXiv: 1705.00565 (2017)
the learning process
Chen et al, IEEE 25, 90 9920 (2014)
tT
episode completed
r = |h ⇤| (t = T )i|2hx(t)
Reinforcement Learning Quantum State Preparation
bang-bang protocols
Marin Bukov
h 2 {±4}
H(t) = �Sz � hx(t)S
x
arXiv: 1705.00565 (2017)
the learning process
How hard is this optimisation problem?Chen et al, IEEE 25, 90 9920 (2014)
tT
episode completed
r = |h ⇤| (t = T )i|2hx(t)
RL-inspired Discovery: Phase Diagram of Quantum Control
bang-bang protocols
Marin Bukov
h 7! 1� Fh(T )
H(t) = �Sz � hx(t)S
x
arXiv: 1705.00565 (2017)
RL-inspired Discovery: Phase Diagram of Quantum Control
bang-bang protocols
infidelity landscape (schematic)
0 .0 0 .5 1 .0 1 .5 2 .0t
−4
−2
0
2
4
h(t)
Fh(T ) = 1 .000
(iii)
{h↵}infidelity landscape minima:
Marin Bukov
h 7! 1� Fh(T )
H(t) = �Sz � hx(t)S
x
arXiv: 1705.00565 (2017)
RL-inspired Discovery: Phase Diagram of Quantum Control
bang-bang protocols
infidelity landscape (schematic)
0 .0 0 .5 1 .0 1 .5 2 .0t
−4
−2
0
2
4
h(t)
Fh(T ) = 1 .000
(iii)
{h↵}infidelity landscape minima:
Marin Bukov
h 7! 1� Fh(T )
H(t) = �Sz � hx(t)S
x
arXiv: 1705.00565 (2017)
RL-inspired Discovery: Phase Diagram of Quantum Control
bang-bang protocols
infidelity landscape (schematic)
0 .0 0 .5 1 .0 1 .5 2 .0t
−4
−2
0
2
4
h(t)
Fh(T ) = 1 .000
(iii)
{h↵}infidelity landscape minima:
Marin Bukov
h 7! 1� Fh(T )
H(t) = �Sz � hx(t)S
x
arXiv: 1705.00565 (2017)
RL-inspired Discovery: Phase Diagram of Quantum Control
bang-bang protocols
infidelity landscape (schematic)
0 .0 0 .5 1 .0 1 .5 2 .0t
−4
−2
0
2
4
h(t)
Fh(T ) = 1 .000
(iii)
{h↵}infidelity landscape minima:
Marin Bukov
h̄(t) =1
#real
X
↵
h↵(t)
Edwards-Anderson-like order parameter:
q(T ) ⇠X
t
{h(t)� h̄(t)}2
h 7! 1� Fh(T )
H(t) = �Sz � hx(t)S
x
arXiv: 1705.00565 (2017)
0 .0 0 .5 1 .0 1 .5 2 .0 2 .5 3 .0 3 .5 4 .0T
0 .0
0 .2
0 .4
0 .6
0 .8
1 .0
Fh(T )
q(T )
TminTc
III
(iii)Fh (T )
(i)(ii)
I II
RL-inspired Discovery: Phase Diagram of Quantum Control
bang-bang protocols
infidelity landscape (schematic)
0 .0 0 .5 1 .0 1 .5 2 .0t
−4
−2
0
2
4
h(t)
Fh(T ) = 1 .000
(iii)
{h↵}infidelity landscape minima:
Marin Bukov
h̄(t) =1
#real
X
↵
h↵(t)
Edwards-Anderson-like order parameter:
q(T ) ⇠X
t
{h(t)� h̄(t)}2
h 7! 1� Fh(T )
H(t) = �Sz � hx(t)S
x
arXiv: 1705.00565 (2017)
0 .0 0 .5 1 .0 1 .5 2 .0 2 .5 3 .0 3 .5 4 .0T
0 .0
0 .2
0 .4
0 .6
0 .8
1 .0
Fh(T )
q(T )
TminTc
III
(iii)Fh (T )
(i)(ii)
I II
RL-inspired Discovery: Phase Diagram of Quantum Control
bang-bang protocols
0 .0 0 .1 0 .2 0 .3 0 .4 0 .5t
−4
−2
0
2
4
h(t)
Fh(T ) = 0 .633(i)
0 .0 0 .2 0 .4 0 .6 0 .8 1 .0t
−4
−2
0
2
4
h(t)
Fh(T ) = 0 .845
(ii)
infidelity landscape (schematic)
0 .0 0 .5 1 .0 1 .5 2 .0t
−4
−2
0
2
4
h(t)
Fh(T ) = 1 .000
(iii)
{h↵}infidelity landscape minima:
Marin Bukov
h̄(t) =1
#real
X
↵
h↵(t)
Edwards-Anderson-like order parameter:
q(T ) ⇠X
t
{h(t)� h̄(t)}2
h 7! 1� Fh(T )
H(t) = �Sz � hx(t)S
x
arXiv: 1705.00565 (2017)
Nature of Control Phase Transitions
Marin Bukov
H(t) = �Sz � hx(t)S
x
0 .0 0 .2 0 .4 0 .6 0 .8 1 .0t
−4
−2
0
2
4
hx(t)
Nature of Control Phase Transitions
Marin Bukov
H(t) = �Sz � hx(t)S
x
0 .0 0 .2 0 .4 0 .6 0 .8 1 .0t
−4
−2
0
2
4
hx(t)
Nature of Control Phase Transitions
Marin Bukov
H(t) = �Sz � hx(t)S
x
lattice sites
Isin
g sp
in
0 .0 0 .2 0 .4 0 .6 0 .8 1 .0t
−4
−2
0
2
4
hx(t)
Nature of Control Phase Transitions
Marin Bukov
one-to-one correspondence:
bang-bang classical protocol spin state
infidelity energy
H(t) = �Sz � hx(t)S
x
lattice sites
Isin
g sp
in
0 .0 0 .2 0 .4 0 .6 0 .8 1 .0t
−4
−2
0
2
4
hx(t)
Nature of Control Phase Transitions
Marin Bukov
one-to-one correspondence:
bang-bang classical protocol spin state
infidelity energy
H(t) = �Sz � hx(t)S
x
effective classical energy function governs control phase transitions
j : sites on time lattice
0 .0 0 .2 0 .4 0 .6 0 .8 1 .0t
−4
−2
0
2
4
h(t)
Fh(T ) = 0 .845
(ii) He↵(T ) = I(T ) +X
j
Gj(T )hj +X
ij
Jij(T )hihj +X
ijk
Kijk(T )hihjhk + . . .
lattice sites
Isin
g sp
in
0 .0 0 .2 0 .4 0 .6 0 .8 1 .0t
−4
−2
0
2
4
hx(t)
Nature of Control Phase Transitions
Marin Bukov
one-to-one correspondence:
bang-bang classical protocol spin state
infidelity energy
control phase transitions: classical (?), non-equilibrium
H(t) = �Sz � hx(t)S
x
effective classical energy function governs control phase transitions
j : sites on time lattice
0 .0 0 .2 0 .4 0 .6 0 .8 1 .0t
−4
−2
0
2
4
h(t)
Fh(T ) = 0 .845
(ii) He↵(T ) = I(T ) +X
j
Gj(T )hj +X
ij
Jij(T )hihj +X
ijk
Kijk(T )hihjhk + . . .
lattice sites
Isin
g sp
in
Outlook
web: mgbukov.github.ioarXiv: 1705.00565 (2017)arXiv: 1711.09109 (2017)
one can teach a reinforcement learning agent to prepare quantum states at short times with high fidelity
finding optimal driving protocol as hard as searching for absolute GS of a spin glass (even if system is disorder-free)
quantum control problems have extremely rich phase diagrams with overconstrained, controllable, correlated and glassy phases —> POSTER (Alex Day) exhibit symmetry breaking —> POSTER (MB)
control phase transitions: classical & nonequilibrium, generic?
Outlook
web: mgbukov.github.ioarXiv: 1705.00565 (2017)arXiv: 1711.09109 (2017)
open-source Python package for ED and quantum dynamics of arbitrary boson, fermion and spin many-body systems, supporting various (user-defined) symmetries and time evolution.
QuSpin: weinbe58.github.io/QuSpin/
SciPost Phys. 2, 003 (2017)