Optimization and Machine Learning in Quantum Information Theory

Optimization and Machine Learning inQuantum Information Theory

Peter Wittek

ICFO-The Institute of Photonic Sciencesand

University of Boras

09 July 2015

Seminar in the University of Tokyo

Introduction SDP Elements of Machine Learning Quantum Physics and Machine Learning Conclusions

Introduction

Global, nonconvex optimization problems are pervasive.Quantum information theory included

Finding the quantum bound of Bell inequalities, estimatingguessing probability, fidelity, ground state energy, adaptivephase estimation. . .

Machine learning similarlyWorth looking at the interaction of the two fields:

Classical optimization and learning theory applied inquantum information theory.Using quantum algorithms, protocols and strategies inmachine learning.

Peter Wittek Optimization and Learning in Quantum Information Theory


Polynomial Optimization Problems of NoncommutingVariables

The generic form is:

p? = infX ,φ〈φ,p(X )φ〉

s.t. ‖φ‖ = 1,gi(X ) � 0, i = 1, . . . ,mg .

〈φ|si(X )|φ〉 � 0, i = 1, . . . ,ms.



Words and Involution

Given n noncommuting variables, words are sequences ofletters of x = (x1, x2, . . . , xn) and x∗ = (x∗1 , x

∗2 , . . . , x

∗n ).

E.g., w = x1x∗2 .Involution: similar to a complex conjugation on sequencesof letters.A polynomial is a linear combination of wordsp =

∑w pww .

Hermitian moment matrix.Hermitian variables.Versus commutative case.



The Relaxation

We replace the optimization problem by the following SDP:

miny

∑w pwyw (1)

s.t. M(y) � 0,

M(giy) � 0, i = 1, . . . ,mg .∑|w |≤2k

si,wyw > 0, i = 1, . . . ,ms.

Pironio, S.; Navascues, M. & Acın, A. Convergent relaxations ofpolynomial optimization problems with noncommuting variables.SIAM Journal on Optimization, SIAM, 2010, 20, 2157–2180.


http://arxiv.org/abs/0903.4368



Toy Example: Polynomial Optimization

Consider the following polynomial optimization problem:

minx ,φ〈φ|x1x2 + x2x1|φ〉

such that

||φ|| = 1

−x22 + x2 + 0.5 � 0,

x21 = x1,

x1x2 = x2x1.



Toy Example: Moment and localizing matrices

minx

2x1x2

such that 1 x1 x2 x1x2 x2

2

x1 x1 x1x2 x1x2 x1x22

x2 x12 y22 y122 y222

x1x2 x1x2 x1x22 x1x2

2 x1x32

x22 x1x2

2 x32 x1x3

2 x42

� 0

−x22 + x2 + 0.5 −x1x2

2 + y12 + 0.5x1 −x32 + x2

2 + 0.5x2

− x1x22 + x1x2 + 0.5x1 −x1x2

2 + x1x2 + 0.5x1 −x1x32 + x1x2

2 + 0.5x1x2

−x32 + x2

2 + 0.5x2 −x1x32 + x1x2

2 + 0.5x1x2 −x42 + x3

2 + 0.5x22

� 0.



Toy Example: Corresponding SDP

miny

2y12

such that 1 y1 y2 y12 y22

y1 y1 y12 y12 y122

y2 y12 y22 y122 y222

y12 y12 y122 y122 y1222

y22 y122 y222 y1222 y2222

� 0

−y22 + y2 + 0.5 −y122 + y12 + 0.5y1 −y222 + y22 + 0.5y2

− y122 + y12 + 0.5y1 −y122 + y12 + 0.5y1 −y1222 + y122 + 0.5y12

−y222 + y22 + 0.5y2 −y1222 + y122 + 0.5y12 −y2222 + y222 + 0.5y22

� 0.



Bounding Quantum Correlations

maxE ,φ〈φ,∑

ij

cijEiEjφ〉

subject to

||φ|| = 1EiEj = δijEi ∀i , j∑i

Ei = 1

[Ei ,Ej ] = 0 ∀i , j .Navascues, M.; Pironio, S. & Acın, A. Bounding the set ofquantum correlations. Physical Review Letters, 2007, 98, 1040.


http://arxiv.org/abs/quant-ph/0607119

http://arxiv.org/abs/quant-ph/0607119


Another Example

Hubbard Model:

H = −t∑<r ,s>

[c†r cs + c†scr

]+ U/2

∑<r ,s>

nr ns,

{cr , c†s} = δrsIr ,

{c†r , c†s} = 0,

{cr , cs} = 0.



The Complexity of Translation

Generating the moment and localizing matrices is not atrivial task.The number of words – the monomial basis – growsexponentially in the order of relaxation.The number of elements in the moment matrix is thesquare of that.



The Problem of Translation

Ncpol2SDPA: converter of symbolic description of(non)commutative polynomial optimization problem to anumerical SDP relaxation.Sparsest possible output.SDPA:

Parallel and distributed SDP solver.Arbitrary-precision variant.

Wittek, P.: Algorithm 950: Ncpol2sdpa—SparseSemidefinite Programming Relaxations for PolynomialOptimization Problems of Noncommuting Variables. ACMTransactions on Mathematical Software, 2015, 41(3):21.arXiv:1308.6029



Large-Scale Problems

Structural redundancy is resolved on an ongoing basis.Up to solving SDPs of 250,000 variables.Quantum chemistry problems

Working towards a more scalable Hubbard model.



Generalizations

Bilevel problems.Mixed states.Steering.Numerically stable way of restricting dimension of Hilbertspace.



The Roots of Machine Learning

StatisticsArtificial intelligenceTheory of computationsFurthermore:

OptimizationControl



Assumptions, Parameters, and Statistics

Descriptive and inferential statistics.Assumptions derive from probability theory.Parameters enter through assumed probabilitydistributions.

It is often assumed that the data is generated by certainprobability distributions described by a finite number ofunknown parameters.

Statistical models.E.g., linear regression with Gaussian error term.



Sample Complexity

Think metrology:Cramer-Rao bound and the standard quantum limit: 1/N.Heisenberg limit: 1/N2.

We can establish guarantees on accuracy based on thesample size.



Theory of Computation

Solving problems efficiently by an algorithm.Number of steps required to arrive at a solution.

Computational complexityBig-o notation: O(n).

Compexity classes: P versus NP.



Artificial Intelligence

Reasoning and deduction.Formal logic and combinatorial explosion.

∃clouds ⇒ rain

Knowledge representation and ontologies.Uncertainty in AI.Bayesian inference, Bayesian networks.



What Machine Learning Should Be About

Data-drivenLooking for patternsClasses, groups of similar objectsMainly quantitative, but can also be qualitative

Robust, tolerates noiseGeneralize well beyond training dataWe seek a balance between

Computational complexityModel complexity andSample complexity



Learning Approach

Supervised: (x1, y1), . . . , (xn, yn).Biomedical: recognizing cancer cellsRecognizing handwritingSpam detection

UnsupervisedRecommendation enginesFinding groups of similar patentsIdentifying trends in a dynamic environment

Transductive learning.Reinforcement learning.

Class 1Class 2Decisionsurface

Unlabeled instancesDecision boundary



VC Dimension and Model Complexity

Shattering sets of labelled points.XOR problem.VC dimension can be infinite.

VC dimension is not perfect: see Rademacher complexity.



VC Theorem and Structural Risk Minimization

Generalize well beyond training data.Bounds relate generalization performance to modelcomplexity.As opposed to empirical risk minimization.

P

(EN(f ) ≤ E +

√h(log(2N/h) + 1)− log(η/4)

N

)= 1− η,

whereEN(f ) is the error of the learned function f over the wholedistribution given the sample;E is the error on the sample;h is the VC dimension.VC dimension is not perfect: see Rademacher complexity.



Risk Minimization in Supervised Learning: SupportVector Machines

Maximum margin classifiersTraining example set:

{(x1, y1), . . . , (xN , yN)},

xi ∈ Rd are the data points.y ∈ {−1,1} are binary classes.

Minimize12

uT u

subject to

yi(uT xi + b) ≥ 1, i = 1, . . . ,N.

Output is a hyperplane: yi := sgn(uT xi + b).We had this result in the 1960s.

Class 1Class 2DecisionsurfaceMargin



Making Support Vector Machines Practical

Allow for mixing of classes by some ξi ≥ 0.

Minimize12

uT u + CN∑

i=1

ξi

yi(uT xi + b) ≥ 1− ξi , ξi ≥ 0, i = 1, . . . ,N.

Dual formulation:

maxαi

N∑i=1

αi −12

∑i,j

αiαjyiyjx>i xj

0 ≤ αi ≤ C, i = 1, . . . ,N,N∑

i=1

αiyi = 0.

The importance of αi and the positive definite kernel.Peter Wittek Optimization and Learning in Quantum Information Theory


Neural networks

Feedforward network:

Connection to spin glasses.Shallow learners.



Deep Learning

Many-layered artificial neural networks.

Image is from https://colah.github.io/posts/2015-01-Visualizing-Representations/


https://colah.github.io/posts/2015-01-Visualizing-Representations/

https://colah.github.io/posts/2015-01-Visualizing-Representations/


Main Research Directions

Classical learning applied to quantum physics problems.Quantum machine learning (quantum computationallearning).Quantum learning (quantum statistical learning).

Group similar states together according to some fidelitymeasure.Quantum template matching.Learnability of unknown quantum measurements.


http://dx.doi.org/10.1007/11766247_37

http://dx.doi.org/10.1007/11766247_37

http://dx.doi.org/10.1103/PhysRevA.64.022317



Classical Learning in Quantum Physics Problems

Adaptive quantum phase estimation: classicalreinforcement learning.

Other attempts: measurement-based quantum computing,quantum logic gates with gradient ascent pulseengineering, simulating quantum circuits on spin systems.


http://dx.doi.org/10.1103/PhysRevLett.110.220501



http://dx.doi.org/10.1098/rsta.2011.0361

http://dx.doi.org/10.1098/rsta.2011.0361


Quantum Machine Learning

Classical data:Grover’s search.

Quantum associative memories.A form of quantum support machines.Hierarchical clustering.

Adiabatic optimization.Quantum data

Solving linear equations and self-analysis.Quantum principal component analysis.Quantum support vector machines.Quantum nearest neighbors algorithm.Topological analysis.

Learning of unitary transformations: similar to processtomography.

Regression and transductive learning.


http://dx.doi.org/10.1016/S0020-0255(99)00101-2

http://dx.doi.org/10.1016/S0893-6080(03)00087-X

http://dx.doi.org/10.1007/s10994-012-5316-5



http://dx.doi.org/10.1038/nphys3029






http://www.slideshare.net/peter_wittek/aqis14poster


Learning and Grover’s search

Without decoherence, Grover’s search finds an element inan unordered set quadratically faster than the classicallimit.Variant for finding minimum and maximum.It is a plug-and-play method.Implementations are not quite clear on actual speedup.



Adiabatic Quantum Computing

Find the global minimum of a given functionf : {0,1}n 7→ (0,∞), where minx f (x) = f0 and f (x) = f0 iffx = x0.Consider the Hamiltonian H1 =

∑x∈{0,1}n f (x)|x〉〈x |. Its

ground state is |x0〉.To find this ground state, consider the HamiltonianH(λ) = (1− λ)H0 + λH1.Demonstrations: search engine ranking and binaryclassification.

Hmem

Hinp

Hmem + Hinp



Intermezzo: Least-Squares Support Vector Machines

Minimize12

u>u +γ

2

N∑i=1

e2i (2)

subject to the equality constraints

yi(u>φ(xi) + b) = 1− ei , i = 1, . . . ,N. (3)



Quantum Least-Squares Support Vector Machines

Use an alternative formulation of support vector machines.Trade-off: losing sparsity (model complexity increases).

Core ideas:Quantum matrix inversion is fast.Simulation of sparse matrixes is efficient.Non-sparse density matrices reveal the eigenstructureexponentially faster than in classical algorithms.



Learning a Unitary Transformation

N disposals of a black-box unitary transformations,followed by K uses of the learned function.A form of quantum process tomography.Regression problem: unknown function == unknownquantum channel.Double optimization: input state and strategy.Transductive learning.

Unlabeled instancesClass 1Class 2



Generalization of Causal Networks

Hidden Markov models.d-separation theorem and its quantum variants.

Challenges Reichenbach’s Common Cause Principle.Sequential measurements and inference.

Entropic description to linearize equations.

Connection to nonlocality.



http://www.arxiv.org/abs/1407.2256


Book

Monograph.Reached 1,009,508th bestselling position.


http://store.elsevier.com/product.jsp?isbn=9780128009536


Summary

Nonconvex optimization is ubiquitous both in quantuminformation theory and machine learning.Classical and quantum learning can help in quantumphysics problems.

Robust heuristics.Structural risk minimization.Adaptive techniques: reinforcement learning.