Machine Learning meets Quantum Foundations: A Brief Survey › pdf › 2003.11224.pdf · Machine Learning meets Quantum Foundations: A Brief Survey Kishor Bharti,1 Tobias Haug,1 Vlatko

Machine Learning meets Quantum Foundations: A Brief Survey

Kishor Bharti∗ and Tobias HaugCentre for Quantum Technologies, National University of Singapore, 3 Science Drive 2, Singapore 117543

Vlatko VedralCentre for Quantum Technologies, National University of Singapore, 3 Science Drive 2, Singapore 117543 and

Clarendon Laboratory, University of Oxford, Parks Road, Oxford OX1 3PU, United Kingdom

Leong-Chuan KwekCentre for Quantum Technologies, National University of Singapore, 3 Science Drive 2, Singapore 117543

MajuLab, CNRS-UNS-NUS-NTU International Joint Research Unit, UMI 3654, SingaporeNational Institute of Education, Nanyang Technological University, 1 Nanyang Walk, Singapore 637616 and

School of Electrical and Electronic Engineering Block S2.1, 50 Nanyang Avenue, Singapore 639798(Dated: June 15, 2020)

The goal of machine learning is to facilitate a computer to execute a specific task without explicit instructionby an external party. Quantum foundations seeks to explain the conceptual and mathematical edifice of quantumtheory. Recently, ideas from machine learning have successfully been applied to different problems in quantumfoundations. Here, we compile the representative works done so far at the interface of machine learning andquantum foundations. We conclude the survey with potential future directions.

CONTENTS

I. Introduction 1

II. Machine Learning 2A. Artificial Neural Networks 3B. Relation with Artificial Intelligence 5

III. Quantum Foundations 5A. Entanglement 5B. Bell Nonlocality 6C. Contextuality 7D. Quantum Steering 8

IV. Neural network as Oracle for Bell Nonlocality 8A. Machine Learning Nonlocal Correlations 8B. Oracle for Networks 9

V. Machine Learning for optimizing Settings for BellNonlocality 10A. Detecting Bell nonlocality with many-body

states 10B. Playing Bell Nonlocal Games 10

VI. Machine Learning-Assisted State Classification 11A. Classification with Bell Inequalities 11B. Classification by Representing Quantum States

with Restricted Boltzmann Machines 11C. Classification with Tomographic Data 11

VII. Neural Networks as “Hidden" Variable Models forQuantum Systems 12

VIII. A Few More Applications 12

∗ [email protected]

IX. Conclusion and Future Work 12

Acknowledgments 13

Data Availability 13

References 13

I. INTRODUCTION

The rise of machine learning in recent times has remark-ably transformed science and society. The goal of machinelearning is to get computers to act without being explicitlyprogrammed [1, 2]. Some of the typical applications of ma-chine learning are self-driving cars, efficient web search, im-proved speech recognition, enhanced understanding of the hu-man genome and online fraud detection. This viral spread ininterest has exploded to various areas of science and engineer-ing, in part due to the hope that artificial intelligence may sup-plement human intelligence to understand some of the deepproblems in science.

The techniques from machine-learning have been used forautomated theorem proving, drug discovery and predicting the3-D structure of proteins based on its genetic sequence [3–5].In physics, techniques from machine learning have been ap-plied for many important avenues [6–39] including the studyof black hole detection [25], topological codes [40], phasetransition [16], glassy dynamics[27], gravitational lenses[23],Monte Carlo simulation [28, 29], wave analysis [24], quantumstate preparation [41, 42], anti-de Sitter/conformal field theory(AdS/CFT) correspondence [43] and characterizing the land-scape of string theories [44]. Vice versa, the methods fromphysics have also transformed the field of machine learningboth at the foundational and practical front [45, 46]. For acomprehensive review on machine learning for physics, referto Carleo et al [47] and references therein. For a thorough

arX

iv:2

003.

1122

4v2

[qu

ant-

ph]

12

Jun

2020

mailto:[email protected]

2

review on machine learning and artificial intelligence in thequantum domain, refer to Dunjko et al [48] or Benedetti et al[49].

Philosophies in science can, in general, be delineated fromthe study of the science itself. Yet, in physics, the study ofquantum foundations has essentially sprouted an enormouslysuccessful area called quantum information science. Quantumfoundations tells us about the mathematical as well as con-ceptual understanding of quantum theory. Ironically, this areahas potentially provided the seeds for future computation andcommunication, without at the moment reaching a consensusamong all the physicists regarding what quantum theory tellsus about the nature of reality [50].

In recent years, techniques from machine learning havebeen used to solve some of the analytically/numerically com-plex problems in quantum foundations. In particular, themethods from reinforcement learning and supervised learn-ing have been used for determination of the maximum quan-tum violation of various Bell inequalities, the classification ofexperimental statistics in local/nonlocal sets, training AI forplaying Bell nonlocal games, using hidden neurons as hid-den variables for completion of quantum theory, and machinelearning-assisted state classification [51–54].

Machine learning also attempts to mimic human reasoningleading to the almost remote possibility of machine-assistedscientific discovery [55]. Can machine learning do the samewith the foundations of quantum theory? At a deeper level,machine learning or artificial intelligence, presumably withsome form of quantum computation, may capture somehowthe essence of Bell nonlocality and contextuality. Of course,such speculation belies the fact that human abstraction andreasoning could be far more complicated than the capabilitiesof machines.

In this brief survey, we compile some of the representativeworks done so far at the interface of quantum foundations andmachine learning. The survey includes eight sections exclud-ing the introduction (section I). In section II, we discuss thebasics of machine learning. Section III contains a brief in-troduction to quantum foundations. In sections IV to VII, wediscuss various applications of machine learning in quantumfoundations. There is a rich catalogue of works, which wecould not incorporate in detail in sections IV to VII, but wefind them worth mentioning. We include such works brieflyin section VIII. Finally, we conclude in section IX with openquestions and some speculations.

II. MACHINE LEARNING

Machine learning is a branch of artificial intelligence whichinvolves learning from data [1, 2]. The purpose of machinelearning is to facilitate a computer to achieve a specific taskwithout explicit instruction by an external party. Accordingto Mitchel (1997) [56] “A computer program is said to learnfrom experience E with respect to some class of tasks T andperformance measure P, if the performance at tasks in T , asmeasured by P, improves with E.” Note that the meaning ofthe word “task” doesn’t involve the process of learning. For

instance, if we are programming a robot to play Go, playingGo is the task. Some of the examples of machine learningtasks are following.

• Classification: In classification tasks, the computer pro-gram is trained to learn the appropriate function h :Rm → {1,2, · · · , t} . Given an input, the learned pro-gram determines which of the t categories the input be-longs to via h. Deciding if a given picture depicts a cator a dog is a canonical example of a classification task.

• Regression: In regression tasks, the computer programis trained to predict a numerical value for a given input.The aim is to learn the appropriate function h :Rm→R.A typical example of regression is predicting the priceof a house given its size, location and other relevant fea-tures.

• Anomaly detection: In anomaly detection (also knownas outlier detection) tasks, the goal is to identify rareitems, events or objects that are significantly differentfrom the majority of the data. A representative exam-ple of anomaly detection is credit card fraud detectionwhere the credit card company can detect misuse of thecustomer’s card by modelling his/her purchasing habits.

• Denoising: Given a noisy example x ∈ Rn, the goal ofdenoising is to predict the conditional probability dis-tribution P(x|x) over noise-free data x ∈ Rn.

The measure of the success P of a machine learning algorithmdepends on the task T . For example, in the case of classifi-cation, P can be measured via the accuracy of the model, i.e.fraction of examples for which the model produces the cor-rect output. An equivalent description can be in terms of errorrate, i.e. fraction of examples for which the model producesthe incorrect output. The goal of machine learning algorithmsis to work well on previously unseen data. To get an estimateof model performance P, it is customary to estimate P on aseparate dataset called test set which the machine has not seenduring the training. The data employed for training is knownas the training set.

Depending on the kind of experience, E, the machine ispermitted to have during the learning process, the machinelearning algorithms can be categorized into supervised learn-ing, unsupervised learning, and reinforcement learning.

1. Supervised learning: The goal is to learn a functiony = f (x), that returns the label y given the correspond-ing unlabeled data x. A prominent example would beimages of cats and dogs, with the goal to recognizethe correct animal. The machine is trained with la-beled example data, such that it learns to correctly iden-tify datasets it has not seen before. Given a finite setof training samples from the join distribution P(Y,X),the task of supervised learning is to infer the proba-bility of a specific label y given example data x i.e.,P(Y = y|X = x). The function that assigns labels canbe inferred from the aforementioned conditional proba-bility distribution.

3

2. Unsupervised learning: For this type of machine learn-ing, data x is given without any label. The goal is to rec-ognize possible underlying structures in the data. Thetask of the unsupervised machine learning algorithmsis to learn the probability distribution P(x) or some in-teresting properties of the distribution, when given ac-cess to several examples x. It is worth stating that thedistinction between supervised and unsupervised learn-ing can be blurry. For example, given a vector x ∈ Rm,the joint probability distribution can be factorized (us-ing the chain rule of probability) as

P(x) =m

∏i=1

P(xi|x1, · · · ,xi−1) . (1)

The above factorization (1) enables us to transform aunsupervised learning task of learning P(x) into m su-pervised learning tasks. Furthermore, given the super-vised learning problem of learning the conditional dis-tribution P(y|x) , one can convert it into the unsuper-vised learning problem of learning the joint distributionP(x,y) and then infer P(y|x) via

P(y|x) = P(x,y)∑y1 P(x,y1)

. (2)

This last argument suggests that supervised and unsu-pervised learning are not entirely distinct. Yet, thisdistinction between supervised versus unsupervised issometimes useful for the classification of algorithms.

3. Reinforcement learning: Here, neither data nor labelsare available. The machine has to generate the data it-self and improve this data generation process throughoptimization of a given reward function. This methodis somewhat akin to a human child playing games: Thechild interacts with the environment, and initially per-forms random actions. By external reinforcement (e.g.praise or scolding by parents), the child learns to im-prove itself. Reinforcement learning has shown im-mense success recently. Through reinforcement learn-ing, machines have mastered games that were initiallythought to be too complicated for computers to master.This was for example demonstrated by Deepind’s Alp-haZero which has defeated the best human player in theboard game Go [57].

One of the central challenges in machine learning is to de-vise algorithms that perform well on previously unseen data.The learning ability of a machine to achieve a high perfor-mance P on previously unseen data is called generalization.The input data to the machine is a set of variables, calledfeatures. A specific instance of data is called feature vec-tor. The error measure on the feature vectors used for train-ing is called training error. In contrast, the error measure onthe test dataset, which the machine has not seen during thetraining, is called generalization error or test error. Givenan estimate of training error, how can we estimate test er-ror? The field of statistical learning theory aptly answers this

question. The training and test data are generated accordingto a probability distribution over some datasets. Such a dis-tribution is called data-generating distribution. It is conven-tional to make independent and identically distributed (IID)assumption, i.e. each example of the dataset is independentof another and, the training and test set are identically dis-tributed. Under such assumptions, the performance of the ma-chine learning algorithm depends on its ability to reduce thetraining error as well as the gap between training and test er-ror. If a machine-learning algorithm fails to get sufficientlylow training error; the phenomenon is called under-fitting. Onthe other hand, if the training error is low, but the test error islarge, the phenomenon is called over-fitting. The capacity ofa machine learning model to fit a wide variety of functions iscalled model capacity. A machine learning algorithm with lowmodel capacity is likely to underfit, whereas a too high modelcapacity often leads to over-fitting the data. One of the waysto alter the model capacity of a machine learning algorithm isby constraining the class of functions that the algorithm is al-lowed to choose. For example, to fit a curve to a dataset, oneoften chooses a set of polynomials as fitting functions (seeFig.1). If the degree of the polynomials is too low, the fit maynot be able to reproduce the data sufficiently (under-fitting, or-ange curve). However, if the degree of the polynomials is toohigh, the fit will reproduce the training dataset too well, suchthat noise and the finite sample size is captured in the model(over-fitting, green curve).

2 0 2 4 6x

2

1

0

1

2

3

y

datap(2) fitp(3) fitp(30) fit

FIG. 1. Fitting data sampled from a degree 3 polynomial with smallnoise. A degree 2 polynomial (orange dashed line) is under-fittingas it has too low model capacity to capture the data. The degree3 model has the right model capacity (blue solid line). The degree30 polynomial (red dashed-dotted line) is over-fitting the data, as itcaptures the sampled data (low training error), but fails to capture theunderlying model (high test error).

A. Artificial Neural Networks

Most of the recent advances in machine learning were fa-cilitated by using artificial neurons. The basic structure is asingle artificial neuron (AN), which is a real-valued function

4

of the form,

AN(x) = φ

(∑

iwixi

)where x= (xi)i ∈ Rk,

w = (wi)i ∈ Rk and φ : R→ R. (3)

In equation 3, the function φ is usually known as activationfunction. Some of the well-known activation functions are

1. Threshold function: φ (a) = 1 if a > 0 and 0 otherwise,

2. Sigmoid function: φ (a) = σ(a) = 11+exp(−a) , and

3. Rectified Linear (ReLu) function: φ(a) = max(0,a) .

The w vector in equation 3 is known as weight vector.Many of those artificial neurons can be combined to-

gether via communication links to perform complex compu-tation. This can be achieved by feeding the output of neurons(weighted by the weights wi) as an input to another neuron,where the activation function is applied again. Such a graphstructure G = (V,E) , with the set of nodes (V ) as artificialneurons and edges in E as connections, is known as an ar-tificial neural network or neural network in short. The firstlayer, where the data is fed in, is called input layer. The lay-ers of neurons in-between are the hidden layers, which aredefining feature of deep learning. The last layer is calledoutput layer. A neural network with more than one hiddenlayer is called deep neural network. Machine learning involv-ing deep neural networks is called Deep learning. A feedfor-ward neural network is a directed acyclic graph, which meansthat the output of the neurons is fed only into forward di-rection (see Fig.2). Apart from the feedforward neural net-work, some of the popular neural network architectures areconvolutional neural networks (CNN), recurrent neural net-works (RNN), generative adversarial network (GAN), Boltz-mann machine and restricted Boltzmann machines (RBM).We provide a brief summary of RBM here. For a detailedunderstanding of various other neural networks, refer to [2].

An RBM is a bipartite graph with two kinds of binary-valued neuron units, namely visible (v = {v1,v2, · · ·}) andhidden (h= {h1,h2, · · ·}) (see Fig.3) The weight matrix W =(wi j) encodes the weight corresponding to the connection be-tween visible unit vi and hidden unit h j [58]. Let the biasweight (offset weight) for the visible unit vi be ai and hiddenunit h j be b j.

For a given configuration of visible and hidden neurons(v,h), one can define an energy function E (v,h) inspiredfrom statistical models for spin systems as

E (v,h) =−∑i

aivi−∑j

b jh j−∑i

∑j

viwi jh j. (4)

The probability of a configuration (v,h) is given by Boltz-mann distribution,

P(v,h) =exp(−E (v,h))

Z, (5)

input layer output layerhidden layers

FIG. 2. A feed-forward neural networks are a network of artificialneurons chained together in sequence. Each dot corresponds to a ar-tificial neuron, whereas the line indicate the input weights that feedinto that neuron. Data fed into the input layer is propagated eachlayer at a time via the hidden layers (here two hidden layers as ex-ample) to the final output layer.

where Z = ∑v,h exp(−E (v,h)) is a normalization factor,commonly known as partition function. As there are no intra-layer connections, one can sample from this distribution eas-ily. Given a set of visible units v, the probability of a specifichidden unit being h j = 1 is

p(h j = 1|v) = σ(h j +∑i

wi, jvi) , (6)

where σ(a) is the sigmoid activation function as introducedearlier. A similar relation holds for the reverse direction, e.g.given a set of hidden units, what is the probability of the visi-ble unit.

...

...

hidden layer

visible layer

Wi,j

h1 h2 h3 hn

v2v1 v3 vm

b1 b2 b3 bn

a1 a2a3 am

FIG. 3. A restricted Boltzman machines (RBM) is composed of hid-den and visible layer. Each node has a bias (ai for visible, b j forhidden layer), and is connected with weight Wi, j to all the nodes inthe opposite layer. There are no intra-layer connections.

With these concepts, complicated probability distributionsP(v) over some variable v can be encoded and trained withinthe RBM. Given a set of training vectors v ∈ V, the goal isfind the weights W ′ of the RBM that fit the training set best

arg maxW ′ ∏v∈V

P(v) . (7)

RBMs have been shown to be a good Ansatz to representmany-body wavefunctions, which are difficult to handle with

5

other methods due to the exponential scaling of the dimen-sion of the Hilbert space. This method has been successfullyapplied to quantum many-body physics and quantum founda-tions problems [54, 59].

In context beyond physics, machine learning with deep neu-ral networks has accomplished many significant milestones,such as mastering various games, image recognition, self-driving cars, and so forth. Its impact on the physical sciencesis just about to start [47, 54].

B. Relation with Artificial Intelligence

The term “artificial intelligence” was first coined in the fa-mous 1956 Dartmouth conference [60]. Though the term wasinvented in 1956, the operational idea can be traced back toAlan Turing’s influential “Computing Machinery and Intelli-gence” paper in 1950, where Turing asks if a machine canthink [61]. The idea of designing machines that can thinkdates back to ancient Greece. Mythical characters such asPygmalion, Hephaestus and Daedalus can be interpreted assome of the legendary inventors and Galatea, Pandora andHephaestus can be thought of as examples of artificial life[2]. The field of artificial intelligence is difficult to define,as can be seen by four different and popular candidate defini-tions [62]. The definitions start with “AI is the discipline thataims at building ...”

1. (Reasoning-based and human-based): agents that canreason like humans.

2. (Reasoning-based and ideal rationality): agents thatthink rationally.

3. (behavior-based and human-based): agents that behavelike humans.

4. (behavior-based and ideal rationality): agents that be-have rationally.

Apart from its foundational impact in attempting to under-stand “intelligence,” AI has reaped practical impacts such asautomated routine labour and automated medical diagnosis,to name a few among many. The real challenge of AI is toexecute tasks which are easy for people to perform, but hardto express formally. An approach to solve this problem is byallowing machines to learn from experience, i.e. via machinelearning. From a foundational point of judgment, the study ofmachine learning is vital as it helps us understand the mean-ing of “intelligence”. It is worthwhile mentioning that deeplearning is a subset of machine learning which can be thoughtof as a subset of AI (see Fig.9).

III. QUANTUM FOUNDATIONS

The mathematical edifice of quantum theory has intriguedand puzzled physicists, as well as philosophers, for manyyears. Quantum foundations seeks to understand and de-velop the mathematical as well as conceptual understanding

AI

ML

DL

FIG. 4. Deep learning (DL) is a sub-discipline of machine learning(ML). ML can be considered a sub-discipline of artificial intelligence(AI).

of quantum theory. The study concerns the search for non-classical effects such as Bell nonlocality, contextuality anddifferent interpretations of quantum theory. This study alsoinvolves the investigation of physical principles that can putthe theory into an axiomatic framework, together with an ex-ploration of possible extensions of quantum theory. In thissurvey, we will focus on non-classical features such as Entan-glement, Bell nonlocality, contextuality and quantum steeringin some detail.

An interpretation of quantum theory can be viewed as amap from the elements of the mathematical structure of quan-tum theory to elements of reality. Most of the interpretationsof quantum theory seek to explain the famous measurementproblem. Some of the brilliant interpretations are the Copen-hagen interpretation, quantum Bayesianism [63], the many-world formalism [64, 65] and the consistent history interpre-tation [66]. The axiomatic reconstruction of quantum theoryis generally categorized into two parts: the generalized prob-abilistic theory (GPT) approach [67] and the black-box ap-proach [68]. The physical principles underlying the frame-work for axiomatizing quantum theory are non-trivial com-munication complexity [69, 70], information causality [71],macroscopic locality [72], negating the advantage for non-local computation [73], consistent exclusivity [74] and localorthogonality [75]. The extensions of the quantum theory in-clude collapse models [76], quantum measure theory [77] andacausal quantum processes [78].

A. Entanglement

Quantum interaction inevitably leads to quantum entangle-ment. The individual states of two classical systems after aninteraction are independent of each other. Yet, this is notthe case for two quantum systems [79, 80]. Quantum statescan be pure or mixed. For a bipartite pure quantum state|ψ〉 ∈H A ⊗H B is entangled if it cannot be written as aproduct state, i.e. |ψ〉 = |ψA〉⊗ |ψB〉 for some |ψA〉 ∈H A

and |ψB〉 ∈ H B . A mixed bipartite state is however ex-pressed in terms of a density matrix, ρ . Like for pure state,a density matrix ρ is entangled if it cannot be expressed in

6

the form ρ = ∑i

pi|ψAi 〉〈ψA

i |⊗ |ψ〉i 〈ψB

i |. An n−partite state

ρsep is called separable if it can be represented as a convexcombination of product states i.e.

ρsep = ∑i

piρ1i ⊗ρ

2i ⊗·· ·⊗ρ

ni , (8)

where 0 ≤ pi ≤ 1 and ∑i pi = 1. A quantum state is calledseparable if it is not entangled.

Computationally it is not easy to check if a mixed state,especially in higher dimensions and for more parties, is sep-arable or entangled. Numerous measures of quantum entan-glement for both pure and mixed states are proposed [80]: forbipartite systems where the dimension of H i(i = A,B) is 2, agood measure is concurrence [81]. Other measures for quanti-fying entanglement are entanglement of formation, robustnessof entanglement, Schmidt rank, squashed entanglement and soforth [80]. It turns out that for any entangled state ρ , there ex-ists a Hermitian matrix W such that Tr(ρW ) < 0 and for allseparable states ρsep, Tr(ρW )≥ 0 [82–85].

Quantum entanglement was mooted a long time ago by Er-win Schrödinger, but it took another thirty years or so forJohn Bell to show that quantum theory imposes strong con-straints on statistical correlations in experiments. Yet, corre-lations are not tantamount to causation and one wonders ifmachine learning could do better [86]. In the sixties and sev-enties, this Gedanken experiment was given further impetuswith some ingenious experimental designs [87–90]. The ad-vent of quantum information in the nineties gave a furtherpush: quantum entanglement became a useful resource formany quantum applications, ranging from teleportation [91],communication [92], purification [93], dense coding [94] andcomputation [95, 96]. Interestingly, quantum entanglement isa fascinating area that emerges almost serendipitously fromthe foundation of quantum mechanics into real practical ap-plications in the laboratories.

B. Bell Nonlocality

According to John Bell [97], any theory based on the col-lective premises of locality and realism must be at variancewith experiments conducted by spatially separated parties in-volving shared entanglement, if the underlying measurementevents are space-like separated. The phenomenon, as dis-cussed before, is known as Bell nonlocality [98]. Apart fromits significance in understanding foundations of quantum the-ory, Bell nonlocality is a valuable resource for many emerg-ing device-independent quantum technologies like quantumkey distribution (QKD), distributed computing, randomnesscertification and self-testing [92, 99, 100]. The experimentswhich can potentially manifest Bell nonlocality are knownas Bell experiments. A Bell experiment involves N spatiallyseparated parties A1,A2, · · · ,AN . Each party receives an inputx1,x2,x3, · · · ,xN ∈X and gives an output a1,a2,a3, · · · ,aN ∈A . For various input-output combinations one gets the statis-

x1

a1

x2

a2

FIG. 5. The simplest scenario in which Bell nonlocality can bedemonstrated is the famous Clauser-Horn-Shimony-Holt (CHSH)experiment. There are two parties involved. Each party can per-form two dichotomic measurements. Thus, it is a (2,2,2) scenario.The experimental statistics corresponds to probabilities of the formP(a1,a2|x1,x2) .

tics of the following form:

P = {P(a1,a2, · · · ,aN |x1,x2, · · · ,xN)}x1,··· ,xN∈X ,a1,··· ,aN∈A .

(9)We will refer to P as behavior. A Bell experiment involvingN space-like separated parties, each party having access to minputs and each input corresponding to k outputs is referredto as (N,m,k) scenario. The famous Clauser-Horn-Shimony-Holt (CHSH) experiment is a (2,2,2) scenario [101], and it isthe simplest scenario in which Bell nonlocality can be demon-strated (see Fig. 5).

A behavior P admits local hidden variable description ifand only if

P(a1,a2, · · · ,aN |x1,x2, · · · ,xN) = ∑λ

P(λ )P(a1|x1,λ )

· · ·P(an|xn,λ ) . (10)

This is known as local behavior. The set of local behaviors(L ) forms a convex polytope, and the facets of this polytopeare Bell inequalities (see Fig. 6). In quantum theory, Bornrule governs probability according to

P(a1,a2, · · · ,aN |x1,x2, · · · ,xN)=Tr[[

Ma1|x1 ⊗·· ·⊗MaN |xN

]ρ],

(11)where

{Mai|xi

}i are positive-operator valued measures

(POVMs) and ρ is a shared density matrix. If a behavior sat-isfying Eq.11 falls outside L , it then violates at least one Bellinequality, and such behavior is said to manifest Bell nonlo-cality. The condition that parties do not communicate duringthe course of the Bell experiment is known as no-signallingcondition. Intuitively speaking, it means that the choice of in-put of one party can not be used for signalling among parties.Mathematically it means,

∑a j

P(ai,a j|xi,x j) = P(ai|xi)∀i, j ∈ {1, · · · ,N} and i 6= j.

(12)

7

Bell Inequality

No-Signalling Set

Quantum Set

Local Set

FIG. 6. The set of local behaviors (L ) forms a convex polytope(bounded polyhedron). The set of quantum behaviors Q is a convexset. The set of behaviors compatible with no-signalling condition(N S ) is again a convex polytope. It is important to note L ⊆Q ⊆N S . The hyperplanes defining the boundary of the local set Lare Bell inequalities. It is important to reiterate that the violation ofBell inequalities means that a local realistic description of nature isimpossible.

The set of behaviours which satisfy the no signalling condi-tion are known as no-signalling behaviours. We will denotethe aforementioned set by N S . The no-signalling set alsoforms a polytope. Furthermore, L ⊆Q ⊆N S (see Fig. 6).The no-signalling behaviours which do not lie in Q are alsoknown as post-quantum behaviours. In the (2,2,2) scenarioi.e. CHSH Scenario, there is a unique Bell inequality, namelyCHSH inequality, upto relabelling of inputs and outputs. TheCHSH inequality is given by

E0,0 +E0,0 +E0,0 +E0,0 ≤ 2 (13)

where Ex1,x2 = P(a1 = a2|x1 = x2)−P(a1 6= a2|x1 = x2) . Alllocal hidden variable theories satisfy CHSH inequality. Inquantum theory, suitably chosen measurement settings andstate can lead to violation of CHSH inequality and thus theCHSH inequality can be used to witness Bell nonlocal na-ture of quantum theory. Quantum behaviours achieve upto2√

2, known as the Tsirelson bound. The upper bound forno-signalling behaviours (no-signalling bound) on the CHSHinequality is 4.

C. Contextuality

An intuitive feature of classical models is non-contextualitywhich means that any measurement has a value independentof other compatible measurements it is performed togetherwith. A set of compatible measurements is called context.It was shown by Simon Kochen and Ernst Specker (as well asJohn Bell) [102] that non-contextuality conflicts with quan-tum theory.

FIG. 7. The Mermin-Peres square in (A) provides a proof of Kochen-Specker theorem. Each line in the 3×3 grid describes mutually com-muting operators. Since the square of the operator at each node isthe identity matrix, its eigenvalues are ±1. The product of the oper-ator along each line is the Identity, except for the bold one in whichthe product is minus the Identity matrix.There is no way to assigndefinite value ±1 to each operator to get these product rules sinceeach operator appears in exactly two lines (or contexts). (B) showsthe Mermin’s pentagram where we have four mutually commutingthree-qubit operators. Along each line the product of the operators isIdentity except for the bold one, leading to a contradiction. Classi-cally, it is impossible to assign the numbers ± to each node such thatthe product of the numbers along a straightline is 1, except for thebold line which is -1.

The contextual nature of quantum theory can be establishedvia simple constructive proofs. In Fig. 7 (A), one considers anarray of operators on a two-qubit system in any quantum state.There are nine operators, and each of them has eigenvalue±1.The three operators in each row or column commute and so itis easy to check that each operator is the product of the othertwo operators on a particular row or column, with a singleexception, the third operator in the third column equals theminus of the product of the other two. Suppose there existpre-assigned values (−1 or +1) for the outcomes of the nineoperators, then we can replace the none operators by the pre-assigned values. However, there is no consistent way to as-sign such values to the nine operators so that the product ofthe numbers in every row or column (except for the operatorsalong the bold line) yield 1 (the product yields -1). Noticethat each operator (node) appears in exactly two lines or con-text. Kochen and Specker provided the first proof of quantumcontextuality with a complicated construct involving 117 op-erators on a 3-dimensional space [102]. Another example isthe pentagram [103] in Fig. 7.

Bell nonlocality is regarded as a particular case of con-textuality where the space-like separation of parties involvedcreates ”context” [104, 105]. The study of contextuality hasled not only to insights into the foundations of quantum me-chanics, but it also offers practical implications as well [106–120]. There are several frameworks for contextuality in-cluding sheaf-theoretic framework [121], graph and hyper-graph framework [104, 122], contextuality-by-default frame-work [123–125] and operational framework [126]. A scenarioexhibits contextuality if it does not admit the non-contextualhidden variable (NCHV) model [127].

8

D. Quantum Steering

Correlations produced by steering lie between the Bell non-local correlations and those generated from entangled states[128, 129]. A state that manifests Bell nonlocality for somesuitably chosen measurement settings also exhibits steering[130]. Furthermore, a state which exhibits steering must beentangled. A state demonstrates steering if it does not admit“local hidden state (LHS)” model [129]. We discuss this for-mally here.

Alice and Bob share some unknown quantum state ρAB. Al-ice can perform a set of POVM measurements {Mx

a}a . Theprobability of her getting outcome a after choosing measure-ment x is given by

P(a|x) = Tr[(Mx

a⊗ I)ρAB]= Tr

{TrA[(Mx

a⊗ I)ρAB]}

= Tr[ρ

Ba|x

], (14)

where ρBa|x = TrA

[(Mx

a⊗ I)ρAB]

is Bob’s residual state upon

normalization. A set of operators{

ρBa|x

}a,x

acting on Bob’s

space is called an assemblage if

∑a

ρBa|x = ∑

aρ

Ba|x′ ∀x 6= x′ (15)

and

∑a

Tr[ρ

Ba|x

]= 1 ∀x. (16)

Condition 15 is the analogue of the no-signalling condition.An assemblage

{ρB

a|x

}a,x

is said to admit LHS model if there

exists some hidden variable λ and some quantum state ρλ act-ing on Bob’s space such that

ρBa|x = ∑

λ

P(λ )P(a|x,λ )ρλ . (17)

A bipartite state ρAB is said to be steerable from Alice to Bobif there exist measurements for Alice such that the correspond-ing assemblage does not satisfy equation 17. Determiningwhether an assemblage admits LHS model is a semidefiniteprogram (SDP). The concept of steering is asymmetric by def-inition, i.e. even if Alice could steer Bob’s state, Bob may notbe able to steer Alice’s state.

IV. NEURAL NETWORK AS ORACLE FOR BELLNONLOCALITY

The characterization of the local set for the convex scenariovia Bell inequalities becomes intractable as the complexityof the underlying scenario grows (in terms of the number ofparties, measurement settings and outcomes). For networkswhere several independent sources are shared among manyparties, the situation gets increasingly worse. The local set isremarkably non-convex, and hence proper analytical and nu-merical characterization, in general, is lacking. Applying ma-chine learning technique to tackle these issues were studied by

Canabarro et al. [52] and Krivachy et al. [51]. In the work byCanabarro et al., the detection and characterization of nonlo-cality is done through an ensemble of multilayer perceptronsblended with genetic algorithms (see IV A).

A. Machine Learning Nonlocal Correlations

Given a behavior, deciding whether it is classical or non-classical is an extremely challenging task since the underly-ing scenario grows in complexity very quickly. Canabarro etal. [52] use supervised machine learning with an ensembleof neural networks to tackle the approximate version of theproblem (i.e. with a small margin of error) via regression.The authors ask “How far is a given correlation from the lo-cal set.” The input feature vector to the neural network is arandom correlation vector. For a given behavior, the output(label) is the distance of the feature vector from the classical,i.e. local set. The nonlocality quantifier of a behavior q isthe minimum trace distance, denoted by NL(q) [131]. For thetwo-party scenario, the nonlocality quantifier is given by

NL(q)≡ 12|x||y|

minp∈L ∑a,b,x,y

|q− p|, (18)

where L is the local set and |x|= |y|= m is the input size forthe parties. The training points are generated by sampling theset of non-signalling behaviors randomly and then calculatingits distance from the local set via equation 18. Given a be-havior q, the distance predicted by the neural network is neverequal to the actual distance, i.e. there is always a small error(ε 6=0). Let us represent the learned hypothesis as f : q→ R.The performance metric of the learned hypothesis f is givenby

P( f )≡ 1N

N

∑i=1|NL(qi)− f (qi) |. (19)

In experiments such as entanglement swapping, which com-prises three separated parties sharing two independent sourcesof quantum states, the local set admits the following form andthe set is non-convex,

P(a1,a2,a3) = ∑λ1,λ2

P(λ1)P(λ2)P(a1|x1,λ1)P(a2|x2,λ2)

P(a3|x3,λ3) . (20)

Here, non-convexity emerges from the independence of thesources i.e. P(λ1,λ2) = P(λ1)P(λ2) .

In this case, the Bell inequalities are no longer linear. Char-acterizing the set of classical as well as quantum behaviorsgets complicated for such scenarios [132, 133].

Canabarro et al. train the model for convex as well as non-convex scenarios. They also train the model to learn post-quantum correlations. The techniques studied in the paper arevaluable for understanding Bell nonlocality for large quantumnetworks, for example those in quantum internet.

9

x1 x2 x3

a1 a2 a3

�1 �2Alice CharlieBob

FIG. 8. There are three separated parties with two sources of hiddenvariables. The sources are independent P(λ1,λ2) = P(λ1)P(λ2) .The local set is no longer convex. In experiments such as entangle-ment swapping, such non-convexity arise. Deciding whether a be-haviour is classical or quantum gets complicated for such scenarios.

B. Oracle for Networks

Given an observed probability distribution corresponding toscenarios where several independent sources are distributedover a network, deciding whether it is classical or non-classical is an important question, both from practical as wellas foundational viewpoint. The boundary separating the clas-sical and non-classical correlations is extremely non-convexand thus a rigorous analysis is exceptionally challenging.

In reference [51], the authors encode the causal structureinto the topology of a neural network and numerically de-termine if the target distribution is “learnable”. A behaviorbelongs to the local set if it is learnable. The authors har-ness the fact that the information flow in feedforward neuralnetworks and causal structures are both determined by a di-rected acyclic graph. The topology of the neural network ischosen such that it respects the causality structure. The localset corresponding to even elementary causal networks suchas triangle network is profoundly non-convex, and thus an-alytical characterization of the same is a notoriously trickytask. Using the neural network as an oracle, Krivachy et al.[51] convert the membership in a local set problem to a learn-ability problem. For a neural network with adequate modelcapacity, a target distribution can be approximated if it is lo-cal. The authors examine the triangle network with quaternaryoutcomes as a proof-of-principle example. In such a scenario,there are three independent sources, say α,β and γ . Each ofthe three parties receives input from two of the three sourcesand process the inputs to provide outputs via fixed responsefunctions. The outputs for Alice, Bob and Charlie will be in-dicated by a,b,c ∈ {0,1,2,3} . The scenario as discussed herecan be characterized by the probability distribution P(a,b,c)over the random variables a,b and c. If the network is clas-sical, then the distribution can be represented by a directedacyclic graph known as a Bayesian network (BN).

Assuming the distribution P(a,b,c) over the random vari-ables a,b and c to be classical, it is assumed without loss ofgenerality that the sources send a random variable drawn uni-formly from the interval 0 to 1. A classical distribution for

γ

pA(a|β γ)

α β

pB(b|γ α) pC(c|α β)

γ

a

α

β

b c

*

FIG. 9. Reprinted with permission from Krivachy et al.[51]. (Top)The triangle network configuration. (Bottom) The topology of theneural network is selected such that it reproduces distributions com-patible with the triangle configuration.

such a case admits the following form:

P(a,b,c) =∫ 1

0dαdβdγPA (a|β ,γ)PB (b|γ,α)PC (c|α,β ) .

(21)The neural network is constructed such that it can approx-imate the distribution of type Eq.21. The inputs to theneural network are α,β and γ drawn uniformly at ran-dom and the outputs are the conditional probabilities i.e.PA (a|β ,γ) ,PB (b|γ,α) and PC (c|α,β ).

The cost function is chosen to be any measure of the dis-tance between the target distribution Pt and the network’s out-put PM. The authors employ the techniques to a few othercases, such as the elegant distribution and a distribution pro-posed by Renou et al. [134]. The application of the techniqueto the elegant distribution suggests that the distribution is in-deed nonlocal as conjectured in [135]. Furthermore, the dis-tribution proposed by Renou et al. appears to have nonlocalfeatures for some parameter regime. For the sake of complete-ness, now we discuss the elegant distribution and the distribu-tion proposed by Renou et al.

Elegant distribution – The distribution is generated bythree parties performing entangled measurements on entan-gles systems. The three parties share singlets i.e. |ψ−〉 =

1√2(|01〉− |10〉) Every party performs entangled measure-

ments on their two qubits. The eigenstates of the entangledmeasurements are given by

|χ j〉=√

32|α j,−α j〉+ i

√3−12|ψ−〉, (22)

where |α j〉 are vectors symmetrically distributed on the Blochsphere i.e. point to the vertices of a tetrahedron, for j ∈{1,2,3,4}.

10

Renou et al. distribution – The distribution is gener-ated by three parties sharing the entangled state |φ+〉 =

1√2(|00〉+ |11〉) and performing the same measurement on

each of their two qubits. The measurements are charac-terized by a single parameter κ ∈

[1√2,1]

with eigenstates

|01〉, |10〉,u|00〉+√

1−u2|11〉 and√

1−u2|00〉−u|11〉.

V. MACHINE LEARNING FOR OPTIMIZING SETTINGSFOR BELL NONLOCALITY

Bell inequalities have become a standard tool to reveal thenon-local structure of quantum mechanics. However, findingthe best strategies to violate a given Bell inequality can be adifficult task, especially for many-body settings or even non-convex scenarios. Especially the latter setting is challenging,as standard optimisation tools are unable to be applied to thiscase. To violate a given Bell inequality, two inter-dependenttasks have to be addressed: Which measurements have tobe performed to reveal the non-locality? And which quan-tum states show the maximal violation? Recently, Dong-LingDeng has approached the latter task for convex settings withmany-body Bell inequalities using restricted Boltzmann ma-chines [54]. Bharti et al. [53] have approached both tasks inconjunction for both convex and non-convex inequalities.

A. Detecting Bell nonlocality with many-body states

Several methods from machine learning have been adoptedto tackle intricate quantum many-body problems [54, 59, 136,137]. Dong-Ling Deng [54] employs machine learning tech-niques to detect quantum nonlocality in many-body systemsusing the restricted Boltzmann machine (RBM) architecture.The key idea can be split into two parts:

• After choosing appropriate measurement settings, Bellinequalities for convex scenarios can be expressed as anoperator. This operator can be thought of as an Hamilto-nian. The eigenstate with the maximal eigenvalue is thestate that gives the maximum violation for the underly-ing Bell inequality. This state can be found by calculat-ing the ground state of the negative of the Hamiltonian.

• Finding the ground state of a quantum Hamiltonian isQMA-hard and thus in general difficult. However, us-ing heuristic techniques involving the RBM architec-ture, the problem is recast into the task of finding theapproximate answer in some cases.

Techniques like density matrix renormalization group(DMRG) [138], projected entangled pair states (PEPS) [139]and multiscale entanglement renormalization ansatz (MERA)[140] are traditionally used to find (or approximate) groundstates of many-body Hamiltonians. But these techniquesonly work reliably for optimization problems involving low-entanglement states. Moreover, DMRG works well only forsystems with short-range interactions in the one-dimensional

case. As evident from references [59], RBM can repre-sent quantum many-body problems beyond 1-D and low-entanglement.

B. Playing Bell Nonlocal Games

Prediction of winning strategies for (classical) games anddecision-making processes with reinforcement learning (RL)has made significant progress in game theory in recent years.Motivated partly by these results, the authors in Bharti et al.[53] have looked at a game-theoretic formulation of Bell in-equalities (known as Bell games) and applied machine learn-ing techniques to it. To violate a Bell inequality, both thequantum state as well as the measurements performed on thequantum states have to be chosen in a specific manner. Theauthors transform this problem into a decision making pro-cess. This is achieved by choosing the parameters in a Bellgame in a sequential manner, e.g. the angles of the measure-ment operators, the angles parameterizing the quantum states,or both. Using RL, these sequential actions are optimizedfor the best configuration corresponding to the optimal/near-optimal quantum violation (see Fig.10). The authors train theRL agent with a cost function that encourages high quantumviolations via proximal policy optimization - a state-of-the-artRL algorithm that combines two neural networks. The ap-proach succeeds for well known convex Bell inequalities, butit can also solve Bell inequalities corresponding to non-convexoptimization problems, such as in larger quantum networks.So far, the field has struggled solving these inequalities; thus,this approach offers a novel possibility towards finding opti-mal (or near-optimal) configurations.

AGENT

ENVIRONMENT

SELECT QUANTUM STATE(S) AND MEASUREMENT SETTINGS

TRAIN WITH REWARD

FIG. 10. Reprinted with permission from Ref [53]. Playing Bellgames with AI: In [53], the authors train AI agent to play variousBell nonlocal games. The agent interacts with the quantum systemby choosing quantum states and measurement angles, and measuresresulting violation of the Bell inequalities. Over repeated games, theagent trains himself using past results to realize maximal violation ofBell inequalities.

11

Furthermore, the authors present an approach to find set-tings corresponding to maximum quantum violation of Bellinequalities on near-term quantum computers. The quantumstate is parameterized by a circuit consisting of single-qubitrotations and CNOT gates acting on neighbouring qubits ar-ranged in a linear fashion. Both the gates and the measure-ment angles are optimized using a variational hybrid classic-quantum algorithm, where the classical optimization is per-formed by RL. The RL agent learns by first randomly try-ing various measurement angles and quantum states. Overthe course of the training, the agent improves itself by learn-ing from experience, and is capable of reaching the maximalquantum violation.

VI. MACHINE LEARNING-ASSISTED STATECLASSIFICATION

A crucial problem in quantum information is identifyingthe degree of entanglement within a given quantum state.For Hilbert space dimensions up to 6, one can use Peres-Horodecki criterion, also known as positive partial transposecriterion (PPT) to distinguish entangled and separable states.However, there is no generic observable or entanglement wit-ness as it is in fact a NP-hard problem [80]. Thus, one mustrely on heuristic approaches. This poses a fundamental ques-tion: Given partial or full information about the state, are thereways to classify whether it is entangled or not? Machine learn-ing has offered a way to find answers to this question.

A. Classification with Bell Inequalities

In reference [141], the authors blend Bell inequalities witha feed-forward neural network to use them as state classifiers.The goal is to classify states as entangled or separable. If astate violates a Bell inequality, it must be entangled. However,Bell inequalities cannot be used as a reliable tool for entangle-ment classification i.e. even if a state is entangled, it might notviolate an entanglement witness based on the Bell inequality.For example, the density matrix ρ = p|ψ−〉〈ψ−|+(1− p) I

4violates the CHSH inequality only for p > 1√

2, but is entan-

gled for p > 13 [142]. Moreover, given a Bell inequality, the

measurement settings that witness entanglement of a quan-tum state (if possible) depend on the quantum state. Promptedby these issues, the authors ask if they can transform Bellinequalities into a reliable separable-entangled states classi-fier. The coefficients of the terms in the CHSH inequalitiesare (1,1,1,−1). The local hidden variable bound on the in-equality is 2 (see equation 13). Assuming fixed measure-ment settings corresponding to CHSH inequality, the authorsexamine whether it is possible to get better performance interms of entanglement classification compared with the values(1,1,1,−1,2). The main challenge to answer such a questionin a supervised learning setting is to get labelled data that isverified to be either separable or entangled.

To train the neural network, the correlation vector corre-sponding to the appropriate Bell inequality was chosen as in-

put with the state being entangled or separable (1 versus 0)as corresponding output. The correlation vector contains theexpectation of the product of Alice’s and Bob’s measurementoperators. The performance of the network improved as themodel capacity was increased, which hints that the hypothesiswhich separates entangled states from separable ones must besufficiently “complex.” The authors also trained a neural net-work to distinguish bi-separable and bound-entangled states.

B. Classification by Representing Quantum States withRestricted Boltzmann Machines

Harney et al. [143] use reinforcement learning with RBMsto detect entangled states. RBMs have demonstrated being ca-pable of learning complex quantum states. The authors mod-ify RBMs such that they can only represent separable states.This is achieved by separating the RBM into K partitions thatare only connected within themselves, but not with the otherpartitions. Each partition represents a (possibly) entangledsubsystems, that however is not entangled with the other par-titions. This choice enforces a specific K-separable Ansatz ofthe wavefunction. This RBM is trained to represent a targetstate. If the training converges, it must be representable by theAnsatz and thus be K-separable. However, if the training doesnot converge, the Ansatz is insufficient and the target state iseither of a different K′-separable form or fully entangled.

C. Classification with Tomographic Data

Can tools from machine learning help to distinguish entan-gled and separable states given the full quantum state (e.g.obtained by quantum tomography) as an input? Two recentstudies address this question.

In Lu et al. [144], the authors detect entanglement by em-ploying classic (i.e. non-deep learning) supervised learning.To simplify data generation of entangled and separable states,the authors approximate the set of separable states by a con-vex hull (CHA). States that lie outside the convex hull areassumed to be most likely entangled. For the decision pro-cess, the authors use ensemble training via bootstrap aggre-gating (bagging). Here, various supervised learning methodsare trained on the data, and they form a committee togetherthat decides whether a given state is entangled or not. The al-gorithm is trained with the quantum state and information en-coding the position relative to the convex hull as inputs. Theauthors show that accuracy improves if bootstrapping of ma-chine learning methods is combined with CHA.

In a different approach Goes et al. [145], the authors presentan automated machine learning approach to classify randomstates of two qutrits as separable (SEP), entangled with posi-tive partial transpose (PPTES) or entangled with negative par-tial transpose (NPT). For training, the authors elaborate a wayto find enough samples to train on. The procedure is as fol-lows: A random quantum state is sampled, then using theGeneral Robustness of Entanglement (GR) and PPT criterion,it is classified to either SEP, PPTES or NPT. The GR measures

12

the closeness to the set of separable states. The authors com-pare various supervised learning methods to distinguish thestates. The input features fed into the machine are the compo-nents of the quantum state vector and higher-order combina-tions thereof, whereas the labels are the type of entanglement.Besides, they also train to estimate the GR with regressiontechniques and use it to validate the classifiers.

VII. NEURAL NETWORKS AS “HIDDEN" VARIABLEMODELS FOR QUANTUM SYSTEMS

Understanding why deep neural networks work so well isan active area of research. The presence of the word “hidden”for hidden variables in quantum foundations and hidden neu-rons in deep learning neurons may not be that accidental. Us-ing conditional Restricted Boltzmann machines (a variant ofRestricted Boltzmann machines), Steven Weinstein providesa completion of quantum theory in reference [146]. The com-pletion, however, doesn’t contradict Bell’s theorem as the as-sumption of “statistical independence” is not respected. Thestatistical independence assumption demands that the com-plete description of the system before measurement must beindependent of the final measurement settings. The phenom-ena where apparent nonlocality is observed by violating statis-tical independence assumption is known as “nonlocality with-out nonlocality” [147].

In a Bell-experiment corresponding to CHSH scenario, thedetector settings α ∈ {a,a′} and β ∈ {b,b′}, and the cor-responding measurement outcomes xα ∈ {1,−1}and xβ ∈{1,−1} for a single experimental trial can be representedas a four-dimensional vector

{α,β ,xα ,xβ

}. Such a vector

can be encoded in a binary vector V = (v1,v2,v3,v4) wherevi ∈ {0,1} . Here, (+1)/(−1) has been mapped to 0/(+1).The four-dimensional binary vector V represents the valuetaken by four visible units of an RBM. The dependencies be-tween the visible units is encoded using sufficient number ofhidden units H = (h1,h2, · · · ,h j) . With four hidden neurons,the authors could reproduce the statistics predicted by EPRexperiment with high accuracy. For example, say the vectorV = (0,1,1,0) occurs in 3% of the trials, then after trainingthe machine would associate P(V )≈ 0.03. Quantum mechan-ics gives us only the conditional probabilities P

(xα ,xβ |α,β

)and thus learning joint probability using RBM is resource-wasteful. The authors harness this observation by encodingthe conditional statistics only in a conditional RBM (cRBM).

The difference between a cRBM and RBM is that the unitscorresponding to the conditioning variables (detector settingshere) are not dynamical variables. There are no probabil-ities assigned to conditioning variables, and thus the onlyprobabilities generated by cRBM are conditional probabili-ties. This provides a more compact representation comparedto an RBM.

VIII. A FEW MORE APPLICATIONS

Work on quantum foundations has led to the birth of quan-tum computing and quantum information. Recently, the amal-gamation of quantum theory and machine learning has led toa new area of research, namely quantum machine learning[148]. For further reading on quantum machine learning, referto [149] and references therein. Further, for recent trends andexploratory works in quantum machine learning, we refer thereader to [150] and references therein.

Techniques from machine learning have been used to dis-cover new quantum experiments [151, 152]. In reference[152], Krenn et. al. provide a computer program whichhelps designing novel quantum experiments. The programwas called Melvin. Melvin provided solutions which werequite counter-intuitive and different than something a humanscientist ordinarily would come up with. Melvin’s solutionshave led to many novel results [153–158]. The ideas fromMelvin have further provided machine generated proofs ofKochen-Specker theorem [159].

IX. CONCLUSION AND FUTURE WORK

In this survey, we discussed the various applications ofmachine learning for problems in the foundations of quan-tum theory such as determination of the quantum bound forBell inequalities, the classification of different behaviors inlocal/nonlocal sets, using hidden neurons as hidden variablesfor completion of quantum theory, training AI for playing Bellnonlocal games, ML-assisted state classification, and so forth.Now we discuss a few open questions at the interface. Someof these open questions have been mentioned in references[51–54]

Witnessing Bell nonlocality in many-body systems is anactive area of research [54, 160]. However, designingexperimental-friendly many-body Bell inequalities is a diffi-cult task. It would be interesting if machine learning couldhelp design optimal Bell inequalities for scenarios involvingmany-body systems. In reference [54], the author used RBMbased representation coupled with reinforcement learning tofind near-optimal quantum values for various Bell inequali-ties corresponding to various convex Bell scenarios. It is wellknown that optimization becomes comparatively easier oncethe representation gets compact. It would be interesting if onecan use other neural networks based representations such asconvolutional neural networks for finding optimal (or near-optimal) quantum values.

As mentioned in reference [52], it is an excellent idea todeploy techniques like anomaly detection for the detection ofnon-classical behaviors. This can be done by subjecting themachine to training with local behaviors only.

In many of the applications, e.g. classification of entangledstates, the computer gives a guess, but we are not sure aboutthe correctness. This is never said and cannot be overlooked,as it is a limitation of these applications. The intriguing ques-tion is to understand how the output of the computer can beemployed to provide a certification of the result, for instance,

13

an entanglement witness. One could use ideas from proba-bilistic machine learning in such cases [161]. The probabilis-tic framework, which explains how to represent and handleuncertainty about models and predictions, plays a fundamen-tal role in scientific data analysis. Harnessing the tools fromprobabilistic machine learning such as Bayesian optimizationand automatic model discovery could be conducive in under-standing how to utilize machine output to provide a certifica-tion of the result.

In reference [53], the authors used reinforcement learningto train AI to play Bell nonlocal games and obtain optimal (ornear-optimal) performance. The agent is offered a reward atthe end of each epoch which is equal to the expectation valueof the Bell operator corresponding to the state and measure-ment settings chosen by the agent. Such a reward scheme issparse and hence it might not be scalable. It would be inter-esting to come up with better reward schemes. Furthermore,in this approach only a single agent tries to learn the opti-mization landscape and discovers near-optimal (or optimal)measurement settings and state. It would be exciting to ex-tend the approach to multi-agent setting where every space-like separated party is considered a separate agent. It is worthmentioning that distributing the actions and observations of asingle agent into a list of agents reduces the dimensionalityof agent inputs and outputs. Furthermore, it also dramaticallyimproves the amount of training data produced per step of theenvironment. Agents learn better if they tend to interact ascompared to the case of solitary learning.

Bell inequalities separate the set of local behaviours fromthe set of non-local behaviours. The analogous boundary sep-arating quantum from the post-quantum set is known as quan-tum Bell inequality [162, 163]. Finding quantum Bell inequal-ities is an interesting and challenging problem. However, onecan aim to obtain the approximate expression by supervisedlearning with the quantum Bell inequalities being the bound-ary separating the quantum set from the post-quantum set.Moreover, it is interesting to see if it is possible to guess phys-ical principles by merely opening the neural-network blackbox.

Driven by the success of machine learning in Bell nonlo-

cality, it is genuine to ask if the methods could be useful tosolve problems in quantum steering and contextuality. Re-cently, ideas from the exclusivity graph approach to contex-tuality were used to investigate problems involving causal in-ference [164]. Ideas from quantum foundations could furtherassist in developing a deeper understanding of machine learn-ing or in general artificial intelligence.

In artificial intelligence, one of the tests to distinguish be-tween humans and machines is the famous “Turing Test (TT)”due to Alan Turing [62, 165]. The purpose of TT is to de-termine if a computer is linguistically distinguishable from ahuman. In TT, a human and a machine are sealed in differentrooms. A human jury who does not know which room con-tains a human and which room not, asks questions to them,by email, for example. Based on the returned outcome, ifthe judge cannot do better than fifty-fifty, then the machinein question is said to have passed TT. The task of distinguish-ing the humans from machine based on the statistics of theanswers (say output a) given questions (say input x) is a sta-tistical distinguishability test assuming the rooms plus its in-habitants as black boxes. In the black-box approach to quan-tum theory, experiments are regarded as a black box where theexperimentalist introduces a measurement (input) and obtainsthe outcome of the measurement (output). One of the cen-tral goals of this approach is to deduce statements regardingthe contents of the black box based on input-output statistics[68]. It would be nice to see if techniques from the black-boxapproach to quantum theory could be connected to TT.

ACKNOWLEDGMENTS

We wish to acknowledge the support of the Ministry of Ed-ucation and the National Research Foundation, Singapore. Wethank Valerio Scarani for valuable discussions.

DATA AVAILABILITY

Data sharing is not applicable to this survey as no new datawere created or analyzed in this study.

[1] Shai Shalev-Shwartz and Shai Ben-David. Understanding ma-chine learning: From theory to algorithms. Cambridge univer-sity press, 2014.

[2] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deeplearning. MIT press, 2016.

[3] Mohammed AlQuraishi. Alphafold at casp13. Bioinformatics,35(22):4862–4865, 2019.

[4] James P Bridge, Sean B Holden, and Lawrence C Paulson.Machine learning for first-order theorem proving. Journal ofautomated reasoning, 53(2):141–172, 2014.

[5] Antonio Lavecchia. Machine-learning approaches in drug dis-covery: methods and applications. Drug discovery today,20(3):318–331, 2015.

[6] Louis-François Arsenault, O Anatole von Lilienfeld, and An-drew J Millis. Machine learning for many-body physics: effi-cient solution of dynamical mean-field theory. arXiv preprintarXiv:1506.08858, 2015.

[7] Yi Zhang and Eun-Ah Kim. Quantum loop topography formachine learning. Physical review letters, 118(21):216401,2017.

[8] Juan Carrasquilla and Roger G Melko. Machine learningphases of matter. Nature Physics, 13(5):431–434, 2017.

[9] Evert PL Van Nieuwenburg, Ye-Hua Liu, and Sebastian D Hu-ber. Learning phase transitions by confusion. Nature Physics,13(5):435–439, 2017.

[10] Dong-Ling Deng, Xiaopeng Li, and S Das Sarma. Ma-chine learning topological states. Physical Review B,

14

96(19):195145, 2017.[11] Lei Wang. Discovering phase transitions with unsupervised

learning. Physical Review B, 94(19):195105, 2016.[12] Peter Broecker, Juan Carrasquilla, Roger G Melko, and Simon

Trebst. Machine learning quantum phases of matter beyondthe fermion sign problem. Scientific reports, 7(1):1–10, 2017.

[13] Kelvin Chng, Juan Carrasquilla, Roger G Melko, and EhsanKhatami. Machine learning phases of strongly correlatedfermions. Physical Review X, 7(3):031038, 2017.

[14] Yi Zhang, Roger G Melko, and Eun-Ah Kim. Machine learn-ing z 2 quantum spin liquids with quasiparticle statistics. Phys-ical Review B, 96(24):245119, 2017.

[15] Sebastian J Wetzel. Unsupervised learning of phase transi-tions: From principal component analysis to variational au-toencoders. Physical Review E, 96(2):022140, 2017.

[16] Wenjian Hu, Rajiv RP Singh, and Richard T Scalettar. Discov-ering phases, phase transitions, and crossovers through unsu-pervised machine learning: A critical examination. PhysicalReview E, 95(6):062122, 2017.

[17] Nobuyuki Yoshioka, Yutaka Akagi, and Hosho Katsura.Learning disordered topological phases by statistical recoveryof symmetry. Physical Review B, 97(20):205110, 2018.

[18] Giacomo Torlai and Roger G Melko. Learning thermo-dynamics with boltzmann machines. Physical Review B,94(16):165134, 2016.

[19] Hsin-Yuan Huang, Kishor Bharti, and Patrick Rebentrost.Near-term quantum algorithms for linear systems of equations.arXiv preprint arXiv:1909.07344, 2019.

[20] Ken-Ichi Aoki and Tamao Kobayashi. Restricted boltzmannmachines for the long range ising models. Modern PhysicsLetters B, 30(34):1650401, 2016.

[21] Yi-Zhuang You, Zhao Yang, and Xiao-Liang Qi. Machinelearning spatial geometry from entanglement features. Physi-cal Review B, 97(4):045153, 2018.

[22] Mario Pasquato. Detecting intermediate mass black holesin globular clusters with machine learning. arXiv preprintarXiv:1606.08548, 2016.

[23] Yashar D Hezaveh, Laurence Perreault Levasseur, and Philip JMarshall. Fast automated analysis of strong gravita-tional lenses with convolutional neural networks. Nature,548(7669):555–557, 2017.

[24] Rahul Biswas, Lindy Blackburn, Junwei Cao, Reed Essick,Kari Alison Hodge, Erotokritos Katsavounidis, KyungminKim, Young-Min Kim, Eric-Olivier Le Bigot, Chang-HwanLee, et al. Application of machine learning algorithms to thestudy of noise artifacts in gravitational-wave data. PhysicalReview D, 88(6):062003, 2013.

[25] Benjamin P Abbott, Richard Abbott, TD Abbott, MR Aber-nathy, Fausto Acernese, Kendall Ackley, Carl Adams, ThomasAdams, Paolo Addesso, RX Adhikari, et al. Observation ofgravitational waves from a binary black hole merger. Physicalreview letters, 116(6):061102, 2016.

[26] Sergei V Kalinin, Bobby G Sumpter, and Richard KArchibald. Big–deep–smart data in imaging for guiding mate-rials design. Nature materials, 14(10):973–980, 2015.

[27] Samuel S Schoenholz, Ekin D Cubuk, Daniel M Sussman,Efthimios Kaxiras, and Andrea J Liu. A structural approach torelaxation in glassy liquids. Nature Physics, 12(5):469–471,2016.

[28] Junwei Liu, Yang Qi, Zi Yang Meng, and Liang Fu.Self-learning monte carlo method. Physical Review B,95(4):041101, 2017.

[29] Li Huang and Lei Wang. Accelerated monte carlo simula-tions with restricted boltzmann machines. Physical Review B,

95(3):035105, 2017.[30] Giacomo Torlai, Guglielmo Mazzola, Juan Carrasquilla,

Matthias Troyer, Roger Melko, and Giuseppe Carleo. Neural-network quantum state tomography. Nature Physics,14(5):447–450, 2018.

[31] Jing Chen, Song Cheng, Haidong Xie, Lei Wang, and Tao Xi-ang. Equivalence of restricted boltzmann machines and tensornetwork states. Physical Review B, 97(8):085104, 2018.

[32] Yichen Huang and Joel E Moore. Neural network represen-tation of tensor network and chiral states. arXiv preprintarXiv:1701.06246, 2017.

[33] Frank Schindler, Nicolas Regnault, and Titus Neupert. Prob-ing many-body localization with neural networks. PhysicalReview B, 95(24):245134, 2017.

[34] Tobias Haug, Rainer Dumke, Leong-Chuan Kwek, ChristianMiniatura, and Luigi Amico. Engineering quantum currentstates with machine learning. arXiv:1911.09578, 2019.

[35] Zi Cai and Jinguo Liu. Approximating quantum many-bodywave functions using artificial neural networks. Physical Re-view B, 97(3):035116, 2018.

[36] Peter Broecker, Fakher F Assaad, and Simon Trebst. Quantumphase recognition via unsupervised machine learning. arXivpreprint arXiv:1707.00663, 2017.

[37] Yusuke Nomura, Andrew S Darmawan, Youhei Yamaji, andMasatoshi Imada. Restricted boltzmann machine learning forsolving strongly correlated quantum systems. Physical ReviewB, 96(20):205152, 2017.

[38] Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Reben-trost, Nathan Wiebe, and Seth Lloyd. Quantum machine learn-ing. Nature, 549(7671):195–202, 2017.

[39] Tobias Haug, Wai-Keong Mok, Jia-Bin You, Wenzu Zhang,Ching Eng Png, and Leong-Chuan Kwek. Classifyingglobal state preparation via deep reinforcement learning.arXiv:2005.12759, 2020.

[40] Giacomo Torlai and Roger G Melko. Neural decoder for topo-logical codes. Physical review letters, 119(3):030501, 2017.

[41] Marin Bukov. Reinforcement learning for autonomous prepa-ration of floquet-engineered states: Inverting the quantumkapitza oscillator. Physical Review B, 98(22):224305, 2018.

[42] Marin Bukov, Alexandre GR Day, Dries Sels, Phillip Wein-berg, Anatoli Polkovnikov, and Pankaj Mehta. Reinforcementlearning in different phases of quantum control. Physical Re-view X, 8(3):031086, 2018.

[43] Koji Hashimoto, Sotaro Sugishita, Akinori Tanaka, and AkioTomiya. Deep learning and the ads/cft correspondence. Phys-ical Review D, 98(4):046019, 2018.

[44] Jonathan Carifio, James Halverson, Dmitri Krioukov, andBrent D Nelson. Machine learning in the string landscape.Journal of High Energy Physics, 2017(9):157, 2017.

[45] Henry W Lin, Max Tegmark, and David Rolnick. Why doesdeep and cheap learning work so well? Journal of StatisticalPhysics, 168(6):1223–1247, 2017.

[46] Andrzej Cichocki. Tensor networks for big data analyt-ics and large-scale optimization problems. arXiv preprintarXiv:1407.3124, 2014.

[47] Giuseppe Carleo, Ignacio Cirac, Kyle Cranmer, LaurentDaudet, Maria Schuld, Naftali Tishby, Leslie Vogt-Maranto,and Lenka Zdeborová. Machine learning and the physical sci-ences. Review of Modern Physics, 91(4):045002, 2019.

[48] Vedran Dunjko, Jacob M Taylor, and Hans J Briegel.Quantum-enhanced machine learning. Physical review letters,117(13):130501, 2016.

[49] Marcello Benedetti, Erika Lloyd, Stefan Sack, and MattiaFiorentini. Parameterized quantum circuits as machine learn-

15

ing models. Quantum Science and Technology, 4(4):043001,2019.

[50] Lucien Hardy and Robert Spekkens. Why physics needs quan-tum foundations. arXiv preprint arXiv:1003.5008, 2010.

[51] Tamás Kriváchy, Yu Cai, Daniel Cavalcanti, Arash Tavakoli,Nicolas Gisin, and Nicolas Brunner. A neural network oraclefor quantum nonlocality problems in networks. arXiv preprintarXiv:1907.10552, 2019.

[52] Askery Canabarro, Samuraí Brito, and Rafael Chaves. Ma-chine learning nonlocal correlations. Physical review letters,122(20):200401, 2019.

[53] Kishor Bharti, Tobias Haug, Vlatko Vedral, and Leong-ChuanKwek. How to teach ai to play bell non-local games: Rein-forcement learning. arXiv preprint arXiv:1912.10783, 2019.

[54] Dong-Ling Deng. Machine learning detection of bell nonlo-cality in quantum many-body systems. Physical review letters,120(24):240402, 2018.

[55] Raban Iten, Tony Metger, Henrik Wilming, Lídia Del Rio, andRenato Renner. Discovering physical concepts with neuralnetworks. Physical Review Letters, 124(1):010508, 2020.

[56] Tom M Mitchell et al. Machine learning. 1997. Burr Ridge,IL: McGraw Hill, 45(37):870–877, 1997.

[57] David Silver, Julian Schrittwieser, Karen Simonyan, IoannisAntonoglou, Aja Huang, Arthur Guez, Thomas Hubert, LucasBaker, Matthew Lai, Adrian Bolton, et al. Mastering the gameof go without human knowledge. Nature, 550(7676):354,2017.

[58] Geoffrey E Hinton. A practical guide to training restrictedboltzmann machines. In Neural networks: Tricks of the trade,pages 599–619. Springer, 2012.

[59] Giuseppe Carleo and Matthias Troyer. Solving the quantummany-body problem with artificial neural networks. Science,355(6325):602–606, 2017.

[60] John McCarthy, Marvin L Minsky, Nathaniel Rochester, andClaude E Shannon. A proposal for the dartmouth summerresearch project on artificial intelligence, august 31, 1955. AImagazine, 27(4):12–12, 2006.

[61] Alan M Turing. Computing machinery and intelligence. InParsing the Turing Test, pages 23–65. Springer, 2009.

[62] Stuart Russell and Peter Norvig. Artificial intelligence: a mod-ern approach. 2002.

[63] Christopher A Fuchs and Rüdiger Schack. A quantum-bayesian route to quantum-state space. Foundations ofPhysics, 41(3):345–356, 2011.

[64] Hugh Everett III. " relative state" formulation of quantum me-chanics. Reviews of modern physics, 29(3):454, 1957.

[65] Bryce Seligman DeWitt and Neill Graham. The many worldsinterpretation of quantum mechanics. Princeton UniversityPress, 2015.

[66] Roland Omnes. Consistent interpretations of quantum me-chanics. Reviews of Modern Physics, 64(2):339, 1992.

[67] Jonathan Barrett. Information processing in generalized prob-abilistic theories. Physical Review A, 75(3):032304, 2007.

[68] Antonio Acín and Miguel Navascués. Black box quantummechanics. In Quantum [Un] Speakables II, pages 307–319.Springer, 2017.

[69] Wim Van Dam. Implausible consequences of superstrong non-locality. Natural Computing, 12(1):9–12, 2013.

[70] Gilles Brassard, Harry Buhrman, Noah Linden, André AllanMéthot, Alain Tapp, and Falk Unger. Limit on nonlocality inany world in which communication complexity is not trivial.Physical Review Letters, 96(25):250401, 2006.

[71] Marcin Pawłowski, Tomasz Paterek, Dagomir Kaszlikowski,Valerio Scarani, Andreas Winter, and Marek Zukowski.

Information causality as a physical principle. Nature,461(7267):1101–1104, 2009.

[72] Miguel Navascués and Harald Wunderlich. A glance be-yond the quantum model. Proceedings of the Royal So-ciety A: Mathematical, Physical and Engineering Sciences,466(2115):881–890, 2010.

[73] Noah Linden, Sandu Popescu, Anthony J Short, and AndreasWinter. Quantum nonlocality and beyond: limits from non-local computation. Physical review letters, 99(18):180502,2007.

[74] Bin Yan. Quantum correlations are tightly bound by the ex-clusivity principle. Physical review letters, 110(26):260406,2013.

[75] Tobias Fritz, Ana Belén Sainz, Remigiusz Augusiak, J BohrBrask, Rafael Chaves, Anthony Leverrier, and Antonio Acín.Local orthogonality as a multipartite principle for quantumcorrelations. Nature communications, 4(1):1–7, 2013.

[76] Kelvin J McQueen. Four tails problems for dynamical collapsetheories. Studies in History and Philosophy of Science Part B:Studies in History and Philosophy of Modern Physics, 49:10–18, 2015.

[77] Rafael D Sorkin. Quantum mechanics as quantum measuretheory. Modern Physics Letters A, 9(33):3119–3127, 1994.

[78] Ognyan Oreshkov, Fabio Costa, and Caslav Brukner. Quan-tum correlations with no causal order. Nature communica-tions, 3(1):1–8, 2012.

[79] Jean-Michel Raimond, M Brune, and Serge Haroche. Manip-ulating quantum entanglement with atoms and photons in acavity. Reviews of Modern Physics, 73(3):565, 2001.

[80] Ryszard Horodecki, Paweł Horodecki, Michał Horodecki, andKarol Horodecki. Quantum entanglement. Reviews of modernphysics, 81(2):865, 2009.

[81] William K Wootters. Entanglement of formation and con-currence. Quantum Information & Computation, 1(1):27–44,2001.

[82] Michal Horodecki, Pawel Horodecki, and Ryszard Horodecki.On the necessary and sufficient conditions for separabil-ity of mixed quantum states. Phys. Lett. A, 223(quant-ph/9605038):1, 1996.

[83] Barbara M Terhal. Detecting quantum entanglement. Theo-retical computer science, 287(1):313–335, 2002.

[84] Philipp Krammer, Hermann Kampermann, Dagmar Bruß,Reinhold A Bertlmann, Leong Chuang Kwek, and ChiaraMacchiavello. Multipartite entanglement detection via struc-ture factors. Physical review letters, 103(10):100502, 2009.

[85] Artur K Ekert, Carolina Moura Alves, Daniel KL Oi, MichałHorodecki, Paweł Horodecki, and Leong Chuan Kwek. Directestimations of linear and nonlinear functionals of a quantumstate. Physical review letters, 88(21):217901, 2002.

[86] Judea Pearle. Causality. Cambridge: Cambridge UniversityPress, 2000.

[87] Stuart J Freedman and John F Clauser. Experimental testof local hidden-variable theories. Physical Review Letters,28(14):938, 1972.

[88] Carl A Kocher and Eugene D Commins. Polarization correla-tion of photons emitted in an atomic cascade. Physical ReviewLetters, 18(15):575, 1967.

[89] Alain Aspect. Expériences basées sur les inégalités de bell. LeJournal de Physique Colloques, 42(C2):C2–63, 1981.

[90] Alain Aspect, Jean Dalibard, and Gérard Roger. Experimentaltest of bell’s inequalities using time-varying analyzers. Physi-cal review letters, 49(25):1804, 1982.

[91] Charles H Bennett, Gilles Brassard, Claude Crépeau, RichardJozsa, Asher Peres, and William K Wootters. Tele-

16

porting an unknown quantum state via dual classical andeinstein-podolsky-rosen channels. Physical review letters,70(13):1895, 1993.

[92] Artur K Ekert. Quantum cryptography based on bell’s theo-rem. Physical Review Letters, 67(6):661, 1991.

[93] Charles H Bennett, Gilles Brassard, Sandu Popescu, BenjaminSchumacher, John A Smolin, and William K Wootters. Pu-rification of noisy entanglement and faithful teleportation vianoisy channels. Physical review letters, 76(5):722, 1996.

[94] Charles H Bennett and Stephen J Wiesner. Communicationvia one-and two-particle operators on einstein-podolsky-rosenstates. Physical review letters, 69(20):2881, 1992.

[95] David Deutsch. Quantum theory, the church–turing principleand the universal quantum computer. Proceedings of the RoyalSociety of London. A. Mathematical and Physical Sciences,400(1818):97–117, 1985.

[96] Richard P Feynman. Feynman lectures on computation. CRCPress, 2018.

[97] J. S. Bell. On the einstein podolsky rosen paradox. Physics(Long Island City, N.Y.), 1:195–200, Nov 1964.

[98] Nicolas Brunner, Daniel Cavalcanti, Stefano Pironio, ValerioScarani, and Stephanie Wehner. Bell nonlocality. Reviews ofModern Physics, 86(2):419, 2014.

[99] Stefano Pironio, Antonio Acín, Serge Massar, A Boyerde La Giroday, Dzmitry N Matsukevich, Peter Maunz, StevenOlmschenk, David Hayes, Le Luo, T Andrew Manning,et al. Random numbers certified by bell’s theorem. Nature,464(7291):1021, 2010.

[100] Dominic Mayers and Andrew Yao. Self testing quantum ap-paratus. Quantum Info. Comput., 4(4):273–286, July 2004.

[101] John F. Clauser, Michael A. Horne, Abner Shimony, andRichard A. Holt. Proposed experiment to test local hidden-variable theories. Physical Review Letters, 23:880–884, Oct1969.

[102] SIMON KOCHEN and EP SPECKER. The problem of hiddenvariables in quantum mechanics. Journal of Mathematics andMechanics, 17(1):59–87, 1967.

[103] N David Mermin. Hidden variables and the two theorems ofjohn bell. Reviews of Modern Physics, 65(3):803, 1993.

[104] Adán Cabello, Simone Severini, and Andreas Winter. Graph-theoretic approach to quantum correlations. Phys. Rev. Lett.,112(4):040401, 2014.

[105] Barbara Amaral and Marcelo Terra Cunha. On Graph Ap-proaches to Contextuality and their Role in Quantum Theory.Springer, Cham, Switzerland, 2018.

[106] Adán Cabello, Vincenzo D’Ambrosio, Eleonora Nagali, andFabio Sciarrino. Hybrid ququart-encoded quantum cryptogra-phy protected by kochen-specker contextuality. Physical Re-view A, 84:030302, Sep 2011.

[107] Kishor Bharti, Maharshi Ray, and Leong-Chuan Kwek. Non-classical correlations in n-cycle setting. Entropy, 21(2):134,2019.

[108] Robert Raussendorf. Contextuality in measurement-basedquantum computation. Physical Review A, 88(2):022322,2013.

[109] Kishor Bharti, Maharshi Ray, Antonios Varvitsiotis,Naqueeb Ahmad Warsi, Adán Cabello, and Leong-ChuanKwek. Robust self-testing of quantum systems via noncontex-tuality inequalities. Physical review letters, 122(25):250403,2019.

[110] Shane Mansfield and Elham Kashefi. Quantum advantagefrom sequential-transformation contextuality. Physical reviewletters, 121(23):230401, 2018.

[111] Debashis Saha, Paweł Horodecki, and Marcin Pawłowski.State independent contextuality advances one-way communi-cation. New Journal of Physics, 21(9):093057, 2019.

[112] Kishor Bharti, Maharshi Ray, Antonios Varvitsiotis, Adán Ca-bello, and Leong-Chuan Kwek. Local certification of pro-grammable quantum devices of arbitrary high dimensionality.arXiv preprint arXiv:1911.09448, 2019.

[113] Nicolas Delfosse, Philippe Allard Guerin, Jacob Bian, andRobert Raussendorf. Wigner function negativity and contex-tuality in quantum computation on rebits. Physical Review X,5(2):021003, 2015.

[114] Jaskaran Singh, Kishor Bharti, et al. Quantum key distributionprotocol based on contextuality monogamy. Physical ReviewA, 95(6):062333, 2017.

[115] Hakop Pashayan, Joel J Wallman, and Stephen D Bartlett.Estimating outcome probabilities of quantum circuits usingquasiprobabilities. Physical review letters, 115(7):070501,2015.

[116] Kishor Bharti, Atul Singh Arora, Leong Chuan Kwek, andJérémie Roland. A simple proof of uniqueness of the kcbsinequality. arXiv preprint arXiv:1811.05294, 2018.

[117] Juan Bermejo-Vega, Nicolas Delfosse, Dan E Browne, CihanOkay, and Robert Raussendorf. Contextuality as a resource formodels of quantum computation with qubits. Physical reviewletters, 119(12):120505, 2017.

[118] Lorenzo Catani and Dan E Browne. State-injection schemesof quantum computation in spekkens’ toy theory. PhysicalReview A, 98(5):052108, 2018.

[119] Atul Singh Arora, Kishor Bharti, et al. Revisiting the admis-sibility of non-contextual hidden variable models in quantummechanics. Physics Letters A, 383(9):833–837, 2019.

[120] Mark Howard, Joel Wallman, Victor Veitch, and Joseph Emer-son. Contextuality supplies the ’magic’ for quantum compu-tation. Nature, 510:351 EP –, Jun 2014.

[121] Samson Abramsky and Adam Brandenburger. The sheaf-theoretic structure of non-locality and contextuality. NewJournal of Physics, 13(11):113036, 2011.

[122] Antonio Acín, Tobias Fritz, Anthony Leverrier, and Ana BelénSainz. A combinatorial approach to nonlocality and contextu-ality. Communications in Mathematical Physics, 334(2):533–628, 2015.

[123] Ehtibar N Dzhafarov, Víctor H Cervantes, and Janne VKujala. Contextuality in canonical systems of randomvariables. Philosophical Transactions of the Royal Soci-ety A: Mathematical, Physical and Engineering Sciences,375(2106):20160389, 2017.

[124] Ehtibar N Dzhafarov. On joint distributions, counterfac-tual values and hidden variables in understanding contextu-ality. Philosophical Transactions of the Royal Society A,377(2157):20190144, 2019.

[125] Janne V Kujala and Ehtibar N Dzhafarov. Measures of con-textuality and non-contextuality. Philosophical Transactionsof the Royal Society A, 377(2157):20190149, 2019.

[126] Robert W Spekkens. Contextuality for preparations, trans-formations, and unsharp measurements. Physical Review A,71(5):052108, 2005.

[127] S. Kochen and E. P. Specker. The problem of hidden varialesin quantum mechanics. J. Math. Mech., 17:59–87, 1967.

[128] Marco Túlio Quintino, Tamás Vértesi, Daniel Cavalcanti,Remigiusz Augusiak, Maciej Demianowicz, Antonio Acín,and Nicolas Brunner. Inequivalence of entanglement, steer-ing, and bell nonlocality for general measurements. PhysicalReview A, 92(3):032107, 2015.

https://doi.org/10.1007/978-3-319-93827-1

17

[129] Howard M Wiseman, Steve James Jones, and Andrew CDoherty. Steering, entanglement, nonlocality, and theeinstein-podolsky-rosen paradox. Physical review letters,98(14):140402, 2007.

[130] Erwin Schrödinger. Discussion of probability relations be-tween separated systems. In Mathematical Proceedings of theCambridge Philosophical Society, volume 31, pages 555–563.Cambridge University Press, 1935.

[131] Samuraí Gomes de Aguiar Brito, B Amaral, and R Chaves.Quantifying bell nonlocality with the trace distance. PhysicalReview A, 97(2):022111, 2018.

[132] Armin Tavakoli, Paul Skrzypczyk, Daniel Cavalcanti, and An-tonio Acín. Nonlocal correlations in the star-network configu-ration. Physical Review A, 90(6):062109, 2014.

[133] Cyril Branciard, Nicolas Gisin, and Stefano Pironio. Char-acterizing the nonlocal correlations created via entanglementswapping. Physical review letters, 104(17):170401, 2010.

[134] Marc-Olivier Renou, Elisa Bäumer, Sadra Boreiri, NicolasBrunner, Nicolas Gisin, and Salman Beigi. Genuine quantumnonlocality in the triangle network. Physical review letters,123(14):140401, 2019.

[135] Nicolas Gisin. Entanglement 25 years after quantum telepor-tation: testing joint measurements in quantum networks. En-tropy, 21(3):325, 2019.

[136] Hiroki Saito and Masaya Kato. Machine learning technique tofind quantum many-body ground states of bosons on a lattice.Journal of the Physical Society of Japan, 87(1):014001, 2018.

[137] Xun Gao and Lu-Ming Duan. Efficient representation of quan-tum many-body states with deep neural networks. Nature com-munications, 8(1):1–6, 2017.

[138] Ulrich Schollwöck. The density-matrix renormalizationgroup. Reviews of modern physics, 77(1):259, 2005.

[139] Frank Verstraete, Valentin Murg, and J Ignacio Cirac. Matrixproduct states, projected entangled pair states, and variationalrenormalization group methods for quantum spin systems. Ad-vances in Physics, 57(2):143–224, 2008.

[140] Guifré Vidal. Class of quantum many-body states that can beefficiently simulated. Physical review letters, 101(11):110501,2008.

[141] Yue-Chi Ma and Man-Hong Yung. Transforming bell’s in-equalities into state classifiers with machine learning. npjQuantum Information, 4(1):1–10, 2018.

[142] Reinhard F Werner. Quantum states with einstein-podolsky-rosen correlations admitting a hidden-variable model. Physi-cal Review A, 40(8):4277, 1989.

[143] Cillian Harney, Stefano Pirandola, Alessandro Ferraro, andMauro Paternostro. Entanglement classification via neural net-work quantum states. arXiv preprint arXiv:1912.13207, 2019.

[144] Sirui Lu, Shilin Huang, Keren Li, Jun Li, Jianxin Chen,Dawei Lu, Zhengfeng Ji, Yi Shen, Duanlu Zhou, and BeiZeng. Separability-entanglement classifier via machine learn-ing. Phys. Rev. A, 98(1):012315, 2018.

[145] Caio BD Goes, Askery Canabarro, Eduardo I Duzzioni, andThiago O Maciel. Automated machine learning can clas-sify bound entangled states with tomograms. arXiv preprintarXiv:2001.08118, 2020.

[146] Steven Weinstein. Neural networks as" hidden" variable mod-els for quantum systems. arXiv preprint arXiv:1807.03910,2018.

[147] Steven Weinstein. Nonlocality without nonlocality. Founda-tions of Physics, 39(8):921–936, 2009.

[148] Maria Schuld and Francesco Petruccione. Supervised learningwith quantum computers, volume 17. Springer, 2018.

[149] Maria Schuld, Ilya Sinayskiy, and Francesco Petruccione. Anintroduction to quantum machine learning. ContemporaryPhysics, 56(2):172–185, 2015.

[150] Vedran Dunjko and Peter Wittek. A non-review of quantummachine learning: trends and explorations. Quantum Views,4:32, 2020.

[151] Alexey A Melnikov, Hendrik Poulsen Nautrup, Mario Krenn,Vedran Dunjko, Markus Tiersch, Anton Zeilinger, and Hans JBriegel. Active learning machine learns to create new quan-tum experiments. Proceedings of the National Academy ofSciences, 115(6):1221–1226, 2018.

[152] Mario Krenn, Mehul Malik, Robert Fickler, Radek Lap-kiewicz, and Anton Zeilinger. Automated search for newquantum experiments. Physical review letters, 116(9):090405,2016.

[153] Xuemei Gu, Mario Krenn, Manuel Erhard, and AntonZeilinger. Gouy phase radial mode sorter for light: Conceptsand experiments. Physical review letters, 120(10):103601,2018.

[154] Feiran Wang, Manuel Erhard, Amin Babazadeh, Mehul Malik,Mario Krenn, and Anton Zeilinger. Generation of the com-plete four-dimensional bell basis. Optica, 4(12):1462–1467,2017.

[155] Amin Babazadeh, Manuel Erhard, Feiran Wang, Mehul Malik,Rahman Nouroozi, Mario Krenn, and Anton Zeilinger. High-dimensional single-photon quantum gates: concepts and ex-periments. Physical review letters, 119(18):180510, 2017.

[156] Florian Schlederer, Mario Krenn, Robert Fickler, MehulMalik, and Anton Zeilinger. Cyclic transformation of or-bital angular momentum modes. New Journal of Physics,18(4):043019, 2016.

[157] Manuel Erhard, Mehul Malik, Mario Krenn, and AntonZeilinger. Experimental greenberger–horne–zeilinger entan-glement beyond qubits. Nature Photonics, 12(12):759–764,2018.

[158] Mehul Malik, Manuel Erhard, Marcus Huber, Mario Krenn,Robert Fickler, and Anton Zeilinger. Multi-photon entangle-ment in high dimensions. Nature Photonics, 10(4):248, 2016.

[159] Mladen Pavicic, Mordecai Waegell, Norman D Megill, andPK Aravind. Automated generation of kochen-specker sets.Scientific reports, 9(1):1–11, 2019.

[160] Jordi Tura, Remigiusz Augusiak, Ana Belén Sainz, TamasVértesi, Maciej Lewenstein, and Antonio Acín. Detect-ing nonlocality in many-body quantum states. Science,344(6189):1256–1258, 2014.

[161] Zoubin Ghahramani. Probabilistic machine learning and arti-ficial intelligence. Nature, 521(7553):452–459, 2015.

[162] Ll Masanes. Extremal quantum correlations for n parties withtwo dichotomic observables per site. arXiv preprint quant-ph/0512100, 2005.

[163] Le Phuc Thinh. Computing quantum bell inequalities. arXivpreprint arXiv:1909.05472, 2019.

[164] Davide Poderini, Rafael Chaves, Iris Agresti, Gonzalo Car-vacho, and Fabio Sciarrino. Exclusivity graph approach toinstrumental inequalities. arXiv preprint arXiv:1909.09120,2019.

[165] Ayse Pinar Saygin, Ilyas Cicekli, and Varol Akman. Turingtest: 50 years later. Minds and machines, 10(4):463–518,2000.

Documents

Machine Learning meets Quantum Foundations: A Brief Survey › pdf › 2003.11224.pdf · Machine Learning meets Quantum Foundations: A Brief Survey Kishor Bharti,1 Tobias Haug,1 Vlatko