9
Quantum computing models for artificial neural networks Stefano Mangini, 1, 2, * Francesco Tacchino, 3 Dario Gerace, 1 Daniele Bajoni, 4 and Chiara Macchiavello 1, 2, 5 1 Dipartimento di Fisica, Universit`a di Pavia, Via Bassi 6, I-27100, Pavia, Italy 2 INFN Sezione di Pavia, Via Bassi 6, I-27100, Pavia, Italy 3 IBM Quantum, IBM Research - Zurich, S¨ aumerstrasse 4, CH-8803 R¨ uschlikon, Switzerland 4 Dipartimento di Ingegneria Industriale e dell’Informazione, Universit` a di Pavia, Via Ferrata 1, 27100, Pavia, Italy 5 CNR-INO - Largo E. Fermi 6, I-50125, Firenze, Italy (Dated: May 21, 2021) Neural networks are computing models that have been leading progress in Machine Learning (ML) and Artificial Intelligence (AI) applications. In parallel, the first small scale quantum computing devices have become available in recent years, paving the way for the development of a new paradigm in information processing. Here we give an overview of the most recent proposals aimed at bring- ing together these ongoing revolutions, and particularly at implementing the key functionalities of artificial neural networks on quantum architectures. We highlight the exciting perspectives in this context, and discuss the potential role of near term quantum hardware in the quest for quantum machine learning advantage. I. INTRODUCTION Artificial Intelligence (AI) broadly refers to a wide set of algorithms that have shown in the last decade impres- sive and sometimes surprising successes in the analysis of large datasets [1]. These algorithms are often based on a computational model called Neural Network (NN), which belongs to the family of differential programming techniques. The simplest realization of a NN is in a “feedforward” configuration, mathematically described as a concate- nated application of affine transformations and element wise nonlinearities, f j (·)= σ j (w j · + b j ), where σ j is a nonlinear activation function, w j is a real matrix con- taining the so-called weights to be optimized, and b j is a bias vector, which is also to be optimized. The action of the NN is defined through repeated applications of this function to an input vector, x, leading to an output ˆ y = f L f L-1 ◦···◦ f 1 (x), where L denotes the number of layers in the NN architecture, as schematically shown in Fig. 1(a). The success of NNs as a computational model is primarily due to their trainability [2], i.e. the fact that the entries of the weight matrices can be adjusted to learn a given target task. The most common use case is super- vised learning, where the algorithm is asked to reproduce the associations between some inputs x i and the desired correct outputs (or label) y i . If properly trained, the NN should also be able to predict well on new data, i.e., data that were not used during training. This crucial property is called generalization, and it represents the key aspect of learning algorithms, which ultimately distinguish them from standard fitting techniques. Generalization encap- sulates the idea that the learning models discover hidden patterns and relations in the data, instead of applying pre-programmed procedures [1, 3, 4]. * [email protected] The size of the data and features elaborated by classi- cal NNs, and the number of parameters defining their structure, have been steadily increasing over the last years. These advancements have come at the expense of a dramatic rise in the demand for energy and compu- tational power [5] from the field of AI. In these regards, quantum computers may offer a viable route forward to continue such growth in the dimension of the treatable problems and, in parallel, enable completely new func- tionalities. In fact, quantum information is elaborated by exploiting superpositions in vector spaces (the so called Hilbert spaces) [6], and quantum processors are planned to soon surpass the computational capabilities of classical supercomputers. Moreover, genuinely quantum correla- tions represent by themselves a resource to be considered in building new forms of ML applications. With respect to classical computational paradigms, quantum computing comes with its own peculiarities. One of these is that quantum computation is intrinsi- cally linear. While this property could ensure, e.g., that an increase in computational power does not come with a parallel increase in energy costs, it also means that im- plementing the nonlinear functions forming the backbone of NNs is a non-trivial task. Also, information cannot be copied with arbitrary precision in quantum algorithms, meaning that tasks such as repeatedly addressing a vari- able are impossible when the latter is represented by a quantum state. As a consequence, classical NN algo- rithms cannot be simply ported to quantum platforms. In this article, we briefly review the main directions that have been taken through the years to develop ar- tificial NNs for quantum computing environments, the strategies used to overcome the possible difficulties, and the future prospects for the field. arXiv:2102.03879v2 [quant-ph] 20 May 2021

arXiv:2102.03879v1 [quant-ph] 7 Feb 2021repeated applications of similar operations (layers). Fi-nally, a measurement on the qubits is performed to in-fer some relevant information

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: arXiv:2102.03879v1 [quant-ph] 7 Feb 2021repeated applications of similar operations (layers). Fi-nally, a measurement on the qubits is performed to in-fer some relevant information

Quantum computing models for artificial neural networks

Stefano Mangini,1, 2, ∗ Francesco Tacchino,3 Dario Gerace,1 Daniele Bajoni,4 and Chiara Macchiavello1, 2, 5

1Dipartimento di Fisica, Universita di Pavia, Via Bassi 6, I-27100, Pavia, Italy2INFN Sezione di Pavia, Via Bassi 6, I-27100, Pavia, Italy

3IBM Quantum, IBM Research - Zurich, Saumerstrasse 4, CH-8803 Ruschlikon, Switzerland4Dipartimento di Ingegneria Industriale e dell’Informazione,

Universita di Pavia, Via Ferrata 1, 27100, Pavia, Italy5CNR-INO - Largo E. Fermi 6, I-50125, Firenze, Italy

(Dated: May 21, 2021)

Neural networks are computing models that have been leading progress in Machine Learning (ML)and Artificial Intelligence (AI) applications. In parallel, the first small scale quantum computingdevices have become available in recent years, paving the way for the development of a new paradigmin information processing. Here we give an overview of the most recent proposals aimed at bring-ing together these ongoing revolutions, and particularly at implementing the key functionalities ofartificial neural networks on quantum architectures. We highlight the exciting perspectives in thiscontext, and discuss the potential role of near term quantum hardware in the quest for quantummachine learning advantage.

I. INTRODUCTION

Artificial Intelligence (AI) broadly refers to a wide setof algorithms that have shown in the last decade impres-sive and sometimes surprising successes in the analysisof large datasets [1]. These algorithms are often basedon a computational model called Neural Network (NN),which belongs to the family of differential programmingtechniques.

The simplest realization of a NN is in a “feedforward”configuration, mathematically described as a concate-nated application of affine transformations and elementwise nonlinearities, fj(·) = σj(wj · + bj), where σj is anonlinear activation function, wj is a real matrix con-taining the so-called weights to be optimized, and bj isa bias vector, which is also to be optimized. The actionof the NN is defined through repeated applications ofthis function to an input vector, x, leading to an outputy = fL ◦fL−1 ◦· · ·◦f1(x), where L denotes the number oflayers in the NN architecture, as schematically shown inFig. 1(a). The success of NNs as a computational modelis primarily due to their trainability [2], i.e. the fact thatthe entries of the weight matrices can be adjusted to learna given target task. The most common use case is super-vised learning, where the algorithm is asked to reproducethe associations between some inputs xi and the desiredcorrect outputs (or label) yi. If properly trained, the NNshould also be able to predict well on new data, i.e., datathat were not used during training. This crucial propertyis called generalization, and it represents the key aspectof learning algorithms, which ultimately distinguish themfrom standard fitting techniques. Generalization encap-sulates the idea that the learning models discover hiddenpatterns and relations in the data, instead of applyingpre-programmed procedures [1, 3, 4].

[email protected]

The size of the data and features elaborated by classi-cal NNs, and the number of parameters defining theirstructure, have been steadily increasing over the lastyears. These advancements have come at the expenseof a dramatic rise in the demand for energy and compu-tational power [5] from the field of AI. In these regards,quantum computers may offer a viable route forward tocontinue such growth in the dimension of the treatableproblems and, in parallel, enable completely new func-tionalities. In fact, quantum information is elaborated byexploiting superpositions in vector spaces (the so calledHilbert spaces) [6], and quantum processors are plannedto soon surpass the computational capabilities of classicalsupercomputers. Moreover, genuinely quantum correla-tions represent by themselves a resource to be consideredin building new forms of ML applications.

With respect to classical computational paradigms,quantum computing comes with its own peculiarities.One of these is that quantum computation is intrinsi-cally linear. While this property could ensure, e.g., thatan increase in computational power does not come witha parallel increase in energy costs, it also means that im-plementing the nonlinear functions forming the backboneof NNs is a non-trivial task. Also, information cannot becopied with arbitrary precision in quantum algorithms,meaning that tasks such as repeatedly addressing a vari-able are impossible when the latter is represented by aquantum state. As a consequence, classical NN algo-rithms cannot be simply ported to quantum platforms.

In this article, we briefly review the main directionsthat have been taken through the years to develop ar-tificial NNs for quantum computing environments, thestrategies used to overcome the possible difficulties, andthe future prospects for the field.

arX

iv:2

102.

0387

9v2

[qu

ant-

ph]

20

May

202

1

Page 2: arXiv:2102.03879v1 [quant-ph] 7 Feb 2021repeated applications of similar operations (layers). Fi-nally, a measurement on the qubits is performed to in-fer some relevant information

2

Quantum Kernel Methods

Quantum Neural Networks

Quantum Convolutional Neural Networks

CL PL CL PL FCL

Dissipative Quantum Neural Networks

l=2

l=1

discard}

discard}

Classical Neural Networks

θInput layer Hidden layers

Quantum Perceptrons

Activation function

Ancilla

Encoding qubits

. . .

. . .

. . .

FIG. 1. Schematic representation of classical and quantum learning models. (a) Classical feedforward NNs processinformation through consecutive layers of nodes, or neurons. (b) A quantum neural network in the form of a variationalquantum circuit: the data re-uploading procedure (Si(x)) is explicitly illustrated, and non-parametrized gates (Wi) of Eq. 1are incorporated inside the variational unitaries (Ui(θi)). (c) A possible model for a quantum perceptron [7], which usesoperations Ui and Uw to load input data and perform multiplication by weights, and a final measurement is used to emulatethe neuron activation function. (d) Quantum Kernel Methods map input data to quantum states [8], and then evaluate theirinner product 〈φ(x)|φ(x′)〉 to build the Kernel matrix K(x,x′). (e) A Quantum Convolutional NN [9] consists of a repeatedapplication of Convolutional layers (CL) and Pooling layers (PL), ultimately followed by a fully-connected layer (FCL), i.e., ageneral unitary on all remaining qubits applied before a measurement (or else, O). (f) In a Dissipative QNN [10], qubits inone layer are coupled to ancillary qubits from the following layer in order to load the result of the computation, while previouslayers are discarded (i.e., dissipated).

II. QUANTUM NEURAL NETWORKS

In gate-based quantum computers, such as those re-alized with superconducting qubits, ion traps, spins insemiconductors, or photonic circuits [11], a quantumcomputation is fully described by its quantum circuitrepresentation, i.e., the arrangement of subsequent oper-ations (gates) applied to the qubits in the quantum pro-cessing unit. These gates can be parametrized by contin-uous variables, like a Pauli-X rotation, Rx(θ) = eiθX/2.A quantum circuit that makes use of parametrized gatesis known as a Parametrized Quantum Circuit (PQC) [12].Variational Quantum Algorithms (VQAs) [13–15] are alarge class of PQC-based protocols whose goal is toproperly tune the parameters in order to solve a targettask. For example, the task might be the minimiza-tion of the expectation value of an observable 〈O〉 =〈0|U(θ)†OU(θ)|0〉, where U(θ) is the unitary operatorrepresenting the quantum circuit parametrized by anglesθ = (θ1, θ2, . . .) and acting on a quantum register initial-ized in a reference state |0〉 = |0〉⊗n. VQAs are made

by three main constituents: data encoding, variationalansatz, and final measurements with a classical update ofthe parameters. The first stage accounts for loading in-put data (classical or quantum) into the quantum circuitusing quantum operations that depend on the given in-put; the second stage consists of a variational circuit witha specific structure, called ansatz, which often consists ofrepeated applications of similar operations (layers). Fi-nally, a measurement on the qubits is performed to in-fer some relevant information about the system. Giventhe outcome, the parameters in the circuits are updatedthrough a classical optimizer to minimize the cost func-tion defining the problem. Such algorithms generally re-quire limited resources to be executed on real quantumdevices, thus they are expected to take full advantage ofcurrent noisy quantum hardware [16, 17].

The definitions above highlight many similarities be-tween PQCs and classical NNs. In both models, infor-mation is processed through a sequence of parametrizedlayers, which are iteratively updated using a classicaloptimizer. The key difference lies in the way informa-

Page 3: arXiv:2102.03879v1 [quant-ph] 7 Feb 2021repeated applications of similar operations (layers). Fi-nally, a measurement on the qubits is performed to in-fer some relevant information

3

tion is processed. Quantum computers promise to possi-bly achieve some form of computational advantage, i.e.,speedups or better performances [18], due to inherentlyquantum effects not available in classical learning scenar-ios 1 . While many different definitions of Quantum Neu-ral Networks (QNN) are possible, the most general andcurrently accepted one is that of a parametrized quan-tum circuit whose structure is directly inspired by classi-cal neural networks. Thus, we formally define a QNN asa PQC whose variational ansatz contains multiple repe-titions of self-similar layers of operations, that is:

UQNN(θ) =

1∏i=L

Ui(θi)Wi

= UL(θL)WL · · ·U1(θ1)W1 , (1)

where Ui(θi) are variational gates, Wi are (typically en-tangling) fixed operations that do not depend on any pa-rameter, and L is the number of layers (see Fig. 1(b) fora schematic picture). Depending on the problem underinvestigation and the connectivity of the quantum device(i.e., related to the native gate set available [11]), variousrealizations exist for both the entangling structure andthe parametrized gates [15]. Note that this definition al-most coincides with that of VQAs, and in fact the termsVQAs and QNNs are often used interchangeably.

While being rather general, a growing body of researchhas recently been highlighting that such a structure, inits naive realization, may be poorly expressive, in thesense that it can only represent a few functions — actu-ally, mostly sine functions — of the input data [22–26].A milestone of classical learning theory, the UniversalApproximation Theorem [27], guarantees that classicalNNs are powerful enough to approximate any functionf(x) of the input data x. Interestingly, it was shownthat a similar result also holds for QNNs, provided thatinput data are loaded into the quantum circuit multi-ple times throughout the computation. Intuitively, suchdata re-uploading can be seen as a necessary feature tocounterbalance the effect of the no-cloning theorem ofquantum mechanics [6]. In fact, while in classical NNsthe output of a neuron is copied and transferred to everyneuron in the following layer, this is impossible in thequantum regime, thus it is necessary to explicitly intro-duce some degree of classical redundancy. As a rule ofthumb, the more often data are re-uploaded during thecomputation, the more expressive the quantum learningmodel becomes, as it is able to represent higher order

1 Earlier works on quantum generalizations of classical machinelearning focused on the use of quantum computers to performlinear algebra exponentially faster compared to classical comput-ers [18]. However, due to their unsuitability for NISQ devices andcomparable performances with new classical quantum-inspiredmethods [19–21], interest in this field is declining, with most ofthe research now redirected towards parametrized quantum cir-cuits as quantum counterparts of classical learning models.

features of the data. Thus, for a more effective con-struction, each layer in Eq. 1 should be replaced withUiWi → S(x)UiWi, where S(x) denotes the data en-coding procedure mapping input x to its correspondingquantum state.

This formulation still leaves a lot of freedom in choos-ing how the encoding and the variational circuit are im-plemented [28, 29]. However, as it happens in the clas-sical domain, also different architectures for QNNs exist,and in the following, we give a brief overview of some ofthe ones that received more attention.

1. Quantum Perceptron Models

Early works on quantum models for single artificialneurons (known as perceptrons) were focused on howto reproduce the non-linearities of classical perceptronswith quantum systems, which, on the contrary, obey lin-ear unitary evolutions [30–33]. In more recent proposals(e.g., Fig. 1(c)), quantum perceptrons were shown to becapable of carrying out simple pattern recognition tasksfor binary [7] and gray scale [34] images, possibly employ-ing variational unsampling techniques for training and tofind adaptive quantum circuit implementations [35]. Byconnecting multiple quantum neurons, models for quan-tum artificial neural networks were also realized on proof-of-principle experiments on superconducting devices [36].

2. Kernel methods

Directly inspired by their classical counterparts, themain idea of kernel methods is to map input data into ahigher dimensional space, and leverage this transforma-tion to perform classification tasks otherwise unfeasiblein the original, low dimensional representation. Typi-cally, any pair of data vectors, x,x′ ∈ X , is mapped intoquantum states, |φ(x)〉, |φ(x′)〉, by means of an encodingmapping φ : X → H, where H denotes the Hilbert spaceof the quantum register. The inner product 〈φ(x)|φ(x′)〉,evaluated in an exponentially large Hilbert space, definesa similarity measure that can be used for classificationpurposes. The matrix Kij = 〈φ(xi)|φ(xj)〉, constructedover the training dataset, is called kernel, and can beused along with classical methods such as Support Vec-tor Machines (SVMs) to carry out classification [8, 37] orregression tasks, as schematically displayed in Fig. 1(d).A particularly promising aspect of such quantum SVMsis the possibility of constructing kernel functions that arehard to compute classically, thus potentially leading toquantum advantage in classification.

3. Generative Models

A generative model is a probabilistic algorithm whosegoal is to reproduce an unknown probability distribution

Page 4: arXiv:2102.03879v1 [quant-ph] 7 Feb 2021repeated applications of similar operations (layers). Fi-nally, a measurement on the qubits is performed to in-fer some relevant information

4

pX over a space X , given a training set, T = {xi|xi ∼pX , i = 1, . . . ,M}. A common classical approach tothis task leverages probabilistic NNs called BoltzmannMachines (BMs) [2, 38], which are physically inspiredmethods resembling Ising models. Upon changing classi-cal nodes with qubits, and substituting the energy func-tion with a quantum Hamiltonian, it is possible to defineQuantum Boltzmann Machines (QBMs) [39–41], dedi-cated to the generative learning of quantum and classicaldistributions. On a parallel side, a novel and surprisinglyeffective classical tool for generative modeling is rep-resented by Generative Adversarial Networks (GANs).Here, the burden of the generative process is offloadedonto two separate neural networks called generator anddiscriminator, respectively, which are programmed tocompete against each other. Through a careful and it-erative learning routine, the generator learns to producehigh-quality new examples, eventually fooling the dis-criminator. A straightforward translation to the quan-tum domain led to the formulation of Quantum GANs(QGANs) models [42–44], where the generator and dis-criminator consist of QNNs. Even if not inspired by anyclassical neural network model, it is also worth mention-ing Born machines [45], where the probability of generat-ing new data follows the inherently probabilistic natureof quantum mechanics represented by Born’s rule.

4. Quantum Convolutional Neural Networks

As the name suggests, these models are inspired bythe corresponding classical counterpart, nowadays ubiq-uitous in the field of image processing. In these net-works, schematically shown in Fig. 1(e), inputs are pro-cessed through a repeated series of so-called convolutionaland pooling layers. The former applies a convolution ofthe input—i.e., a quasi-local transformation in a trans-lationally invariant manner— to extract some relevantinformation, the latter applies a compression of such in-formation in a lower dimensional representation. Aftermany iterations, the network outputs an informative andlow dimensional enough representation that can be ana-lyzed with standard techniques, such as fully connectedfeedforward NNs. Similarly, in Quantum ConvolutionalNNs [9, 46] a quantum state first undergoes a convolu-tion layer, consisting of a parametrized unitary opera-tion acting on individual subsets of the qubits, and thena pooling procedure obtained by measuring some of thequbits. This procedure is repeated until only a few qubitsare left, where all relevant information was stored. Thismethod proved particularly useful for genuinely quantumtasks such as quantum phase recognition and quantumerror correction. Moreover, the use of only logarithmi-cally many parameters with respect to the number ofqubits is particularly important to achieve efficient train-ing and effective implementations on near-term devices.

5. Quantum Dissipative Neural Networks

Dissipative QNNs, as proposed in [10], consist of mod-els where each qubit represents a node, and edges con-necting qubits in different layers represent general uni-tary operations acting on them. The use of the termdissipative is related to the progressive discarding of lay-ers of qubits after a given layer has interacted with thefollowing one (see, e.g., Fig. 1(f) for a schematic illus-tration). This model can implement and learn generalquantum transformations, carrying out a universal quan-tum computation. Endowed with a backpropagation-liketraining algorithm, it represents a possible route towardsthe realization of quantum analogues of deep neural net-works for analyzing quantum data.

III. TRAINABILITY OF QNN

For decades, AI methods remained merely academictools, due to the lack of computational power and propertraining routines, which made the use of learning algo-rithms practically unfeasible on available devices. Thisunderlines a very pivotal problem of all learning models,i.e., the ability to efficiently train them.

1. Barren plateaus

Unfortunately, also quantum machine learning mod-els were shown to suffer from serious trainability issues,which are commonly referred to as barren plateaus[47].In particular, using a parametrization as in Eq. (1), andassuming that such architecture forms a so-called unitary2-design 2 , it can be shown that the derivative of any costfunction of the form C(θ) = 〈0|UQNN(θ)†OUQNN(θ)|0〉with respect to any of its parameters θk, yields 〈∂kC〉 =0, and Var[∂kC] ≈ 2−n, where n is the number of qubits.This means that, on average, there will be no clear di-rection of optimization in the cost function landscapeupon random initialization of the parameters in a largeenough circuit, being it essentially flat everywhere, ex-cept in close proximity to the minima. Moreover, barrenplateaus were shown to be strongly dependent on thenature of the cost function driving the optimization rou-tine [48]. In particular, while the use of a global costfunction always hinders the trainability of the model (in-dependently of the depth of the quantum circuit), a lo-cal cost function allows for an efficient trainability forcircuit depths of order O(log n). Besides, also entangle-ment plays a major role in hindering the trainability of

2 Roughly speaking, it is sufficiently random that sampling over itsdistribution matches the Haar distribution — a generalization ofthe uniform distribution in the space of unitary matrices — upto the second moment.

Page 5: arXiv:2102.03879v1 [quant-ph] 7 Feb 2021repeated applications of similar operations (layers). Fi-nally, a measurement on the qubits is performed to in-fer some relevant information

5

QNNs. In fact, it was shown that whenever the objectivecost function is evaluated only on a small subset of avail-able qubits (visible units), with the remaining ones beingtraced out (hidden units), a barren plateau in the opti-mization landscape may occur [49]. This arises from thepresence of entanglement, which tends to spread informa-tion across the whole network instead of its single parts.Moreover, also the presence of hardware noise during theexecution of the quantum circuit can itself be the reasonfor the emergence of noise-induced barren plateaus [50].

It is worth noticing that most of these results stemfrom strong assumptions on the shape of the variationalansatz, for example, its ability to approximate an exact 2-design. While these assumptions are often well-matchedby proposed quantum learning models, QNNs should beanalyzed case-by-case to have precise statements abouttheir trainability. As an example, dissipative QNNs wereshown to suffer from barren plateaus [51] when globalunitaries are used, while convolutional QNNs seem tobe immune to the barren plateaus phenomena [52]. Ingeneral terms, however, recent results indicate that thetrainability of quantum models is closely related to theirexpressivity, as measured by their distance from beingan exact 2-design. Highly expressive ansatzes generallyexhibit flatter landscapes and will be harder to train [53].At last, it was recently argued that the training of anyVQA is by itself an NP-hard problem [54], thus being in-trinsically difficult to tackle. Despite their intrinsic the-oretical value, these results should not represent a severethreat, as similar conclusions also hold in general for clas-sical NNs, particularly in worst-case analysis, and did notprevent their successful application in many specific usecases.

2. Avoiding plateaus

Several strategies have been proposed to avoid, or atleast mitigate, the emergence of barren plateaus. A possi-ble idea is to initialize and train parameters in batches, ina procedure known as layerwise learning [55]. Similarly,another approach is to reduce the effective depth of thecircuit by randomly initializing only a subset of the totalparameters, with the others chosen such that the over-all circuit implements the identity operation [56]. Intro-ducing correlations between the parameters in multiplelayers of the QNN, thus reducing the overall dimension-ality of the parameter space [57], has also been proposed.Along a different direction, leveraging classical recurrentNNs to find good parameter initialization heuristics, suchthat the network starts close to a minimum, has alsoproven effective [58]. Finally, as previously mentioned, itis worth reminding that the choice of a local cost func-tion and of specific entanglement scaling between hiddenand visible units is of key importance to ultimately avoidbarren plateaus [48].

3. Optimization routines

The most common training routine in classical NNs isthe backpropagation algorithm [1], which combines thechain rule for derivatives and stored values from inter-mediate layers to efficiently compute gradients. However,such a strategy cannot be straightforwardly generalizedto the quantum case, because intermediate values from aquantum computation can only be accessed via measure-ments, which disturb the relevant quantum states [59].For this reason, the typical strategy is to use classicaloptimizers leveraging gradient-based or gradient-free nu-merical methods. However, there exist scenarios wherethe quantum nature of the optimization can be taken intoaccount, leading to strategies tailored to the quantum do-main like the parameter shift rule [25, 59], the QuantumNatural Gradient [60], and closed update formulas [26].Finally, while a fully coherent quantum update of theparameters is certainly appealing [61], this remains outof reach for near-term hardware, which is limited in thenumber of available qubits and circuit depth [11].

IV. QUANTUM ADVANTAGE

Quantum computers are expected to become morepowerful than their classical counterparts on many spe-cific applications [62–64], and particularly in samplingfrom complex probability distributions — a featureparticularly relevant for generative modeling applica-tions [65, 66]. It is therefore natural to ask whether thishierarchy also applies to learning models. While a posi-tive answer is widely believed to be true, clear indicationson how to achieve such a quantum advantage are still un-der study, with only a few results in this direction.

1. Power of Quantum Neural Networks

In the previous sections, we used the term expressiv-ity quite loosely. In particular, we used it with twodifferent meanings: the first to indicate the ability ofQNNs to approximate any function of the inputs [23],the second to indicate their distance from being an ex-act 2-design, i.e. the ability to span uniformly the spaceof quantum states [53]. Each definition conveys a par-ticular facet of an intuitive sense of power, highlightingthe need for a unifying measure of such power yet to bediscovered. Nevertheless, some initial results towards athorough analysis of the capacity (or power) of quantumneural networks have recently been put forward [67–69].

Starting from the Vapnik–Chervonenkis (VC) dimen-sion, which is a classical measure of richness of a learningmodel related to the maximum number of independentclassifications that the model can implement, a univer-sal metric of capacity related to the maximum informa-tion that can be stored in the parameters of a learningmodel, be it classical or quantum, was defined [67]. Based

Page 6: arXiv:2102.03879v1 [quant-ph] 7 Feb 2021repeated applications of similar operations (layers). Fi-nally, a measurement on the qubits is performed to in-fer some relevant information

6

on this definition, it was claimed that quantum learningmodels have the same overall capacity as any classicalmodel having the same number of weights. However, asargued in Ref. [68], measures of capacity like the VC-dimension can become vacuous and difficult — if notcompletely impractical — to be measured, and there-fore cannot be used as a general faithful measure of theactual power of learning models. For this reason, theuse of effective dimension as a measure of power is pro-posed, motivated by information-theoretical standpoints.This measure can be linked to generalization bounds ofthe learning models, leading to the result that quantumneural networks can be more expressive and efficient dur-ing training than their classical counterparts, thanks to amore evenly spread spectrum of the Fisher Information.Some evidence was provided through the analysis of twoparticular QNNs instances: one having a trivial inputdata encoding, and the other using the encoding schemeproposed in [8], which is thought to be classically hardto simulate and indeed shows remarkable performances.Despite such specificity, this is one of the first resultsdrawing a clear separation between quantum and classi-cal neural networks, highlighting the great potentials ofthe former.

2. The role of data

In the same spirit, but with different tools, the role ofdata in analyzing the performances of quantum machinelearning algorithms was taken into account [69], by devis-ing geometric tests aimed at clearly identify those scenar-ios where quantum models can achieve an advantage. Inparticular, focusing on kernel methods, a geometric mea-sure based exclusively on the training dataset and thekernel function induced by the learning model, which isactually independent of the actual classification/functionto be learned, was defined. If such quantity is similar forthe classical and quantum learning case, then classicalmodels are proved to perform similarly or better thanquantum models. However, if this geometric distance islarge, then there exists some dataset where the quantummodel can outperform the classical one. Notably, one keyresult of the study is that poor performances of quantummodels could be attributed precisely to the careless en-coding of data into exponentially large quantum Hilbertspaces, spreading them too far apart from each other andresulting in the kernel matrix approximating the identity.To avoid this setback, authors propose the idea of pro-jected quantum kernels, which aim at reducing the effec-tive dimension of the encoded data set while keeping theadvantages of quantum correlations. On engineered datasets, this procedure was shown to outperform all testedclassical models in prediction error. From a higher per-spective, the most important handout of these results isthat data themselves play a major role in learning mod-els, both quantum and classical.

3. Quantum data for quantum neural networks

Some data set could be particularly suited for QNNs.An example is provided by those proposed in [70] andbased on the discrete logarithm problem. Also, it isnot hard to suppose that quantum machine learning willprove particularly useful when analyzing data that areinherently quantum, for example coming from quantumsensors, or obtained from quantum chemistry simula-tions of condensed matter physics. However, even ifthis sounds theoretically appealing, no clear separationhas yet been found in this respect. A particularly rel-evant open problem concerns, for example, the identi-fication and generation of interesting quantum dataset.Moreover, based on information-theoretic derivations inthe quantum communication formalism, it was recentlyshown [71] that a classical learner with access to quantumdata coming from quantum experiments performs simi-larly to quantum models with respect to sample complex-ity and average prediction error. Even if an exponentialadvantage is still attainable if the goal is to minimize theworst-case prediction error, classical methods are againfound to be surprisingly powerful when given access evento quantum data.

4. Quantum-inspired methods

When discussing the effectiveness of quantum neuralnetworks, and in order to achieve a true quantum ad-vantage, one should compare not only to previous quan-tum models but also to the best available classical alter-natives. For this reason, and until fault-tolerant quan-tum computation is achieved, quantum-inspired proce-dures on classical computers could pave the way towardsthe discovery of effective solutions [19–21]. One of themost prominent examples is represented by tensor net-works [72], initially developed for the study of quantummany-body systems, which show remarkably good per-formances for the simulation of quantum computers inspecific regimes [73]. In fact, many proposals for quan-tum neural networks are inspired by or can be directlyrephrased in the language of tensor networks [9, 46], sothat libraries developed for those applications could beused to run some instances of these quantum algorithms.Fierce competition is therefore to be expected in thecoming years between quantum processors and advancedclassical methods, including the so called neural networkquantum states [74, 75] for the study many-body physics.

V. OUTLOOKS

Quantum machine learning techniques, and partic-ularly quantum neural network models, hold promiseto significantly increase our computational capabilities,unlocking whole new areas of investigation. The maintheoretical and practical challenges currently encompass

Page 7: arXiv:2102.03879v1 [quant-ph] 7 Feb 2021repeated applications of similar operations (layers). Fi-nally, a measurement on the qubits is performed to in-fer some relevant information

7

trainability, the identification and suitable treatmentof appropriate data sets, and a systematic comparisonwith classical counterparts. Moreover, with steadyprogress on the hardware side, significant attention willbe devoted in the near future towards experimentaldemonstrations of practical relevance. Research evolvesrapidly, with seminal results highlighting prominentdirections to explore in the quest for quantum compu-tational advantage in AI applications, and new excitingresults are expected in the years to come.

VI. ACKNOWLEDGMENTS

This research was partly supported by the Italian Min-istry of Education, University and Research (MIUR)through the “Dipartimenti di Eccellenza Program (2018-2022)”, Department of Physics, University of Pavia,and the PRIN-2017 project 2017P9FJBS “INPhoPOL”,and by the EU H2020 QuantERA ERA-NET Cofund inQuantum Technologies project QuICHE.

[1] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learn-ing (MIT Press, 2016).

[2] G. E. Hinton, S. Osindero, and Y.-W. Teh, A fast learningalgorithm for deep belief nets, Neural Comput. 18, 1527(2006).

[3] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Na-ture 521, 436–444 (2015).

[4] T. Hastie, R. Tibshirani, and J. Friedman, The Elementsof Statistical Learning: Data Mining, Inference, and Pre-diction, Springer series in statistics (Springer, 2009).

[5] A. S. G. Andrae and T. Edler, On global electricity us-age of communication technology: Trends to 2030, Chal-lenges 6, 117 (2015).

[6] M. A. Nielsen and I. L. Chuang, Quantum computationand quantum information, 10th ed. (Cambridge Univer-sity Press, Cambridge, UK, 2010).

[7] F. Tacchino, C. Macchiavello, D. Gerace, and D. Bajoni,An artificial neuron implemented on an actual quantumprocessor, npj Quantum Inf. 5, 26 (2019).

[8] V. Havlıcek, A. D. Corcoles, K. Temme, A. W. Harrow,A. Kandala, J. M. Chow, and J. M. Gambetta, Super-vised learning with quantum-enhanced feature spaces,Nature 567, 209–212 (2019).

[9] I. Cong, S. Choi, and M. D. Lukin, Quantum convolu-tional neural networks, Nat. Phys. 15, 1273–1278 (2019).

[10] K. Beer, D. Bondarenko, T. Farrelly, T. J. Osborne,R. Salzmann, D. Scheiermann, and R. Wolf, Trainingdeep quantum neural networks, Nat. Commun. 11, 808(2020).

[11] F. Tacchino, A. Chiesa, S. Carretta, and D. Gerace,Quantum computers as universal quantum simulators:State-of-the-art and perspectives, Adv. Quantum Tech-nol. 3, 1900052 (2020).

[12] M. Benedetti, E. Lloyd, S. Sack, and M. Fiorentini, Pa-rameterized quantum circuits as machine learning mod-els, Quantum Sci. Technol. 4, 043001 (2019).

[13] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-Guzik, The theory of variational hybrid quantum-classical algorithms, New J. Phys. 18, 023023 (2016).

[14] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q.Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’Brien,A variational eigenvalue solver on a photonic quantumprocessor, Nat. Commun. 5, 4213 (2014).

[15] M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin,S. Endo, K. Fujii, J. R. McClean, K. Mitarai, X. Yuan,L. Cincio, and P. J. Coles, Variational quantum algo-rithms (2020), arXiv:2012.09265 [quant-ph].

[16] J. Preskill, Quantum Computing in the NISQ era andbeyond, Quantum 2, 79 (2018).

[17] K. Bharti, A. Cervera-Lierta, T. H. Kyaw, T. Haug,S. Alperin-Lea, A. Anand, M. Degroote, H. Heimonen,J. S. Kottmann, T. Menke, W.-K. Mok, S. Sim, L.-C.Kwek, and A. Aspuru-Guzik, Noisy intermediate-scalequantum (nisq) algorithms (2021), arXiv:2101.08448[quant-ph].

[18] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost,N. Wiebe, and S. Lloyd, Quantum machine learning, Na-ture 549, 195–202 (2017).

[19] S. Aaronson, Read the fine print, Nat. Phys. 11, 291–293(2015).

[20] E. Tang, A quantum-inspired classical algorithm for rec-ommendation systems, in Proceedings of the 51st An-nual ACM SIGACT Symposium on Theory of Comput-ing, STOC 2019 (Association for Computing Machinery,New York, NY, USA, 2019) p. 217–228.

[21] J. M. Arrazola, A. Delgado, B. R. Bardhan, and S. Lloyd,Quantum-inspired algorithms in practice, Quantum 4,307 (2020).

[22] F. J. Gil Vidal and D. O. Theis, Input redundancy for pa-rameterized quantum circuits, Front. Phys. 8, 297 (2020).

[23] M. Schuld, R. Sweke, and J. J. Meyer, Effect of data en-coding on the expressive power of variational quantum-machine-learning models, Phys. Rev. A 103, 032430(2021).

[24] A. Perez-Salinas, A. Cervera-Lierta, E. Gil-Fuster, andJ. I. Latorre, Data re-uploading for a universal quantumclassifier, Quantum 4, 226 (2020).

[25] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii, Quan-tum circuit learning, Phys. Rev. A 98, 032309 (2018).

[26] M. Ostaszewski, E. Grant, and M. Benedetti, Structureoptimization for parameterized quantum circuits, Quan-tum 5, 391 (2021).

[27] K. Hornik, Approximation capabilities of multilayer feed-forward networks, Neural Netw. 4, 251 (1991).

[28] R. LaRose and B. Coyle, Robust data encodings for quan-tum classifiers, Phys. Rev. A 102, 032420 (2020).

[29] S. Lloyd, M. Schuld, A. Ijaz, J. Izaac, and N. Killoran,(2020), arXiv:2001.03622 [quant-ph].

[30] M. Schuld, I. Sinayskiy, and F. Petruccione, The questfor a quantum neural network, Quantum Inf. Process. 13,2567–2586 (2014).

[31] M. Schuld, I. Sinayskiy, and F. Petruccione, Simulating aperceptron on a quantum computer, Phys. Lett. A 379,660 (2015).

Page 8: arXiv:2102.03879v1 [quant-ph] 7 Feb 2021repeated applications of similar operations (layers). Fi-nally, a measurement on the qubits is performed to in-fer some relevant information

8

[32] Y. Cao, G. G. Guerreschi, and A. Aspuru-Guzik,Quantum neuron: an elementary building blockfor machine learning on quantum computers (2017),arXiv:1711.11240 [quant-ph].

[33] E. Torrontegui and J. J. Garcia-Ripoll, Unitary quantumperceptron as efficient universal approximator, EPL 125,30004 (2019).

[34] S. Mangini, F. Tacchino, D. Gerace, C. Macchiavello,and D. Bajoni, Quantum computing model of an artifi-cial neuron with continuously valued input data, Mach.Learn.: Sci. Technol. 1, 045008 (2020).

[35] F. Tacchino, S. Mangini, P. K. Barkoutsos, C. Macchi-avello, D. Gerace, I. Tavernelli, and D. Bajoni, Varia-tional learning for quantum artificial neural networks,IEEE Transactions on Quantum Engineering 2, 1 (2021).

[36] F. Tacchino, P. Barkoutsos, C. Macchiavello, I. Taver-nelli, D. Gerace, and D. Bajoni, Quantum implementa-tion of an artificial feed-forward neural network, Quan-tum Sci. Technol. 5, 044010 (2020).

[37] M. Schuld and N. Killoran, Quantum machine learningin feature hilbert spaces, Phys. Rev. Lett. 122, 040504(2019).

[38] D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, A learn-ing algorithm for boltzmann machines, Cogn. Sci. 9, 147(1985).

[39] M. H. Amin, E. Andriyash, J. Rolfe, B. Kulchytskyy, andR. Melko, Quantum boltzmann machine, Phys. Rev. X8, 021050 (2018).

[40] C. Zoufal, A. Lucchi, and S. Woerner, Variational quan-tum Boltzmann machines, Quantum Machine Intelli-gence 3, 7 (2021).

[41] M. Kieferova and N. Wiebe, Tomography and generativetraining with quantum boltzmann machines, Phys. Rev.A 96, 062327 (2017).

[42] C. Zoufal, A. Lucchi, and S. Woerner, Quantum genera-tive adversarial networks for learning and loading randomdistributions, npj Quantum Inf. 5, 103 (2019).

[43] S. Lloyd and C. Weedbrook, Quantum generative adver-sarial learning, Phys. Rev. Lett. 121, 040502 (2018).

[44] P.-L. Dallaire-Demers and N. Killoran, Quantum gen-erative adversarial networks, Phys. Rev. A 98, 012324(2018).

[45] B. Coyle, D. Mills, V. Danos, and E. Kashefi, The bornsupremacy: quantum advantage and training of an isingborn machine, npj Quantum Inf. 6, 60 (2020).

[46] E. Grant, M. Benedetti, S. Cao, A. Hallam, J. Lockhart,V. Stojevic, A. G. Green, and S. Severini, Hierarchicalquantum classifiers, npj Quantum Inf. 4, 65 (2018).

[47] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Babbush,and H. Neven, Barren plateaus in quantum neural net-work training landscapes, Nat. Commun. 9, 4812 (2018).

[48] M. Cerezo, A. Sone, T. Volkoff, L. Cincio, and P. J.Coles, Cost function dependent barren plateaus in shal-low parametrized quantum circuits, Nature Communica-tions 12, 1791 (2021).

[49] C. O. Marrero, M. Kieferova, and N. Wiebe, Entangle-ment induced barren plateaus (2020), arXiv:2010.15968[quant-ph].

[50] S. Wang, E. Fontana, M. Cerezo, K. Sharma, A. Sone,L. Cincio, and P. J. Coles, Noise-induced barrenplateaus in variational quantum algorithms (2020),arXiv:2007.14384 [quant-ph].

[51] K. Sharma, M. Cerezo, L. Cincio, and P. J. Coles, Train-ability of dissipative perceptron-based quantum neural

networks (2020), arXiv:2005.12458 [quant-ph].[52] A. Pesah, M. Cerezo, S. Wang, T. Volkoff, A. T. Sorn-

borger, and P. J. Coles, Absence of barren plateausin quantum convolutional neural networks (2020),arXiv:2011.02966 [quant-ph].

[53] Z. Holmes, K. Sharma, M. Cerezo, and P. J. Coles, Con-necting ansatz expressibility to gradient magnitudes andbarren plateaus (2021), arXiv:2101.02138 [quant-ph].

[54] L. Bittel and M. Kliesch, Training variational quan-tum algorithms is np-hard – even for logarithmi-cally many qubits and free fermionic systems (2021),arXiv:2101.07267 [quant-ph].

[55] A. Skolik, J. R. McClean, M. Mohseni, P. van der Smagt,and M. Leib, Layerwise learning for quantum neural net-works (2020), arXiv:2006.14904 [quant-ph].

[56] E. Grant, L. Wossnig, M. Ostaszewski, and M. Benedetti,An initialization strategy for addressing barren plateausin parametrized quantum circuits, Quantum 3, 214(2019).

[57] T. Volkoff and P. J. Coles, Large gradients via correlationin random parameterized circuits, Quantum Sci. Technol.6 (2021).

[58] G. Verdon, M. Broughton, J. R. McClean, K. J. Sung,R. Babbush, Z. Jiang, H. Neven, and M. Mohseni, Learn-ing to learn with quantum neural networks via classicalneural networks (2019), arXiv:1907.05415 [quant-ph].

[59] M. Schuld, V. Bergholm, C. Gogolin, J. Izaac, and N. Kil-loran, Evaluating analytic gradients on quantum hard-ware, Phys. Rev. A 99, 032331 (2019).

[60] J. Stokes, J. Izaac, N. Killoran, and G. Carleo, QuantumNatural Gradient, Quantum 4, 269 (2020).

[61] G. Verdon, J. Pye, and M. Broughton, A universaltraining algorithm for quantum deep learning (2018),arXiv:1806.09729 [quant-ph].

[62] A. W. Harrow and A. Montanaro, Quantum computa-tional supremacy, Nature 549, 203–209 (2017).

[63] F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin,R. Barends, R. Biswas, S. Boixo, F. G. S. L. Bran-dao, and D. A. e. Buell, Quantum supremacy using aprogrammable superconducting processor, Nature 574,505–510 (2019).

[64] H.-S. Zhong, H. Wang, Y.-H. Deng, M.-C. Chen, L.-C.Peng, Y.-H. Luo, J. Qin, D. Wu, X. Ding, Y. Hu, P. Hu,X.-Y. Yang, W.-J. Zhang, H. Li, Y. Li, X. Jiang, L. Gan,G. Yang, L. You, Z. Wang, L. Li, N.-L. Liu, C.-Y. Lu,and J.-W. Pan, Quantum computational advantage usingphotons, Science 370, 1460 (2020).

[65] R. Sweke, J.-P. Seifert, D. Hangleiter, and J. Eisert, Onthe Quantum versus Classical Learnability of DiscreteDistributions, Quantum 5, 417 (2021).

[66] Y. Du, M.-H. Hsieh, T. Liu, and D. Tao, Expressivepower of parametrized quantum circuits, Phys. Rev. Re-search 2, 033125 (2020).

[67] L. G. Wright and P. L. McMahon, The capacity of quan-tum neural networks (2019), arXiv:1908.01364 [quant-ph].

[68] A. Abbas, D. Sutter, C. Zoufal, A. Lucchi, A. Figalli,and S. Woerner, The power of quantum neural networks(2020), arXiv:2011.00027 [quant-ph].

[69] H.-Y. Huang, M. Broughton, M. Mohseni, R. Babbush,S. Boixo, H. Neven, and J. R. McClean, Power of data inquantum machine learning, Nature Communications 12,2631 (2021).

Page 9: arXiv:2102.03879v1 [quant-ph] 7 Feb 2021repeated applications of similar operations (layers). Fi-nally, a measurement on the qubits is performed to in-fer some relevant information

9

[70] Y. Liu, S. Arunachalam, and K. Temme, A rigorous androbust quantum speed-up in supervised machine learning(2020), arXiv:2010.02174 [quant-ph].

[71] H.-Y. Huang, R. Kueng, and J. Preskill, Information-theoretic bounds on quantum advantage in machinelearning, Phys. Rev. Lett. 126, 190505 (2021).

[72] S. Montangero, Introduction to Tensor Network Methods(Springer Nature Switzerland AG, Cham, CH, 2018).

[73] Y. Zhou, E. M. Stoudenmire, and X. Waintal, What lim-its the simulation of quantum computers?, Phys. Rev. X

10, 041038 (2020).[74] G. Carleo and M. Troyer, Solving the quantum many-

body problem with artificial neural networks, Science355, 602 (2017).

[75] G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer,R. Melko, and G. Carleo, Neural-network quantum statetomography, Nat. Phys. 14, 447 (2018).