14
Relative Entropy and Catalytic Relative Majorization Soorya Rethinasamy 1, 2 and Mark M. Wilde 2, 3 1 Birla Institute of Technology and Science, Pilani, Rajasthan 333031, India 2 Hearne Institute for Theoretical Physics, Department of Physics and Astronomy, Louisiana State University, Baton Rouge, Louisiana 70803, USA 3 Center for Computation and Technology, Louisiana State University, Baton Rouge, Louisiana 70803, USA Given two pairs of quantum states, a fundamental question in the resource theory of asymmetric distinguishability is to determine whether there exists a quantum channel converting one pair to the other. In this work, we reframe this question in such a way that a catalyst can be used to help perform the transformation, with the only constraint on the catalyst being that its reduced state is returned unchanged, so that it can be used again to assist a future transformation. What we find here, for the special case in which the states in a given pair are commuting, and thus quasi-classical, is that this catalytic transformation can be performed if and only if the relative entropy of one pair of states is larger than that of the other pair. This result endows the relative entropy with a fundamental operational meaning that goes beyond its traditional interpretation in the setting of independent and identical resources. Our finding thus has an immediate application and interpretation in the resource theory of asymmetric distinguishability, and we expect it to find application in other domains. I. INTRODUCTION The majorization partial ordering has been studied in the light of the following question [HLP52]: What does it mean for one probability distribution to be more disor- dered than another? While there exist several equivalent characterizations of majorization, the most fundamental is arguably the notion that a distribution p majorizes an- other distribution p 0 when p 0 can be obtained from p by the action of a doubly stochastic map. This captures the intuition that p 0 is more disordered than p (see Subsec- tion A4 for technical details). The theory of majorization has several widespread ap- plications, some of which are in quantum resource theo- ries [CG19]. For example, transformations of pure bipar- tite states by means of local operations and classical com- munication can be analyzed in terms of majorization of the Schmidt coefficients of the states [Nie99]. Majoriza- tion has been shown to determine whether state trans- formations are possible not just in the resource theory of entanglement, but also coherence [ZMC + 17, WY16] and purity [HHO03] as well. As it turns out, majorization is a special case of a more general concept introduced in [Vei71] and now called rel- ative majorization [Ren16, BG17]. This more general notion also has a rich history in the context of statistics, where it goes by the name of “statistical comparison” or “comparison of experiments” [Bla53]. To recall the notion, a pair (p, q) of probability distributions p and q relatively majorizes another pair (p 0 ,q 0 ) of probability distributions p 0 and q 0 if there exists a classical channel (conditional probability distribution) that converts p to p 0 and q to q 0 . That is, (p, q) relatively majorizes (p 0 ,q 0 ) if there exists a stochastic matrix N such that p 0 = Np and q 0 = Nq. In each pair, if the second distribution is the uniform distribution, relative majorization collapses to majorization, demonstrating that relative majorization is indeed a generalization of majorization. FIG. 1. (p, q) is a pair of distributions to be converted to another pair (p 0 ,q 0 ) by means of a classical channel and with the use of the catalyst pair (r, s). The state t 0 after transfor- mation has first marginal p 0 and second marginal r. Dotted arrows indicate marginal distributions. Another generalization of majorization comes in the form of catalysis [JP99]. A catalyst is an ancillary dis- tribution whose presence enables certain transformations that would otherwise be impossible, with the only con- straint is that its reduced state is returned unchanged. Catalysis has found application in various resource the- ories, including entanglement [JP99, DK01], thermody- namics [BHN + 15], and purity [BEG + 19]. In this paper, we reformulate the transformation task in relative majorization by allowing for a catalyst that can aid the transformation. In this scenario, a catalyst consists of an arbitrary pair (r, s) of probability distri- butions r and s that can be used in conjunction with the original input pair (p, q) to generate the output pair (p 0 ,q 0 ) by means of a classical channel Λ acting on p r and q s. To make the catalytic task non-trivial, we de- mand that the first and second marginals of Λ(p r) are p 0 and r, respectively. We also demand that Λ(q s) be exactly equal to q 0 s. As a consequence of this demand, the catalyst is returned unchanged and can be used in future catalytic tasks for distributions that are indepen- dent of p and p 0 . We call this task relative majorization assisted by a catalyst with correlations, and the task is succinctly summarized in Figure 1. Let q and q 0 be distributions with rational entries and full support, and suppose that p 6= p 0 . One main result arXiv:1912.04254v1 [quant-ph] 9 Dec 2019

Relative Entropy and Catalytic Relative Majorization · Relative Entropy and Catalytic Relative Majorization Soorya Rethinasamy1,2 and Mark M. Wilde2,3 1Birla Institute of Technology

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Relative Entropy and Catalytic Relative Majorization · Relative Entropy and Catalytic Relative Majorization Soorya Rethinasamy1,2 and Mark M. Wilde2,3 1Birla Institute of Technology

Relative Entropy and Catalytic Relative Majorization

Soorya Rethinasamy1, 2 and Mark M. Wilde2, 3

1Birla Institute of Technology and Science, Pilani, Rajasthan 333031, India2Hearne Institute for Theoretical Physics, Department of Physics and Astronomy,

Louisiana State University, Baton Rouge, Louisiana 70803, USA3Center for Computation and Technology, Louisiana State University, Baton Rouge, Louisiana 70803, USA

Given two pairs of quantum states, a fundamental question in the resource theory of asymmetricdistinguishability is to determine whether there exists a quantum channel converting one pair tothe other. In this work, we reframe this question in such a way that a catalyst can be used tohelp perform the transformation, with the only constraint on the catalyst being that its reducedstate is returned unchanged, so that it can be used again to assist a future transformation. Whatwe find here, for the special case in which the states in a given pair are commuting, and thusquasi-classical, is that this catalytic transformation can be performed if and only if the relativeentropy of one pair of states is larger than that of the other pair. This result endows the relativeentropy with a fundamental operational meaning that goes beyond its traditional interpretation inthe setting of independent and identical resources. Our finding thus has an immediate applicationand interpretation in the resource theory of asymmetric distinguishability, and we expect it to findapplication in other domains.

I. INTRODUCTION

The majorization partial ordering has been studied inthe light of the following question [HLP52]: What doesit mean for one probability distribution to be more disor-dered than another? While there exist several equivalentcharacterizations of majorization, the most fundamentalis arguably the notion that a distribution p majorizes an-other distribution p′ when p′ can be obtained from p bythe action of a doubly stochastic map. This captures theintuition that p′ is more disordered than p (see Subsec-tion A 4 for technical details).

The theory of majorization has several widespread ap-plications, some of which are in quantum resource theo-ries [CG19]. For example, transformations of pure bipar-tite states by means of local operations and classical com-munication can be analyzed in terms of majorization ofthe Schmidt coefficients of the states [Nie99]. Majoriza-tion has been shown to determine whether state trans-formations are possible not just in the resource theory ofentanglement, but also coherence [ZMC+17, WY16] andpurity [HHO03] as well.

As it turns out, majorization is a special case of a moregeneral concept introduced in [Vei71] and now called rel-ative majorization [Ren16, BG17]. This more generalnotion also has a rich history in the context of statistics,where it goes by the name of “statistical comparison”or “comparison of experiments” [Bla53]. To recall thenotion, a pair (p, q) of probability distributions p andq relatively majorizes another pair (p′, q′) of probabilitydistributions p′ and q′ if there exists a classical channel(conditional probability distribution) that converts p top′ and q to q′. That is, (p, q) relatively majorizes (p′, q′) ifthere exists a stochastic matrix N such that p′ = Np andq′ = Nq. In each pair, if the second distribution is theuniform distribution, relative majorization collapses tomajorization, demonstrating that relative majorizationis indeed a generalization of majorization.

FIG. 1. (p, q) is a pair of distributions to be converted toanother pair (p′, q′) by means of a classical channel and withthe use of the catalyst pair (r, s). The state t′ after transfor-mation has first marginal p′ and second marginal r. Dottedarrows indicate marginal distributions.

Another generalization of majorization comes in theform of catalysis [JP99]. A catalyst is an ancillary dis-tribution whose presence enables certain transformationsthat would otherwise be impossible, with the only con-straint is that its reduced state is returned unchanged.Catalysis has found application in various resource the-ories, including entanglement [JP99, DK01], thermody-namics [BHN+15], and purity [BEG+19].

In this paper, we reformulate the transformation taskin relative majorization by allowing for a catalyst thatcan aid the transformation. In this scenario, a catalystconsists of an arbitrary pair (r, s) of probability distri-butions r and s that can be used in conjunction withthe original input pair (p, q) to generate the output pair(p′, q′) by means of a classical channel Λ acting on p⊗ rand q⊗ s. To make the catalytic task non-trivial, we de-mand that the first and second marginals of Λ(p⊗ r) arep′ and r, respectively. We also demand that Λ(q ⊗ s) beexactly equal to q′⊗s. As a consequence of this demand,the catalyst is returned unchanged and can be used infuture catalytic tasks for distributions that are indepen-dent of p and p′. We call this task relative majorizationassisted by a catalyst with correlations, and the task issuccinctly summarized in Figure 1.

Let q and q′ be distributions with rational entries andfull support, and suppose that p 6= p′. One main result

arX

iv:1

912.

0425

4v1

[qu

ant-

ph]

9 D

ec 2

019

Page 2: Relative Entropy and Catalytic Relative Majorization · Relative Entropy and Catalytic Relative Majorization Soorya Rethinasamy1,2 and Mark M. Wilde2,3 1Birla Institute of Technology

2

of our paper is that it is possible to transform the pair(p, q) to the pair (p′, q′) in the above sense if and only if

D(p‖q) ≥ D(p′‖q′) and rank (p) ≤ rank (p′) (1)

where the relative entropy D(p‖q) is defined as

D(p‖q) :=∑x

p(x) ln

(p(x)

q(x)

). (2)

Another contribution of our paper is that it is possibleto perform an inexact, yet arbitrarily accurate transfor-mation if the same conditions hold but q and q′ are notrequired to have rational entries.

We establish these results by building on the prior workof [BHN+15, M18]. As done in the prior works, we em-ploy an embedding map as a technical tool [BHN+15],and we also allow for the catalyst to become correlatedwith the target distribution [M18].

We note here that our results apply to the more general“quasi-classical” case in which the first pair consists ofcommuting quantum states ρ and σ and the second pairconsists of commuting quantum states ρ′ and σ′. Thisfollows as a direct consequence of the fact that commut-ing quantum states are diagonal in the same basis andthus effectively classical, along with the fact that thereis a unitary channel that takes the common basis of thefirst pair to the common basis of the second pair.

II. MAIN RESULT

We begin by stating the main result of our paper andthe associated corollary. We present all proofs in Appen-dices C, D and E.

Theorem 1 (One Shot Characterization of ExactPair Transformations). Given probability distributionsp, q, p′, q′ on the same alphabet where q and q′ have fullsupport and only rational entries, and p 6= p′, the follow-ing conditions are equivalent:

1. D(p‖q) > D(p′‖q′) and rank (p) ≤ rank (p′)

2. For any γ > 0, there exist probability distributionsr and s, a joint distribution t′, and a classical chan-nel Λ such that

(a) Λ(p⊗ r) = t′ with marginals p′ and r

(b) Λ(q ⊗ s) = q′ ⊗ s(c) D(t′‖p′ ⊗ r) < γ

Furthermore, we can take s = η, where η is theuniform distribution on the support of r.

The theorem states that the relative entropy is theonly relevant information quantity that characterizes pairtransformations when catalysts are available. Further-more, the relative entropy D(t′‖p′ ⊗ r) can be made as

small as desired and thus the output of the classical chan-nel Λ arbitrarily close to the product of p′ and r. Thecondition on q and q′ being rational can be relaxed byallowing for errors in the formation of the target state,as follows:

Theorem 2 (One Shot Characterization of Approxi-mate Pair Transformations). Given probability distribu-tions p, q, p′, q′ on the same alphabet where p 6= p′, andq and q′ are full rank, the the following conditions areequivalent:

1. D(p‖q) ≥ D(p′‖q′)

2. For any ε > 0 and γ > 0, there exist probabilitydistributions r, s and p′ε, a joint distribution t′ε,and a classical channel Λ such that

(a) Λ(p⊗ r) = t′ε with marginals p′ε and r

(b) Λ(q ⊗ s) = q′ ⊗ s(c) ‖p′ − p′ε‖1 ≤ ε(d) D(t′ε‖p′ε ⊗ r) < γ

Furthermore, we can take s = η, where η is theuniform distribution on the support of r.

The relative entropy D(t′ε‖p′ε⊗r) can be made as smallas desired and thus the output of the classical channel Λarbitrarily close to the product of p′ε and r. Further-more, the state p′ε can be arbitrarily close to the targetdistribution p′ in trace distance. Note that allowing fora slight error in the transformation of p while demandingno error in the transformation of q is consistent with theframeworks of [Mat10, WW19, BST19], with the frame-work of [WW19] being known as the “resource theory ofasymmetric distinguishability” due to this asymmetry inthe transformation.

Corollary 1. Given probability distributions p and p′ onthe same alphabet where p 6= p′, the following conditionsare equivalent.

1. H(p) ≤ H(p′)

2. For any ε > 0 and γ > 0, there exist probabilitydistributions r and p′ε, a joint distribution t′ε, anda unital classical channel Λ such that

(a) Λ(p⊗ r) = t′ε with marginals p′ε and r

(b) ‖p′ − p′ε‖1 ≤ ε(c) D(t′ε‖p′ε ⊗ r) < γ

A unital classical channel is one that preserves the uni-form distribution. The corollary states that, given a cat-alyst, the condition on the transformation is based on theShannon entropy only. Furthermore, the relative entropyD(t′ε‖p′ε⊗r) can be made as small as desired and thus theoutput of the classical channel Λ arbitrarily close to theproduct of p′ε and r. Lastly, the state p′ε can be arbitrar-ily close to the target distribution p′ in trace distance.

Page 3: Relative Entropy and Catalytic Relative Majorization · Relative Entropy and Catalytic Relative Majorization Soorya Rethinasamy1,2 and Mark M. Wilde2,3 1Birla Institute of Technology

3

III. DISCUSSIONS AND IMPLICATIONS

In this section we discuss the various implications ofour results. A catalyst is a special ancillary probabilitydistribution whose presence enables certain transforma-tions that would otherwise be impossible. In other words,a catalyst decreases the strength of the condition requiredfor a transformation [JP99]. Additionally, allowing forthe catalyst to be correlated with the target distribu-tion, subject to the condition that its reduced state isreturned unchanged, the strength of the condition is fur-ther reduced and previously impossible transformationsbecome possible. Since the catalyst is return unchanged,the paradigm of catalytic majorization with correlationsis physically relevant and justified.

In the resource theories of pure-state entanglement,purity, and pure-state coherence, state transformationsin the absence of a catalyst are governed by the ma-jorization partial order, whereas in the resource theoryof quasi-classical thermodynamics, transformations aregoverned by thermomajorization [GJB+18]. The cat-alytic majorization condition is equivalent to an infiniteset of conditions on the Renyi entropy [Kli07, Tur07].The strength of the catalytic majorization condition liesstrictly below the majorization condition, as there existdistributions p and q where p � q but p �T q [JP99],proving that catalysis provides an advantage.

Catalytic majorization can be further relaxed. In cat-alytic majorization, the target state, after transforma-tion, exists in a product state with the catalyst; i.e., thecatalyst and the target state are independent after thetransformation. Relaxing this additional assumption andallowing the final state to be a non-product state, withthe condition that their local states remain unchanged,further reduces the strength of the required condition.This is a generalization of catalytic majorization, calledcatalytic majorization with correlations, as the productstate is a special case of a general distribution.

Our result is a necessary and sufficient condition for thelast case, where we allow catalysis and correlations. Theresult is a condition on relative entropy and neatly ties allthe resource theories mentioned into a single paradigm.This also enhances the operational meaning of the rela-tive entropy by establishing its relevance in the single-shot regime as well.

IV. RELATED WORK

There are several related generalizations and regimesof operations to the ones mentioned in this work. Thissection provides an overview of the landscape.

Majorization and relative majorization are mostprominent in the single-shot regime. This regime hasto do with tasks involving a single system.

There exists another regime that deals with situa-tions in which many identically and independently dis-tributed (i.i.d.) systems are available, often referred to

as the asymptotic regime. In the asymptotic regime,perhaps the more pertinent question is that of rate ofconversion from initial to target distributions and notnecessarily feasibility of the transformation. For the re-source theories of pure-state entanglement, pure-state co-herence, and quasi-classical thermodynamics, the rateof conversion is given by ratios of the entropy (pure-state entanglement and pure-state coherence) and rel-ative entropy to the Gibbs distribution (quasi-classicalthermodynamics). The asymptotic version of Black-well’s “comparison of experiments” paradigm was stud-ied in [Mat10, WW19, BST19]. In [WW19], the problemwas approached by introducing the resource theory ofasymmetric distinguishability, which allowed for analyz-ing this problem in a resource-theoretic way. The resultof [WW19, BST19] is that, in the asymptotic regime, therate of conversion between two pairs of probability dis-tributions (p, q) and (p′, q′) is given by the ratio of therelative entropy of the pairs, thus enhancing the funda-mental operational meaning of the relative entropy. Theresults in the various regimes and resource theories arepresented in Table I.

Another generalization of majorization, called quan-tum majorization, was proposed to generalize classicalstochasticity to the quantum regime [GJB+18]. A nec-essary and sufficient condition for the transformation ofa pair of qubits to another has been given in [AU80].However, it was shown that the conditions are necessarybut not sufficient for quantum systems with dimensionsgreater than two [Mat14], and thus a more general resultis lacking.

V. CONCLUSION

The majorization partial order was developed to cap-ture the notion of disorder in probability distributionsand governs transformations in various resource theories.The addition of a catalyst reduces the strength of the re-quired condition by allowing for transformations that arenot possible in the framework of majorization withoutcatalysis. Furthermore, allowing correlations betweenthe target and the catalyst further reduces the strength ofthe required condition to one based on relative entropies.The main result of our paper is that a pair (p, q) of prob-ability distributions can be converted to another pair(p′, q′) of probability distributions assisted by a catalystwith correlations if and only if D(p‖q) ≥ D(p′‖q′). Thisenhances the operational meaning of the relative entropybeyond the traditional independent and identical regimeinto the single-shot regime. This finding thus comple-ments and enhances the previous finding from [BEG+19]in the context of entropy and the resource theory of pu-rity.

There are several open questions yet to be tackled.The most pertinent is whether our result applies to pairsof quantum states (ρ, σ) and (τ, ω), and quantum cat-alysts. Another avenue to be explored is whether our

Page 4: Relative Entropy and Catalytic Relative Majorization · Relative Entropy and Catalytic Relative Majorization Soorya Rethinasamy1,2 and Mark M. Wilde2,3 1Birla Institute of Technology

4

Resource Theory Asymptotic One-Shot One-shot + CatalysisOne-shot + Catalysis +

CorrelationsEntanglement H(p)/H(q) p � q Hα(p) ≤ Hα(q) H(p) ≤ H(q)

Coherence H(p)/H(q) p � q Hα(p) ≤ Hα(q) H(p) ≤ H(q)Thermodynamics D(p‖γ)/D(q‖γ) p �thermo q Fα(p) ≥ Fα(q) F (p) ≥ F (q)Distinguishability D(p‖q)/D(p′‖q′) (p, q) � (p′, q′) Dα(p‖q) ≥ Dα(p′‖q′) D(p‖q) ≥D(p′‖q′)

TABLE I. Results in various regimes and resource theories. (The bold cell in bold indicates our result.) The last row is ageneral case of the rows above. The one-shot regime with catalysis depends on Hα and Dα, ∀α ∈ [−∞,∞] (see SubsectionsA 1, A 2). Furthermore, in the one-shot cases, moving further right reduces the strength of the required argument, by allowingfor catalysis.

result applies to larger sets of distributions, and notjust dichotomies, i.e., the conversion of {p1, p2, . . . , pk}to {q1, q2, . . . , qk} for k > 2.

ACKNOWLEDGMENTS

We thank Vishal Katariya and Aby Philip for dis-cussions. We acknowledge support from NSF GrantNo. 1907615. SR thanks the Department of Physics andAstronomy at Louisiana State University for hospitalityduring his research visit in Fall 2019. SR also thanks theDepartment of Physics, Birla Institute of Technology andScience for this opportunity.

Appendix A: Notations and Definitions

We introduce the various measures and their proper-ties in this section. Furthermore, we define some usefulconcepts and constructs used in the various theorems andlemmas.

1. Renyi Divergence

For probability distributions p = {pi}ni=1 and q ={qi}ni=1, the Renyi divergence is defined ∀α ∈ R \{0, 1,∞,−∞} as follows [R61]

Dα(p‖q) =sgn (α)

α− 1ln

n∑i=1

pαi q1−αi , (A1)

where,

sgn (α) =

{1 α ≥ 0

−1 α < 0.(A2)

We also take some special limits for α ∈ {0, 1,∞,−∞}.

D0(p‖q) = − ln

n∑i:pi 6=0

qi,

D1(p‖q) =

n∑i

pi(ln pi − ln qi),

D∞(p‖q) = ln maxi

piqi,

D−∞(p‖q) = D∞(q‖p). (A3)

D∞(p‖q) can also be expressed as

D∞(p‖q) = ln min

{λ|∀i, λ ≥ pi

qi

}. (A4)

An important property of the Renyi divergence is thatit obeys the data processing property, i.e. for all stochas-tic maps Λ and ∀α ∈ [−∞,∞], [vEH14]

Dα(p‖q) ≥ Dα(Λ(p)‖Λ(q)). (A5)

Another important property of the Renyi divergence isthat for α ∈ [0,∞],

Dα(p‖q) > 0 (A6)

for all probability distributions p 6= q. Additionally,

D(p‖q) = 0 (A7)

iff p = q.

The Renyi divergence is additive under products, i.e.for probability distributions p1, p2, q1 and q2 [vEH14],

Dα(p1‖q1) +Dα(p2‖q2) = Dα(p1 ⊗ q1‖p2 ⊗ q2) (A8)

2. Renyi Entropy

For a probability distribution p = {pi}ni=1, the Renyientropy is defined ∀α ∈ R\{0, 1,∞,−∞} as follows [R61],

Hα(p) =sgn (α)

1− αln

n∑i=1

pαi . (A9)

Page 5: Relative Entropy and Catalytic Relative Majorization · Relative Entropy and Catalytic Relative Majorization Soorya Rethinasamy1,2 and Mark M. Wilde2,3 1Birla Institute of Technology

5

Similarly, we also take some special limits for α ∈{0, 1,∞,−∞}.

H0(p) = ln rank (p) ,

H1(p) = −n∑i

pi ln pi,

H∞(p) = − ln pmax,

H−∞(p) = ln pmin. (A10)

3. Quantum Renyi Divergence

The Renyi divergence from the classical regime can beextended to the quantum regime, but the extension is notunique. Classical probability distributions can be mod-eled as quantum density operators that are diagonal inan orthonormal basis. Thus, modelling two probabilitydistributions is equivalent to working with two densityoperators that commute with each other, i.e., that are di-agonalizable in the same basis. However, in general, thismay not be possible, which leads to the non-uniquenessof the quantum extension. Note that any quantum ex-tension should collapse to the classical case if the statescommute.

Here, we use the quantum generalization from [Pet86]:

Dα(ρ‖σ) =sgn (α)

α− 1ln[Tr ρασ1−α] . (A11)

The limits of α ∈ {0, 1} are

D0(ρ‖σ) = − ln [Tr(Πρσ)] ,

D1(ρ‖σ) = Tr(ρ ln ρ− ρ lnσ), (A12)

where Πρ is the projector onto support of ρ. The quan-tum Renyi divergence obeys the data processing inequal-ity for all CPTP maps Λ and 0 ≤ α ≤ 2 [Pet86]:

Dα(ρ‖σ) ≥ Dα(Λ(ρ)‖Λ(σ)). (A13)

Another property of the quantum Renyi divergence isthat for ρ 6= σ, [Pet86]

D(ρ‖σ) > 0. (A14)

4. Majorization and its Extensions

The theory of majorization was developed to capturethe notion of disorder. The crux of the theory is to answerthe question: When is one probability distribution moredisordered than another? The results can be succinctlyput forth as follows [HLP52].

For two probability distributions p, q ∈ Rk, we say that

“p majorizes q” or p � q if for all n = 1, . . . , k − 1,

n∑i=1

p↓i ≥n∑i=1

q↓i ,

k∑i=1

p↓i =

k∑i=1

q↓i , (A15)

where p↓ is the reordering of p in descending order. Ifp � q, we say that q is more disordered than p.

There are several formulations for majorization thatcan be proven to be equivalent (see Lemma 1). Ma-jorization has several applications, especially in entan-glement transformations [Nie99]. It has been shown thatmajorization defines a partial order for states that canbe transformed into another.

An extension of the theory of majorization comes inthe form of catalysis. Given two distributions p, q suchthat p ⊀ q, does there exist a distribution r such thatp⊗r ≺ q⊗r? If such a distribution exists, it can be usedto transform q into p and be left unchanged. Since r isunaffected by the process, we call it the catalyst of thetransformation. We say that p is catalytically majorizedby q, written p ≺T q, if there exists a catalyst r such thatp⊗ r ≺ q⊗ r. The theory of catalytic majorization is notas well understood as majorization. However, severalimportant results have been shown [Kli07, Tur07].

5. Embedding Map

A mathematical tool frequently used in the resourcetheory of thermodynamics is an embedding map, whichis a classical channel that maps a thermal distribution toa uniform distribution [BHN+15]. We use this tool in amore general sense needed in our context here, in orderto map a probability distribution described by a set ofrational numbers to a uniform distribution.

Consider the simplex of probability distributions{pi}ki=1 and a set of natural numbers {di}ki=1 with N =k∑i=1

di. The embedding Γd : Rk → RN is defined as

Γd(p) :=

k⊕i=1

piηi =

p1

d1, . . . ,

p1

d1︸ ︷︷ ︸d1

, . . . ,pkdk, . . . ,

pkdk︸ ︷︷ ︸

dk

,

(A16)

where ηi =(

1di, . . . , 1

di

)∈ Rdi is the uniform distribu-

tion.Since Γd is an injective map, there exists a left-inverse

for Γd denoted by Γ∗d : RN → Rk and defined as follows:

Γ∗d(x) =

d1∑i=1

xi,

d1+d2∑i=d1+1

xi, . . . ,

N∑d1+...+dn−1+1

xi

.

(A17)

Page 6: Relative Entropy and Catalytic Relative Majorization · Relative Entropy and Catalytic Relative Majorization Soorya Rethinasamy1,2 and Mark M. Wilde2,3 1Birla Institute of Technology

6

Note that Γ∗d is a classical channel, and furthermore,Γ∗d ◦ Γd = I, where I is the identity channel.

Appendix B: Technical Lemmas

The technical lemmas presented in this appendix havebeen established in prior work. Here we list them forconvenience and completeness.

Lemma 1 (Majorization [MOA11, HLP52]). Let x andy be probability distributions ∈ Rd. Then, the followingare equivalent.

1. x ≺ y.

2. For n = 1, . . . , k − 1,n∑i=1

x↓i ≥n∑i=1

y↓i

andk∑i=1

x↓i =k∑i=1

y↓i .

3. x = Dy for some d×d doubly stochastic matrix D.

4. For t ∈ R,d∑i=1

|xi − t| ≤d∑i=1

|yi − t|.

Lemma 2 (Embedding of distribution with rational en-tries [BHN+15]). For a distribution γd defined using a

set of natural numbers {di}ki=1, wherek∑i=1

di = N , as

γd =

(d1

N, . . . ,

dkN

), (B1)

then

Γd(γd) = ηN (B2)

where ηN is the uniform distribution of size N .

Proof. From the definition of an embedding map Γd (seeSubsection A 5), we see that

Γd(γd) =

d1/N

d1, . . . ,

d1/N

d1︸ ︷︷ ︸d1

, . . . ,dk/N

dk, . . . ,

dk/N

dk︸ ︷︷ ︸dk

,

=

1

N, . . . ,

1

N︸ ︷︷ ︸d1

, . . . ,1

N, . . . ,

1

N︸ ︷︷ ︸dk

,

=

1

N, . . . ,

1

N︸ ︷︷ ︸N

,

= ηN . (B3)

This concludes the proof.

Lemma 3 (Preservation of Renyi’s Divergence[BHN+15]). Let p = {pi}ki=1 be a probability dis-tribution and let {di}ki=1 be natural numbers with

N =k∑i=1

di and p = Γd(p). Then for all α ∈ (−∞,∞),

Dα(p‖γd) = Dα(p‖ηN ), (B4)

where γd is as defined in Lemma 2 and Γd is as definedin Subsection A 5.

Proof. We split the proof into separate parts for differentα values (see Subsection A 1).

• For α 6∈ {−∞, 0, 1,∞},

Dα(p‖γd) =sgn (α)

α− 1ln

[k∑i=1

pαi (γd)1−αi

],

=sgn (α)

α− 1ln

[k∑i=1

pαi

(diN

)1−α],

=sgn (α)

α− 1ln

[k∑i=1

di

(pidi

)α(1

N

)1−α],

=sgn (α)

α− 1ln

[(1

N

)1−α k∑i=1

di

(pidi

)α],

=sgn (α)

α− 1ln

[N∑i=1

pαi

(1

N

)1−α],

= Dα(p‖ηN ). (B5)

• For α = 0,

D0(p‖γd) = − ln

∑i,pi 6=0

(γd)i

,

= − ln

∑i,pi 6=0

diN

,

= − ln

∑i,pi 6=0

1

N

,

= D0(p‖ηN ). (B6)

• For α→∞,

D∞(p‖γd) = ln min

{λ : ∀i, λ ≥ pi

di/N

},

= ln min

{λ : ∀i, λ ≥ pi/di

1/N

},

= D∞(p‖ηN ). (B7)

Page 7: Relative Entropy and Catalytic Relative Majorization · Relative Entropy and Catalytic Relative Majorization Soorya Rethinasamy1,2 and Mark M. Wilde2,3 1Birla Institute of Technology

7

• For α→ −∞,

D−∞(p‖γd) = D∞(γd‖p),

= ln min

{λ : ∀i, λ ≥ di/N

pi

},

= ln min

{λ : ∀i, λ ≥ 1/N

pi/di

},

= D∞(ηN‖p),= D−∞(p‖ηN ). (B8)

• For α = 1,

D1(p‖γd) =

k∑i=1

pi ln

(pi

di/N

),

=

k∑i=1

dipidi

ln

(pi/di1/N

),

=

N∑i=1

pidi

ln

(pi/di1/N

),

=

N∑i=1

pi ln

(pi

(ηN )i

),

= D1(p‖ηN ). (B9)

Thus, for all α ∈ (−∞,∞),

Dα(p‖γd) = Dα(p‖ηN ).

This concludes the proof.

Lemma 4 (Mixture reduces Renyi’s Divergence[BHN+15]). Let p 6= q be probability distributions withq full rank. Then ∀α ∈ (−∞,∞) and 0 < δ < 1,

Dα((1− δ)p+ δq‖q) < Dα(p‖q). (B10)

Proof. We split the proof into separate parts for differentα values.

For α ∈ [0, 1], joint convexity for probability distribu-tions (A1, B1) and (A2, B2) states that, for a parameter0 < δ < 1, [vEH14]

Dα((1− δ)A1 + δA2‖(1− δ)B1 + δB2)

≤ (1− δ)Dα(A1‖B1) + δDα(A2‖B2). (B11)

Setting B1 = B2 = A2 = q and A1 = p, we see that,

Dα((1− δ)p+ δq‖q) ≤ (1− δ)Dα(p‖q)< Dα(p‖q), (B12)

where the first inequality follows because Dα(q‖q) = 0(See (A7)) and the second strict inequality followsbecause Dα(p‖q) > 0 for p 6= q (see (A6)).

For α > 1, joint convexity does not hold. Denoter = (1 − δ)p + δq. Then, the statement of the lemmais equivalent to

k∑i=1

rαi q1−αi <

k∑i=1

pαi q1−αi . (B13)

Since α > 1, f(x) = xα is a convex function over allx > 0. Since q1−α

i > 0, the linear combination

F (p) =

k∑i=1

q1−αi pαi (B14)

is convex with respect to p. For p 6= q, F (p) > 1 whichcomes from positivity of Renyi’s divergence Dα(p‖q).

Note that F (q) =k∑i=1

qi = 1. Thus,

F (r) ≤ (1− δ)F (p) + δF (q),

= F (p)− δ(F (p)− F (q)),

< F (p), (B15)

where the first inequality is by convexity and the secondinequality is because F (p) > F (q). Thus, we show thevalidity of (B13).

For α < 0, if p is not full rank, Dα(p‖q) = ∞. Butq is full rank, implying that Dα((1 − δ)p + δq‖q) isfinite and hence inequality holds. If p is full rank, weuse a similar approach as the case for α > 1 by defin-ing α′ = 1−α and thus α′ > 1. We use the previous case.

This concludes the proof.

Lemma 5 (Conversion into rational entries [BHN+15]).Let q = {qi}ki=1 be an ordered (descending) probabilitydistribution of full rank. Then, for any ε > 0, thereexists a distribution q such that

1. ‖q − q‖1 ≤ ε.

2. There exists a set of natural numbers {di}ki=1 such

that q = {di/N} with N =k∑i=1

di.

3. There exists a classical channel E such thatE(q) = q and for any other distribution p,‖p− E(p)‖1 ≤ O(

√ε).

Proof. We construct a q that satisfies statements 1 and 2.Then, we create channel E to satisfy statement 3. Sinceq is decreasing, mini qi = qk. Firstly, we choose an Nsuch that,

qk >k√N

>k

N. (B16)

We now define q. For i ∈ {1, . . . , k − 1},

qi =dqiNeN

=diN, (B17)

Page 8: Relative Entropy and Catalytic Relative Majorization · Relative Entropy and Catalytic Relative Majorization Soorya Rethinasamy1,2 and Mark M. Wilde2,3 1Birla Institute of Technology

8

and

qk = 1−k−1∑i=1

qi =dkN. (B18)

Note that for i ∈ {1, . . . , k−1}, qi ≥ qi and qk ≤ qk sinceboth are normalized to 1. Also note that,

1− qk =

k−1∑i=1

qi,

=

k−1∑i=1

dqiNeN

,

≤k−1∑i=1

qiN

N+k − 1

N,

≤ 1− qk +k

N, (B19)

where the first equality follows from (B18) and the secondequality follows from (B17). Thus,

qk ≥ qk −k

N> 0 (B20)

which follows from (B16) by the choice of N. Thus q is avalid probability distribution. Furthermore, we see that∑

qi>qi

qi +∑qi≤qi

qi =∑qi>qi

qi +∑qi≤qi

qi = 1. (B21)

Thus, ∑qi>qi

qi − qi =∑qi≤qi

qi − qi. (B22)

Using this result, we show that

‖q − q‖1 =1

2

k∑i=1

|qi − qi| ,

=1

2

∑i:qi>qi

(qi − qi) +∑i:qi≤qi

(qi − qi)

,=

∑i:qi>qi

(qi − qi),

= qk − qk,

≤ k

N, (B23)

where the third equality follows from (B22) and the lastinequality is due to (B20). Thus, if we pick ε = k

N , wecan satisfy statements 1 and 2.

Lastly, we need to construct a channel that takes qto q. Such a channel must slightly increase the proba-bilities qi,∀i ∈ {1, . . . , k − 1} and reduce qk. Channels

are characterised by conditional probabilities P (j|i) suchthat

k∑j=1

P (j|i) = 1, (B24)

which is just the normalization preservation.

Denote I = {1, 2, . . . , k − 1} which are the indices forwhich qi ≥ qi, ∆i = qi− qi and ∆ =

∑i∈I ∆i. Note that

qk = qk −∆.

Consider a channel E defined by

P (j|i) =

1 i = j, i, j ∈ I0 i 6= j, i, j ∈ I1−

(∆qk

)i = j = k

0 j = k, i ∈ I∆j

qki = k, j ∈ I

. (B25)

We now prove that the normalization condition is metand E(q) = q. We begin with the normalization condi-tion.

• For i ∈ I,

k∑j=1

P (j|i)

=∑

j∈I,j 6=i

P (j|i) + P (i|i) + P (k|i),

= 0 + 1 + 0,

= 1. (B26)

• for i = k,

k∑j=1

P (j|k)

=∑j∈I

P (j|k) + P (k|k),

=∑j∈I

(∆j

qk

)+ 1−

(∆

qk

),

= 1. (B27)

Thus,

k∑j=1

P (j|i) = 1, (B28)

which is the normalization preservation.

Next, we show that E(q) = q. Let q′ = E(q). We needto show that q = q′.

Page 9: Relative Entropy and Catalytic Relative Majorization · Relative Entropy and Catalytic Relative Majorization Soorya Rethinasamy1,2 and Mark M. Wilde2,3 1Birla Institute of Technology

9

• For j ∈ I,

q′j =∑

i∈I,i6=j

P (j|i)qi + P (j|j)qj + P (j|k)qk,

= 0 + qj +∆j

qkqk,

= qj + ∆j ,

= qj . (B29)

• For j = k,

q′k =∑i∈I

P (k|i)qi + P (k|k)qk,

= 0 +

(1− ∆

qk

)qk,

= qk −∆,

= qk. (B30)

Thus, we have proven that q = q′.

Finally, we need to show that, for any other distribu-tion p, ‖p− E(p)‖1 ≤ O(

√ε). Denote E(p) = p. We see

that

‖p− p‖1 = pk − pk,

= pk −(

1− ∆

qk

)pk,

=

(∆

qk

)pk,

≤ ∆

qk,

≤ k

Nqk,

≤ ε√N

k,

= ε

√N√k

1√k,

=

√ε

k, (B31)

where the first equality is from the last equality of (B23),the first inequality is because pk ≤ 1, the second in-equality is because ∆ = qk − qk ≤ k

N , the third inequal-

ity is because kN = ε, qk ≥ k√

Nand the last equality

is because Nk = 1

ε . Thus, for any other distribution p,‖p−E(p)‖1 ≤ O(

√ε). Furthermore, we can show that E

maps the basis vectors to probability distributions withnon-zero entries, proving that is indeed stochastic. Thiscompletes the proof.

Remark: For a full rank distribution p, the distribu-tion E(p) (as defined above) is full rank. The reasoningis as follows:

• The channel increases all entries except the last en-try. Thus, all entries except the last entry in E(p)are strictly greater than 0 since p is full rank.

• The last entry in E(p) is(

1− ∆qk

)pk. Since ∆ =

qk − qk, by choice of N , we see that this entry isstrictly greater than 0 as well (see (B20)).

Thus, the distribution E(p) is full rank for a full rank p.

Remark: A classical channel E is defined using a con-ditional probability distribution. In other words, givenan input distribution p(x) and a channel E(y|x), the out-put distribution p′(y) is given by

p′(y) =∑x

E(y|x)p(x). (B32)

A reversal channel of a classical channel can be definedas follow. From Bayes’ Theorem, we know that

E(y|x)p(x) = E′(x|y)p′(y). (B33)

Summing both sides of (B33) over x and noticing that∑xE′(x|y) = 1 is the normalization condition, we

recover (B32).

Similarly, summing both sides of the Equation (B33)over y, we see that

p(x) =∑y

E′(x|y)p′(y), (B34)

and thus, a reversal channel E′ can be defined.

Lemma 6 (Splitting of channel [BHN+15]). Consider achannel, such that, for some fixed probability distributionu = (u1, . . . , u`, 0, . . . , 0), the channel gives output u′ =(u′1, . . . , u

′`, 0, . . . , 0). Moreover Λ(w) = w holds for some

full rank distribution w. Then Λ = Λ1 ⊕ Λ2 where Λ1

acts on the first ` elements and Λ2 acts on the remainingn− ` elements.

Proof. Consider the joint probability of two randomvariables X and Y given by the distribution w andthe channel. X = 0 denotes inputs are from thefirst group (items from 1 to `), and X = 1 meansinputs are from the second group (items from ` + 1to n). Similarly, Y denotes similar events for the outputs.

Since the channel preserves w, we have that P (Y =0) = P (X = 0) and P (Y = 1) = P (X = 1). Moreover,since the channel sends t into t′, this means that P (Y =0|X = 0) = 1. Thus,

P (Y = 0) = P (Y = 0|X = 0)P (X = 0)

+ P (Y = 0|X = 1)P (X = 1),

P (Y = 0) = P (X = 0) + P (Y = 0|X = 1)P (X = 1).(B35)

Page 10: Relative Entropy and Catalytic Relative Majorization · Relative Entropy and Catalytic Relative Majorization Soorya Rethinasamy1,2 and Mark M. Wilde2,3 1Birla Institute of Technology

10

Thus,

P (Y = 0|X = 1)P (X = 1) = 0. (B36)

Since, for arbitrary distributions, P (X = 1) 6= 0, wesee that,

P (Y = 0|X = 1) = 0. (B37)

Rewritten,

P (Y = 0|X = 0) = P (Y = 1|X = 1) = 1. (B38)

This means that first group elements are always mappedto first group and similarly for the second group. So thechannel can be split into two channels. This concludesthe proof.

Lemma 7 (Continuity [vEH14]). The relative entropyD(p‖q) is continuous in both arguments p and q, when qhas full rank.

Lemma 8 (Inclusion of support [Ren05]). Let ρAB be adensity operator in HA ⊗HB. Then

supp (ρAB) ⊆ supp (ρA)⊗ supp (ρB) . (B39)

Proof. Suppose that ρAB is a pure state. Then ρAB =|ψ〉〈ψ|. Let |ψ〉 =

∑z∈Z αz|φz〉 ⊗ |ψz〉 be the Schmidt

decomposition of |ψ〉. Then

supp (ρAB) = {|ψ〉}⊆ span {|φz〉z∈Z} ⊗ span {|ψz〉z∈Z} . (B40)

Since span {|φz〉z∈Z} = supp (ρA) and span {|ψz〉z∈Z} =supp (ρB), we see that the lemma is true.

For mixed states, let ρAB =∑x∈X ρ

xAB be the decom-

position into pure states for ρAB . Then

supp (ρAB)

= span

{⋃x∈X

supp (ρxAB)

},

⊆ span

{⋃x∈X

supp (ρxA)⊗ supp (ρxB)

},

(span

{⋃x∈X

supp (ρxA)

})⊗(

span

{⋃x∈X

supp (ρxB)

}),

= supp (ρA)⊗ supp (ρB) . (B41)

This concludes the proof.

Lemma 9 (Superadditivity of D1 [CLPG18]). LetHAB = HA ⊗HB be a bipartite Hilbert space, and ρAB,σAB be two density operators. If σAB = σA ⊗ σB, then

D(ρAB‖σA ⊗ σB)

= D(ρAB‖ρA ⊗ ρB) +D(ρA‖σA) +D(ρB‖σB). (B42)

Thus,

D(ρAB‖σA ⊗ σB) ≥ D(ρA‖σA) +D(ρB‖σB). (B43)

Proof. Using the definition of Quantum Renyi’s Diver-gence (from Subsection A 3), we see that

D(ρAB‖σA ⊗ σB)

= Tr [ρAB (ln ρAB − lnσA ⊗ σB)] ,

= Tr [ρAB (ln ρAB − ln ρA ⊗ ρB)]

+ Tr [ρAB (ln ρA ⊗ ρB − lnσA ⊗ σB)] ,

= D(ρAB‖ρA ⊗ ρB)

+D(ρA ⊗ ρB‖σA ⊗ σB),

= D(ρAB‖ρA ⊗ ρB) +D(ρA‖σA)

+D(ρB‖σB).(B44)

Since D(ρAB‖ρA ⊗ ρB) ≥ 0 (See (A14)), we see thatD(ρAB‖σA ⊗ σB) ≥ D(ρA‖σA) +D(ρB‖σB), thus con-cluding the proof.

Lemma 10 (Superadditivity of D0). Let HAB = HA ⊗HB be a bipartite Hilbert space and ρAB, σAB be twodensity operators. If σAB = σA ⊗ σB, then

D0(ρAB‖σA ⊗ σB) ≥ D0(ρA‖σA) +D0(ρB‖σB). (B45)

Proof. From Lemma 8, we see that supp (ρAB) ⊆supp (ρA)⊗ supp (ρB) and thus,

ΠρAB≤ ΠρA ⊗ΠρB . (B46)

Then,

D0(ρAB‖σA ⊗ σB)

= − ln (Tr [ΠρAB(σA ⊗ σB)]) ,

≥ − ln (Tr [(ΠρA ⊗ΠρB )(σA ⊗ σB)]) ,

= − ln (Tr [(ΠρAσA ⊗ΠρBσB)]) ,

= − ln (Tr [ΠρAσA] Tr [ΠρBσB ]) ,

= − ln (Tr [ΠρAσA])− ln (Tr [ΠρBσB ]) ,

= D0(ρA‖σA) +D0(ρB‖σB). (B47)

This concludes the proof.

Lemmas 9 and 10 are results based on the QuantumRenyi divergence . However, any result that holds for thequantum case, must hold for the classical case. This isbecause, allowing the quantum states to be commuting,we can diagonalize them in the same basis and thus areeffectively quasi-classical.

Lemma 11 (Muller [M18]). Let pA, p′A ∈ Rm be prob-

ability distributions with p↓A 6= p′↓A. Then, there existsa probability distribution qB and an extension p′AB withmarginals p′A and qB such that

pA ⊗ qB � p′AB (B48)

if, and only if, H0(pA) ≤ H0(p′A) and H(pA) < H(p′A).Moreover, if these inequalities are satisfied, we can al-ways choose B and p′AB such that D(p′AB‖p′A ⊗ qB) < ε,for any choice of ε > 0.

Page 11: Relative Entropy and Catalytic Relative Majorization · Relative Entropy and Catalytic Relative Majorization Soorya Rethinasamy1,2 and Mark M. Wilde2,3 1Birla Institute of Technology

11

Appendix C: Proof of Theorem 1

Proof. We begin with statement 1 implies statement 2;i.e., we suppose D(p‖q) > D(p′‖q′) and rank (p) ≤rank (p′), and prove the existence of a classical channel Λ,probability distributions r and s, and a joint distributiont′ that satisfy the conditions of statement 2.

Since q and q′ have rational entries, without loss of gen-

erality, we pick q =(d1N , . . . ,

dkN

)and q′ =

(d′1N , . . . ,

d′kN

)for integers {di} and {d′i}, such that

k∑i=1

di =k∑i=1

d′i = N

respectively.Using Lemma 3 and Statement 1, we see that

D(Γd(p)‖ηN ) > D(Γd′(p′)‖ηN ). (C1)

Define p ≡ Γd(p) and p′ ≡ Γd′(p′). Thus,

D(p‖ηN ) > D(p′‖ηN ). (C2)

Using the definition of the relative entropy (see Subsec-tion A 1), we see that, for any probability distribution p

D(p‖ηN ) = ln(N)−H(p). (C3)

Thus, from (C2) and (C3)

H(p) < H(p′). (C4)

Since rank (p) ≤ rank (p′), we see that

rank (p) ≤ rank (p′) . (C5)

From the definition of Renyi entropy H0 (see Subsec-tion A 2) and (C5), we see that

H0(p) ≤ H0(p′). (C6)

Using (C4) and (C6), we see that, from Lemma 11,there exists a probability distribution r and an extensionv′ with marginals p′ and r such that

p⊗ r � v′,D(v′‖p′ ⊗ r) < γ,

(C7)

for any choice of γ > 0. In other words, there exists abistochastic map Φ (see Lemma 1) such that

Φ(p⊗ r) = v′. (C8)

Using Lemma 6, we see that r can be considered,without loss of generality, to be of full rank. Concretely,if r is of rank d (less than full rank), then defineu = p ⊗ r, u′ = v′ and w = ηN ⊗ η. Using Lemma 6, Φcan be split into Φ1 ⊕ Φ2. If r is of full rank, Φ = Φ1.

Finally consider

Λ = (Γ∗d′ ⊗ I) ◦ Φ1 ◦ (Γd ⊗ I). (C9)

Since this is a composition of stochastic maps, theoverall map is stochastic.

Proving 2a)

Λ(p⊗ r) = [(Γ∗d′ ⊗ I) ◦ Φ1 ◦ (Γd ⊗ I)](p⊗ r),= [(Γ∗d′ ⊗ I) ◦ Φ1](p⊗ r),= [(Γ∗d′ ⊗ I)]v′. (C10)

We now define t′ = [(Γ∗d′ ⊗ I)]v′. We notice that themarginals of t′ are Γ∗d′(p

′) = p′ and r.

Proving 2b)

We need to show that Λ(q ⊗ s) = q′ ⊗ s for some s.Pick s = η. Then,

Λ(q ⊗ η) = [(Γ∗d′ ⊗ I) ◦ Φ1 ◦ (Γd ⊗ I)](q ⊗ η),

= [(Γ∗d′ ⊗ I) ◦ Φ1](ηN ⊗ η),

= [(Γ∗d′ ⊗ I)](ηN ⊗ η),

= q′ ⊗ η. (C11)

where the second equality is from Lemma 2. Thus,Λ(q ⊗ s) = q′ ⊗ s.

Proving 2c)

γ > D(v′‖p′ ⊗ r),≥ D(t′‖p′ ⊗ r), (C12)

where the first inequality is from (C7) and the second in-equality is from the data-processing inequality (A5) usingchannel [(Γ∗d′ ⊗ I)]. Thus, D(t′‖p′ ⊗ r) ≤ γ.

This completes the proof that statement 1 impliesstatement 2. We now look at the reverse direction; i.e.,we assume a classical channel Λ that satisfies the condi-tions of statement 2. Let α ∈ {0, 1}. Then,

Dα(p‖q) +Dα(r‖s) = Dα(p⊗ r‖q ⊗ s)≥ Dα(Λ(p⊗ r)‖Λ(q ⊗ s)),= Dα(t′‖q′ ⊗ s),≥ Dα(p′‖q′) +Dα(r‖s), (C13)

where the first equality follows from additivity of Renyidivergence (A8), the first inequality follows from thedata-processing property (A5) and the last inequality fol-lows from Lemmas 9 and 10 for α = 1 and α = 0 respec-tively.

Thus, since we showed that r, s can be taken to be fullrank, Dα(r‖s) is finite and can be subtracted, and thuswe can conclude that

Dα(p‖q) ≥ Dα(p′‖q′). (C14)

We now split into two cases.

α = 0. Thus, D0(p‖q) ≥ D0(p′‖q′). Since this is truefor all q, q′, it must be true for q = q′ = η. From the

Page 12: Relative Entropy and Catalytic Relative Majorization · Relative Entropy and Catalytic Relative Majorization Soorya Rethinasamy1,2 and Mark M. Wilde2,3 1Birla Institute of Technology

12

definition of Renyi divergence D0 (see Subsection A 1),we see that

rank (p) ≤ rank (p′) . (C15)

α = 1. Thus, D(p‖q) ≥ D(p′‖q′). Consider the caseof equality. Since it holds true for all q, q′, pick q = q′ =η. Thus,

H(p) = H(p′). (C16)

Furthermore, equality implies that (See (C13)),

D(t′‖q′ ⊗ s) = D(p′‖q′) +D(r‖s). (C17)

We know that

D(t′‖q′ ⊗ s) = D(t′‖p′ ⊗ r) +D(p′‖q′) +D(r‖s), (C18)

from Lemma 9. Thus,

D(t′‖p′ ⊗ r) = 0 (C19)

which implies (see (A7))

t′ = p′ ⊗ r. (C20)

Since we know that p ⊗ r � t′ (since Λ(p ⊗ r) = t′ fromstatement 2), this implies

p⊗ r � p′ ⊗ r =⇒ p �T p′. (C21)

Coupled with H(p) = H(p′), it shows that p = p′, whichis against our assumption. Thus D(p‖q) > D(p′‖q′).

This concludes the proof.

Appendix D: Proof of Theorem 2

Proof. We begin with statement 1 implies statement 2;i.e., we suppose D(p‖q) ≥ D(p′‖q′), and prove the exis-tence of a classical channel Λ, probability distributionsp′ε, r and s, and a joint distribution t′ε that satisfy theconditions of statement 2.

In general, q and q′ do not have rational entries. How-ever, the set of probability distributions with rational en-tries is dense in the set of all distributions. Thus, usingLemma 5, we can always pick probability distributionsqd and q′d′ (defined below) that are arbitrarily close to qand q′ respectively.

Without loss of generality, we pick qd =(d1N , . . . ,

dkN

)and q′d′ =

(d′1N , . . . ,

d′kN

)for integers {di} and {d′i}, such

thatk∑i=1

di =k∑i=1

d′i = N respectively.

Furthermore, Lemma 5 shows that there exists classicalchannels E and E′ such that E(q) = qd and E′(q′) = q′d′such that,

‖qd − q‖1 ≤ ε1,

‖E(p)− p‖1 ≤ O(√ε1), (D1)

for all probability distributions p and,

‖q′d′ − q′‖1 ≤ ε2,

‖E′(p′)− p′‖1 ≤ O(√ε2), (D2)

for all probability distributions p′.Define the reversal channel of E′ as E′∗ (see Re-

mark B). Then,

E′∗(q′d′) = q′. (D3)

Now define

p′′ = (1− δ)p′ + δq′ (D4)

for 0 < δ < 1. Note that ‖p′′ − p′‖1 ≤ δ and p′′ is fullrank. From Lemma 4 and statement 1, we see that

D(p‖q) > D(p′′‖q′). (D5)

From Lemma 7, we see that D(p‖q) is continuous inboth arguments p and q, when q is full rank. Thus in theε1 → 0 limit, we see that

D(p‖q) = D(E(p)‖qd). (D6)

Similarly, in the ε2 → 0 limit,

D(p′′‖q′) = D(E′(p′′)‖q′d′). (D7)

Thus, using (D5), (D6), and (D7), we conclude that

D(E(p)‖qd) > D(E′(p′′)‖q′d′). (D8)

Further, using Lemma 3 and (D8), we see that

D(Γd(E(p))‖ηN ) > D(Γd′(E′(p′′))‖ηN ). (D9)

Define E(p) ≡ Γd(E(p)) and E′(p′′) ≡ Γd′(E′(p′′)).

Thus,

D(E(p)‖ηN ) > D(E′(p′′)‖ηN ). (D10)

Using the definition of the relative entropy (see Subsec-tion A 1), we see that, for any probability distribution p,

D(p‖ηN ) = ln(N)−H(p). (D11)

Thus, from (D10) and (D11),

H(E(p)) < H(E′(p′′)). (D12)

Since p′′ is full rank, from Remark B, we see that E′(p′′)

is full rank and thus E′(p′′) is full rank as well. From thedefinition of H0 (see Subsection A 2), we see that

H0(E(p)) ≤ H0(E′(p′′)). (D13)

Using (D12) and (D13), we see that, from Lemma 11,there exists a probability distribution r and an extensionv′′ with marginals E′(p′′) and r such that

E(p)⊗ r � v′′,D(v′′‖E′(p′′)⊗ r) < γ,

(D14)

Page 13: Relative Entropy and Catalytic Relative Majorization · Relative Entropy and Catalytic Relative Majorization Soorya Rethinasamy1,2 and Mark M. Wilde2,3 1Birla Institute of Technology

13

for any choice of γ > 0. In other words, there exists abistochastic map Φ (see Lemma 1) such that

Φ(E(p)⊗ r) = v′′ (D15)

Using Lemma 6, we see that r can be considered,without loss of generality, to be of full rank. Concretely,if r is of rank d (less than full rank), then define

u = E(p)⊗ r, u′ = v′′ and w = ηN ⊗ η. Using Lemma 6,Φ can be split into Φ1 ⊕ Φ2. If r is of full rank, Φ = Φ1.

Finally consider

Λ = (E′∗ ⊗ I) ◦ (Γ∗d′ ⊗ I) ◦Φ1 ◦ (Γd ⊗ I) ◦ (E ⊗ I). (D16)

Since this is a composition of stochastic maps, theoverall map is stochastic.

Proving 2a) and 2c)

Λ(p⊗ r)= [(E′∗ ⊗ I) ◦ (Γ∗d′ ⊗ I) ◦ Φ1 ◦ (Γd ⊗ I) ◦ (E ⊗ I)](p⊗ r),= [(E′∗ ⊗ I) ◦ (Γ∗d′ ⊗ I) ◦ Φ1 ◦ (Γd ⊗ I)](E(p)⊗ r),= [(E′∗ ⊗ I) ◦ (Γ∗d′ ⊗ I) ◦ Φ1](E(p)⊗ r),= [(E′∗ ⊗ I) ◦ (Γ∗d′ ⊗ I)]v′′ (D17)

We now define t′ε = [(E′∗ ⊗ I) ◦ (Γ∗d′ ⊗ I)]v′′. We notice

that the marginals of t′ε are p′ε = E′∗(Γ∗d′(E′(p′′))) and r.

Note that,

‖p′ε − p′‖1= ‖E′∗(Γ∗d′(E′(p′′)))− p′‖1≤ ‖E′∗(Γ∗d′(E′(p′′)))− Γ∗d′(E

′(p′′))‖1+ ‖Γ∗d′(E′(p′′))− p′‖1

≤ O(√ε2) + ‖Γ∗d′(Γd′(E′(p′′)))− p′‖1

= O(√ε2) + ‖E′(p′′)− p′‖1

≤ O(√ε2) + ‖E′(p′′)− p′′‖1 + ‖p′′ − p′‖1

≤ O(√ε2) +O(

√ε1) + δ, (D18)

where the first inequality is because of the triangle in-equality of the trace distance, the second inequality isfrom (D2), the second equality is from (A17), the thirdinequality is because of the triangle inequality of the tracedistance and the last inequality is because of (D2) and(D4).

Since ε1, ε2 and δ are arbitrarily chosen we can setthem to ε. Thus Λ(p⊗ r) = t′ε with marginals p′ε and r.Furthermore, ‖p′ε − p′‖1 ≤ ε.

Proving 2b)

We need to show that Λ(q ⊗ s) = q′ ⊗ s for some s.Pick s = η. Then,

Λ(q ⊗ η)

= [(E′∗ ⊗ I) ◦ (Γ∗d′ ⊗ I) ◦ Φ1 ◦ (Γd ⊗ I) ◦ (E ⊗ I)](q ⊗ η),

= [(E′∗ ⊗ I) ◦ (Γ∗d′ ⊗ I) ◦ Φ1 ◦ (Γd ⊗ I)](qd ⊗ η),

= [(E′∗ ⊗ I) ◦ (Γ∗d′ ⊗ I) ◦ Φ1](ηN ⊗ η),

= [(E′∗ ⊗ I) ◦ (Γ∗d′ ⊗ I)](ηN ⊗ η),

= [(E′∗ ⊗ I)](qd′ ⊗ η),

= q′ ⊗ η, (D19)

where the third equality follows because Φ1 is bistochas-tic. Thus Λ(q ⊗ s) = q′ ⊗ s.Proving 2d)

γ > D(v′′‖E′′(p′)⊗ r),≥ D(t′ε‖p′ε ⊗ r), (D20)

where the first inequality is from (D14) and the secondinequality is from the data-processing inequality (A5) us-ing channel [(E′∗⊗I)◦(Γ∗d′⊗I)]. Thus, D(t′ε‖p′ε⊗r) ≤ γ.

This completes the proof that statement 1 impliesstatement 2. We now look at the reverse direction; i.e.,we assume a stochastic map Λ.

D(p‖q) +D(r‖s) = D(p⊗ r‖q ⊗ s),≥ D(Λ(p⊗ r)‖Λ(q ⊗ s)),= D(t′ε‖q′ ⊗ s),≥ D(p′ε‖q′) +D(r‖s), (D21)

where the first equality follows from additivity of relativeentropy (A8), the first inequality follows from the data-processing property (A5) and the last inequality followsfrom Lemma 9. Thus, since we showed that r, s can betaken to be full rank, D(r‖s) is finite and can be sub-tracted.

D(p‖q) ≥ D(p′ε‖q′). (D22)

Using Lemma 7,

D(p‖q) ≥ D(p′‖q′). (D23)

This concludes the proof.

Appendix E: Proof of Corollary 1

Proof. Simply pick q = q′ = η and apply Theorem 2.Since Λ preserves the uniform distribution, it is a unitalclassical channel.

[AU80] P. M. Alberti and Armin Uhlmann. A problem relat-ing to positive linear maps on matrix algebras. Reports

on Mathematical Physics, 18(2):163–176, October 1980.

Page 14: Relative Entropy and Catalytic Relative Majorization · Relative Entropy and Catalytic Relative Majorization Soorya Rethinasamy1,2 and Mark M. Wilde2,3 1Birla Institute of Technology

14

[BEG+19] Paul Boes, Jens Eisert, Rodrigo Gallego,Markus P. Muller, and Henrik Wilming. Von neu-mann entropy from unitarity. Physical Review Letters,122(21):210402, May 2019. arXiv:1807.08773.

[BG17] Francesco Buscemi and Gilad Gour. Quantum rel-ative Lorenz curves. Physical Review A, 95(1):012110,January 2017. arXiv:1607.05735.

[BHN+15] Fernando G. S. L. Brandao, Michal Horodecki,Nelly Huei Ying Ng, Jonathan Oppenheim, andStephanie Wehner. The second laws of quan-tum thermodynamics. Proceedings of the NationalAcademy of Sciences, 112(11):3275–3279, February 2015.arXiv:1305.5278.

[Bla53] David Blackwell. Equivalent comparisons of experi-ments. The Annals of Mathematical Statistics, 24(2):265–272, June 1953.

[BST19] Francesco Buscemi, David Sutter, and MarcoTomamichel. An information-theoretic treatment ofquantum dichotomies. July 2019. arXiv:1907.08539.

[CG19] Eric Chitambar and Gilad Gour. Quantum resourcetheories. Reviews of Modern Physics, 91(2):025001, April2019. arXiv:1806.06107.

[CLPG18] Angela Capel, Angelo Lucia, and David Perez-Garcia. Superadditivity of quantum relative entropy forgeneral states. IEEE Transactions on Information The-ory, 64(7):4758–4765, July 2018. arXiv:1705.03521.

[DK01] Sumit Daftuar and Matthew Klimesh. Mathematicalstructure of entanglement catalysis. Physical Review A,64(4):042314, September 2001. arXiv:quant-ph/0104058.

[GJB+18] Gilad Gour, David Jennings, Francesco Buscemi,Runyao Duan, and Iman Marvian. Quantum majoriza-tion and a complete set of entropic conditions for quan-tum thermodynamics. Nature Communications, 9:5352,December 2018. arXiv:1708.04302.

[HHO03] Michal Horodecki, Pawel Horodecki, and JonathanOppenheim. Reversible transformations from pureto mixed states and the unique measure of informa-tion. Physical Review A, 67(6):062104, June 2003.arXiv:quant-ph/0212019.

[HLP52] G. H. Hardy, D. E. Littlewood, and G. Polya. In-equalities. Cambridge Mathematical Library. CambridgeUniversity Press, 1952.

[JP99] Daniel Jonathan and Martin B. Plenio. Entanglement-assisted local manipulation of pure quantum states.Physical Review Letters, 83(17):3566–3569, October1999. arXiv:quant-ph/9905071.

[Kli07] Matthew Klimesh. Inequalities that collectively com-pletely characterize the catalytic majorization relation.September 2007. arXiv:0709.3680.

[M18] Markus P. Muller. Correlating thermal machines and

the second law at the nanoscale. Physical Review X,8(4):041051, December 2018. arXiv:1707.03451.

[Mat10] Keiji Matsumoto. Reverse test and characterizationof quantum relative entropy, 2010. arXiv:1010.1030.

[Mat14] Keiji Matsumoto. An example of a quantum statis-tical model which cannot be mapped to a less informa-tive one by any trace preserving positive map, September2014. arXiv:1409.5658.

[MOA11] Albert Marshall, Ingram Olkin, and Barry C.Arnold. Inequalities: Theory of majorization and its ap-plications. Springer Science+Business Media, LLC, NewYork, 2011.

[Nie99] Michael A. Nielsen. Conditions for a class of en-tanglement transformations. Physical Review Letters,83(2):436–439, July 1999. arXiv:quant-ph/9811053.

[Pet86] Dnes Petz. Quasi-entropies for finite quantum sys-tems. Reports on Mathematical Physics, 23(1):57 – 65,1986.

[Ren05] Renato Renner. Security of quantum key distribution.PhD thesis, ETH Zurich, December 2005. arXiv:quant-ph/0512258.

[Ren16] Joseph M. Renes. Relative submajorization andits use in quantum resource theories. Journal ofMathematical Physics, 57(12):122202, December 2016.arXiv:1510.03695.

[R61] Alfrd Rnyi. On measures of entropy and information. InProceedings of the Fourth Berkeley Symposium on Math-ematical Statistics and Probability, Volume 1: Contribu-tions to the Theory of Statistics, pages 547–561, Berkeley,Calif., 1961. University of California Press.

[Tur07] S. Turgut. Catalytic transformations for bipartitepure states. Journal of Physics A, 40(40):12185–12212,September 2007.

[vEH14] Tim van Erven and Peter Harremoes. Renyi diver-gence and Kullback-Leibler divergence. IEEE Trans-actions on Information Theory, 60(7):3797–3820, July2014. arXiv:1206.2459.

[Vei71] Arthur F. Veinott. Least d-majorized network flowswith inventory and statistical applications. ManagementScience, 17(9):547–567, May 1971.

[WW19] Xin Wang and Mark M. Wilde. Resource theoryof asymmetric distinguishability for quantum channels.May 2019. arXiv:1905.11629.

[WY16] Andreas Winter and Dong Yang. Operational re-source theory of coherence. Physical Review Letters,116(12):120404, March 2016. arXiv:1506.07975.

[ZMC+17] Huangjun Zhu, Zhihao Ma, Zhu Cao, Shao-Ming Fei, and Vlatko Vedral. Operational one-to-one mapping between coherence and entanglement mea-sures. Physical Review A, 96(3):032316, September 2017.arXiv:1704.01935.