Entanglement and thermodynamics in general probabilistic theories Giulio Chiribella∗ Department of Computer Science, University of Hong Kong, Pokfulam Road, Hong Kong Carlo Maria Scandolo† Department of Computer Science, University of Oxford, Oxford, UK Entanglement is one of the most striking features of quantum mechanics, and yet it is not specifically quantum. More specic to quantum mechanics is the connection between entanglement and thermodynamics, which leads to an identication between entropies and measures of pure state entanglement. Here we search for the roots of this connection, investigating the relation between entanglement and thermodynamics in the framework of general probabilistic theories. We rst address the question whether an entangled state can be transformed into another by means of local operations and classical communication. Under two operational requirements, we prove a general version of the Lo-Popescu theorem, which lies at the foundations of the theory of pure-state entanglement. We then consider a resource theory of purity where free operations are random reversible transformations, modelling the scenario where an agent has limited control over the dynamics of a closed system. Our key result is a duality between the resource theory of entanglement and the resource theory of purity, valid for every physical theory where all processes arise from pure states and reversible interactions at the fundamental level. As an application of the main result, we establish a one-to-one correspondence between entropies and measures of pure bipartite entanglement. The correspondence is then used to dene entanglement measures in the general probabilistic framework. Finally, we show a duality between the task of information erasure and the task of entanglement generation, whereby the existence of entropy sinks (systems that can absorb arbitrary amounts of information) becomes equivalent to the existence of entanglement sources (correlated systems from which arbitrary amounts of entanglement can be extracted). I. INTRODUCTION The discovery of quantum entanglement [1, 2] introduced the revolutionary idea that a composite system can be in a pure state while its components are mixed. In Schrödinger's words: maximal knowledge of a total system does not necessarily imply maximal knowledge of all its parts [2]. This new possibility, in radical contrast with the paradigm of classical physics, is at the root of quantum non-locality [36] in all its counterintuitive manifestations [713]. With the advent of quantum information, it quickly became clear that entanglement was not only a source of foundational puzzles, but also a resource [14]. Harnessing this resource has been the key to the invention of groundbreaking protocols such as quantum teleportation [15], dense coding [16], and secure quantum key distribution [17, 18], whose implications deeply impacted physics and computer science [19, 20]. The key to understand entanglement as a resource is to consider distributed scenarios where spatially-separated parties perform local operations (LO) in their laboratories and exchange classical communication (CC) from one laboratory to another [2123]. The protocols that can be implemented in this scenario, known as LOCC protocols, provide a means to characterize entangled states and to compare their degree of entanglement. Precisely, a state ∗ giulio@cs.hku.hk † carlomaria.scandolo@st-annes.ox.ac.uk is i) entangled if it cannot be generated by an LOCC protocol, and ii) more entangled than another if there exists an LOCC protocol that transforms the former into the latter. Comparing the degree of entanglement of two quantum states is generally a hard problem [2430]. Nevertheless, the solution is simple for pure bipartite states, where the majorization criterion [31] provides a necessary and sufcient condition for LOCC convertibility. The criterion identies the degree of entanglement of a bipartite system with the degree of mixedness of its parts: the more entangled a pure bipartite state is, the more mixed its marginals are. Mixed states are compared here according to their spectra, with a state being more mixed than another if the spectrum of the latter majorizes the spectrum of the former [3235]. The majorization criterion shows that for pure bipartite states the notion of entanglement as a resource beautifully matches Schrödinger's notion of entanglement as non-maximal knowledge about the parts of a pure composite system. Moreover, majorization establishes an intriguing duality between entanglement and thermodynamics [3640], whereby the reduction of entanglement caused by LOCC protocols becomes dual to the increase of mixedness (and therefore entropy [41]) caused by thermodynamic transformations. This duality has farreaching consequences, such as the existence of a unique measure of pure state entanglement in the asymptotic limit [36, 37, 42]. In addition, it has provided guidance for the development of entanglement theory beyond the case of pure bipartite states [39]. 2 The duality between entanglement and thermodynamics is a profound and fundamental fact. As such, one might expect it to follow directly from basic principles. However, what these principle are is far from clear: up to now, the relation between entanglement and thermodynamics has been addressed in a way that depends heavily on the Hilbert space framework, using technical results that lack an operational interpretation (such as, e.g. the singular value decomposition). It is then natural to search for a derivation of the entanglementthermodynamics duality that uses only high-level quantum features, such as the impossibility of instantaneous signalling or the no-cloning theorem. In the same spirit, one can ask whether the duality holds for physical theories other than quantum mechanics, adopting the broad framework of general probabilistic theories [4352]. In the landscape of general probabilistic theories, entanglement is a generic feature [44, 53], which provides powerful advantages for a variety of information-theoretic tasks [54 59]. But what about its relation with thermodynamics? Is it also generic, or rather constitutes a specic feature of quantum theory? In this paper we explore the relation between entanglement and thermodynamics in an operational, theoryindependent way. Our work is part of a larger project that aims at establishing a common axiomatic foundation to quantum information theory and quantum thermodynamics. Within this broad scope, we start our investigation from the resource theory of entanglement, asking which conversions are possible under LOCC protocols. Our rst result is a generalization of the Lo-Popescu theorem [23]: we show that under suitable assumptions every LOCC protocol acting on a pure bipartite state can be simulated by a protocol using only one round of classical communication. Our assumptions are satised by quantum theory on both real and complex Hilbert spaces, and also by all bipartite extreme no-signalling boxes studied in the literature [6063]. In order to establish the connection with thermodynamics, we then move our attention to mixed states. We consider the scenario where an agent has limited control over the dynamics of a closed system, thus causing it to undergo a random mixture of reversible transformations and degrading it to a more disordered state. This notion of degradation coincides with the notion of adding noise put forward by Müller and Masanes for the problem of encoding spatial directions into physical systems [64], and represents a natural generalization of the notion of majorization [65]. Provided that that every pure state can be reached from any other pure state through some reversible dynamics, we show that the relevant resource in this scenario can be identied with the purity of the state. This observation leads to an operational theory of purity, which in the quantum case turns out to be equivalent to the theory of purity dened by Horodecki and Oppenheim [66]. Once the resource theories of entanglement and purity are put into place, we set out to establish a duality between them. To this purpose, we consider physical theories that admit a fundamental level of description where all states are pure and all interactions are reversible. Such theories are identied by the Purication Principle [47], which expresses a strengthened version of the conservation of information [67, 68]. The possibility of a pure and reversilble description is particularly appealing for the foundations of thermodynamics, as it reconciles the mixedness of thermodynamic ensembles with the pure and reversible picture provided by fundamental physics. In the quantum case, Purication is the starting point for all recent proposals to derive thermodynamic ensembles from the typicality of pure entangled states [6975], an idea that has been recently explored also in the broader framework of general probabilistic theories [76, 77]. Building on the Purication Principle, we establish the desired duality between entanglement and thermodynamics, showing that the degree of entanglement of a pure bipartite system coincides with the degree of mixedness of its parts. As a consequence, every measure of single-system mixedness becomes equivalent to a measure of pure bipartite entanglement. Exploiting this result, we dene a class of measures of entanglement, which can be extended to from pure to mixed state via the the convex roof construction [22, 78, 79], exactly in the same way as in the quantum case. Finally, we apply the duality to the task of information erasure [80], namely the task of converting a mixed state into a xed pure state via a set of allowed operations (in our case, the set of random reversible operations). As a result, erasing information becomes equivalent to generating entanglement. Quite surprisingly, we nd out that the impossibility of erasing information with the assistance of a catalyst implies the existence of a special type of purication, where the purifying system is a twin of the puried system. This observation completes the physical picture of the entanglement-thermodynamics duality, which appears to be a consequence of the possibility to describe every physical process in terms of pure states, reversible interactions, and pure measurementsin particular, modelling physical systems in mixed states through the introduction of a mirror image that completes the description. The paper is organized as follows. In section II we introduce the framework. The resource theory of entanglement is discussed in section III. In section IV we prove an operational version of Lo-Popescu theorem, which provides the starting point for the entanglementthermodynamics duality. In section V we formulate an operational resource theory of dynamical control, which gives rise to a resource theory of purity under the condition that all pure state are equivalent under the allowed reversible dynamics. In section VI we prove the duality between entanglement and thermodynamics, focussing our attention to theories that admit a fundamental level where all processes are pure and reversible. The consequences of the duality are examined in section VII: specically, we discuss the equivalence between measures 3 of mixedness and measures of entanglement for pure bipartite states, and we establish the relation between information erasure and entanglement generation. In section VIII we show that the requirement that information cannot be erased for free leads to the requirement of Symmetric Purication. Finally, section IX draws the conclusions and highlights the implications of our results. II. FRAMEWORK In this paper we adopt the framework of operationalprobabilistic theories (OPTs) [47, 48, 52], which combines the toolbox of probability theory with the graphical language of symmetric monoidal categories [8184]. Here we give a quick recap, referring the reader to Refs. [47, 48, 52] and to Hardy's works [50, 51] for a more extended presentation. The OPT framework describes circuits that can be built up by combining physical processes in sequence and in parallel, as in the following example ρ A A A ′ A′ A ′′ a B B B ′ b Here, A, A′, A′′, B, B′ label physical systems, ρ is a bipartite state, A, A′ and B are transformations, a and b are eects. The two transformations A and A′ are composed in sequence, while the transformations A and B and the eects a and b are composed in parallel. The circuit has no external wirescircuits of this form are associated with probabilities. Two transformations that give the same probabilities in all circuits are identied. The short-hand notation (a|ρ) is used to indicate the probability that the eect a takes place on the state ρ, diagrammatically represented as (a|ρ) := ρ A a . The set of all possible physical systems, denoted by Sys, is closed under composition: given two systems A and B one can form the composite system A⊗B. We denote the trivial system as I, which represents nothing (or, more precisely, nothing that the theory cares to describe). The trivial system satises the obvious conditions A ⊗ I = I ⊗ A = A, ∀A ∈ Sys. For generic systems A and B, we denote as • St (A) the set of states of system A • Transf (A,B) the set of transformations from system A to system B • Eff (A) the set of eects on system A The sets of states, transformations, and eects span vector spaces over the real numbers, denoted by StR (A), TransfR (A,B), and EffR (A), respectively. We denote by DA the dimension of the vector space StR (A) and say that system A is nite i DA < +∞. Transformations and eects act linearly on the vector space of states. For every system A, we assume the existence of an identity transformation IA, which does nothing on the states of the system. A test is a collection of transformations that can occur as alternatives in an experiment. Specically, a test of type A to B is a collection of transformations {Ci}i∈X with input A and output B. A transformation is called deterministic if it belongs to a test with a single outcome. We will often refer to deterministic transformations as channels, following the standard terminology of quantum information. A channel U from A to B is called reversible if there exists a channel U−1 from B to A such that U−1U = IA and UU−1 = IB. We denote by RevTransf (A,B) the set of reversible transformations from A to B. If there exists a reversible channel transforming A into B we say that A and B are operationally equivalent, denoted by A ' B. The composition of systems is required to be symmetric [8184], meaning that A ⊗ B ' B ⊗ A. The reversible channel that implements the equivalence is the swap channel, SWAP, and satises the condition B B B ′ A A A ′ = B SWAP A A A ′ SWAP B′ A B B B ′ A′ , (1) for every pair of transformations A and B and for generic systems A,A′,B,B′, as well as the conditions A SWAP B SWAP A B A B = A B , the wires in the r.h.s. representing identity transformations, and A SWAP B B C C A = A SWAP B B A SWAP C C A . In this paper we restrict our attention to causal theories [47], namely theories where the choice of future measurement settings does not inuence the outcome probability of present experiments. Mathematically, causality is equivalent to the fact that for every system A there is only one deterministic eect, which we denote here by TrA, in analogy with the trace in quantum mechanics. The uniqueness of the deterministic eect provides a canonical way to dene marginal states: Denition 1 The marginal state of a bipartite state ρAB on system A is the state ρA := TrB ρAB obtained by applying the deterministic eect on B. Moreover, one can dene the norm of a state ρ as ‖ρ‖ := Tr ρ. 4 The set of normalized states of A will be denoted by St1 (A) := {ρ ∈ St (A) | ‖ρ‖ = 1} . In a causal theory, every state is proportional to a normalized state [47]. In quantum mechanics, St1 (A) is the set of normalized density matrices of system A, while St (A) is the set of all sub-normalized density matrices. In a causal theory channels admit a simple characterization, which will be useful later in the paper: Proposition 1 Let C ∈ Transf (A,B). C is a channel if and only if TrBC = TrA. The proof can be found in lemma 5 of Ref. [47]. A. Pure states and transformations In every probabilistic theory one can dene pure states, and, more generally, pure transformations. Both concepts are based on the notion of coarse-graining, i.e. the operation of joining two or more outcomes of a test. More precisely, a test {Ci}i∈X is a coarse-graining of the test {Dj}j∈Y if there is a partition {Yi}i∈X of Y such that Ci = ∑ j∈Yi Dj for every i ∈ X. In this case, we say that {Dj}j∈Y is a renement of {Ci}i∈X. The renement of a given transformation is dened via the renement of a test: if {Dj}j∈Y is a renement of {Ci}i∈X, then the transformations {Dj}j∈Yi are a renement of the transformation Ci. A transformation is called pure if it has only trivial renements: Denition 2 The transformation C ∈ Transf (A,B) is pure if for every renement {Dj} one has Dj = pjC, where {pj} is a probability distribution. Pure transformations are those for which the experimenter has maximal information about the evolution of the system. We assume as part of the framework that tests satisfy a pure decomposition property : Denition 3 A test satises the pure decomposition property if it admits a renement consisting only of pure transformations. Later in the paper, we will assume one axiom Puricationthat implies the validity of the pure decomposition property for every possible test [47]. The set of pure transformations from A to B will be denoted as PurTransf (A,B). In the special case of states (transformations with no input), the above denition coincides with the usual denition of pure state. We denote the set of pure states of system A as PurSt (A). As usual, non-pure states will be called mixed. Pure states will play a key role in this paper. An elementary property of pure states is that they are preserved by reversible transformations. Proposition 2 Let U ∈ Transf (A,B) be a reversible channel. Then a state ψ ∈ St (A) is pure if and only if the state Uψ ∈ St (B) is pure. The proof is standard and is reported in Appendix A for convenience of the reader. III. THE RESOURCE THEORY OF ENTANGLEMENT The resource theory of quantum entanglement [14] is based on the notion of LOCC protocols, that is, protocols in which distant parties are allowed to communicate classically to one another and to perform local operations in their laboratories [21, 22]. Being operational, the notion of LOCC protocol can be directly exported to arbitrary theories. In this paper we consider protocols involving only two parties, Alice and Bob. A generic LOCC protocol consists of a sequence of tests, performed by Alice and Bob, with the property that the choice of the test at a given step can depend on all the outcomes produced at the previous steps. For example, consider a two-way protocol where 1. Alice performs a test {Ai1} and communicates the outcome to Bob 2. Bob performs a test { B(i1)i2 } and communicates the outcome to Alice 3. Alice performs a test { A(i1,i2)i3 } . An instance of the protocol is identied by the sequence of outcomes (i1, i2, i3) and can be represented by a circuit of the form A0 Ai1 %% A1 A(i1,i2)i3 A2 B0 B(i1)i2 77 B1 , where the dashed arrows represent classical communication. By coarse-graining over all possible outcomes, one obtains a channel, given by L = ∑ i1,i2,i3 [ A(i1,i2)i3 Ai1 ⊗ B (i1) i2 ] . Entangled states are those states that cannot be generated using an LOCC protocol. Equivalently, they can be characterized as the states that are not separable, i.e. not of the form ρ = ∑ i pi α (i) ⊗ β(i) , where {pi} is a probability distribution allowed by the theory, α(i) is a state of A, and β(i) is a state of B. Like in quantum theory, LOCC protocols can be used to compare entangled states. 5 Denition 4 Given two states ρ ∈ St (A⊗ B) and ρ′ ∈ St (A′ ⊗ B′), we say that ρ is more entangled than ρ′, denoted by ρ ent ρ′, if there exists an LOCC protocol that transforms ρ into ρ′, i.e. if ρ′ = Lρ for some LOCC channel L. Mathematically, the relation ent is a preorder, i.e. it is reexive and transitive. Moreover, it is stable under tensor products, namely ρ⊗σ ent ρ′⊗σ′ whenever ρ ent ρ′ and σ ent σ′. In other words, the relation ent turns the set of all bipartite states into a preordered monoid, the typical mathematical structure arising in all resource theories [85]. The resource theory of entanglement here ts completely into the framework of Ref. [85], with the LOCC channels as free operations. The states which can be prepared by LOCC (i.e. the separable states) are free by denition, and all the other states represent resources. If ρ ent ρ′ and ρ′ ent ρ, then we say that ρ and ρ′ are equally entangled, denoted by ρ 'ent ρ′. Note that ρ 'ent ρ′ does not imply that ρ and ρ′ are equal: for example, every two separable states are equally (un)entangled. IV. AN OPERATIONAL LO-POPESCU THEOREM Given two bipartite states, it is natural to ask whether one is more entangled than the other. A priori, answering the question requires one to check all possible LOCC protocols. However, the situation is much simpler when the initial state is pure: here we prove that in this case every LOCC protocol can be replaced without loss of generality by a protocol involving only one round of classical communicationi.e. a one-way LOCC protocol. Our argument is based on two basic operational requirements and provides a generalized version of the Lo-Popescu theorem [23], the key result at the foundation of the quantum theory of pure-state entanglement. A. Two operational requirements Our derivation of the operational Lo-Popescu theorem is based on two requirements, the rst being Axiom 1 (Purity Preservation [45, 48, 68]) The composition of two pure transformations yields a pure transformation, namely A A A′ C pure , C B B B′ pure =⇒ A A A′ C B B B′ pure , for every choice of systems A,A′,B,B′,C. As a special case, Purity Preservation implies that the product of two pure states is a pure state. This conclusion could also be obtained from the Local Tomography axiom [4345, 47]. Nevertheless, counterexamples exist of theories that satisfy Purity Preservation and violate Local Tomography. An example is quantum theory on real vector spaces [8688]. In general, we regard Purity Preservation as more fundamental than Local Tomography. Considering the theory as an algorithm to make deductions about physical processes, Purity Preservation ensures that, when presented with maximal information about two processes, the algorithm outputs maximal information about their composition [68]. Our second requirement imposes a symmetry of pure bipartite states: Axiom 2 (Local Exchangeability) For every pure bipartite state Ψ ∈ PurSt (A⊗ B), there exist two channels C ∈ Transf (A,B) and D ∈ Transf (B,A) such that Ψ A C B B D A = Ψ A SWAP B B A . (2) where SWAP is the swap operation [cf. Eq. (1)]. Note that, in general, the two channels depend on the specic pure state Ψ. Local Exchangeability is trivially satised by classical probability theory, where all pure states are of the product form. Less trivially, it is satised by quantum theory, both on complex and on real Hilbert spaces. This fact is illustrated in the following Example 1 Suppose that A and B are quantum systems, and let HA and HB be the corresponding Hilbert spaces. By the Schmidt decomposition, every pure state in the tensor product Hilbert space can be written as |Ψ〉 = r∑ i=1 √ pi |αi〉 ⊗ |βi〉 , where {|αi〉}ri=1 ⊂ HA and {|βi〉} r i=1 ⊂ HB are orthonormal vectors. The Schmidt decomposition implies the relation SWAP|Ψ〉 = (C ⊗D) |Ψ〉 , where C and D are the partial isometries C :=∑r i=1 |βi〉〈αi| and D := ∑r i=1 |αi〉〈βi|. From the partial isometries C and D it is immediate to construct the desired channels C and D, which can be dened as C (ρ) := CρC† + √ IA − C†Cρ √ IA − C†C D (σ) := DσD† + √ IB −D†Dσ √ IB −D†D , where ρ and σ are generic input states of systems A and B, respectively. With this denition, one has (C ⊗ D) (|Ψ〉〈Ψ|) = SWAP |Ψ〉〈Ψ| SWAP , which is the Hilbert space version of the local exchangeability condition of Eq. (2). 6 Local Exchangeability is also satised by all the extreme bipartite non-local correlations characterized in the literature [6063]: Example 2 Consider a scenario where two space-like separated parties, Alice and Bob, perform measurements on a pair of systems, A and B, respectively. Let x (y) be the index labelling Alice's (Bob's) measurement setting and let a (b) the index labelling the outcome of the measurement done by Alice (Bob). In the theory known as box world [44, 89] all no-signalling probability distributions pab|xy are physically realizable and represent states of the composite system A⊗B. Such probability distributions form a convex polytope [62], whose extreme points are the pure states of the theory. For x, y, a, b ∈ {0, 1} the systems A and B are operationally equivalent. We will denote by I the reversible transformation that converts A into B. The extreme nonlocal correlations have been characterized in [60] and are known to be equal to the standard PR-box correlation [61] pab|xy = { 1 2 a+ b ≡ xy mod 2 0 otherwise (3) up to exchange of 0 with 1 in the local settings of Alice and Bob and in the outcomes of their measurements. In the circuit picture, these operations are described by local reversible transformations: denoting by Φ the standard PR-box state, one has that every other pure entangled state Ψ ∈ PurSt (A⊗ B) is of the form Ψ A B = Φ A U A B V B , where U and V are reversible transformations. To see that Local Exchangeability holds, note that swapping systems A and B is equivalent to exchanging x with y and a with b. Now, the standard PR-box correlation of Eq. (3) is invariant under exchange x ↔ y, a↔ b, meaning that one has Φ A SWAP B B A = Φ A I B B I−1 A . Then it is clear that every pure state of A ⊗ B can be swapped by local operations: indeed, one has Ψ A SWAP B B A = Φ A U A SWAP B B V B A = Φ A SWAP B V B B A U A = Φ A I B V B B I−1 A U A = Ψ A C B B D A , (4) where C := V I U−1 and D := U I−1 V−1. This proves the Local Exchangeability property for all pure bipartite states in the 2-setting/2-outcome scenario. The situation is analogous in the case of 2 settings and arbitrary number of outcomes. Let us denote the setting by x, y ∈ {0, 1} and assume that a can take dA values, whereas b can take dB values. Then, all extreme non-local correlations are characterized in Ref. [62]. Up to local reversible transformations, they are labelled by a parameter k ∈ {2, . . . ,min {dA, dB}} and they are such that pab|xy = { 1 k b− a ≡ xy mod k 0 otherwise . (5) Thanks to the local equivalence, it is enough to prove the validity of Local Exchangeability for correlations in the standard form of Eq. (5). We distinguish between the two cases xy = 0 and xy = 1. For xy = 0, swapping x with y and a with b has no eect on pab|xy. For xy = 1, by swapping x with y and a with b, one obtains the probability distribution p′ab|xy = { 1 k a− b ≡ 1 mod k 0 otherwise This probability distribution can be obtained from the original one by relabelling the outputs as a′ := k − a and b′ := k − b. Such a relabelling corresponds to local reversible operations on A and B. In other words, Local Exchangeability holds. Finally, the last category of extreme non-local correlations characterized in the literature corresponds to the case of arbitrary number of settings and to 2-outcome measurements. In this case, the extreme correlations are characterized explicitly in Ref. [63]. Up to local reversible transformations, the pure states are invariant under swap. Hence, the same argument used in Eq. (4) shows that Local Exchangeability holds. B. Inverting the direction of classical communication Purity Preservation and Local Exchangeability have an important consequence. For one-way protocols acting on a pure input state, the direction of classical communication is irrelevant: every one-way LOCC protocol with communication from Alice to Bob can be replaced by a one-way LOCC protocol with communication from Bob to Alice, as shown by the following. Lemma 1 (Inverting CC) Let Ψ be a pure state of A ⊗ B and let ρ′ be a (possibly mixed) state of A′ ⊗ B′. Under the validity of axioms 1 and 2, the following are equivalent: 1. Ψ can be transformed into ρ′ by a one-way LOCC protocol with communication from Alice to Bob 7 2. Ψ can be transformed into ρ′ by a one-way LOCC protocol with communication from Bob to Alice. Proof. Suppose that Ψ can be transformed into ρ′ by a one-way LOCC protocol with communication from Alice to Bob, namely ρ′ A′ B′ = ∑ i∈X Ψ A Ai && A′ B B(i) B ′ (6) where {Ai}i∈X is a test, and, for every outcome i ∈ X, B(i) is a channel. Note that one can assume without loss of generality that all transformations {Ai}i∈X are pure: if the transformations are not pure, we can rene them by the pure decomposition property (cf. denition 2) and apply the argument to the rened test consisting of pure transformations. For every xed i ∈ X, one has Ψ A Ai A ′ B = Ψ A SWAP B SWAP A′ B A Ai A ′ B (7) By Local Exchangeability, the rst swap can be realized by two local channels C : A→ B and D : B→ A. Moreover, since Ai is pure, Purity Preservation implies that the (unnormalized) state (Ai ⊗ IB) Ψ is pure. Hence, also the second swap in Eq. (7) can be realized by two local channels C(i) : A′ → B and D(i) : B → A′. Substituting into Eq. (7) one obtains Ψ A Ai A ′ B = Ψ A C B D(i) A ′ B D A Ai A ′ C(i) B and, therefore, Ψ A Ai && A′ B B(i) B ′ = Ψ A C B D(i) A ′ B D A Ai 88 A′ C(i) B B(i) B ′ =: Ψ A Ã(i) A ′ B Bi 99 B′ , (8) having dened Ã(i) := D(i) C and B(i) = B(i) C(i)AiD. By construction { Bi } i∈X is a test, because it can be realized by performing the test {Ai}i∈X after the channel D and subsequently applying the channel B(i) C(i), depending on the outcome (the ability to perform conditional operations is guaranteed by causality [47]). On the other hand, Ã(i) is a channel for every i ∈ X. Hence, we have constructed a one-way LOCC protocol with communication from Bob to Alice. Combining Eqs. (6) and (8) we obtain ρ′ A′ B′ = ∑ i∈X Ψ A Ai && A′ B B(i) B ′ = ∑ i∈X Ψ A Ã(i) A ′ B Bi 99 B′ , meaning that Ψ can be transformed into ρ′ by a one-way LOCC protocol with communication from Bob to Alice. Clearly, the same argument can be applied to prove the converse direction.  Note that the target state ρ′ need not be pure: the fact that the direction of classical communication can be exchanged relies only on the purity of the input state Ψ. C. Reduction to one-way protocols We are now ready to derive the operational version of the Lo-Popescu theorem. Our result shows that the action of an arbitrary LOCC protocol on a pure state can be simulated by a one-way LOCC protocol: Theorem 1 (Operational Lo-Popescu theorem) Let Ψ be a pure state of A ⊗ B and ρ′ be a (possibly mixed) state of A′ ⊗ B′. Under the validity of axioms 1 and 2, the following are equivalent 1. Ψ can be transformed into ρ′ by an LOCC protocol 2. Ψ can be transformed into ρ′ by a one-way LOCC protocol. Proof. The non-trivial implication is 1 =⇒ 2. Suppose that Ψ can be transformed into ρ′ by an LOCC protocol with N rounds of classical communication. Without loss of generality, we assume that Alice starts the protocol and that all transformations occurring in the rst N − 1 rounds are pure. Let s = (i1, i2, . . . , iN−1) be the sequence of all classical outcomes obtained by Alice and Bob up to step N − 1, ps be the probability of the sequence s, and Ψs be the pure state after step N −1 conditional on the occurrence of s. For concreteness, suppose that the outcome iN−1 has been generated on Alice's side. Then, the rest of the protocol consists in a test { B(s)iN } , performed on Bob's side, followed by a channel A(s,iN ) performed on Alice's 8 side. By hypothesis, one has ρ′ A′ B′ = ∑ s ps × × ∑ iN Ψs AN−1 A(s,iN ) A ′ BN−1 B(s)iN 88 B′ . Now, using lemma 1 one can invert the direction of the classical communication in the last round, obtaining ρ′ A′ B′ = ∑ s ps × × ∑ iN Ψs AN−1 Ã(s)iN && A′ BN−1 B(s,iN ) B ′ for a suitable test { Ã(s)iN } and suitable channels B(s,iN ). Now, since both the (N − 1)-th and the N -th tests are performed by Alice, they can be merged into a single test, thus reducing the original LOCC protocol to an LOCC protocol with N − 1 rounds. Iterating this argument for N−1 times one nally obtains a one-way protocol.  In quantum theory, the Lo-Popescu theorem provides the foundation for the resource theory of pure-state entanglement. Having the operational version of this result will be crucial for our study of the relation between entanglement and thermodynamics. Before entering into that, however, we need to put into place a suitable resource theory of purity, which will provide the basis for our thermodynamic considerations. V. THE RESOURCE THEORY OF PURITY A. A resource theory of dynamical control Consider the scenario where a closed system A undergoes a reversible dynamics governed by some parameters under the experimenter's control. For example, system A could be a charged particle moving in an electric eld, whose intensity and direction can be tuned in order to obtain a desired trajectory. In general, the experimenter may not have full control and the actual values of the parameters may uctuate randomly. As a result, the evolution of the system will be described by a Random Reversible (RaRe) channel, that is a channel R of the form A R A = ∑ i∈X pi A U (i) A , where { U (i) | i ∈ X } is a set of reversible transformations and {pi}i∈X is their probability distribution. Assuming that the system remains closed during the whole evolution, RaRe channels are the most general transformations the experimenter can implement. An important question in all problems of control is whether a given input state can be driven to a target state using the allowed dynamics. With respect to this task, an input state is more valuable than another if the set of target states that can be reached from the former contains the set of target states that can be reached from the latter. In our model, this idea leads to the following denition. Denition 5 (More controllable states) Given two states ρ and ρ′ of system A, we say that ρ is more controllable than ρ′, denoted by ρ  ρ′, if ρ′ can be obtained from ρ via a RaRe channel. This denition appeared independently in an earlier work by Müller and Masanes [64], where the authors explored the use of two-level systems as indicators of spatial directions (cf. denition 8 in the appendix). In this paper we propose to consider it as the starting point for an axiomatic theory of thermodynamics. Denition 5 ts into the general framework of resource theories [85], with RaRe channels playing the role of free operations. Note that at this level of generality there are no free states: since the experimenter can only control the evolution, every state is regarded as a resource. Physically, this is in agreement with the fact that the input state in a control problem is not chosen by the experimenterfor example, it can be a thermal state or the ground state of an unperturbed Hamiltonian. As it is always the case in resource theories, the relation  is reexive and transitive, i.e. it is a preorder. Moreover, since the tensor product of two RaRe channels is a RaRe channel, the relation  is stable under tensor products, namely ρ⊗σ  ρ′⊗σ′ whenever ρ  ρ′ and σ  σ′. For nite systems (i.e. systems with nite-dimensional state space), Müller and Masanes [64] showed the additional property ρ  σ , σ  ρ =⇒ ρ = Uσ , (9) for some reversible transformation U . In other words, two states that are equally controllable can only dier by a reversible transformation. B. From dynamical control to purity There is a close relation between the controllability of a state and its purity. For example, a state that is more controllable than a pure state must also be pure. Proposition 3 If ψ ∈ St (A) is a pure state and ρ ∈ St (A) is more controllable than ψ, then ρ must be pure. Specically, ρ = Uψ for some reversible channel U . Proof. Since ψ is pure, the condition ∑ i pi U (i)ρ = ψ implies that U (i)ρ = ψ for every i, meaning that 9 ρ = V(i)ψ, where V(i) is the inverse of U (i). Proposition 2 then guarantees that ρ is pure.  In other words, pure states can be reached only from pure states. A natural question is whether every state can be reached from some pure state. The answer is positive in quantum theory and in a large class of theories. Nevertheless, counterexamples exist that prevent an easy identication of the resource theory of dynamical control with a resource theory of purity. This fact is illustrated in the following Example 3 Consider a system with the state space depicted in Fig. 1(a). In this case, there are only two reversible transformations, namely the identity and the reection around the vertical symmetry axis. As a consequence, there is no way to obtain the mixed states on the two vertical sides by applying a RaRe channel to a pure state. These states represent a valuable resource, even though they are not pure. Since some mixed states are a resource, the resource theory of dynamical control cannot be viewed as a resource theory of purity. As a second example, consider instead a system whose state space is a half-disk, like in Fig. 1(b). Also in this case there are only two reversible transformations (the identity and the reection around the vertical axis). However, now every mixed state can be generated from some pure state via a RaRe channel. The state space can be foliated into horizontal segments generated by pure states under the action of RaRe channels. As a result, the pure states are the most useful resources and one can interpret the relation  as a way to compare the degree of purity of dierent states. Nevertheless, pure states on dierent segments are inequivalent resources. In this case there are dierent, inequivalent classes of pure states: purity is not the only relevant resource into play. Finally, consider a system with a square state space, like in Fig. 1(c) and suppose that all the symmetry transformations in the dihedral group D4 are allowed reversible transformations. In this case, all the pure states are equivalent under reversible transformations and every mixed state can be obtained by applying a RaRe channel to a xed pure state. Here, the resource theory of dynamical control becomes a fulledged resource theory of purity. The above examples show that not every operational theory supports a sensible resource theory of purity. Motivated by the examples, we put forward the following denition: Denition 6 A theory of purity is a resource theory of dynamical control where every state ρ can be compared with at least one pure state. The theory is called canonical if every pure state is comparable to any other pure state. In this paper we will focus on canonical theories of purity. (a) Example of state space leading to a theory of dynamical control that cannot be interpreted as a theory of purity. Due to the shape of the state space, the reversible transformations can only be the identity and the reection around the vertical axis. The states on the vertical sides are maximally controllable (and therefore, maximally resourceful) even though they are not pure. (b) Example of state space leading to a non-canonical theory of purity. In this case, maximal purity is equivalent to maximal resourcefulness: only the pure states are maximally controllable. However, some pure states are inequivalent resources, meaning that purity is not the only resource into play. (c) Example of state space compatible with a canonical theory of purity. Here the set of maximally controllable states coincides with the set of pure states, and, in addition, all pure states are equivalent resources. Figure 1. Three dierent examples of theories of dynamical control. 10 Proposition 4 The following are equivalent: 1. the theory is a canonical theory of purity 2. for every system A, the group of reversible channels acts transitively on the set of pure states. 3. for every system A, there exists at least one state that is more controllable than every state. The proof is provided in Appendix B. Combining all the statements of proposition 4, one can see that in a canonical theory of purity every pure state is more controllable than any other state. Starting from Hardy's work [43], the transitivity of the action of reversible channels on pure states has featured in a number of axiomatizations of quantum theory, either directly as an axiom [9092] or indirectly as a consequence of an axiom, as in the case of the Purication axiom [48]. Proposition 4 provides a new motivation for this axiom, now identied as a necessary and sucient condition for a well-behaved theory of purity. In a canonical theory of purity, we say that ρ is purer than ρ′ if ρ  ρ′ and we adopt the notation ρ pur ρ′. In this case, we also say that ρ′ is more mixed than ρ, denoted by ρ′ mix ρ. When ρ mix ρ′ and ρ′ mix ρ we say that ρ and ρ′ are equally mixed, denoted by ρ 'mix ρ′. Clearly, every two states that dier by a reversible channel are equally mixed. The converse is true for nite dimensional systems, thanks to Eq. (9). C. Maximally mixed states We say that a state χ ∈ St (A) is maximally mixed if it satises the property ∀ρ ∈ St (A) : ρ mix χ =⇒ ρ = χ . Maximally mixed states can be characterized as the states that are invariant under all reversible channels: Proposition 5 A state χ ∈ St (A) is maximally mixed if and only if it is invariant, i.e. if and only if χ = Uχ for every reversible channel U : A→ A. We omit the proof, which is straightforward. Note that maximally mixed states do not exist in every theory: for example, innite-dimensional quantum systems have no maximally mixed density operator, i.e. no traceclass operator that is invariant under the action of the full unitary group. For nite-dimensional canonical theories, however, the maximally mixed state exists and is unique under the standard assumption of compactness of the state space [47, 91]. In this case, the state χ is not only a maximal element, but also the maximum of the relation mix, namely, χ mix ρ ∀ρ ∈ St (A) . (10) Figure 2. Mixedness relation for the state space of a square bit: the vertices of the octagon represent the states that can be reached from a given state ρ via reversible transformations. Their convex hull is the set of states that are more mixed than ρ. Note that it contains the invariant state χ, which can be characterized as the maximally mixed state. This is in analogy with the quantum case, where the maximally mixed state is given by the density matrix χ = I/d, where I is the identity operator on the system's Hilbert space and d is the Hilbert space dimension. Another example of nite-dimensional canonical theory is provided by the square bit: Example 4 Consider a system whose state space is a square, as in Fig. 1(c) and pick a generic (mixed) state ρ. The states that are more mixed than ρ are obtained by applying all possible reversible transformations to ρ (i.e. all the elements of the dihedral group D4) and taking the convex hull of the orbit. The set of all states that are more mixed than ρ is an octagon, depicted in blue in Fig. 2. All the vertexes of the octagon are equally mixed. The centre of the square is the maximally state χ, the unique invariant state of the system. VI. ENTANGLEMENT-THERMODYNAMICS DUALITY In quantum theory, it is well known that the ordering of pure bipartite states according to the degree of entanglement is equivalent to the ordering of their marginals according to the degree of mixedness [3134, 93]. In this section we will prove the validity of this equivalence based only on rst principles. A. Purication In order to establish the desired duality, we consider theories that satisfy the Purication Principle [47, 48]. Let us briey summarize its content. We say that a state ρ ∈ St (A) has a purication if there exists a system B 11 and a pure state Ψ ∈ PurSt (A⊗ B) (the purication) such that Ψ A = ρ A B Tr We say that the purication is essentially unique if every other purication Ψ′ with the same purifying system B satises the condition Ψ′ A B = Ψ A B U B , (11) for some reversible transformation U : B → B. With these denitions, the Purication Principle can be phrased as Axiom 3 (Purication [47, 48]) Every state has a purication. Every purication is essentially unique. Purication has a number of important consequences. First of all, it implies that the group of reversible transformations acts transitively on the set of pure states: Proposition 6 (Transitivity) For every system B and every pair of pure states ψ,ψ′ ∈ PurSt (B) there exists a reversible channel U : B→ B such that ψ′ = Uψ. The existence of a reversible transformation connecting ψ and ψ′ is a consequence of the essential uniqueness of purication [Eq. (11)], in the special case A = I [47]. Since all pure states are equivalent under reversible transformations, every theory with purication gives rise to a canonical theory of purity, in the sense of denition 6. One could take this fact as a further indication that Purication is a good starting point for a well-behaved thermodynamics. Another important consequence of Purication is the existence of entanglement: Proposition 7 (Existence of entangled states) For every pair of systems A and B, a pure state of A ⊗ B is entangled if and only if its marginal on system A is mixed. Proof. Let us denote the pure bipartite state by Ψ. If Ψ is not entangled, then it must be a product of two pure states, say Ψ = α ⊗ β. Clearly, this implies that the marginal on system A is pure. Conversely, suppose that the marginal of Ψ on system A is pure and denote it by α. Then, for every pure state β′ ∈ PurSt (B), the product state Ψ′ = α ⊗ β′ is pure, thanks to Purity Preservation. Now, Ψ and Ψ′ are two purications of α. By the essential uniqueness of purication, one must have Ψ = (IA ⊗ UB) Ψ′ for some reversible transformation UB acting on system B. Hence, we have Ψ = α⊗ β, with β = UBβ′.  Finally, Purication implies the steering property [2, 94], stating that every ensemble decomposition of a given state can be generated by a measurement on the purifying system: Proposition 8 (Steering) Let ρ be a state of system A and let Ψ ∈ PurSt (A⊗ B) be a purication of ρ. For every ensemble of states {ρi}i∈X such that ∑ i ρi = ρ, there exists a measurement {bi}i∈X on the purifying system B such that the following relation holds ρi A = Ψ A B bi ∀i ∈ X . See theorem 6 of Ref. [47] for the proof. The steering property will turn out to be essential in establishing the duality between entanglement and thermodynamics. B. One-way protocols transforming pure states into pure states The operational Lo-Popescu theorem guarantees that every LOCC protocol acting on a pure bipartite input state can be simulated by a one-way protocol. Purication buys us an extra bonus: not only is the protocol one-way, but also all the conditional operations are reversible. Lemma 2 Let Ψ and Ψ′ be pure states of A ⊗ B. Under the validity of Purication and Purity Preservation, every one-way protocol transforming Ψ into Ψ′ can be simulated by a one-way protocol where all conditional operations are reversible. Proof. Suppose that Ψ can be transformed into Ψ′ via a one-way protocol where Alice performs a test {Ai}i∈X and Bob performs a channel B(i) conditional on the outcome i. By denition, we have ∑ i Ψ A Ai && A B B(i) B = Ψ′ A B . Since Ψ′ is pure, this implies that there exists a probability distribution {pi} such that Ψ A Ai && A B B(i) B = pi Ψ′ A B (12) for every outcome i. Now, without loss of generality each transformation Ai can be assumed to be pure (if not, one can always decompose it into pure transformations, thanks to the pure decomposition property). Then, Purity Preservation guarantees that the normalized state Ψi dened by Ψi A B := p−1i Ψ A Ai A B (13) is pure. With this denition, Eq. (12) becomes Ψi A B B(i) B = Ψ′ A B . 12 Tracing out system B on both sides one obtains Ψ′ A B Tr = Ψi A B B(i) B Tr = Ψi A B Tr , the second equality coming from the normalization of the channel B(i) (cf. proposition 1). Hence, the pure states Ψi and Ψ ′ have the same marginal on A. By the essential uniqueness of Purication, they must dier by a reversible channel U (i) on the purifying system B, namely Ψi A B U (i) B = Ψ′ A B (14) In conclusion, we obtained Ψ A Ai && A B B(i) B = = pi Ψ′ A B = pi Ψi A B U (i) B = Ψ A Ai && A B U (i) B , where we have used Eqs. (12), (14), and (13). In other words, the initial protocol can be simulated by a protocol where Alice performs the test {Ai} and Bob performs the reversible transformation U (i) conditionally on the outcome i.  The reduction to one-way protocols with reversible operations is the key to connect the resource theory of entanglement with the resource theory of purity. The duality between these two resource theories will be established in the next subsections. C. The more entangled a pure state, the more mixed its marginals We start by proving one direction of the entanglementthermodynamics duality: if a state is more entangled than another, then the marginals of the former are more mixed than the marginals of the latter: Lemma 3 Let Ψ and Ψ′ be two pure states of system A⊗B and let ρ, ρ′ and σ, σ′ be their marginals on system A and B, respectively. Under the validity of Purication, Purity Preservation, and Local Exchangeability, if Ψ is more entangled than Ψ′, then ρ (σ) is more mixed than ρ′ (σ′). Proof. By the operational Lo-Popescu theorem, we know that there exists a one-way protocol transforming Ψ into Ψ′. Moreover, thanks to Purication, the conditional operations in the protocol can be chosen to be reversible (lemma 2). Let us choose a protocol with classical communication from Alice to Bob, in which Alice performs the test {Ai}i∈X and Bob performs the reversible transformation U (i) conditional on the outcome i. Since Ψ′ is pure, we must have Ψ A Ai && A B U (i) B = pi Ψ′ A B ∀i ∈ X , where {pi} is a suitable probability distribution. Denoting by V(i) the inverse of U (i) and applying it on both sides of the equation, we obtain Ψ A Ai A B = pi Ψ′ A B V(i) B . Summing over all outcomes the equality becomes Ψ A A A B = Ψ′ A B R B , (15) with A := ∑ iAi and R := ∑ i piV(i). Finally, we obtain σ B = Ψ A Tr B = Ψ A A A Tr B = Ψ′ A Tr B R B = σ′ B R B , where we have used the normalization of channel A in the second equality and Eq. (15) in the third. Since R is a RaRe channel by construction, we have proved that σ is more mixed than σ′. The fact that ρ is more mixed than ρ′ can be proved by the same argument, starting from a one-way protocol with classical communication from Bob to Alice and with reversible operations on Alice's side.  The relation between degree of entanglement of a pure state and degree of mixedness of its marginals holds not only for bipartite states, but also for multipartite states. Indeed, suppose that Ψ and Ψ′ are two pure states of system A1⊗A2⊗* * *⊗AN and that Ψ is more entangled 13 than Ψ′, in the sense that there exists a (multipartite) LOCC protocol converting Ψ into Ψ′. For every subset S ⊂ {1, . . . , N} one can dene A := ⊗ n 6∈S An and B := ⊗ n∈S An and apply lemma 3. As a result, one obtains that the marginals of Ψ are more mixed than the marginals of Ψ′ on every subsystem. D. The more mixed a state, the more entangled its purication We now prove the converse direction of the entanglement-thermodynamics duality: if a state is more mixed than another, then its purication is more entangled. Remarkably, the proof of this fact requires only the validity of Purication. Lemma 4 Let ρ and ρ′ be two states of system A and let Ψ (Ψ′) be a purication of ρ (ρ′), with purifying system B. Under the validity of Purication, if ρ is more mixed than ρ′, then Ψ is more entangled than Ψ′. Proof. By hypothesis, one has ρ′ A R A = ρ A for some RaRe channel R := ∑ i pi U (i). Let us dene the bipartite state Θ as Θ A B := Ψ′ A R A B ≡ ∑ i pi Ψ′ A U (i) A B (16) By construction, Θ is an extension of ρ: indeed, one has Θ A B Tr = Ψ′ A R A B Tr = ρ′ A R A = ρ A . Let us take a purication of Θ, say Γ ∈ PurSt (A⊗ B⊗ C). Clearly, Γ is a purication of ρ, since one has Γ A B Tr C Tr = Θ A B Tr = ρ A . Then, the essential uniqueness of purication implies that Γ must be of the form Γ A B C = Ψ A B U B γ C C , (17) for some reversible transformation U and some pure state γ. In other words, Ψ can be transformed into Γ by local operations on Bob's side. Now, Eq. (16) implies that the states{ pi ( U (i) ⊗ IB ) Ψ } i∈X are an ensemble decomposition of Θ. Hence, the steering property (proposition 8) implies that there exists a measurement {ci}i∈X on C such that pi Ψ′ A Ui A B = Γ A B C ci ∀i ∈ X . (18) Combining Eqs. (17) and (18), we obtain the desired result. Ψ′ A B = ∑ i∈X Ψ A U (i) A ′ B Bi 88 B′ , where {Bi}i∈X is the test dened by B Bi B := B U B γ C C ci In conclusion, if the marginal state of Ψ is more mixed than the marginal state of Ψ′, then Ψ can be converted into Ψ′ by a one-way LOCC protocol.  E. The duality Combining lemmas 3 and 4 we identify the degree of entanglement of a pure bipartite state with the degree of mixedness of its marginals: Theorem 2 (Entanglement-thermodynamics duality) Let Ψ and Ψ be two pure states of system A ⊗ B and let ρ, ρ′ and σ, σ′ be their marginals on system A and B, respectively. Under the validity of Purication, Purity Preservation, and Local Exchangeability, the following statements are equivalent: 1. Ψ is more entangled than Ψ′ 2. ρ is more mixed than ρ′ 3. σ is more mixed than σ′. 14 Proof. The implications 1 =⇒ 2 and 1 =⇒ 3 follow from lemma 3 and require the validity of all the three axioms. The implications 2 =⇒ 1 and 3 =⇒ 1 follow from lemma 4 and require only the validity of Purication.  The duality can be illustrated by the commutative diagrams Ψ TrB  LOCC // Ψ′ TrB  ρ ρ′ RaReoo Ψ TrA  LOCC // Ψ′ TrA  σ σ′ RaReoo and is implemented operationally by discarding one of the component systems. Another illustration of the duality is via the diagram Ψ Ψ′ LOCCoo ρ purification OO RaRe // ρ′ purification OO . Here the map implementing the duality is (a choice of) purication. Such a map cannot be realized as a physical operation [47]. Instead, it corresponds to the theoretical operation of modelling mixed states as marginals of pure states. VII. CONSEQUENCES OF THE DUALITY In this section we discuss the simplest consequences of the entanglement-thermodynamics duality, including the relation between maximally mixed and maximally entangled states, as well as a link between information erasure and generation of entanglement. From now on, the axioms used to derive the duality will be treated as standing assumptions and will not be written explicitly in the statement of the results. A. Equivalence under local reversible transformations Let us start from the easiest consequence of the duality: Corollary 1 Let Ψ and Ψ′ be two states of system A⊗B, with A nite-dimensional. Then, Ψ and Ψ′ are equally entangled if and only if they are equivalent under local reversible transformations, namely Ψ′ = (U ⊗ V) Ψ where U and V are reversible transformations acting on A and B, respectively. This result, proved in appendix C, guarantees that the equivalence classes under the entanglement relation have a simple structure, inherited from the reversible dynamics allowed by the theory. For nite systems, pure bipartite entanglement is completely characterized by the quotient of the set of pure states under local reversible transformations. B. Duality for states on dierent systems Theorem 2 concerns the convertibility of states of the same system. To generalize it to arbitrary systems, it is enough to observe that the tensor product with local pure states does not change the degree of entanglement: for arbitrary pure states Ψ, α′, and β′ of systems A⊗B, A′, and B′ one has Ψ 'ent α′ ⊗Ψ⊗ β′ , (19) relative to the bipartition (A′ ⊗A)⊗ (B⊗ B′). As a consequence, one has the equivalence Ψ ent Ψ′ ⇐⇒ α′ ⊗Ψ⊗ β′ ent α⊗Ψ′ ⊗ β for arbitrary pure states α, α′, β, β′ of A,A′,B,B′, respectively. This fact leads directly to the generalization of the duality to states of dierent systems: Corollary 2 Let Ψ and Ψ′ be two pure states of systems A ⊗ B and A′ ⊗ B′, respectively, and let ρ, ρ′, σ and σ′ be their marginals on system A,A′,B and B′ respectively. Under the validity of Purication, Purity Preservation, and Local Exchangeability, the following statements are equivalent: 1. Ψ is more entangled than Ψ′ 2. ρ ⊗ α′ is more mixed than α ⊗ ρ′ for every pair of pure states α ∈ PurSt (A) and α′ ∈ PurSt (A′). 3. σ ⊗ β′ is more mixed than β ⊗ σ′ for every pair of pure states β ∈ PurSt (B) and β′ ∈ PurSt (B′). The duality is now implemented by the operation of discarding systems and preparing pure states, as illustrated by the commutative diagrams Ψ TrB⊗α′  LOCC // Ψ′ α⊗TrB  ρ⊗ α′ α⊗ ρ′RaReoo Ψ TrA⊗β′  LOCC // Ψ′ β⊗TrA  σ ⊗ β′ β ⊗ σ′RaReoo At this point, a cautionary remark is in order. Inspired by Eq. (19) one may be tempted compare the degree of mixedness of states of dierent systems, by postulating the relation ρ 'mix ρ⊗ α′ (20) 15 for arbitrary states ρ and arbitrary pure states α′. The appeal of this choice is that the duality would maintain the simple form Ψ ent Ψ′ ⇐⇒ ρ mix ρ′, even for states of dierent systems. However, Eq. (20) would trivialize the resource theory of purity: as a special case, it would imply the relation 1 'mix α′ for a generic pure state α′, meaning that pure states can be freely generated. Since in a canonical theory of purity pure states are the most resourceful, having pure states for free would mean having every state for free. Another way to compare states of dierent systems according to their degree of purity would be to postulate the relation ρ 'pur ρ⊗ χB, (21) where χB is the maximally mixed state of system B (assuming that such a state exists). The rationale for this choice would be that χB is the minimum-resource state in the resource theory of purity and therefore one may want to consider it as free. This choice would not trivialize the resource theory of purity, but would break the duality with the resource theory of entanglement. Indeed, Eq. (21) would imply as a special case 1 'pur χB, meaning that maximally mixed states can be freely generated from nothing. Clearly, this is not the case for their purications, which are entangled and cannot be generated freely by LOCC. In summary, refraining from comparing mixed states on dierent systems seems to be the best way to approach the duality between the resource theory of entanglement and the resource theory of purity. C. Measures of mixedness and measures of entanglement The duality provides the foundation for the denition of quantitative measures of entanglement. In every resource theory, one can dene measures of resourcefulness, by introducing functions that are non-increasing under the set of free operations [85]. In the resource theory of entanglement, this leads to the notion of entanglement monotones: Denition 7 An entanglement monotone for system A ⊗ B is a function E : St (A⊗ B) → R satisfying the condition E (ρ) ≥ E (ρ′) ∀ρ, ρ′ ∈ St (A⊗ B) , ρ ent ρ′ . More generally, one may want to compare entangled states on dierent systems. In this case, an entanglement monotone E is a family of functions E = {EA⊗B | A,B ∈ Sys} satisfying the condition EA⊗B (ρ) ≥ EA′⊗B′ (ρ′) for every pair of states ρ ∈ St (A⊗ B) and ρ′ ∈ St (A′ ⊗ B′) satisfying ρ ent ρ′. Similarly, one can dene monotones in the resource theory of purity: Denition 8 A purity monotone for system A is a function P : St (A)→ R satisfying the condition P (ρ) ≥ P (ρ′) ∀ρ, ρ′ ∈ St (A) , ρ pur ρ′ . Recall that in our resource theory of purity we abstain from comparing states on dierent systems, for the reasons discussed in the end of the previous subsection. Purity monotones give a further indication that the denition of purity in terms of random reversible channels is a sensible one: indeed, if we restrict our attention to the classical case, the notion of purity monotone introduced here coincides with the canonical notion of Schurconvex function in the theory of majorization [65] (see appendix D). Schur-convex functions are the key tool to construct entropies and other measures of mixedness in classical statistical mechanics, and have applications in a number of diverse elds [95]. Constructing purity monotones is fairly easy. For example, every function that is convex and invariant under reversible transformations is a purity monotone: Proposition 9 Let P : St (A) → R be a function satisfying 1. convexity: P ( ∑ i piρi) ≤ ∑ i piP (ρi) for every set of states {ρi} and for every probability distribution {pi}, and 2. invariance under reversible transformations: P (Uρ) = P (ρ) for every state ρ and for every reversible transformation U . Then, P is a purity monotone. The proof is elementary and is presented in the appendix D for the convenience of the reader. We highlight that the above proposition is the natural extension of a wellknown result in majorization theory, namely that every convex function that is symmetric in its variables is automatically Schur-convex [65]. Again, it is worth highlighting the perfect match of the operational notions discussed here with the canonical results about majorization. Using proposition 9, one can construct purity monotones aplenty: for every convex function f : R → R one can dene the f -purity Pf : St (A)→ R as Pf (ρ) := sup a pure ∑ x∈X f (px) px := (ax|ρ) , where the supremum runs over all pure measurements a = {ax}x∈X and over all outcome spaces X. It is easy to verify that every f -purity is convex and invariant under reversible transformations, and therefore is a purity monotone. In the special case of the function f (x) = x log x, one has Pf (ρ) = −H (ρ) , (22) 16 where H is the measurement entropy [9698], namely the minimum over all pure measurements of the Shannon entropy of the probability distribution resulting from the measurement. In the case of f (x) = x2 one obtains instead an generalized notion of purity, which in the quantum case coincides with the usual notion P (ρ) = Tr ( ρ2 ) . Another way to construct purity monotones is by using norms on the state space: thanks to proposition 9, every norm that is invariant under reversible transformations leads to a purity monotone. For systems that have an invariant state, an easy example is given by the operational distance P (ρ) := 1 2 ‖ρ− χ‖ , where ‖*‖ is the operational norm, dened as ‖δ‖ := supa0∈Eff(A) (a|δ) − infa1∈Eff(A) (a1|δ) [47], and χ is the invariant state. Another example of purity monotone induced by a norm is the notion of purity introduced in Refs. [76, 77], based on the Schatten 2-norm. In the quantum case, this notion of purity coincides with the ordinary notion P (ρ) = Tr ( ρ2 ) and therefore coincides with the f -purity with f (x) = x2. It is not a priori clear whether the 2-norm purity coincides with the x2-purity for more general theories. Now, thanks to the duality we can turn every purity monotone into an entanglement monotone. Given a purity monotone P : St (A) → R, we can dene the pure state entanglement monotone E : PurSt (A⊗ B)→ R as E (Ψ) := g [P (ρ)] , ρ = TrB Ψ , (23) where g : R → R is any monotonically decreasing function (f (x) ≤ f (y) for x > y). Here the monotonically decreasing behaviour of g implements the reversing of arrows in the duality. Furthermore, if the functions P and f have suitable convexity properties, the entanglement monotone can be extended from pure states to arbitrary states using the convex roof construction [22, 78, 79]. Specically, one has the following Corollary 3 Let P : St (A) → R be a convex purity monotone, g : R → R be a concave, monotonically decreasing function, and E : PurSt (A⊗ B)→ R be the pure state entanglement monotone dened in Eq. (23). Then, the convex roof extension E : St (A⊗ B)→ R dened by E (Σ) := inf {pi,Ψi}∑ i piΨi=Σ ∑ i piE (Ψi) is a convex entanglement monotone. The proof is the same as in the quantum case [78]. An easy way to generate entanglement measures is to pick an f -purity and take its negative, which corresponds to the choice g (x) = −x. For example, the choice f (x) = x log x leads to a generalization of the entanglement of formation [22] to all theories satisfying the duality. D. Maximally entangled states As a consequence of the duality, there exists a correspondence between maximally mixed and maximally entangled states, the latter being dened as follows Denition 9 A pure state Φ of system A ⊗ B is maximally entangled if no other pure state of A ⊗ B is more entangled than Φ, except for the states that are equivalent to Φ under local reversible transformationsi.e. if for every Ψ ∈ PurSt (A⊗ B) one has Ψ ent Φ =⇒ Ψ = (U ⊗ V) Φ for some reversible transformations U : A → A and V : B→ B. Theorem 2 directly implies the following. Corollary 4 The purication of a maximally mixed state is maximally entangled. Proof. Suppose that Ψ ∈ PurSt (A⊗ B) is more entangled than Φ, where Φ is a purication of the maximally mixed state of system A (assuming such a state exists for system A). By theorem 2, the marginal of Ψ on system A, denoted by ρ, must satisfy ρ mix χ. Since χ is maximally mixed, this implies ρ = χ. The uniqueness of purication then implies the condition Ψ = (IA ⊗ VB) Φ for some reversible transformation VB on B.  As noted earlier in the paper, under the standard assumptions of convexity and compactness of the state space, the maximally mixed state is not only a maximal element of the mixedness relation, but also the maximum [cf. Eq. (10)]. Similarly, under the same standard assumptions, it is immediate to obtain that the purication of a maximally mixed state is more entangled than every state, namely Φ ent Σ ∀Σ ∈ St (A⊗ B) . The relation follows directly from theorem 2 when Σ is a pure state and in the general case can be proved by convexity, using the fact that the set of LOCC channels is closed under convex combinations. E. Duality between information erasure and entanglement generation The entanglement-thermodynamics duality establishes a link between the two tasks of erasing information and generating entanglement. By erasing information we mean resetting a mixed state to a xed pure state of the same system [80]. Clearly, erasure is a costly operation in the resource theory of purity: there is no way to transform a non-pure state into a pure state by using only RaRe channels (cf. proposition 3). The dual operation in the resource theory of entanglement is the generation 17 of entangled states from product states. By the duality, the impossibility of erasing information by RaRe channels and the impossibility of generating entanglement by LOCC are one and the same thing. The relation between information erasure and entanglement generation suggests that the cost of erasing a mixed state ρ could be identied with the cost of generating the corresponding entangled state Ψ. For example, one may choose a xed entangled state Φ as a reference unit of entanglement and ask how many copies of Φ are needed to generate Ψ through LOCC operations. The number of entanglement units needed to generate Φ could then be taken as a measure of the cost of erasing ρ. We now explore this idea at the heuristic level, discussing rst a model of erasure and then connecting it with the generation of entanglement. Suppose that erasure is implemented by i) performing a pure measurement, ii) writing down the outcome on a classical register, iii) conditionally on the outcome, performing a reversible transformation that brings the system to a xed pure state, and nally iv) erasing the classical register. Of course, this model assumes that some systems described by the theory can act as classical registers, meaning that they have perfectly distinguishable pure states. Assuming the validity of Landauer's principle at the classical level, the cost of erasing the classical register is then equal to the Shannon entropy of the outcomes multiplied by kBT , kB and T being the Boltzmann constant and the temperature, respectively [80]. Minimizing the entropy over all possible measurements at step i), one would then obtain the measurement entropy, as dened in Eq. (22). Hence, the minimum cost for erasing ρ is given by kBTH (ρ). Note that this heuristic conclusion implicitly assumes that the operations i-iii) can be performed for free. This is the case in quantum theory, where i) the measurement attaining minimum Shannon entropy is projective and the overall transformation associated to it is a random unitary channel, ii) the measurement outcome can be written down via a unitary operation on the system and the classical register, and iii) the state of the system can be reset via another joint unitary operation. In physical theories other than quantum and classical theories, however, the request that the operations i-iii) are free is non-trivial and would need to be further analysed in terms of physical axioms. Suppose now that we want to erase an unknown state ρ. Since the state is unknown, the relevant quantity here is the worst case cost of erasure, dened as the supremum of the cost over all possible states. Since the measurement entropy is monotone under the mixedness relation, in nite dimensions the supremum is attained for the maximally mixed state χ, so that the worst case cost of erasure is given by kBTH (χ). This result allows us to make an interesting connection with the work by Brunner et al [99], who considered the task of erasure in general probabilistic theories. Specically, they considered the number of states that can be perfectly distinguished by a measurement and adopted the logarithm of this number as a measure of the cost of erasure. In their analysis they considered probabilistic quantum and classical theories, as well as alternative theories with hypercube state spaces, wherein measurements can distinguish at most two states. In all these theories the logarithm of the dimension is exactly equal to the measurement entropy of the maximally mixed state. Thanks to this fact, erasure cost dened in Ref. [99] coincides with the worst case erasure cost dened above. It is an open question whether the two denitions coincide in all canonical theories of purity, and, if not, which conditions are needed for the two denitions to coincide. Let us now look at erasure from the dual point of view. Since the duality inverts the order, the dual of an erasure protocol consisting of operations i),i),iii) and iv) will be an entanglement generation protocol consisting of the dual operations in the opposite order iv),iii),ii) and i). The dual of iv) is an operation that generates a purication of the classical register. The duals of the free operations i-iii) are LOCC operations that convert the initial entangled state into the state Ψ. Now, by the duality we can measure the cost of generating Ψ in terms of the measurement entropy H (ρ). But what is the operational meaning of this choice? Again, the duality suggests an answer. Classically, the Shannon entropy can be interpreted as the asymptotic rate at which random bits can be extracted from a given probability distribution. Dually, the inverse relation must hold between the purications: Referring to the purication of a random bit as an ebit, we have that the Shannon entropy is the number of ebits needed asymptotically to generate the purication of a given probability distribution by LOCC. Minimizing over all probability distributions one can characterize the measurement entropy as the minimum number of ebits needed to asymptotically generate the state Ψ by LOCC. Although partly based on heuristics, the argument provides already a good illustration of the far reaching consequences of the entanglementthermodynamics duality, which allowed to identify the cost of erasing a state with the number of ebits needed to generate its purication. F. Entropy sinks and entanglement reservoirs Let us consider now the task of erasure assisted by a catalyst, namely a system C whose state remains unaffected by the erasure operation. In this case, the operation of erasure transforms the product state ρ ⊗ γ ∈ St (A⊗ C) into the state α0 ⊗ γ for some pure state α0 ∈ PurSt (A). By duality, it is immediate to see that catalyst-assisted erasure is equivalent to catalyst-assisted entanglement generation: Corollary 5 Let Ψ and Γ be two pure states of systems A ⊗ B and C ⊗ D, respectively, and let ρ and γ be their marginals on systems A and C, respectively. Then, the following are equivalent 18 1. ρ can be erased by a RaRe channel using γ as a catalyst 2. Ψ can be generated by a LOCC channel using Γ as a catalyst. If such catalysts existed, they would behave like entropy sinks, which absorb mixed states without becoming more mixed, or like entanglement reservoirs, from which entanglement can be borrowed indenitely. For example, suppose that Ψ can be generated freely using Γ as a catalyst. Then, every measure of pure state entanglement E that it is additive on product states would have to satisfy the relation E (Γ) ≥ E (Ψ) + E (Γ) . Assuming that the measure assigns a strictly positive value to every entangled state. the above relation can only be satised if E (Γ) = +∞. In other words, the catalyst's state must be innitely entangled. It is then natural to ask whether the impossibility of innitely entangled/innitely mixed states follow from our axioms. The answer is armative in the nite-dimensional case, but counterexamples exist in innite dimensions. For the nite-dimensional case, we have the following Proposition 10 Let A⊗ C be a nite system. Then, it is impossible to erase a mixed state of A using C as a catalyst. The proof is presented in appendix E. In the innite-dimensional case, a heuristic counterexample is as follows: imagine a scenario where system C consists of an innite chain of identical systems, with each system in the chain equivalent to A, namely C =⊗ i∈Z Ai, Ai ' A. Loosely speaking, we may choose the state γ to be the product state γ = γL ⊗ γR, where γL is a state on the left side of the chain, consisting of innite copies of the pure state α0, and γR is a state on the right of the chain, consisting of innite copies of the mixed state ρ. It is then natural to expect that the state ρ⊗γ can be reversibly transformed into the state α0⊗γ, simply by swapping system A with the rst system on the left of the chain and subsequently shifting the whole chain by one place to the right. The above counterexample is heuristic, because the notion of innite tensor product is not dened in our formalism. However, innite tensor products can be treated rigorously, at least in the quantum case, and the intuition of our counterexample turns out to be correct. In the dual task of catalytic entanglement generation, the rigorous version of this argument was presented by Keyl, Matsui, Schlingemann, and Werner [100], who exhibited an example of innite spin chain from which arbitrarily large amounts of entanglement can be generated for free. VIII. SYMMETRIC PURIFICATION As we observed in the previous section, the ability to erase information/generate entanglement for free has undesirable consequences for the resource theories of purity and entanglement. These scenarios can be excluded at the level of rst principles, by postulating the following Axiom 4 (No Entropy Sinks) Random reversible dynamics cannot achieve erasure, even with the assistance of a catalyst. In addition to being a requirement for a sensible resource theory of entanglement, Axiom 4 has a surprising twist: in the context of the other axioms, it implies that Local Exchangeability is equivalent to the existence of symmetric purications, dened as follows Denition 10 Let ρ be a state of system A and let Ψ be a pure state of A ⊗ A. We say that Ψ is a symmetric purication of ρ if Ψ A = ρ A A Tr and A Tr Ψ A = ρ A . This denition leads us to an upgraded version of the Purication axiom: Axiom 5 (Symmetric Purication) Every state has a symmetric purication. Every purication is essentially unique. The key result is then given by the following: Theorem 3 In a causal theory satisfying Purity Preservation and No Entropy Sinks, the following axioms are equivalent: 1. Local Exchangeability and Purication 2. Symmetric Purication. The proof is presented in appendix F. This result identies Purity Preservation and Symmetric Purication as the key axioms at the foundation of the entanglement-thermodynamics duality and, ultimately, as strong candidates for a reconstruction of quantum thermodynamics from rst principles. We stress that the axiom No Entropy Sinks is needed only for innitedimensional systems, while for nite dimensional systems its validity can be proved (cf. proposition 10). One of the bonuses of Symmetric Purication is that the marginals of a pure state are equivalent, in the following sense: Proposition 11 Let Ψ be a pure state of system A⊗ B and let ρA and ρB be its marginals on systems A and B, respectively. Then, one has ρB B α A = ρA A U B β B A , 19 where α and β are pure states of A and B, respectively, and U is a reversible transformation. The proof can be found in appendix G. As a consequence of this result, we have that the states ρA ⊗ β and α⊗ ρB have the same purity, for every possible purity monotone. Equivalently, we can say that the two marginal states have the same mixedness, for every measure of mixedness. IX. CONCLUSIONS While entanglement is not a uniquely quantum feature, the remarkable ways in which it is intertwined with thermodynamics appear to be far more specic. Understanding these links at the level of basic principles is expected to reveal new clues to the foundations of quantum theory, as well as to the foundations of thermodynamics. With this motivation in mind, we set out to search for the roots of the relation between entanglement and entropy, adopting an operational, theory-independent approach. We attacked the problem from what is arguably the most primitive link: the duality between the resource theory of entanglement (where the free transformations are those achievable by two spatially separated agents via local operations and classical communication) and the resource theory of purity (where the free transformations are those achievable by an agent who has limited control on the dynamics of the system). By the duality, every free operation in the resource theory of purity admits an equivalent description as a free operation in the resource theory of entanglement. The duality leads to an identication between measures of mixedness (i.e. lack of purity) and measures of pure bipartite entanglement. Under suitable conditions, the latter can be extended to measures of mixed-state entanglement. Let us elaborate on the implications of our results. Our reconstruction of the entanglement-thermodynamics duality hints at a simple, physically motivated idea: the idea that nature should admit a fundamental level of description where all states are pure, all dynamics reversible, and all measurements pure. Two of our axioms clearly express this requirement: i) Purication is equivalent to the existence of a pure and reversible level of description for states and channels, and ii) Purity Preservation ensures that such a description remains consistent when dierent, possibly non-deterministic processes are connected. The remaining axiom, Local Exchangeability, appeared to be slightly more mysterious at rst sight. Nevertheless, the duality claried its signicance: for every nite system (and more generally, for every system where all states have nite entanglement), Local Exchangeability is equivalent to the existence of symmetric puricationsthat is, purications where the purifying system is a twin of the puried system. In summary, all the axioms used to derive the duality are requirements about the possibility to come up with an ideal description of the world, satisfying simple requirements of purity, reversibility, and symmetry. A natural question is whether these axioms single out quantum theory. Strictly speaking, the answer cannot be armative, because all our axioms are satised by also by the variant of quantum theory based on real Hilbert spaces [86, 87]. Hence, the actual question is whether real and complex quantum theory are the only two examples of theories satisfying the axioms. While an armative answer is logically possible, we do not expect it to be the case. The reason is that our axioms do not place any restriction on measurements: for example, our proof of the duality does not require one to assume an operational analogue of Naimark's theorem, stating that every measurement can be implemented as an ideal measurement at the fundamental level. Overall, in the general purication philosophy of our work, it is natural to expect that full characterization of quantum theory will require at least one requirement about the existence of a class of ideal measurements that generalize projective quantum measurements. Naimark-type axioms for measurements has been recently put forward by one of the authors [101, 102], for the purpose of deriving bounds on quantum nonlocality and contextuality. A natural development of our work is to investigate the consequences of these axioms on the entanglement-thermodynamics duality. From such development, we expect a solution to most of the outstanding questions arising from the present paper. Among them, an important one concerns the asymptotic limit of many identical copies: in quantum theory, it is well known that asymptotically there exists a unique measure of pure bipartite entanglement [36, 37, 42]namely, the von Neumann entropy. Under which conditions does this result hold in the general probabilistic scenario? In order to address the question, the most promising route is to add an axiom about ideal measurements, which, combined with Purication and Purity Preservation, guarantees that mixed states can be diagonalized, that is, decomposed as random mixtures of perfectly distinguishable pure states [103]. The consequences of this diagonalization result for the entanglement-thermodynamics duality will be discussed in a forthcoming paper [104]. Another open question concerns the physical interpretation of the duality. So far, the duality has been presented as a one-to-one correspondence between two operational scenarios, one involving a single agent with limited control and the other involving two spatially separated agents performing LOCC operations on a pure state. Inspired by the paradigm of the fundamentally pure and reversible description, one may be tempted to regard the pure-state side of the duality as more fundamental. To push this idea further, one would have to consider a completely-coherent version of the LOCC operations, where Alice's and Bob's operations are replaced by control-reversible channels [105]. Restricting the global dynamics of composite systems to these completely-coherent evolutions appears as a promising 20 direction in the programme of deriving eective thermodynamic features from the reversible dynamics of a composite system [6975, 77]. While it is early to predict all the applications of the completely-coherent paradigm, our work provides the basic theoretical framework and motivation to embark in this new exploration. ACKNOWLEDGMENTS We are grateful to M Piani for a useful discussion during QIP 2015 and to M Müller for drawing our attention to the notion of group majorization in Ref. [64]. We are also grateful to the anonymous referee of NJP who suggested to add a discussion on the cost of erasure. This work is supported by the Foundational Questions Institute through the large grant The fundamental principles of information dynamics (FQXi-RFP3-1325), by the National Natural Science Foundation of China through Grants 11450110096, 11350110207, and by the 1000 Youth Fellowship Program of China. GC acknowledges the hospitality of the Simons Center for the Theory of Computation and of Perimeter Institute, where part of this work was done. Research at Perimeter Institute for Theoretical Physics is supported in part by the Government of Canada through NSERC and by the Province of Ontario through MRI. The research by CMS has been supported by a scholarship from Fondazione Ing. Aldo Gini and by the Chinese Government Scholarship. Appendix A: Proof of proposition 2 Suppose that Uψ can be written as a coarse-graining as follows Uψ = ∑ i ρi . (A1) To prove that the state Uψ is pure, now we show that the renement {ρi} is trivial. Indeed, by applying U−1 to both sides of Eq. (A1), we obtain ψ = ∑ i U−1ρi . Since ψ is pure, this implies that U−1ρi = pi ψ for some probability distribution {pi}. Hence, by applying U on both sides, we obtain ρi = piUψ. This concludes the proof that {ρi} is a trivial renement of Uψ and, therefore, that Uψ is pure. The converse can be proved in the same way by applying the reversible channel U−1 to Uψ.  Appendix B: Proof of proposition 4 1 =⇒ 2. If the theory is canonical, every pure state ψ ∈ PurSt (A) is comparable to every pure state φ ∈ PurSt (A). Suppose, for instance that ψ is more controllable than φ. Then, by proposition 3, there exists a reversible channel U such that ψ = Uφ, thus showing that the group of reversible transformations acts transitively on the set of pure states. 2 =⇒ 3. Every state ρ can be expressed as a convex combination of the form ρ = ∑ i piφi, where {pi} is a probability distribution allowed by the theory and φi are pure states. Now, suppose that ψ is a pure state. For every i, by picking a reversible channel U (i) such that U (i)ψ = φi, one obtains the relation ρ = ∑ i piU (i)ψ, meaning that ψ is more controllable than ρ. Since ρ is generic, we conclude that ψ is more controllable than every state. 3 =⇒ 1. Suppose there exists a state ρ that is more controllable than every state. Specically, ρ must be more controllable than every pure state ψ. By proposition 3, ρ must be pure and there exists a reversible transformation U such that ρ = Uψ. This shows that ψ is more controllable than ρ, which, in turn is more controllable than any state. Hence ψ is more controllable than every state, and, specically, more controllable than every pure state. Since ψ is generic, the theory is canonical.  Appendix C: Proof of corollary 1 Clearly, if Ψ and Ψ′ are equivalent under local reversible transformations, then they are equally entangled. To prove the converse, note that, by the duality, the marginals of Ψ and Ψ′ on system A, denoted by ρ and ρ′, are equally mixed. Since A is nite-dimensional, this implies ρ′ = Uρ for some reversible transformation U . As a consequence, Ψ′ and (U ⊗ IB) Ψ are two purications of ρ′. By the essential uniqueness of purication, we then have Ψ′ = (U ⊗ V) Ψ for some reversible transformation V.  Appendix D: Purity monotones and Schur-convex functions In classical probability theory, states are probability distributions over nite sets and reversible transformations are permutation matrices, of the form Πmn = δm,π(n) where π is a permutation. According to denition 5, a state p = ( p1 . . . pn )T is purer than another state p′ = ( p′1 . . . p ′ n )T if p′ = ∑ i qiΠip , where {qi} are probabilities and {Πi} are permutation matrices. This notion is equivalent to the classical notion of majorization: in short, p is purer than p′ if and only 21 if the vector p′ is majorized by the vector p. Hence, a function P : Rn → R is a purity monotone i it is a Schur-convex function. The parallel between purity monotones and Schurconvex functions continues with proposition 9. In classical probability theory, a function P : Rn → R is symmetric if P (x) = P (Πx) for every permutation matrix Π. A well-known result is that every convex symmetric function is Schur-convex [65]. Our proposition 9 is the operational version of this statement: every convex function P : St (A) → R satisfying the condition P (ρ) = P (Uρ) for every reversible transformation U is a purity monotone. The proof is elementary. Suppose that ρ is purer than ρ′, namely ρ′ = ∑ i piUiρ. Then, one has P (ρ′) ≤ ∑ i piP (Uiρ) = ∑ i piP (ρ) = P (ρ) , having used convexity in the rst inequality and invariance in the rst equality. In the classical case, this (trivial) proof provides a simpler proof of the well-known result for convex symmetric functions (cf. C.2 of Ref. [65]). Appendix E: Proof of proposition 10 Let us prove the contrapositive: if a state can be erased using system C as a catalyst, then the state must be pure. Specically, suppose that ρ ∈ St (A) can be erased, with the catalyst in the state γ ∈ St (C). By denition, this means that ρ ⊗ γ mix α0 ⊗ γ for some pure state α0. On the other hand, one has ρ mix α0, which implies ρ ⊗ γ mix α0 ⊗ γhence ρ ⊗ γ and α0 ⊗ γ are equally mixed. Since A ⊗ C is a nite system, this means that there exists a reversible transformation U such that U (α0 ⊗ γ) = ρ⊗ γ , (E1) [cf. Eq. (9)]. Now, let us choose a basis for StR (A⊗ C), such that the reversible transformations are represented by orthogonal matrices. Following Ref. [64], we consider the Schatten 2-norm associated with this basis, dened as ‖v‖2 := √√√√DA⊗C∑ i=1 v2i , where v is a generic element of the vector space StR (A⊗ C) and (vi)DA⊗Ci=1 are the expansion coecients of v. With this denition, we have the relation ‖α0 ⊗ γ‖2 = ‖U (α0 ⊗ γ)‖2 = ‖ρ⊗ γ‖2 ≤ ∑ i pi ‖αi ⊗ γ‖2 = ‖α0 ⊗ γ‖2 , the rst and fourth lines following from the invariance of the 2-norm under orthogonal transformations, the second line following from Eq. (E1), and the third line following from the triangular inequality, having chosen a convex decomposition of ρ as ρ = ∑ i piαi for suitable pure states {αi}. In conclusion, we must have the equality∥∥∥∥∥∑ i pi (αi ⊗ γ) ∥∥∥∥∥ 2 = ∑ i pi ‖αi ⊗ γ‖2 . In order for this to be possible, all the terms αi⊗γ must be proportional to one another: in other words, ρ must be pure.  Appendix F: Proof of theorem 3 Let us show that the rst set of axioms (Purity Preservation, Local Exchangeability, Purication, and No Entropy Sinks) implies the second (Purity Preservation, Symmetric Purication, No Entropy Sinks). To this purpose, it is sucient to show that every state has a symmetric purication. This can be done as follows: Let ρ be a state of system A, and let Ψ ∈ PurSt (A⊗ B) be one of its purications. By Local Exchangeability there exist two channels C and D such that Ψ A C B B D A = Ψ A SWAP B B A Now, in a theory satisfying Purication, every channel can be realized through a reversible transformation acting on the system and on an environment, initially in a pure state and nally discarded [47]. Specically, channel C can be realized as A C B := η E E ′ Tr A U B , where E and E′ are suitable systems, U is a reversible transformation, and η is a pure state. Similarly, channel D can be realized as B D A := B V A φ F F ′ Tr . (F1) Inserting the realizations of C and D in the local exchangeability condition, we obtain η E E ′ Tr Ψ A U B B V A φ F F ′ Tr = Ψ A SWAP B B A . 22 Since the pure state on the l.h.s. is the purication of a pure state, by proposition 7, it must be of the product form η E E ′ Ψ A U B B V A φ F F ′ = Γ E′ Ψ A SWAP B B A F′ for some pure state Γ. The above equation shows that the state Γ can be generated by LOCC using Ψ as a catalyst. By the No Entropy Sinks requirement, we have that Γ must be a product state, i.e. Γ = η′ ⊗ φ′ for two pure states η′ and φ′. Hence, the local exchangeability condition becomes η E E ′ Ψ A U B B V A φ F F ′ = η′ E ′ Ψ A SWAP B B A φ′ F ′ or, equivalently, η E Ψ A B V A φ F F ′ = η′ E ′ E Ψ A SWAP B U−1 A B A φ′ F ′ Discarding system E one obtains Ψ A B V A φ F F ′ = Σ A A φ′ F ′ for some suitable state Σ. Since the l.h.s. is a pure state, Σ must be a pure state. Now, discarding system F′ and the second copy of system A, and recalling Eq. (F1), we have Ψ A B D A Tr = Σ A A Tr . Recalling that D is a channel, and therefore TrAD = TrB (proposition 1), we conclude that ρ A = Ψ A B Tr = Σ A A Tr . Hence, the marginal of Σ on the rst copy of system A is equal to ρ. By the same reasoning, we can prove that the marginal on the second copy of system A is also equal to ρ. Hence, Σ is a symmetric purication of ρ. Since ρ is arbitrary, we conclude that every state has a symmetric purication, unique up to local reversible transformations. Conversely, we now show that the second set of axioms implies the rst. To this purpose, we must show the validity of Local Exchangeability. Clearly, symmetric purications are locally exchangeable: indeed, if Ψ is a symmetric purication one has Ψ A SWAP A A A Tr = ρ A = Ψ A A Tr . and, by the essential uniqueness of purication, Ψ A SWAP A A A = Ψ A A U A . for some reversible channel U . Since all purications of ρ are equivalent to Ψ under local operations and since Ψ is locally exchangeable, we conclude that every purication of ρ is locally exchangeable [by the same argument used in Eq. (4)]. This proves Local Exchangeability.  Appendix G: Proof of proposition 11 Let Φ ∈ PurSt (A⊗A) be a symmetric purication of ρA and let α (β) be a xed, but otherwise arbitrary, pure state of A (B). By the uniqueness of purication, there must exist a reversible transformation U such that Ψ A B α A = Φ A A U B β B A . Discarding the rst copy of system A and using the fact that Φ is a symmetric purication we obtain the desired result ρB B α A = ρA A U B β B A .  23 [1] A. Einstein, B. Podolsky, and N. Rosen, Phys. Rev. 47, 777 (1935). [2] E. Schrödinger, Mathematical Proceedings of the Cambridge Philosophical Society 31, 555 (1935). [3] J. S. Bell, Speakable and Unspeakable in Quantum Mechanics: Collected Papers on Quantum Philosophy, 2nd ed. (Cambridge University Press, Cambridge, 2004). [4] J. F. Clauser, M. A. Horne, A. Shimony, and R. A. Holt, Phys. Rev. Lett. 23, 880 (1969). [5] H. Buhrman, R. Cleve, S. Massar, and R. de Wolf, Rev. Mod. Phys. 82, 665 (2010). [6] N. Brunner, D. Cavalcanti, S. Pironio, V. Scarani, and S. Wehner, Rev. Mod. Phys. 86, 419 (2014). [7] D. M. Greenberger, M. A. Horne, and A. Zeilinger, in Bell's Theorem, Quantum Theory, and Conceptions of the Universe, edited by M. Kafatos (Kluwer, Dordrecht, 1989) pp. 6972. [8] N. D. Mermin, Phys. Rev. Lett. 65, 3373 (1990). [9] A. Peres, Physics Letters A 151, 107 (1990). [10] L. Hardy, Phys. Rev. Lett. 68, 2981 (1992). [11] L. Hardy, Phys. Rev. Lett. 71, 1665 (1993). [12] S. Popescu, Nat Phys 6, 151 (2010). [13] S. Abramsky and L. Hardy, Phys. Rev. A 85, 062114 (2012). [14] R. Horodecki, P. Horodecki, M. Horodecki, and K. Horodecki, Rev. Mod. Phys. 81, 865 (2009). [15] C. H. Bennett, G. Brassard, C. Crépeau, R. Jozsa, A. Peres, and W. K. Wootters, Phys. Rev. Lett. 70, 1895 (1993). [16] C. H. Bennett and S. J. Wiesner, Phys. Rev. Lett. 69, 2881 (1992). [17] C. H. Bennett and G. Brassard, in Proceedings of the IEEE International Conference on Computers, Systems, and Signal Processing (1984) pp. 175179. [18] A. K. Ekert, Phys. Rev. Lett. 67, 661 (1991). [19] D. Mayers and A. Yao, Quantum Info. Comput. 4, 273 (2004). [20] B. W. Reichardt, F. Unger, and U. Vazirani, arXiv:1209.0448 (2012). [21] C. H. Bennett, G. Brassard, S. Popescu, B. Schumacher, J. A. Smolin, and W. K. Wootters, Phys. Rev. Lett. 76, 722 (1996). [22] C. H. Bennett, D. P. DiVincenzo, J. A. Smolin, and W. K. Wootters, Phys. Rev. A 54, 3824 (1996). [23] H.-K. Lo and S. Popescu, Phys. Rev. A 63, 022301 (2001). [24] A. Peres, Phys. Rev. Lett. 77, 1413 (1996). [25] M. Horodecki, P. Horodecki, and R. Horodecki, Physics Letters A 223, 1 (1996). [26] P. Horodecki, Physics Letters A 232, 333 (1997). [27] M. Horodecki, P. Horodecki, and R. Horodecki, Phys. Rev. Lett. 80, 5239 (1998). [28] C. H. Bennett, D. P. DiVincenzo, T. Mor, P. W. Shor, J. A. Smolin, and B. M. Terhal, Phys. Rev. Lett. 82, 5385 (1999). [29] W. Dür, G. Vidal, and J. I. Cirac, Phys. Rev. A 62, 062314 (2000). [30] J. I. de Vicente, C. Spee, and B. Kraus, Phys. Rev. Lett. 111, 110502 (2013). [31] M. A. Nielsen, Phys. Rev. Lett. 83, 436 (1999). [32] A. Uhlmann, Wiss. Z. Karl-Marx-Univ. Leipzig 20, 633 (1971). [33] A. Uhlmann, Wiss. Z. Karl-Marx-Univ. Leipzig 21, 421 (1972). [34] A. Uhlmann, Wiss. Z. Karl-Marx-Univ. Leipzig 22, 139 (1973). [35] I. Bengtsson and K. Zyczkowski, Geometry of Quantum States: An Introduction to Quantum Entanglement (Cambridge University Press, Cambridge, 2006). [36] S. Popescu and D. Rohrlich, Phys. Rev. A 56, R3319 (1997). [37] V. Vedral and M. B. Plenio, Phys. Rev. A 57, 1619 (1998). [38] M. Horodecki, J. Oppenheim, and R. Horodecki, Phys. Rev. Lett. 89, 240403 (2002). [39] F. G. S. L. Brandão and M. B. Plenio, Nat Phys 4, 873 (2008). [40] M. Horodecki, Nat Phys 4, 833 (2008). [41] W. Thirring, Quantum Mathematical Physics: Atoms, Molecules and Large Systems (Springer, Berlin Heidelberg, 2002). [42] C. H. Bennett, H. J. Bernstein, S. Popescu, and B. Schumacher, Phys. Rev. A 53, 2046 (1996). [43] L. Hardy, arXiv quant-ph/0101012 (2001). [44] J. Barrett, Phys. Rev. A 75, 032304 (2007). [45] G. M. D'Ariano, in Philosophy of Quantum Information and Entanglement, edited by A. Bokulich and G. Jaeger (Cambridge University Press, Cambridge, 2010) pp. 85 126. [46] H. Barnum, J. Barrett, M. Leifer, and A. Wilce, Phys. Rev. Lett. 99, 240501 (2007). [47] G. Chiribella, G. M. D'Ariano, and P. Perinotti, Phys. Rev. A 81, 062348 (2010). [48] G. Chiribella, G. M. D'Ariano, and P. Perinotti, Phys. Rev. A 84, 012311 (2011). [49] H. Barnum and A. Wilce, Electronic Notes in Theoretical Computer Science 270, 3 (2011), proceedings of the Joint 5th International Workshop on Quantum Physics and Logic and 4th Workshop on Developments in Computational Models (QPL/DCM 2008). [50] L. Hardy, in Deep Beauty: Understanding the Quantum World through Mathematical Innovation, edited by H. Halvorson (Cambridge University Press, Cambridge, 2011) pp. 409442. [51] L. Hardy, arXiv:1104.2066 (2011). [52] G. Chiribella, in Proceedings 11th workshop on Quantum Physics and Logic, Kyoto, Japan, 4-6th June 2014, Electronic Proceedings in Theoretical Computer Science, Vol. 172, edited by B. Coecke, I. Hasuo, and P. Panangaden (Open Publishing Association, 2014) pp. 114. [53] H. Barnum, J. Barrett, M. Leifer, and A. Wilce, in Proceedings of Symposia in Applied Mathematics, Vol. 71 (2012) pp. 2548. [54] W. van Dam, Nonlocality & communication complexity, Ph.D. thesis, Faculty of Physical Sciences, University of Oxford (1999). [55] G. Brassard, H. Buhrman, N. Linden, A. A. Méthot, A. Tapp, and F. Unger, Phys. Rev. Lett. 96, 250401 (2006). 24 [56] N. Linden, S. Popescu, A. J. Short, and A. Winter, Phys. Rev. Lett. 99, 180502 (2007). [57] M. Pawaowski, T. Paterek, D. Kaszlikowski, V. Scarani, A. Winter, and M. ukowski, Nature 461, 1101 (2009). [58] M. L. Almeida, J.-D. Bancal, N. Brunner, A. Acín, N. Gisin, and S. Pironio, Physical review letters 104, 230404 (2010). [59] T. Fritz, A. B. Sainz, R. Augusiak, J. B. Brask, R. Chaves, A. Leverrier, and A. Acín, Nature communications 4 (2013). [60] B. Tsirelson, Hadronic J. Suppl. 8, 329 (1993). [61] S. Popescu and D. Rohrlich, Foundations of Physics 24, 379 (1994). [62] J. Barrett, N. Linden, S. Massar, S. Pironio, S. Popescu, and D. Roberts, Phys. Rev. A 71, 022101 (2005). [63] N. S. Jones and L. Masanes, Phys. Rev. A 72, 052312 (2005). [64] M. P. Müller and L. Masanes, New Journal of Physics 15, 053040 (2013). [65] A. W. Marshall, I. Olkin, and B. C. Arnold, Inequalities: Theory of Majorization and Its Applications, Springer Series in Statistics (Springer, New York, 2011). [66] M. Horodecki, P. Horodecki, and J. Oppenheim, Phys. Rev. A 67, 062104 (2003). [67] G. Chiribella, G. M. D'Ariano, and P. Perinotti, Entropy 14, 1877 (2012). [68] G. Chiribella and C. M. Scandolo, EPJ Web of Conferences 95, 03003 (2015). [69] S. Popescu, A. J. Short, and A. Winter, Nat Phys 2, 754 (2006). [70] E. Lubkin and T. Lubkin, International Journal of Theoretical Physics 32, 933 (1993). [71] J. Gemmer, A. Otte, and G. Mahler, Phys. Rev. Lett. 86, 1927 (2001). [72] S. Goldstein, J. L. Lebowitz, R. Tumulka, and N. Zanghì, Phys. Rev. Lett. 96, 050403 (2006). [73] J. Gemmer, M. Michel, and G. Mahler, Quantum Thermodynamics: Emergence of Thermodynamic Behavior Within Composite Quantum Systems, Lecture Notes in Physics, Vol. 784 (Springer Verlag, Heidelberg, 2009). [74] M. P. Müller, D. Gross, and J. Eisert, Communications in Mathematical Physics 303, 785 (2011). [75] F. G. S. L. Brandão and M. Cramer, arXiv:1502.03263 (2015). [76] M. P. Müller, O. C. Dahlsten, and V. Vedral, Communications in Mathematical Physics 316, 441 (2012). [77] M. P. Müller, J. Oppenheim, and O. C. O. Dahlsten, Journal of High Energy Physics 2012, 116 (2012), 10.1007/JHEP09(2012)116. [78] G. Vidal, Journal of Modern Optics 47, 355 (2000). [79] M. B. Plenio and S. S. Virmani, in Quantum Information and Coherence, Scottish Graduate Series, edited by E. Andersson and P. Öhberg (Springer International Publishing, 2014) pp. 173209. [80] R. Landauer, IBM J. Res. Dev. 5, 183 (1961). [81] S. Abramsky and B. Coecke, in Handbook of Quantum Logic and Quantum Structures: Quantum Logic, edited by K. Engesser, D. M. Gabbay, and D. Lehmann (Elsevier, 2008) pp. 261324. [82] B. Coecke, AIP Conference Proceedings 810, 81 (2006). [83] B. Coecke, Contemporary Physics 51, 59 (2010). [84] P. Selinger, in New Structures for Physics, Lecture Notes in Physics, Vol. 813, edited by B. Coecke (Springer, Berlin, Heidelberg, 2011) pp. 289356. [85] B. Coecke, T. Fritz, and R. W. Spekkens, arXiv:1409.5531 (2014). [86] E. C. G. Stueckelberg, Helv. Phys. Acta 33, 727 (1960). [87] W. K. Wootters, in Complexity, entropy and the physics of information, edited by W. H. Zurek (Addison-Wesley, Boston, 1990) pp. 3946. [88] L. Hardy and W. K. Wootters, Foundations of Physics 42, 454 (2012). [89] A. J. Short and J. Barrett, New Journal of Physics 12, 033034 (2010). [90] B. Daki¢ and C. Brukner, in Deep Beauty: Understanding the Quantum World through Mathematical Innovation, edited by H. Halvorson (Cambridge University Press, Cambridge, 2011) pp. 365392. [91] L. Masanes and M. P. Müller, New Journal of Physics 13, 063001 (2011). [92] L. Masanes, M. P. Müller, R. Augusiak, and D. PérezGarcía, Proceedings of the National Academy of Sciences 110, 16373 (2013). [93] M. A. Nielsen and I. L. Chuang, Quantum computation and quantum information (Cambridge University Press, Cambridge, 2010). [94] H. Barnum, C. P. Gaebler, and A. Wilce, Foundations of Physics 43, 1411 (2013). [95] B. C. Arnold, Statist. Sci. 22, 407 (2007). [96] H. Barnum, J. Barrett, L. Orlo Clark, M. Leifer, R. Spekkens, N. Stepanik, A. Wilce, and R. Wilke, New Journal of Physics 12, 033024 (2010). [97] A. J. Short and S. Wehner, New Journal of Physics 12, 033023 (2010). [98] G. Kimura, K. Nuida, and H. Imai, Reports on Mathematical Physics 66, 175 (2010). [99] N. Brunner, M. Kaplan, A. Leverrier, and P. Skrzypczyk, New Journal of Physics 16, 123050 (2014). [100] M. Keyl, T. Matsui, D. Schlingemann, and R. F. Werner, Reviews in Mathematical Physics 18, 935 (2006). [101] G. Chiribella and X. Yuan, arXiv:1404.3348 (2014). [102] G. Chiribella and X. Yuan, arXiv:1504.02395 (2015). [103] G. Chiribella and C. M. Scandolo, arXiv:1506.00380 (2015). [104] G. Chiribella and C. M. Scandolo, In preparation. [105] A. Harrow, Phys. Rev. Lett. 92, 097902 (2004).