1 Introduction: Completing the Realist Program

The ontic interpretation of quantum mechanics, that the wavefunction is an element of reality, is physically appealing because experience with other physical theories is that it is possible to describe real objects and fields mathematically, and it seems odd that only quantum mechanics should be different. A number of paradoxes can be avoided by adopting the epistemic interpretation—that the wavefunction describes a state of knowledge—but such a dramatic departure from the norm has not been found necessary in other fields of physics.

However, the realist approach must address multiple challenges, including EPR correlations, indeterminacy, and state reduction upon measurement (including as challenges both the departure from unitary evolution and the arrival at a state with a single eigenvalue of the relevant operator). It ought to be possible to complete the realist program by providing an unambiguous mathematical description of the evolution of the wavefunction that deals with these challenges in a reasonable way and applies in all circumstances (that is, with or without measurement). To be specific, there must exist a mathematical relationship among variables and functionals thereof (a “wave equation,” for short) that is valid at all times. It should describe the phenomena described by the wave equation in standard QM, but also the measurement-induced transition from a superposition to a single (collapsed) eigenstate.

We believe that such a project is possible, and will demonstrate below that a plausible framework can be constructed and shown to have many of the required properties. For some such properties we have not yet been successful in constructing a proof, but will outline steps that may lead to one.

We argue that the wave equation must be nonlinear, because it must describe the evolution of a system from a superposition of states into a single one of those states. A linear equation cannot do that, even if the measurement apparatus is explicitly accounted for, because the previous sentence still applies with “system” defined as the union of the system being measured and the apparatus.

The equation must also be nonlocal in space. We see this because for certain measurements (e.g., the energy of a nonrelativistic bound particle), the condition of being in a single eigenstate (or set of degenerate eigenstates) of the corresponding operator is not a local property, that is, that condition is not determined by the value of the wavefunction, its derivatives, and other relevant parameters (e.g, the Schrödinger potential V(x)) at a single point in space (see Appendix A for details). Of course, relativistic frame invariance requires that the wave equation be nonlocal in time as well.

At this point it is apparent that we need more than a new interpretation; since we require a nonlinear, nonlocal wave equation, the formalism of QM must be modified. In fact, the nonlocality requirement cannot be met by any differential equation; the most natural nonlocal form would be an integral or integrodifferential equation (IDE).

We will also require a “realistic” theory to be time-symmetric. Again, the best argument for this is that other fundamental physical theories are symmetric in time.Footnote 1 Even in conventional QM, the unitary evolution of the wavefunction is time-symmetric; the part that is not is the measurement-induced collapse, that is, the process that is least understood. Symmetry in time implies retrocausality, roughly speaking, the idea that effects may precede their causes in time. (To be more precise, in a retrocausal theory the solution at t is found as a function of, inter alia, variables at \(t'>t\).)

Multiple retrocausal approaches have been explored by various researchers, but the tack we will take is that Nature finds a stationary point of the action

$$\begin{aligned} S \equiv \int _{t_{i}}^{t_{f}}{\mathrm {d}}t \, L \equiv \int {\mathrm {d}}^{4} x \, \mathcal {L} \end{aligned}$$
(1)

where L is the Lagrangian and \(\mathcal {L}\) the Lagrangian density, subject to initial and final conditions at \(t_{i}\) and \(t_{f}\) (and of course spatial boundary conditions as well). The solution corresponding to that extremum describes the system at times and positions within the enclosed region of spacetime.Footnote 2 This approach, called the Lagrangian Schema by Wharton [2] (see also [3]), is immediately compatible with Hamilton’s principle, which is the variational approach used by Schwinger [1] to construct quantum field theory. In an ideal measurement, we expect that the experimental preparation provides the initial conditions at \(t_{i}\), and the experimental features that allow the result to be read off provide the final conditions at \(t_{f}\). The final conditions will be less stringent than the initial conditions, and may not be constraining at all, since the system is normally prepared in a single specified state but may end up in any one of multiple states. Accordingly, our variational analysis will employ a “natural boundary condition” (NBC) at \(t_{f}\). [4] Note that this asymmetry in treatment of initial and final times is due to the experimental design (prepare the experiment at \(t_{i}\) and read the result at \(t_{f}\)), not time asymmetry in the theory.

Our exposition will be nonrelativistic, and we will find it useful to limit our attention to a single reference frame and in particular to dependence of variables on time in that frame. Nevertheless, the variational principle is inherently compatible with special relativity [1], and we expect that it can readily be expressed in a relativistically covariant formulation. Relativistic Lagrangians routinely appear in quantum field theory, [5] and the four-dimensional integration of the Lagrangian density to produce the action is an operation invariant under change of reference frames.

Motivated yet again by a desire to have QM behave as other physical theories do, we propose that genuinely random variables are not root causes of physical phenomena, but underlying physical mechanisms may depend on uncontrolled or poorly-understood parameters such that a variety of outcomes are possible from experiments that appear to be identically prepared. The empirical fact that different outcomes may result from apparently identically-prepared repetitions of the same measurement proves that the boundary conditions underconstrain the problem. At \(t_{i}\), the system is prepared in a given quantum state or superposition of states, but that description falls short of a specification of every possible variable, as it must by quantum complementarity [3, 6]. Possibly the full specification of the initial (ontological) state consists of the given quantum state, plus additional parameters unknown to or uncontrolled by the experimenter. Similarly, the measurement of a quantum state at \(t_{f}\) does not determine the ontological state at that moment; in fact, the measurement readout at \(t_{f}\) is a weaker constraint than the preparation at \(t_{i}\), because it determines only the variable (operator) measured but not its value (eigenvalue). This indeterminacy provides the opportunity for uncontrolled variables to participate in determining the result of the measurement. It ought to be possible to identify such parameters and explain indeterminacy and Born’s rule in terms of their variation.

It may be argued that such uncontrolled parameters constitute “hidden variables” disallowed by Bell’s Theorem. In fact, Bell’s Theorem does not apply because the theory will be nonlocal, and also because the proof of the theorem relies on assumptions contradicted by retrocausality [7].

Ultimately, the frequencies of the different outcomes possible from a single experimental definition must reflect the distribution of uncontrolled parameter values in a large number of realizations of the experiment. The observed fact that that those frequencies may be described by a simple law (Born’s rule) presumably reflects a likelihood of an approximately universal distribution of the parameter values in experiments that are likely to be conducted. For instance, suppose the experimental result depends on a high-frequency sinusoidal function of some experimental time. If in an ensemble of experimental realizations that time is naturally distributed over a range large compared to the period of oscillation, it is an excellent approximation to say that that time has a uniform distribution over a single period. In this way, it is reasonable to expect that naturally-occurring ensembles of experiments may be found reliably to give outcome frequencies satisfying Born’s rule.

Our theoretical development will be based on the variational approach used by Schwinger to construct quantum field theory.Footnote 3 In this approach, the requirement of stationary action leads to a wave equation that is linear and local. We observe that if S is stationary then \(S^{2}\) is also; it would not be wrongFootnote 4 to write the variational principle in terms of \(S^{2}\). The squared expression is naturally written as a double integral over some time interval of an integrand involving two instances of the wavefunction and two of the Lagrangian operator. This provides us an opportunity to introduce both nonlinearity and nonlocality by modifying which operator operates on which wavefunction.

This is promising by the following reasoning. In its simplest form, a variational principle leads to a differential equation, the Euler equation [4]. In order to describe a nonlocal process, we desire an integrodifferential equation (IDE). An IDE can be found as a stationary point of a more complex functional—a double rather than a single integral in time—but only if that functional does not factor into a product of single integrals. (See Appendix B for mathematical details.) For this reason we cannot produce the nonlocal (or, for that matter, the nonlinear) behavior we seek by simply replacing S by \(S^{2}\) in the variational principle, but we can if we modify the \(S^{2}\) expression so that it does not factor. (For clarity, we will henceforth refer to the resulting functional as the “superaction.”) Since we are trying to describe the measurement process, the modification is naturally limited to those spacetime locations where the measured system and the apparatus interact. Consequently, in any region of spacetime that does not include such interactions, the superaction factors to \(S^{2}\), and the wave equation derivable from the original action S applies. Our approach thus will agree with standard QM in the absence of a measurement.

Another characteristic we require for a realistic theory is that we must not treat a measurement differently from other physical processes, or a measurement apparatus differently from other systems, for the simple reason that we do not know how to convincingly justify such distinctions. We will define a measurement apparatus simply as another system that interacts with the system being measured in such a way that their states will tend to agree. When a measurement is being performed, there must be an interaction between the two subsystems; we will adopt a Lagrangian term quadratic in the discrepancy between the eigenvalues of the two systems. This is the simplest form that takes an extreme value when the eigenvalues agree. We may regard the quadratic form simply as proof of principle, in the hope that more complicated interactions will have similar behavior; but we will venture a stronger prediction, that more complicated interactions will be similar because their Taylor expansions about the point of equality of eigenvalues will begin with a quadratic term.

The modifications made to the (squared) action denote some form of coupling between the two instances of the wavefunction (or/and the two instances of the interaction term in the Lagrangian) in the integrand. At this point it is a largely mathematical construct, but one provocative interpretation is based on the observation that S is time-reversal invariant; it is agnostic as to whether it represents a state traveling “forward” or “backward” in time. Then \(S^{2}\) might be taken to represent a forward and a backward wave passing each other without interacting, and the modified form of \(S^{2}\) describes an interaction between them at the time when the measured system and the apparatus interact, which is appropriate on physical grounds. In fact, this interaction sounds like, and may fulfill the functions of, the “handshake” between offer and confirmation wave in Cramer’s Transactional Interpretation [9,10,11]—although our approach differs from Cramer’s in important ways, some of which we will mention below.

We will show that our formalism predicts that (1) in the absence of a measurement, both subsystems evolve according to the usual unitary relations, but (2) when a measurement is made, the pointer state of the apparatus will come to agree with the measured system, and the measured system will end up in a single eigenstate (or superposition of degenerate eigenstates) of the appropriate operator. We believe it should be possible to show that (3) this theory predicts outcomes distributed according to Born’s rule, based on likely distributions of the values of the governing uncontrolled parameters. That proof has so far eluded us, but in Appendix C we sketch out some of the approaches that may contribute to a successful theoretical analysis.

The retrocausal approach provides other advantages, such as a simple explanation of EPR correlations. Correlations between two entangled and now widely separated objects are easily explained because the measurement of one particle has an influence that can travel backward in time to the time and place where the particle became entangled, and then forward to the other particle. [12] In particular, in an EPR experiment, the final conditions constraining the solution will include the measurement apparatus settings (e.g., the choice of axis with respect to which a given detector will measure the spin of a particle or the helicity of a photon), so the solution describing the complete system between \(t_{i}\) and \(t_{f}\) must be consistent with those settings and with the wave equation (which will enforce, e.g., conservation of total spin angular momentum.)

Another paradox that is trivially resolved is Wheeler’s delayed-choice experiment. [13] That experiment involves some mechanism that interacts with the traveling entity (e.g., photon), and that can be set to measure either a particlelike or a wavelike property of that entity. In our picture, that interaction imposes a spatial or/and final boundary condition on the solution. The boundary condition applies when and where the traveling entity interacts with the mechanism, with the result that the solution of the variational principle over the entire relevant spacetime region is consistent with the setting at that time and place, and displays at \(t_{f}\) the particlelike or wavelike property selected. The fact that the experimenter may have chosen the setting after \(t_{i}\) is completely irrelevant.

We expect that many other (real and thought) experiments that are regarded as paradoxical can be explained by our retrocausal variational approach.

We point out other retrocausal approaches that cannot achieve the goals we have set ourselves here. The principal reason for that is that they are interpretations, which by definition leave the formalism unchanged. For instance, the Two-State Vector Formalism of Aharonov, Bergmann, and Lebowitz [14] is a way to describe quantum states and their properties in a time-symmetric, perfectly even-handed way, but cannot possibly make predictions different from those of standard quantum mechanics. [15]

Cramer’s Transactional Interpretation [9,10,11] is also presented as an interpretation consistent with the standard formalism. It describes interesting quantum phenomena in terms of “offer waves” and “confirmation waves,” all of which indeed satisfy the appropriate wave equation from standard QM. Unfortunately, there is no mathematical description of the important events in those processes—for instance, a boundary in spacetime between a region containing only an offer wave and one containing superposed offer and confirmation waves. Since in the standard formalism the wave equation does not describe those events, we suspect that supplying Cramer’s missing mathematical description would necessarily promote his approach from an interpretation to a genuinely new theory, which might even resemble the theory we present in this article.Footnote 5 However, he has not taken that step.

Another approach to quantum reasoning that may be termed retrocausal is Griffiths’s “consistent histories.” [16, 17] It is based on a fundamentally probabilistic description of nature, and so does not satisfy our requirements for a realistic theory. (We point out that this objection also applies to another ontic approach, the collapse theory of Ghirardi, Rimini and Weber [18]—but we reject that approach as well on the grounds that it is not time-symmetric.)

In the next section, we will develop the theory based on a variational principle, generalized so as to result in a nonlocal equation. The subsequent section will discuss the predictions of that equation and compare them to the properties that we have argued must appear in a successful theory. In some cases the agreement will be clear, although it will remain for the future to describe the details of approach to the solution, and to prove that the solution is unique. For Born’s rule, we will argue for the possibility of relevant but uncontrolled parameters, and indicate how the expected output frequencies may follow from their distribution; however, analytic proof or numerical demonstration that our theory yields frequencies consistent with Born’s rule remains to be done. In the last section we will summarize what we have done, discuss new perspectives required by retrocausality and nonlocality, and list some of the next steps to be taken to continue developing these ideas. Appendices present more technical material on how the nonlocality requirement follows from the requirement that a measurement finds only states with a single eigenvalue; an extension of variational calculus used in our analysis of the nonlocal variational principle; and the bulk of the speculative Born’s rule analysis.

2 Theoretical Development

2.1 Model of the Measurement Problem

As the principal issues motivating our theoretical development have to do with quantum measurement, we will consider an idealized model of such a measurement. Suppose that the system is prepared in a known superposition \(\sum _{j} C_{j} \left| \psi _{j} \right\rangle\) of eigenstates of the operator \(\sigma _{op}\) at time \(t_{i}\); that is, the eigenstates are well-defined and the coefficients \(C_{j}(t_{i})\) are known. This superposition is known to be initially stable; \({\dot{C}}_{j}(t_{i})=0\) for all j. The eigenstates themselves must be stable, so \(\sigma _{op}\) commutes with the Hamiltonian. Finally, the stability of an (unperturbed, unmeasured) superposition implies that the system is linear when in isolation, that is, it satisfies a wave equation linear in the wavefunction. This consideration will be seen to constrain the form of possible Lagrangians for the system. We will develop our ideas using a particular simple form, as proof of principle.

During all or part of the interval \([t_{i},t_{f}]\), a measurement apparatus (which we will call system 2) interacts with the measured system (system 1). A requirement for generality of the theory—validity of the properties of “quantum measurements” across all types of measurements—excludes all but the most general description of the measurement apparatus and its interaction with the measured system. We therefore use a minimal description, that the apparatus has a “pointer state” variable \(\sigma ^{2}\), and that it is coupled to the measured variable \(\sigma ^{1}\) of the system. Without loss of generality, we define \(\sigma ^{2}\) so that its value in a successful measurement equals the value of \(\sigma ^{1}\). Then the composite (system \(+\) apparatus) Lagrangian must include an interaction term that depends on both measured and pointer state variables, and attains an extreme (or stationary) value when they are equal. The simplest such term is quadratic in the corresponding operators, that is, proportional to \((\sigma _{op}^{1} - \sigma _{op}^{2})^{2}\).

Note that good experimental design dictates that the combined system (1 and 2) be well isolated in spacetime. Spatial isolation is accomplished by physical isolation or other control of the boundaries of the domain, and temporal isolation by system preparation at \(t_{i}\) and measurement readout at \(t_{f}\). This blocks influences from outside the spacetime region, which is important so that the spacetime integrals in this nonlocal theory can legitimately be limited to the experimental domain.

2.2 Variational Approach

An isolated system (system 1 or system 2, in our case, when they are not interacting), is described by a Lagrangian \(L[\psi ,{\dot{\psi }},t]\), which must satisfy the Euler equation

$$\begin{aligned} 0 = \frac{\partial L}{\partial \psi } - \frac{{\mathrm {d}}}{{\mathrm {d}}t} \frac{\partial L}{\partial {\dot{\psi }}} \end{aligned}$$
(2)

Evidently the requirement that (2) yield a linear wave equation implies that the Lagrangian must be quadratic in \(\psi\) and its time derivative.

2.3 Normal-Mode Expansion: Single System

Since the point of the measurement problem is to describe the evolution of a superposition of eigenstates of a given operator to a single eigenstate, it will simplify matters to define a basis set of such eigenstates. This expansion will be specific to a given inertial reference frame—the frame in which the measurement is performed and described by the above characteristics—because that will simplify the analysis and its comparison to those points. However, as explained above, we expect that the general theory (the form of the action, without dependence on the normal-mode expansion we will use here) will be relativistically appropriate and can be expressed in covariant form.

At any given time t, let \(\left| \psi ^{\ell }_{j} \right\rangle\) be for system \({\ell } = 1\) or 2 an eigenstate of a Hermitian operator \(\sigma _{op}^{\ell }\),

$$\begin{aligned} \sigma _{op}^{\ell } \left| \psi ^{\ell }_{j}(t) \right\rangle = \sigma _{j}^{\ell } \left| \psi ^{\ell }_{j}(t) \right\rangle \end{aligned}$$
(3)

satisfying the applicable spatial BCs, and let those eigenstates form an orthonormal basis for states of system \({\ell }\). Since external fields acting on the system may change during the course of the measurement (perhaps due to the measurement process itself), the eigenvalues and eigenstates are in general functions of time. In many interesting cases they are slowly varying functions of time, and for simplicity we will confine ourselves to the case in which the eigenvalues \(\sigma _{j}^{\ell }\) are constant. We expect that the analysis presented below can be readily generalized to the time-dependent case, for sufficiently slow variation.

We will also require each normal mode \(\left| \psi ^{\ell }_{j} \right\rangle\) to satisfy the Euler equation based on its single-system Lagrangian \(L^{\ell }\). This is possible because as stated above, the operator corresponding to the measured variable commutes with the Hamiltonian. The basis states will be taken to be simultaneous eigenstates of both operators, and eigenstates of the Hamiltonian satisfy the variational principle. Since a basis vector \(\left| \psi _{j}^{\ell }(t) \right\rangle\) was defined to be an eigenstate of the Hamiltonian, it has an energy \(E_{j}^{\ell }\) and a time derivative

$$\begin{aligned} \frac{{\mathrm {d}}}{{\mathrm {d}}t} \left| \psi _{j}^{\ell }(t) \right\rangle = -\frac{{\mathrm {i}}}{\hbar }E_{j}^{\ell } \left| \psi _{j}^{\ell }(t) \right\rangle \end{aligned}$$
(4)

(Schödinger picture). We will also take the energies \(E_{j}^{\ell }\) to be constant; then it follows that

$$\begin{aligned} \left\langle \psi ^{\ell }_{j}(t_{1}) \! \right. \left| \psi ^{\ell }_{k(t_{2})} \right\rangle = \delta _{jk} \, {\mathrm {e}}^{-\frac{{\mathrm {i}}}{\hbar } E_{j}^{\ell }(t_{2}-t_{1})} \end{aligned}$$
(5)

Now if system \({\ell } = 1\) (measured system) or 2 (measurement apparatus) is isolated, its wavefunction can be expanded

$$\begin{aligned} \left| \psi ^{\ell }(t) \right\rangle = \sum _{j} C^{\ell }_{j}(t) \left| \psi ^{\ell }_{j}(t) \right\rangle \end{aligned}$$
(6)

and the usual normalization condition on \(\psi ^{\ell }(t)\) implies

$$\begin{aligned} \sum _{j} |C_{j}^{\ell }(t) |^{2} = 1 \end{aligned}$$
(7)

At present we expect this condition to hold for any t, but in Sect. 2.8 we will argue for removing this constraint.

The action is

$$\begin{aligned} S^{\ell }\equiv & {} \int _{t_{i}}^{t_{f}} {\mathrm {d}}t \, L^{\ell }(t) \nonumber \\= & {} \int _{t_{i}}^{t_{f}} {\mathrm {d}}t \left\langle \psi ^{\ell }(t) \right| L_{op}^{\ell } \left| \psi ^{\ell }(t) \right\rangle \nonumber \\= & {} \sum _{j,k} \int _{t_{i}}^{t_{f}} {\mathrm {d}}t \left\langle \psi ^{\ell }_{j}(t) \right| \, C^{{\ell }*}_{j}(t) \, L_{op}^{\ell } \, C^{\ell }_{k}(t) \left| \psi ^{\ell }_{k}(t) \right\rangle \end{aligned}$$
(8)

Since the wavefunction \(\psi ^{\ell }\) is completely determined by the set of coefficients \(C_{j}^{\ell }(t)\), the condition of stationarity of the action reduces to the problem of finding those coefficients, which must satisfy

$$\begin{aligned} 0 = \frac{\partial L^{\ell }}{\partial C^{\ell }_{j}} - \frac{{\mathrm {d}}}{{\mathrm {d}}t} \frac{\partial L^{\ell }}{\partial {\dot{C}}^{\ell }_{j}} \quad \forall j \end{aligned}$$
(9)

This formulation of the problem replaces (2).

It is traditional in quantum field theory to perform the variational calculus analysis by varying (differentiating with respect to) the physically significant canonical fields and momenta, and that approach is extremely useful in producing intuitively appealing and useful evolution equations. [1, 5] However, the stationarity of the action is a mathematical condition, and as long as our formulation spans the space of its allowed variations, the mathematics do not dictate our choice of the functions in terms of which those variations are expressed. Because we are interested in the eigenstate content of the wavefunction, the corresponding coefficients are particularly useful to us, and we choose them as the description we will use in the variational principle.

2.4 Combined Systems

Now we can write from (8)

$$\begin{aligned} S^{1} + S^{2} = \int _{t_{i}}^{t_{f}} {\mathrm {d}}t \left\langle \psi ^{1}(t) \right| \left\langle \psi ^{2}(t) \right| (L_{op}^{1} + L_{op}^{2}) \left| \psi ^{1}(t) \right\rangle \left| \psi ^{2}(t) \right\rangle \end{aligned}$$
(10)

if there is no interaction or entanglement between the two systems, that is, the combined state factors as \(\left| \psi \right\rangle \equiv \left| \psi ^{1} \right\rangle \left| \psi ^{2} \right\rangle\).

To allow the two subsystems to be entangled, we replace the product of single-system states \(\left| \psi ^{1} \right\rangle\) and \(\left| \psi ^{2} \right\rangle\) by the joint state

$$\begin{aligned} \left| \psi (t) \right\rangle = \sum _{j,k} C_{jk}(t) \left| \psi ^{1}_{j}(t) \right\rangle \left| \psi ^{2}_{k}(t) \right\rangle \end{aligned}$$
(11)

whereupon the normalization conditionFootnote 6

$$\begin{aligned} \left\langle \psi (t) \! \right. \left| \psi (t) \right\rangle = 1 \end{aligned}$$
(12)

implies

$$\begin{aligned} \sum _{j,k} \, |C_{jk}(t) |^{2} = 1 \quad \forall t \end{aligned}$$
(13)

Then

$$\begin{aligned} S^{1} + S^{2} = \sum _{j,k,{\ell },m} \int _{t_{i}}^{t_{f}} {\mathrm {d}}t \,\, \left\langle \psi _{j}^{1}(t) \right| \, \left\langle \psi _{k}^{2}(t) \right| \, C^{*}_{jk}(t) (L_{op}^{1} + L_{op}^{2}) \, C_{{\ell } m}(t) \, \left| \psi _{\ell }^{1}(t) \right\rangle \, \left| \psi _{m^{2}}(t) \right\rangle \end{aligned}$$
(14)

and

To simplify the single-system terms, suppose \(L^{1}_{op}\) and \(L^{2}_{op}\) are of the form

$$\begin{aligned} L^{\ell }_{op}=& {} A^{\ell } - B^{\ell } \, \frac{{\mathrm {d}}^{2}}{{\mathrm {d}}t^{2}} \nonumber \\=& {} A^{\ell } + \overleftarrow{\frac{{\mathrm {d}}}{{\mathrm {d}}t}} \, B^{\ell } \, \frac{{\mathrm {d}}}{{\mathrm {d}}t} \end{aligned}$$
(15)

so \(L^{1}\) and \(L^{2}\) take the form

$$\begin{aligned} L^{\ell } \equiv \left\langle \psi ^{\ell } \right| L^{\ell }_{op} \left| \psi ^{\ell } \right\rangle = A^{\ell } \left\langle \psi ^{\ell } \! \right. \left| \psi ^{\ell } \right\rangle + B^{\ell } \left\langle {\dot{\psi }}^{\ell } \! \right. \left| {\dot{\psi }}^{\ell } \right\rangle \end{aligned}$$
(16)

with real constants \(A^{\ell }\) and \(B^{\ell }\). Then the fact that \(\left| \psi _{j}^{\ell } \right\rangle\) is an eigenstate of the Hamiltonian means that it satisfies the Euler equation (2), which we can write as

$$\begin{aligned} 0=& {} \frac{\partial L^{\ell }}{\partial \left\langle \psi _{j}^{\ell } \right| } - \frac{{\mathrm {d}}}{{\mathrm {d}}t} \frac{\partial L^{\ell }}{\partial \left\langle {\dot{\psi }}_{j}^{\ell } \right| } \nonumber \\=& {} A^{\ell } \left| \psi _{j}^{\ell } \right\rangle - B^{\ell } \left| \ddot{\psi }_{j}^{\ell } \right\rangle \end{aligned}$$
(17)

At this point we observe that the functional in question is a physical action and therefore real, so it is unchanged if we drop any imaginary part of the integrand. This has a simplifying advantage. When we use variational calculus to find a stationary state with respect to variations of a complex quantity (\(\left| \psi _{j}^{\ell } \right\rangle\) or \(C_{jk}\)), we may treat the real and imaginary parts of that quantity independently, with an Euler equation for each of them. Alternatively, we may treat the quantity and its complex conjugate (\(\left\langle \psi _{j}^{\ell } \right|\) or \(C^{*}_{jk}\)) as the two functions to be varied. In our case, with a real integrand, doing so has the convenient feature that the two resulting Euler equations are complex conjugates of each other and we only need to solve one of them. Here in (17) we choose to vary the bra vector.

Substituting (15) into (14) and using property (17) of the eigenvectors, we find (introducing the shorthand notation \(B \equiv B^{1} + B^{2}\)) that

$$\begin{aligned} S^{1} + S^{2}= & {} \sum _{j,k,{\ell },m} \int _{t_{i}}^{t_{f}} {\mathrm {d}}t \,\, \left\langle \psi _{j}^{1}(t) \right| \, \left\langle \psi _{k}^{2}(t) \right| \, C^{*}_{jk}(t) \left[ (L_{op}^{1} + L_{op}^{2}), \, C_{{\ell } m}(t) \right] \, \left| \psi _{\ell }^{1}(t) \right\rangle \, \left| \psi _{m}^{2}(t) \right\rangle \nonumber \\= & {} -B \sum _{j,k,{\ell },m} \int _{t_{i}}^{t_{f}} {\mathrm {d}}t \,\, \left\langle \psi _{j}^{1} \right| \, \left\langle \psi _{k}^{2} \right| \, C^{*}_{jk} \left( {\ddot{C}}_{{\ell } m} + 2 \, {\dot{C}}_{{\ell } m} \frac{{\mathrm {d}}}{{\mathrm {d}}t} \right) \, \left| \psi _{\ell }^{1} \right\rangle \, \left| \psi _{m}^{2} \right\rangle \end{aligned}$$
(18)

Then

$$\begin{aligned} S^{1} + S^{2}= & {} \, -B \! \sum _{j,k,{\ell },m} \int _{t_{i}}^{t_{f}} {\mathrm {d}}t \,\, \left\langle \psi _{j}^{1} \right| \, \left\langle \psi _{k}^{2} \right| \, C^*_{jk} \left( {\ddot{C}}_{{\ell } m} - \frac{2{\mathrm {i}}}{\hbar } E_{{\ell } m} \, {\dot{C}}_{{\ell } m} \right) \, \left| \psi _{\ell }^{1} \right\rangle \, \left| \psi _{m}^{2} \right\rangle \nonumber \\= & {} \, -B \sum _{j,k} \int _{t_{i}}^{t_{f}} {\mathrm {d}}t \,\, C^{*}_{jk} \, \left( {\ddot{C}}_{jk} - \frac{2{\mathrm {i}}}{\hbar } E_{jk} \, {\dot{C}}_{jk} \right) \nonumber \\= & {} \, B \sum _{j,k} \int _{t_{i}}^{t_{f}} {\mathrm {d}}t \, \left( |{\dot{C}}_{jk} |^{2} + \frac{2{\mathrm {i}}}{\hbar } E_{jk} \, C^{*}_{jk} \,{\dot{C}}_{jk} \right) \end{aligned}$$
(19)

where in the last step we rely on the hypothesis that \({\dot{C}}_{jk}\) vanishes at \(t_{i}\) as a condition imposed by the experimental preparation, and at \(t_{f}\) since that is implied by the NBC. Finally, as intended, we discard the imaginary part of the integrand:

$$\begin{aligned} S^{1} + S^{2} = \, B \sum _{j,k} \int _{t_{i}}^{t_{f}} {\mathrm {d}}t \, \left( |{\dot{C}}_{jk} |^{2} + \mathrm {Re} \left\{ \frac{2{\mathrm {i}}}{\hbar } E_{jk} \, C^{*}_{jk} \,{\dot{C}}_{jk} \right\} \right) \end{aligned}$$
(20)

2.5 Interaction Term

As argued above, we must account for interaction by including in the action a term proportional to \((\sigma _{op}^{1} - \sigma _{op}^{2})^{2}\). A simple form for such an interaction term is

$$\begin{aligned} S^{I} = \mu \int ^{t_{f}}_{t_{i}}{\mathrm {d}}t \, \left\langle \psi (t) \right| (\sigma _{op}^{1} - \sigma _{op}^{2})^{2} \left| \psi (t) \right\rangle \end{aligned}$$
(21)

for some constant \(\mu\). Then, defining another shorthand notation \(\Delta _{jk} \equiv \sigma _{j}^{1} - \sigma _{k}^{2}\),

$$\begin{aligned} S^{I}= & {} \, \mu \sum _{j,k,{\ell },m} \int ^{t_{f}}_{t_{i}} {\mathrm {d}}t \, \left\langle \psi _{j}^{1}(t) \right| \, \left\langle \psi _{k}^{2}(t) \right| \, C^{*}_{jk}(t) \, (\sigma _{op}^{1} - \sigma _{op}^{2})^{2} \, C_{{\ell } m}(t) \, \left| \psi _{\ell }^{1}(t) \right\rangle \, \left| \psi _{m}^{2}(t) \right\rangle \nonumber \\= & {} \, \mu \sum _{j,k} \int ^{t_{f}}_{t_{i}} {\mathrm {d}}t \, \, \Delta _{jk}^{2} \, |C_{jk}(t) |^{2} \end{aligned}$$
(22)

Then we might expect the complete action to be

$$\begin{aligned} S = S^{1} + S^{2} + S^{I} \end{aligned}$$
(23)

2.6 Nonlocal Interaction Term

Since the phenomenon that requires nonlocality (measurement-induced collapse of the wavefunction) is due to the interaction between systems 1 and 2, we suppose that it is the interaction term \(S^{I}\) that must be made nonlocal. We propose to add to it a nonlocal piece involving two integrations on time. We start with an expression resembling \(S^{I}\) in (21) but with two integrations on time:

$$\begin{aligned}&\nu \left[ \int ^{t_{f}}_{t_{i}}{\mathrm {d}}t \, \left\langle \psi (t) \right| (\sigma _{op}^{1}- \sigma _{op}^{2}) \left| \psi (t) \right\rangle \right] ^{2} \nonumber \\&= \nu \int ^{t_{f}}_{t_{i}}{\mathrm {d}}t_{1} \, \int ^{t_{f}}_{t_{i}}{\mathrm {d}}t_{2} \, \left\langle \psi (t_{1}) \right| \left\langle \psi (t_{2}) \right| ' (\sigma _{op}^{1}- \sigma _{op}^{2}) \, (\sigma _{op}^{1'}- \sigma _{op}^{2'}) \left| \psi (t_{1}) \right\rangle \left| \psi (t_{2}) \right\rangle ' \end{aligned}$$
(24)

Here \(\nu\) is a real constant, and the primed \(\sigma ^{\ell }_{op}\) operators combine with the primed bra and ket vectors in an inner product, as do the unprimed operators and bra and ket vectors. Now we make changes so as to couple the \(t_{1}\) and the \(t_{2}\) integrals. We move one of the primes in the operator kernel, changing it from \((\sigma _{op}^{1}- \sigma _{op}^{2}) \, (\sigma _{op}^{1'}- \sigma _{op}^{2'})\) to \((\sigma _{op}^{1}- \sigma _{op}^{2'}) \, (\sigma _{op}^{1'}- \sigma _{op}^2)\). We also move the prime from one ket vector to the other. Finally, we observe that in this form the interaction between the state at \(t_{1}\) and at that at \(t_{2}\) is a function of the time difference. It may be that that effect weakens with temporal separation, so a dimensionless non-negative real function \(f(t_{1} - t_{2})\) should be included in the integrand. By symmetry, f must be an even function, and we expect it to be a monotonically decreasing function of the absolute value of its argument. For later convenience, let us suppose that there is a real constant \(\tau\) such that \(f(t_{1} - t_{2})=0\) whenever \(|t_{1} - t_{2} |\ge \tau\). These changes result in the term

$$\begin{aligned} R^{I} \equiv \nu \int ^{t_{f}}_{t_{i}}{\mathrm {d}}t_{1} \, \int ^{t_{f}}_{t_{i}}{\mathrm {d}}t_{2} \, f(t_{1} - t_{2}) \left\langle \psi (t_{1}) \right| \left\langle \psi (t_{2}) \right| ' (\sigma _{op}^{1}- \sigma _{op}^{2'}) \, (\sigma _{op}^{1'}- \sigma _{op}^{2}) \left| \psi (t_{1}) \right\rangle ' \left| \psi (t_{2}) \right\rangle \end{aligned}$$
(25)

Physically this expresses an interaction or “auto-entanglement” between the state \(\left| \psi \right\rangle\) at time \(t_{1}\) and the same state at \(t_{2}\); this is an expression of retrocausality in the sense that the state at the later time interacts with its earlier value. As mentioned earlier, a more speculative interpretation, based on the time symmetry of the variational principle, is that this term describes interaction between “forwards” and “backwards” histories, something like the “transaction” in Cramer’s Transactional Interpretation, but it is not quite the same.Footnote 7

We point out that for the extreme choice of f

$$\begin{aligned} f(t_{1} - t_{2}) = \delta (t_{1} - t_{2}) \end{aligned}$$
(26)

the integrand takes a more intuitive form in terms of quantum expectation values \(\left\langle \mathcal {O} \right\rangle \equiv \left\langle \psi \right| \mathcal {O} \left| \psi \right\rangle\):

$$\begin{aligned} \left\langle \psi (t) \right| \left\langle \psi (t) \right| ' (\sigma _{op}^{1}- \sigma _{op}^{2'}) \, (\sigma _{op}^{1'}- \sigma _{op}^{2}) \left| \psi (t) \right\rangle \left| \psi (t) \right\rangle '= & {} \left\langle \sigma ^{1} \right\rangle ^{2} -2 \left\langle \sigma ^{1} \sigma ^{2} \right\rangle + \left\langle \sigma ^{2} \right\rangle ^{2} \nonumber \\= & {} \left\langle (\sigma ^{1} - \sigma ^{2})^{2} \right\rangle - \left\langle (\Delta \sigma ^{1})^{2} \right\rangle - \left\langle (\Delta \sigma ^{2})^{2} \right\rangle \qquad \end{aligned}$$
(27)

in which

$$\begin{aligned} \Delta \sigma ^{\ell } \equiv \sigma ^{\ell } - \left\langle \sigma ^{\ell } \right\rangle \quad {\ell }=1,2 \end{aligned}$$
(28)

This suggests that minimizing the term \(\left\langle (\sigma ^1 - \sigma ^2)^2 \right\rangle\) drives the action of measurement (system and apparatus evolve to states with the same eigenvalue) and the other two terms drive wavefunction collapse (until each system ultimately has only a single eigenvalue \(\sigma ^{\ell } = \left\langle \sigma ^{\ell } \right\rangle\)). We will find that the \(\delta\)-function form of f is unsuitable for our objectives, so the physical interpretation of \(R^{I}\) is more subtle, but this limiting case suggests that that term drives both the measurement property and wavefunction collapse.

Next we expand in normal modes according to (11) and use the eigenvalue relation (3):

$$\begin{aligned} R^{I}= & {} \nu \int ^{t_{f}}_{t_{i}} {\mathrm {d}}t_{1} \, \int ^{t_{f}}_{t_{i}} {\mathrm {d}}t_{2} \, f(t_{1} - t_{2}) \sum _{\begin{array}{c} j,k,{\ell },m,\\ n,p,q,r \end{array}} \left\langle \psi _{j}^{1}(t_{1}) \right| \left\langle \psi _{k}^{2}(t_{1}) \right| C^*_{jk}(t_{1}) \left\langle \psi _{\ell }^{1}(t_{2}) \right| ' \left\langle \psi _{m}^{2}(t_{2}) \right| ' \nonumber \\&C^*_{{\ell } m}(t_{2}) \, \Delta _{qp} \, \Delta _{nr} \, C_{np}(t_{1}) \left| \psi _{n}^{1}(t_{1}) \right\rangle ' \left| \psi _{p}^{2}(t_{1}) \right\rangle ' C_{qr}(t_{2}) \left| \psi _{q}^{1}(t_{2}) \right\rangle \left| \psi _{r}^{2}(t_{2}) \right\rangle \end{aligned}$$
(29)

Then, using (5) and defining \(E_{jk} \equiv E_{j}^{1} + E_{k}^{2}\),

$$\begin{aligned} R^{I}= & {} \int ^{t_{f}}_{t_{i}} {\mathrm {d}}t_{1} \, \int ^{t_{f}}_{t_{i}} {\mathrm {d}}t_{2} \, r^I(t_{1},t_{2}) \nonumber \\= & {} \, \frac{1}{2} \int ^{t_{f}}_{t_{i}} {\mathrm {d}}t_{1} \, \int ^{t_{f}}_{t_{i}} {\mathrm {d}}t_{2} \, \left[ r^I(t_{1},t_{2}) + r^I(t_{2},t_{1}) \right] \end{aligned}$$
(30)

where

$$\begin{aligned} r^I(t_{1},t_{2}) \equiv \, \nu f(t_{1} - t_{2}) \sum _{j,k,{\ell },m} \Delta _{jm} \, \Delta _{{\ell } k} \, C^*_{jk}(t_{1}) \, C^*_{{\ell } m}(t_{2}) \, C_{{\ell } m}(t_{1}) \, C_{jk}(t_{2}) \, {\mathrm {e}}^{-\frac{{\mathrm {i}}}{\hbar } (E_{jk}-E_{{\ell } m}) (t_{2}-t_{1})} \end{aligned}$$
(31)

In the second line of (30) we have replaced the integrand by its real part, for the reasons discussed above, utilizing the property

$$\begin{aligned} \left[ r^I(t_{1},t_{2}) \right] ^* = r^I(t_{2},t_{1}) \end{aligned}$$
(32)

2.7 Complete Superaction and Variational Analysis

Then the full superaction is

$$\begin{aligned} S=& {} S^1 + S^{2} + S^{I} + R^{I} \nonumber \\= & {} \int ^{t_{f}}_{t_{i}} {\mathrm {d}}t \left[ s^{12}(t) + s^I(t) \right] + \frac{1}{2} \int ^{t_{f}}_{t_{i}} {\mathrm {d}}t_{1} \int ^{t_{f}}_{t_{i}} {\mathrm {d}}t_{2} \left[ r^I(t_{1},t_{2}) + r^I(t_{2},t_{1}) \right] \nonumber \\= & {} \int ^{t_{f}}_{t_{i}} {\mathrm {d}}t_{1} \int ^{t_{f}}_{t_{i}} {\mathrm {d}}t_{2} \left\{ \frac{1}{2T} \left[ s^{12}(t_{1}) + s^I(t_{1}) + s^{12}(t_{2}) + s^I(t_{2}) \right] + \frac{1}{2} \left[ r^I(t_{1},t_{2}) + r^I(t_{2},t_{1}) \right] \right\} \qquad \end{aligned}$$
(33)

where \(s^{12}\) and \(s^I\) are the integrands (including prefactors) in \((S^1 + S^{2})\) and \(S^{I}\), as given in (20) and (22):

$$\begin{aligned}&s^{12} = \, B \sum _{j,k} \left[ |{\dot{C}}_{jk} |^{2} + \frac{{\mathrm {i}}}{\hbar } E_{jk} \left( C^*_{jk} \,{\dot{C}}_{jk} - {\dot{C}}^*_{jk} \, C_{jk} \right) \right] \end{aligned}$$
(34)
$$\begin{aligned}&s^I = \, \mu \sum _{j,k} \, \Delta _{jk}^2 \, |C_{jk} |^{2} \end{aligned}$$
(35)

We observe that in this form, the integrand of S is real and symmetric in \(t_{1}\) and \(t_{2}\). It depends on the coefficients \(\{C_{pq}\}\) at two times. We need to find a stationary point of the action subject to the constraint (13). In Appendix B we outline the analysis of such a problem, including the use of a Lagrange mulitipler \(\lambda (t)\) to enforce the constraint, leading to integral equation (71). For any given choice of j and k, we vary \(C_{jk}^*\) by that procedure and define the operator

$$\begin{aligned} W \equiv \frac{\partial }{\partial C_{jk}^*(t_{1})} - \left. \frac{\partial }{\partial t_{1}} \right| _{t_{2}} \frac{\partial }{\partial {\dot{C}}_{jk}^*(t_{1})} \end{aligned}$$
(36)

Accordingly, we find

$$\begin{aligned} 0= & {} \frac{1}{2} \, W \, s^{12}(t_{1}) + \frac{1}{2} \, W \, s^I(t_{1}) + \frac{1}{2} \int ^{t_{f}}_{t_{i}} {\mathrm {d}}t_{2} \, W \left[ r^{I}(t_{1},t_{2}) + r^I(t_{2},t_{1}) \right] \nonumber \\&+ \, T \, \lambda (t_{1}) \, \frac{\partial }{\partial C_{jk}^*(t_{1})} \left[ \sum _{{\ell },m} \, |C_{{\ell },m}(t_{1}) |^{2} - 1\right] \end{aligned}$$
(37)

This becomes

$$\begin{aligned} {\ddot{C}}_{jk}(t) = \frac{2{\mathrm {i}}}{\hbar } E_{jk} \, {\dot{C}}_{jk}(t) + \frac{1}{B} \left[ \mu \, \Delta _{jk}^2 + 2T \, \lambda (t) \right] C_{jk}(t) + \frac{\nu }{B} \, {\tilde{C}}_{jk}(t) \end{aligned}$$
(38)

in which we define the function

$$\begin{aligned} {\tilde{C}}_{jk}(t) \equiv \sum _{{\ell },m} \Delta _{jm} \, \Delta _{{\ell } k} \, C_{{\ell } m}(t) \, \int ^{t_{f}}_{t_{i}} {\mathrm {d}}t' \, C^*_{{\ell } m}(t') \, C_{jk}(t') \, f(t - t') \, {\mathrm {e}}^{-\frac{{\mathrm {i}}}{\hbar } (E_{jk}-E_{{\ell } m}) (t'-t)} \end{aligned}$$
(39)

It can be seen by varying the action by \(C_{jk}\) instead of \(C_{jk}^*\) that \(\lambda (t)\) must be real. To find it, we note that the second derivative of the normalization condition (13) is

$$\begin{aligned} 2 \sum _{j,k} \left( |{\dot{C}}_{jk} |^{2} + \mathrm {Re} \left\{ C_{jk} ^* \, {\ddot{C}}_{jk} \right\} \right) = 0 \end{aligned}$$
(40)

We eliminate \({\ddot{C}}_{jk}\) between (38) and (40) and then solve for (a constant times) \(\lambda (t)\):

$$\begin{aligned} \frac{2T \lambda }{B} = - \sum _{j,k} \left( |{\dot{C}}_{jk} |^{2} + \frac{\mu }{B} \, \Delta _{jk}^2 \, |C_{jk} |^{2} + \mathrm {Re} \left\{ \frac{2 {\mathrm {i}}}{\hbar } E_{jk} \, C_{jk}^* \, {\dot{C}}_{jk} + \frac{\nu }{B} \, C_{jk}^* \, {\tilde{C}}_{jk} \right\} \right) \end{aligned}$$
(41)

Substituting that expression into (38),

$$\begin{aligned} {\ddot{C}}_{jk}= & {} \frac{2 {\mathrm {i}}}{\hbar } E_{jk} \, {\dot{C}}_{jk} + \frac{\mu }{B} \, C_{jk} \left( \Delta _{jk}^2 - \sum _{{\ell },m} \Delta _{{\ell } m}^2 |C_{{\ell } m} |^{2} \right) \nonumber \\&+ \frac{\nu }{B} \left( {\tilde{C}}_{jk} - C_{jk} \, \mathrm {Re} \left\{ \sum _{{\ell },m} C_{{\ell } m}^* {\tilde{C}}_{{\ell } m} \right\} \right) - C_{jk} \sum _{{\ell },m} \left( |{\dot{C}}_{{\ell } m} |^{2} + \mathrm {Re} \left\{ \frac{2 {\mathrm {i}}}{\hbar } E_{{\ell } m} \, C_{{\ell } m}^* \, {\dot{C}}_{{\ell } m} \right\} \right) \qquad \end{aligned}$$
(42)

This is the equation that we expect describes the evolution of the complete system (that is, system \(+\) apparatus), as described by the coefficients \(\{C_{jk}(t)\}\) in the normal-mode expansion (6).

The BCs to be applied with (42) are the specified values of \(\{C_{jk}(t_{i})\}\) from the initial preparation, and the NBC at \(t_{f}\), which takes the form

$$\begin{aligned} 0 = \int _{a}^{b} {\mathrm {d}}t_{2} \left. \left( \frac{\partial L}{\partial {\dot{C}}^*_{jk}(t_{1})} \right) \right| _{t_{1}=t_{f}} \end{aligned}$$
(43)

in which L is the integrand in the full action on the last line of (33). This form of the NBC is derived in Appendix B.

2.8 Alternative Treatment of the Normalization Condition

Comparison of (38) with (42) shows that rigorous enforcement of the normalization condition (13) has complicated the mathematics. Since we hope to show that experimental results of great simplicity and generality (e.g. Born’s rule) follow from this theory, we are suspicious of the additional complexity and wonder whether it is absolutely necessary to satisfy the stated normalization condition at every instant t.

Our skepticism about that requirement is also based on a thought experiment described by Renninger [19, 20], which is equivalent to the following description. An excited atom at the origin is known to emit a photon at \(t=0\), but the direction is unknown, so the photon’s wavefunction satisfies \(|\psi |^2 = \delta (r-ct)/(4\pi r^2)\). A perfectly collecting hemispherical detector screen occupies the upper half of the sphere \(r=1\) light-second. Therefore, if the photon’s emission direction is within \(\theta < \pi /2\), it is collected and extinguished at \(t=1\) second. Otherwise, it is not registered by the detector screen, and its wavefunction changes to satisfy \(|\psi |^{2} = \delta (r-ct) H(\theta -\pi /2)/(2\pi r^2)\), where H is the Heaviside function. The instantaneous change in the denominator from \(4\pi r^2\) to \(2\pi r^2\) at \(t=1\) is not due to any measurement, for there is none, nor to any physical change in the photon; it arises entirely from the normalization requirement. This seems unphysical, and our suspicion deepens when we consider that this description depends on choice of reference frame; for instance, in any other frame the detector screen would not be (hemi)spherical but spheroidal, and so the resulting change in magnitude of the uncollected wavefunction would happen over a nonzero interval of time.

A more physically sound description would be that a photon intercepted by the detector screen does not simply vanish; it interacts with (a) particle(s) of the screen to produce some physical effect, for instance dislodging a photoelectron. A more complete description of the experiment would include that effect. Since half of the outgoing spherical photon wavefunction participates in that effect, it is unreasonable for the uncollected half to double its weight to satisfy a normalization condition. We argue instead that the outgoing uncollected photon wavefunction after \(t=1\) should be normalized to integrate to 1/2, and with that change we see that a discontinuous and unphysical change is no longer needed in that uncollected part at \(t=1\).

Armed with our reasoning that the normalization condition (13) is not absolute, we propose to relax it for the experiment that is the subject of this paper. Although for many experiments we do not expect to lose any of the wavefunction weight in mid-experiment, we point out that the total weight of the wavefunction (unity, meaning one particle of whatever type is being described) is known only at \(t_{i}\) and \(t_{f}\). There is not, nor can there be, any experimental evidence for a unity (or any other) value of the weight at intermediate times. Therefore we propose that (13) (and equivalently (12)) is a constraint only at \(t_{i}\) and \(t_{f}\). This is easily handled mathematically; we simply stipulate that (13) is part of the initial and final conditions.Footnote 8 Then we can dispense with the Lagrange multiplier altogether, so the IDE to be satisfied is

$$\begin{aligned} {\ddot{C}}_{jk}(t) = \frac{2{\mathrm {i}}}{\hbar } E_{jk} \, {\dot{C}}_{jk}(t) + \frac{\mu }{B} \Delta _{jk}^2 \, C_{jk}(t) + \frac{\nu }{B} \, {\tilde{C}}_{jk}(t) \end{aligned}$$
(44)

Since we regard the simplicity of this equation in comparison to (42) as an argument for its plausibility, we will adopt it rather than the latter in the remaining sections of the paper; nevertheless, much of the following reasoning can be applied to (42) as well at the cost of more algebra.

3 Comparison to Desired Properties

3.1 Stability of a Superposition in the Absence of a Measurement

We observe at this point that (44) predicts the stability of an unperturbed superposition, as it should. When there is no interaction between the system and the measurement apparatus, \(\mu = \nu = 0\). The resulting equation

$$\begin{aligned} {\ddot{C}}_{jk} = \frac{2 {\mathrm {i}}}{\hbar } E_{jk} \, {\dot{C}}_{jk} \end{aligned}$$
(45)

has the solution \({\dot{C}}_{jk} =0 \; \forall j, \! k\), that is, stability of the superposition. Furthermore, since (for each subsystem \({\ell }=1\) or 2) the modes in the expansion (11) were defined as solutions of the no-measurement wave equation, the stable solution resulting from our analysis here agrees with the solution of the ordinary wave equation for each isolated system.

3.2 Collapse to a Single Eigenstate with \(\sigma _{j}^{1} = \sigma _{k}^{2}\)

This includes three events we expect in a measurement: system 1 must collapse to a single eigenstate of \(\sigma _{op}^1\), or a superposition of eigenstates with the same eigenvalue; system 2 must similarly collapse; and the eigenvalues of the two systems must agree. The third condition (measurement) requires that for any jk,

$$\begin{aligned} \Delta _{jk}=0 \quad \text {or} \quad C_{jk}=0 \end{aligned}$$
(46)

Although we will not analyze the differential equation (44) to describe the approach to these three conditions, we will show that it is consistent with their satisfaction in the steady state, when all time derivatives of \(\{C_{pq}\}\) vanish. Thus it is plausible for the combined system to reach such a state, and having done so, to remain in that state.

We see that condition (46) together with the steady-state condition cause every term in (44) to vanish except possibly the last. To understand those terms, consider that after the system attains a steady state, we can replace all the factors \(C_{pq}\) or \(C^*_{pq}\) on the RHS of (39) by their final values, which satisfy (46). Then at times t greater than \(\tau\) after the full system reaches its steady state, any nonzero terms \({\ell },m\) on the RHS must have

$$\begin{aligned} \Delta _{jk} = \Delta _{{\ell } m} = 0 \end{aligned}$$
(47)

If either of systems 1 and 2 has collapsed to a single state (or a set of states with a single eigenvalue), then by (46) the other system has also collapsed, and it is easy to see that (47) implies that \(\Delta _{jm} = \Delta _{{\ell } k} = 0\), so the only possible nonzero term in \({\tilde{C}}_{jk}\) is zero after all. Therefore the last term in (44) vanishes, so the equation is consistent with the supposed late-time steady state. On the other hand, if systems 1 and 2 have not collapsed, there are terms in (39) that do not trivially vanish. We conclude that the evolution equation predicts that a late-time steady state is only possible if both the measurement condition is satisfied (the apparatus state corresponds to the state of the system being measured) and both systems have collapsed to a single eigenvalue.

We would prefer to have a more rigorous analysis, both disposing of the possibility that the combined system never reaches a steady state and describing the approach to the steady state. This analysis must await future work, possibly including numerical studies. Our objective in this paper is to show the possibility that a variational principle of the type we have developed can explain the measurement problem.

3.3 Consistency with Born’s Rule

At this point we take it as given that the system will collapse to a single value of the eigenvalue. Since the system being measured is denoted \({\ell }=1\), the weight corresponding to eigenvalue \(\sigma ^{1}_{j}\) is

$$\begin{aligned} P_{j} \equiv \sum _{k} |C_{jk}(t_{i}) |^{2} \end{aligned}$$
(48)

(More generally, it is \(\sum _{j,k} |C_{jk}(t_{i}) |^{2}\), where the sum on j is over all modes with a single value of the eigenvalue. For simplicity, we will consider only the non-degenerate case, but the extension to the more general case should be straightforward.)

It will be convenient to denote averages over an ensemble of identically prepared experimental realizations by an overbar. Then, if it is taken as given that the collapse to a single eigenvalue is complete by \(t_{f}\), we can see that the relation

$$\begin{aligned} \overline{P_{j}(t_{i})} = \overline{P_{j}(t_{f})} \quad \forall j \end{aligned}$$
(49)

is equivalent to Born’s rule. This equivalence holds because at the initial time \(t_{i}\), by the requirement of identical preparation, every member of the ensemble contributes the same value \(P_{j}(t_{i})\) to the ensemble average. At \(t_{f}\), \(P_{j} = 1\) in a fraction \(P_{j}(t_{i})\) of the realizations in the ensemble, and 0 in the others. So (49) is the relation that should be predicted by a successful theory.

We would like to be able prove that Born’s rule (49) follows from our nonlocal wave equation (42). The theoretical proof has eluded us so far; we may ultimately have to rely on numerical studies. However, we sketch out in Appendix C some of the ideas that may contribute to the theoretical analysis.

4 Discussion

4.1 Sensitivity of the System Evolution to a Measurement

The traditional challenge of quantum measurement is to explain the sensitivity of a quantum system to measurement, which changes its evolution from unitary evolution to collapse (state reduction). Our picture provides a quantitative explanation for this sensitivity. The act of measuring a system involves causing it to physically interact with a measurement apparatus, and the variational principle describes the evolution of the combined system. The readout of the measurement at \(t_{f}\) defines the end of the domain of integration of the variational principle. Of course, the theory continues to apply after \(t_{f}\), but the observation at \(t_{f}\), like its preparation at \(t_{i}\) and its spatial boundary conditions, imposes a leakproof barrier to influences from outside the problem domain, so that a solution may be found within that domain without reference to the rest of the universe.

Now if the measurement apparatus were read at some intermediate time \(t_{m}\), the structure of the problem would be different. Instead of applying between \(t_{i}\) and \(t_{f}\), the variational principle would apply twice, from \(t_{i}\) to \(t_{m}\) and from \(t_{m}\) to \(t_{f}\). The appearance of a constraint at \(t_{m}\) as a final condition on the first interval and an initial condition on the second would make this a different problem than the original one from \(t_{i}\) to \(t_{f}\). (As we have explained, the intervention at \(t_{m}\) results in the appearance of an NBC on the solution between \(t_{i}\) and \(t_{m}\), even though it does not dictate the result of the reading at \(t_{m}\).) Consequently, the act of observing the system at \(t_{m}\) changes it, just as in conventional interpretations.

The reader may object that we have not removed the mystery but moved it to a different concept. Instead of declaring by fiat that a measurement changes the system, we have declared that the domain of integration of the variational principle must end at the time (and place) at which the measurement apparatus is read. We haven’t explained what is special about the events at \(t_{f}\) that allow us to end the domain there.

The criticism is valid, but we point out that we have pushed back the mystery, or made it less mysterious, by relating it to considerations of BCs. Certainly the description of a measurement in terms of an action integral bounded at \(t_{i}\) and \(t_{f}\) must be an approximation to a more complete theory that includes a greater time interval before and after \([t_{i},t_{f}]\) and a fuller description of the measurement process. On the other hand, the empirical fact that broad statements of great generality apply to measurements, regardless of the system under study or the mechanism of the process, strongly suggests that a simple description is possible, particularly regarding a point in time before the measurement (\(t_{i}\)) and one after its completion (\(t_{f}\)). The validity of the simple description is not necessarily a surprise; it may be that the interactions that can be so described have been adopted as measurement procedures precisely because of their ability to give repeatable quantitative results. Thus measurements performed by such procedures can be described, to good approximation, in terms of the action within a domain bounded by \(t_{i}\) and \(t_{f}\).

If the simple description proposed in this paper turns out to be successful in description and prediction at some level of approximation, that will be evidence of its usefulness, without denying the possibility of a more complete theory. Eventually such an improved theory may show that collapse/decay to a single eigenvalue occurs at \(t_{f}\) in a physically justifiable way, based on the role of the apparatus in the action, and so it is appropriate to simplify the problem as we have done by terminating the integral at \(t_{f}\) and accepting the NBC there.

An extended analysis of that type would also be appropriate to explore another aspect of the new theory. We have argued that we can solve the variational principle between \(t_{i}\) and \(t_{f}\), which would presumably enable a prediction of the experimental outcome at \(t_{f}\) (based on (a) fixed value(s) of hidden variable(s), of course). We have asserted that the final condition at \(t_{f}\) provides a leakproof barrier to influences from outside that problem domain. But the theory must apply under reversal of the direction of time, so it should also be possible to apply an experimental preparation (initial condition) at \(t_{f2} \equiv t_{f+T}\) and a measurement readout (NBC as a final condition) at \(t_{f}\) to predict an outcome at \(t_{f}\) based on physics between \(t_{f}\) and \(t_{f2}\). We suspect that the theory retains sufficient flexibility to allow the two solutions (for \(t_{i} \le t \le t_{f}\) and \(t_{f} \le t \le t_{f2}\)) to agree at \(t_{f}\). It probably helps that we expect (in both cases) to apply natural BCs at \(t_{f}\), so we are not actually constraining the value of the measured variable. Also, continuity constraints on fields, wavefunctions and derivatives appearing in the action may help to avoid contradictions. Since these two predictions must agree, the barrier at \(t_{f}\) is not completely leakproof. It might instead be described as a partially permeable membrane, as suggested by the applicability of an NBC that constrains some but not all properties of the system at \(t_{f}\). This type of study may give insight into the nature of the constraint imposed by the measurement readout.

4.2 Causality and Time-Ordering Issues

Retrocausality—the dependence of phenomena at a given time on phenomena in their future—conflicts with the usual notion of causality—the concept that causes precede their effects in time. However, multiple authors [10, 21, 22] have pointed out that such a notion of causality is not necessary to avoid contradictions. If event \(A\Rightarrow B\), then \(B \Rightarrow \, \sim \! A\) would produce a contradiction. But if we are somehow prevented from declaring that \(B \Rightarrow \, \sim \! A\) (or an equivalent combination of statements), then in principle \(A\Rightarrow B\) is possible even if B occurs earlier than A.

To apply this to our use of retrocausality in the variational principle, we are asserting that the NBC at \(t_{f}\) (which applies because a measurement result is read off at that time, even though that result is unconstrained) is an event A that constrains the solution between \(t_{i}\) and \(t_{f}\), so that solution at some intermediate time \(t_{m}\) can be considered as event B. But the event B thus chosen is by definition consistent with A, since it is a point along the solution based on A. It is not possible to claim that \(B \Rightarrow \, \sim \! A\), so no contradiction is possible.

Of course, the usual objection to this is that one could intervene at \(t_{m}\) to change the trajectory of events and produce \(\sim \! A\) at \(t_{f}\) (going back in time and shooting one’s grandparent, in the usual cliche). But doing this changes the problem, as described above; now the variational principle applies from \(t_{i}\) to \(t_{m}\) and from \(t_{m}\) to \(t_{f}\), with the intervention imposing new BCs at \(t_{m}\). Since this is a different problem than the original one, the original solution does not apply and no claim of a contradiction can be made.

4.3 Choice of the Function f

We have relied on a supposed interaction between wavefunctions at \(t_{1}\) and \(t_{2}\), as expressed in the nonlocal action term (25). The interaction is a physical process with a temporal range described by the function f. It will be important to determine the form of f; this may be explored numerically, but additional physical insight could be very useful.

Our earlier hypotheses that f is a decreasing function of the absolute value of its argument and that it has a finite range \(\tau\) are intuitively appealing, but they are not the only possibility. In fact, we cannot rule out the opposite extreme, that \(f(t) \equiv 1\). This would mean that the nonlocal interaction has infinite range, but in practice for a given measurement it would be limited to the interval \([t_{i},t_{f}]\). (Without the finite-range limit \(\tau\), our analysis in Sect. 3.2 would have to be revisited.)

4.4 Solving the Integrodifferential Equation

As mentioned above, it will be important to solve, or otherwise study, the IDE (44). That effort may be made theoretically, or numerically if need be. We would like to understand under what conditions the system reaches the collapsed state described in Sect. 3.2, how fast that late-time state is approached, and which of the possible collapsed states is reached, as a function of the uncontrolled parameter(s). It will also be important to test whether the equation produces outcome frequencies consistent with Born’s rule, possibly following ideas in Sect. 3.3.

One question is whether, given a choice of initial conditions and uncontrolled parameter(s), the solution to the IDE is unique (and even whether a solution exists). If there is always a unique solution, the theory may be completely deterministic (although it is unclear what that means for a retrocausal theory!), so we may be able to dispense completely with the idea that quantum mechanical processes depend on instrinsically random variables. Such a discovery might have far-reaching ramifications in quantum information technologies that rely on (supposed) randomness.

If this understanding enables us to make predictions based on the theory, we will look for experimentally testable predictions. Although we have argued that the new theory will agree with many features of conventional theory, it is certainly possible that it could differ in some ways.Footnote 9 One possibility is that results that have historically been seen to vary, supposedly due to intrinsic randomness, may vary less or not at all if a hidden (that is, historically uncontrolled) variable is controlled in new experiments (guided by new predictions about how well or to what values it must be controlled).

Of course, it is possible that the particular choice of action we have made, and the IDE resulting from it, do not correspond to nature. Even in that case, our exposition here shows that a variational principle of this type, including our assumptions of retrocausality, nonlocality, and one or more uncontrolled parameters, can lead to a plausible theory that avoids, resolves or explains problematic features of conventional quantum theory. If the theory presented here is not borne out, a similarly-constructed theory with a different form of the action may be more successful.

4.5 Summary: Successes, Limitations and Open Questions

Our explorations have touched quite a few issues, some more successfully than others. Let us summarize.

We have shown that it is possible to develop a theory that is nonlinear (so as to enable state reduction), nonlocal (enabling recognition of eigenstates of the operator associated with a measurement), and time-symmetric. We have included the measurement apparatus in the calculation, but without invoking any special treatment of that apparatus. (We do pay particular attention to the interaction between apparatus and measured system, but the same formalism would apply to any pair of interacting subsystems.) There was no need to refer to the presence or activity of any observer. We have indicated mathematically how this theory should agree with the standard QM wave equation in the absence of a measurement, and should describe state reduction when there is a measurement; and that state reduction should both satisfy the measurement property (pointer state corresponds to state of measured system) and reach the endpoint of a state (or degenerate states) with a single eigenvalue.

We have also described how the use of retrocausality (Wharton’s Lagrangian schema) leads to simple explanations for EPR correlations and for the delayed-choice experiment. Finally, we have shown that the normalization of the wavefunction can be constrained in a relativistically acceptable way by limiting that constraint to the spacelike initial and final surfaces of a measurement (that is, the surfaces where the initial and final conditions are imposed by the design of the experiment).

These successes are mitigated somewhat by certain limitations in our analysis. As “proof of principle,” we considered a very simple model of the measured system, the measurement apparatus, and their interaction. We believe that that description faithfully represents the relevant characteristics of more complicated systems, but that is a supposition that remains to be tested. For simplicity we assumed that eigenvalues—of the measured quantity as well as of energy—were constant in time. We considered a spacetime region bounded by initial and final surfaces that were constant in time (at \(t_{i}\) and \(t_{f}\)); we believe but have not shown that our results apply to more general surfaces. None of these simplifications invalidate our results, but they show the need for continued work.

More seriously, there are parts of the analysis that are incomplete or questionable. While we showed that the equations tend toward the desired state reduction, we did not solve for the actual time dependence or show that the reduced state was inevitable. The construction of the superaction included some ad hoc operations that could have been done differently (for instance, \(R^{I}\) could have been substituted for \(S^{I}\) rather than added to it)—we have not identified compelling reasons for all the choices we made—and we have not provided a general prescription for the modifications to be made to convert squared action \(S^{2}\) to a non-factorizable superaction (the step that resulted in (25)). We have not shown how to solve the IDE, or determined whether it has a unique or multiple solutions. Most importantly, we have not identified the needed uncontrolled parameter or demonstrated Born’s rule; the analysis in Appendix C is suggestive but has several large gaps.

While these deficiencies are disappointing, we remind the reader that our objective was to see if it is possible to construct a “realistic” theory with the properties we specified at the outset. The answer seems to be yes, that it is possible. Since such a project is not generally regarded as feasible, it is significant that we have come as far as we have. The properties we identified as successes above are important. Although our derivations lack much generality (see the “limitations” above), they play the role of an existence proof, a demonstration that a “realistic” theory is a possibility. And while we have not shown how to solve the IDE or derive Born’s rule (and may not even have the IDE right, due to uncertainty about the form of the superaction), it seems possible that this theory (or one similar to it) could be solved, and could be consistent with Born’s rule. The “realist program” as we have defined it could turn out to describe nature, and it would be valuable to pursue this approach to find out if it does.