Skip to main content
Log in

Stochastic Independence and Causal Connection

  • Original Article
  • Published:
Erkenntnis Aims and scope Submit manuscript

Abstract

Assumptions of stochastic independence are crucial to statistical models in science. Under what circumstances is it reasonable to suppose that two events are independent? When they are not causally or logically connected, so the standard story goes. But scientific models frequently treat causally dependent events as stochastically independent, raising the question whether there are kinds of causal connection that do not undermine stochastic independence. This paper provides one piece of an answer to this question, treating the simple case of two tossed coins with and without a midair collision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Science’s physical probability distributions are in the first instance over event types rather than event tokens; in what follows, I use the term “event” to refer to both event types and singular events as the context requires.

  2. A more careful statement of the independence assumption would exclude quantities logically related to velocity and temperature from the independence claim: kinetic theory does not, for example, assume that a molecule’s velocity is independent of its momentum. But there is no need to get delayed by such matters here.

  3. Pollock says “properties” rather than “events”. I remind you that by “event” I sometimes mean “event type”; with this understanding, I think that Pollock would allow my reformulation.

  4. Some philosophers deny the possibility of reducible physical probability. What they cannot deny is the existence of powerful scientific models that represent deterministically produced phenomena using probability distributions that incorporate sweeping independence judgments. The success of this practice must be explained even if reducible probability is in some metaphysical sense not “real” probability.

  5. The microconstancy of an evolution function is always relative to an outcome. It is also relative to a measure, that is, a way of quantifying a setup’s initial conditions. The correct measure to use in the present framework is the measure with respect to which the probability distribution over the setup’s initial conditions is stated. In what follows I take the initial condition distribution, and thus the measure, as given.

  6. For a guide to the older work, see Plato (1983), and for more recent work, Diaconis et al. (2007). Reichenbach (2008) and Hopf (1934) §9 briefly explore the relationship between microconstancy and independence, deriving a result much like that presented in the next section. For the differences between the approaches taken by the earlier writers and my own approach, see Strevens (2003) §2.A, (2011).

  7. In Strevens (2011) I endorse the second of these assumptions (in a back-handed way; see the end of §4), but I also show how to ground a probability for heads without the first assumption. In the present paper the first assumption is made for purely expository reasons.

  8. Here and in the later results concerning probabilistic independence the term “approximate” is called on to do some hedging: approximate macroperiodicity, approximate linearity, approximate “microlinearity” (Sect. 4), and so on. The proofs I cite and the informal arguments I give tend to assume that “approximate” means “negligible”. You might wonder about cases where deviations from the ideal, though small, are not negligible. Do such deviations have a tendency to get “blown up” to big deviations by, for example, the tossed coin’s sensitivity to initial conditions? They do not: the process by which the deviations are aggregated to produce the result in question (for example, a probability equal to a strike ratio) is in effect a weighted averaging; consequently, the total deviation is the weighted average of the individual deviations. That means that the total deviation cannot “blow up”, and indeed will typically be much smaller than the larger individual deviations.

  9. Or at least, probabilities that are almost equal.

  10. Here I assume also, of course, that the densities for individual trials are macroperiodic, so that the probabilities of outcomes of individual trials are equal to the outcomes’ strike ratios.

  11. Mathematically speaking, almost all macroperiodic joint densities represent some degree of correlation between the events whose probabilities they encode; Fig. 3 depicts one example.

  12. A reader asks: what if the joint initial condition distribution provides only marginal probabilities, that is (roughly) average probabilities for pairs of consecutive spin speeds? Then the independence result given above might hold but consecutive outcomes might not be independent if the single-case probabilities deviated from the averages. For example, it might be that the average probability of obtaining heads, given heads on the immediately preceding toss, is as independence requires one-half, but that in some particular case the single-case probability of heads immediately following heads is higher than one-half. (In some other particular cases it would have to be lower than one-half in order to fix the one-half value for the average.) The answer to this question is that, as with any mathematical argument, you get out what you put in. If you put in a marginal joint distribution, the argument above gets you independence of the outcomes’ marginal probabilities only. If you put in a distribution of single-case probabilities, you get independence of the outcomes’ single-case probabilities. If single-case probabilities exist for the outcomes, they or something equivalent presumably exist for the initial conditions [though see Strevens (2011)]; the argument above should then be applied to these single-case probabilities for the pairs of consecutive spin speeds—a sufficient condition for independence in that case being the macroperiodicity of those probabilities’ distribution. If single-case probabilities do not exist for the outcomes—if the only relevant probabilities are ensemble probabilities—then there can be a fact of the matter about independence only at the ensemble level; thus, it is unreasonable to request the demonstration of anything more.

  13. In the treatment of tossed coins in this section, I begin with probabilities for causally isolated coin tosses—coins that do not collide—and show that most collisions do not undermine independence. That method cannot be applied directly to the probabilities of population genetics, which are inherently extrinsic, and so do not exist in a causally isolated form. The way to demonstrate the independence of the population genetic probabilities is to build them out of probabilities for smaller causal steps that do come in an isolated form, as explained in Strevens (2003), chap. 4.

  14. This informal definition is imprecise: if the areas of constant ratio are of different sizes, for example, it does not specify whether strength is determined by taking the largest size, the mean size, or something else. But given the informal use to which I put the notion in what follows, there is no real benefit to precisifying—though a fuller treatment of independence would certainly do so.

  15. That the transformation of evolution functions is one-to-one is guaranteed by linearity but not by microlinearity. However, because the transformation in question is a one-to-one function of the inverse of the transformation that maps pre-collision speeds onto post-collision speeds, we can be sure that the sort of evolution function transformations we are talking about are one-to-one: if they were not, the spin speed transformation would not be well-defined, or perhaps I should say, would not be deterministic.

  16. Thanks to Jossi Berkovitz for raising this question. The loose generalization stated in this paragraph has its foundation in the results proved in Strevens (2003), §3.B7.

  17. For technical reasons, this created no problems in Strevens (2003).

  18. More technically, the microlinearity of a transformation can be defined as relative to a partition of its domain into connected sets: it is microlinear if it is approximately linear over any member of the partition. What the theorem above requires is microlinearity relative to a partition into sets of equal strike ratio.

  19. A non-zero value for \(c\) would be physically peculiar, but since it is easy to handle, I do not rule it out here.

  20. Two additional physically plausible assumptions are made in the course of the derivation (see “Appendix: Further Proofs”): \(a\) and \(g\) are positive and \(b\) and \(h\) have the same sign.

References

  • Diaconis, P., Holmes, S., & Montgomery, R. (2007). Dynamical bias in the coin toss. SIAM Review, 49, 211–235.

    Article  Google Scholar 

  • Harman, G. (1973). Thought. Princeton, NJ: Princeton University Press.

    Google Scholar 

  • Hopf, E. (1934). On causality, statistics and probability. Journal of Mathematics and Physics, 13, 51–102.

    Article  Google Scholar 

  • Keller, J. (1986). The probability of heads. American Mathematical Monthly, 93, 191–197.

    Article  Google Scholar 

  • Pollock, J. (2007). Probable probabilities. http://philsci-archive.pitt.edu/3340/.

  • Reichenbach, H. (2008). The concept of probability in the mathematical representation of reality. Chicago: Open Court. Reichenbach’s doctoral dissertation, originally published in 1916.

  • Strevens, M. (2003). Bigger than chaos: Understanding complexity through probability. Cambridge, MA: Harvard University Press.

    Google Scholar 

  • Strevens, M. (2005). How are the sciences of complex systems possible? Philosophy of Science, 72, 531–556.

    Article  Google Scholar 

  • Strevens, M. (2011). Probability out of determinism. In C. Beisbart & S. Hartmann (Eds.), Probabilities in physics. Oxford: Oxford University Press.

    Google Scholar 

  • Strevens, M. (2013). Tychomancy: Inferring probability from causal structure. Cambridge, MA: Harvard University Press.

    Book  Google Scholar 

  • von Plato, J. (1983). The method of arbitrary functions. British Journal for the Philosophy of Science, 34, 37–47.

    Article  Google Scholar 

Download references

Acknowledgments

For valuable comments, thanks to Jossi Berkovitz and the audience at the IHPST.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Strevens.

Appendix: Further Proofs

Appendix: Further Proofs

I will show that if a midair collision between two tossed coins effects a non-deflationary linear transformation of the spin speeds, then the equivalent transformation at the beginning of the tosses will also be non-deflationary, provided that some additional, physically plausible conditions hold. I also show that under the same conditions a deflationary mid-toss collision is equivalent to a beginning-of-toss transformation that is no more deflationary.

Some new notation will simplify the algebra. Define \(\hat{t}_{1}\) and \(\hat{t}_2\) as the proportions of the total spin time that elapse before and after the collision, so that

$$\begin{aligned} \hat{t}_{1} = t_{1}/T \quad {\text {and}}\quad \hat{t}_2 = t_2/T.\end{aligned}$$

Then the coefficients \(d\) and \(e\) of the beginning-of-the-toss transformation of \(u\) equivalent to the actual interaction can be written

$$\begin{aligned} d&= a\hat{t}_2 + \hat{t}_{1} \\ e&= b\hat{t}_2 \text {.}\end{aligned}$$

The same notation can be used for the coefficients of the beginning-of-the-toss transformation of \(v\). If the actual linear transformation of \(v\) effected by the collision is:

$$\begin{aligned} v^{\prime} = gv + hu + i \end{aligned}$$

then the beginning-of-toss transformation of \(v\) equivalent to the actual interaction is

$$\begin{aligned} (g\hat{t}_2 + \hat{t}_{1})v + h\hat{t}_2 u + i{\hat{t}_2} {\text {.}}\end{aligned}$$

Consider non-deflationary collision dynamics first. The beginning-of-the-toss transformation—the transformation on the \(u\,\times\,v \) space—is non-deflationary just in case the absolute value of its determinant is greater than or equal to 1. Let me simplify matters by assuming that the determinant is positive; non-deflation in that case requires that the determinant is greater than or equal to 1 (otherwise, run the following arguments with strategically placed negation operators).

The beginning-of-toss transformation’s determinant is

$$\begin{aligned} \det \left(\begin{array}{cc} a\hat{t}_2 + \hat{t}_{1} & b\hat{t}_2 \\ h\hat{t}_2 & g\hat{t}_2 + \hat{t}_{1} \end{array}\right)&= ag(\hat{t}_2)^{2}+ a\hat{t}_{1} \hat{t}_2 + g\hat{t}_{1} \hat{t}_2 + (\hat{t}_{1})^{2}- bh(\hat{t}_2)^{2}\\&= (ag - bh)(\hat{t}_2)^{2}+ \frac{(a + g)}{2}2\hat{t}_{1} \hat{t}_2 + (\hat{t}_{1})^{2}\text {.}\end{aligned}$$

Since \((\hat{t}_2)^{2}+ 2\hat{t}_{1} \hat{t}_2 + (\hat{t}_{1})^{2}= (\hat{t}_{1} + \hat{t}_2)^{2}= 1\), a sufficient condition for the determinant to be greater than or equal to 1 is that the coefficients of \((\hat{t}_2)^{2}\) and \(2\hat{t}_{1} \hat{t}_2 \) in the above expression be greater than or equal to 1, that is, that

  1. 1.

    \(ag - bh \ge 1\), and

  2. 2.

    \(a + g \ge 2\).

The quantity \(ag - bh\) is the determinant of the actual transformation, that is, the transformation actually effected by the midair collision partway through the toss; it is greater than or equal to 1 just in case the original transformation is non-deflationary.

If it is assumed that \(a\) and \(g\) are positive (i.e., a coin’s pre-interaction spin speed makes a positive rather than a negative contribution to its post-interaction spin speed) and that (because of physical symmetry) the signs of \(b\) and \(h\) are the same, then (1) entails that \(ag \ge 1\), which in turn entails (2). Under these assumptions, then, if the interaction transformation partway through the toss is non-deflationary, its equivalent at the beginning of the toss is also non-deflationary.

What if the mid-toss collision transformation is deflationary? Its degree of deflation is proportional to its determinant \(d\), the absolute value of which will be less than one. Assume as before for expository purposes that \(d\) is positive. Then it can be shown that under the same conditions assumed in the previous paragraph, the determinant of the equivalent beginning-of-toss transformation is no less than \(d\), so that the equivalent beginning-of-toss transformation is no more deflationary than the mid-toss transformation. Proof: A sufficient condition for the beginning-of-toss determinant to be greater than or equal to \(d\) can be read off the argument above (using all variable names in the same way):

  1. 1.

    \(ag - bh \ge d \), and

  2. 2.

    \(a + g \ge 2d \).

Since \(d\) is by definition equal to \(ag - bh\), condition (1) is always satisfied. Supposing, as before, that \(a\) and \(g\) are positive and that \(b\) and \(h\) have the same sign so that \(bh\) is positive, condition (2) is also satisfied (reasoning omitted).

Next consider the case where the coin’s spin speed changes significantly over time. As in the main text, I assume that the speed at any time is a linear function of the initial speed: there exists a function \(f(t)\) (not necessarily linear) such that, after time \(t\), the speed of a coin with initial speed \(u\) will be \(f(t) u \). Redefine \(\hat{t}_{1}\) and \(\hat{t}_2\) so that

$$\begin{aligned} \hat{t}_{1} = F(t_{1}) \big / F(T) \quad {\text {and} }\quad {\hat{t}}_2 = F(t_2) \big / F(T) .\end{aligned}$$

where \(F(t) = \int _{0}^{t}f(x) \,dx \). Then the relevant beginning-of-the-toss transformation can be written just as I have written it above.

It can no longer be assumed that \(\hat{t}_{1} + \hat{t}_2 = 1\). However, if as I suggested in the main text \(f(t) \) is a non-increasing function then as demonstrated below, \(\hat{t}_{1} + \hat{t}_2 \ge 1\), which is sufficient for the above reasoning to apply to the more general case, so that if the other conditions stated above apply, then (a) if the actual collision transformation is non-deflationary, the equivalent beginning-of-the-toss transformation is also non-deflationary, and (b) if the actual collision transformation is deflationary, the equivalent beginning-of-the-toss transformation is no more deflationary.

To show that \(\hat{t}_{1} + \hat{t}_2 \ge 1\): because \(f(t)\) is non-increasing, for \(b>0\)

$$\begin{aligned} \int _{0}^{a}f(t) \,dt \ge \int _{b}^{b+a}f(t) \,dt \text {.}\end{aligned}$$

It follows from the definition of \(F(t)\) that

$$\begin{aligned} \hat{t}_{1} + \hat{t}_2&= \frac{F(t_{1}) + F(t_2)}{F(T)}\\&= \frac{\int _{0}^{t_{1}}f(t) \,dt + \int _{0}^{t_2}f(t) \,dt }{\int _{0}^{T}f(t) \,dt }\\&\ge \frac{\int _{0}^{t_{1}}f(t) \,dt + \int _{t_{1}}^{T}f(t) \,dt }{\int _{0}^{T}f(t) \,dt }\\&\ge 1\text {.}\end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Strevens, M. Stochastic Independence and Causal Connection. Erkenn 80 (Suppl 3), 605–627 (2015). https://doi.org/10.1007/s10670-014-9682-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10670-014-9682-1

Keywords

Navigation