Abstract
Path-dependence offers a promising way of understanding the role historicity plays in explanation, namely, how the past states of a process can matter in the explanation of a given outcome. The two main existing accounts of path-dependence have sought to present it either in terms of dynamic landscapes or branching trees. However, the notions of landscape and tree both have serious limitations and have been criticized. The framework of causal networks is both more fundamental and more general that that of landscapes and trees. Within this framework, I propose that historicity in networks should be understood as symmetry breaking. History matters when an asymmetric bias towards an outcome emerges in a causal network. This permits a quantitative measure for how path-dependence can occur in degrees, and offers suggestive insights into how historicity is intertwined both with causal structure and complexity.
Similar content being viewed by others
Notes
In this paper, I take path-dependence to be a property of an explanation or a representation of a process, not a property of the process itself. A process is not identical with the representation of it, and epistemological problems arise because of this disjunction; however, this will not be a concern for the purposes of this paper. In the interests of brevity, I will often refer to representations of processes simply as ‘processes’.
This is partially why historical explanations do not fit the mould of deductive (or even inductive) explanations. The explanandum cannot be deduced from a general principle, or inductively inferred with high probability, but maintains some degree of ‘contingency’.
Many processes in statistical physics and the special sciences are modelled as probabilistic, even though the underlying causal processes may be deterministic. See also footnote 1.
We will see later on that unpredictability contingency is not sufficient either: some probabilistic processes are ahistorical.
In brief, a function is linear when \(f(x+y) = f(x)+f(y)\); thus when a function is nonlinear a slight change in input will lead to an effect that is not linearly proportionate, and could potentially be very large. When a function is discontinuous, some modifications of the input, no matter how slight, will lead to relatively large effects. If a process is nonlinear but continuous, small changes will still lead to small effects; however, in a discontinuous process, some changes, no matter how small, will lead to lead to large effects, even if the process is otherwise linear.
See also Fig. 6.
Effective elimination is what Bassanini and Dosi (1999, p. 15) call asymptotic path-independence, which occurs when two possible trajectories come arbitrarily close within a finite time-span, and for an infinite number of times thereafter. (If the dynamics is Markovian, then this condition reduces to the following: two possible trajectories intersect in finite time, because once there is a single intersection, it is expected that the paths will overlap for all subsequent times.) If this condition is met, then the difference an initial condition makes on a subsequent history is eliminated in finite time. In this way, weak path-independence is a form of ergodicity.
Compare with Doeblin’s theorem in the theory of Markov processes (e.g. Stroock 2005).
Compare this with the analysis of conservative vector fields: if a dynamics can be represented as the gradient of a scalar, then it is path-independent.
For a more mathematical characterization, see Desjardins (2011a).
Also, it can be shown that maximal divergence is, perhaps surprisingly, a case of maximal path-independence (see Fig. 8).
Another implication is that while history may matter for the occurrence of some intermediary state, it is impossible for history to matter for an outcome at some time in the past but not ultimately (compare with Desjardins 2015).
In this way, while mass extinctions introduce contingency into evolution (as famously emphasized by Gould 1989), to the extent that they make the reconstruction of the past more difficult, they actually remove some degree of historicity.
By Bayes’ rule, \(P(s_1|o_2) = \frac{P(o_2|s_1)P(s_1)}{P(o_2)} = \frac{2/3 \cdot 1/3}{4/9} = 1/2\).
The analysis given in Brown (2014) can be seen as dealing with this causal structure.
For the derivation, see e.g. Cover and Thomas (2006), Chapt. 2.
See Cover and Thomas (2006).
The technical expression is that mutual information is the expectation, given S, of the Kullback-Leibler divergence between the distribution p(O) and the conditional distribution p(O|S):
$$\begin{aligned} I(O;S) = \mathbb {E}_S \left[ D_{KL}(p(o|s) || p(o) \right] . \end{aligned}$$This is simply a quantitative expression of how much the conditional probability distribution is expected to diverge from the unconditional distribution, ‘from the perspective’ of some time in the past.
See Sober and Steel (2011) for a related analysis of entropy change in Markov models. Since causal networks are Markovian, many of their results would also be applicable here.
“C’est la dissymétrie qui crée le phénomène.” (Curie 1894, 400).
Or, if one adheres to the Many-Worlds Interpretation, both these paths are realized, but in different parallel universes.
Whether the mathematization corresponds to causal reality is a different question. This is related to the debate whether drift and natural selection actually pick out causal forces in reality, or are just a statistical abstraction from the high-dimensional state space (see e.g. Matthen 2009 for the relation between abstraction and natural selection).
References
Arthur, B. W. (1994). Increasing returns and path dependence in the economy. Ann Arbor: University of Michigan Press.
Bassanini, A, & Dosi, G. (1999). When and how chance and human will can twist the arms of Clio. LEM Working Paper series 05, Sant’Anna School of Advanced Studies, Pisa.
Beatty, J. (1995). The evolutionary contingency thesis. In G. Wolters & J. G. Lennox (Eds.), Concepts, theories and rationality in the biological sciences (pp. 45–81). Pittsburgh: University of Pittsburgh Press.
Beatty, J. (2006). Replaying life’s tape. Journal of Philosophy, 103, 336–362.
Beatty, J., & Desjardins, E. (2009). Natural selection and history. Biology and Philosophy, 24, 231–246. doi:10.1007/s10539-008-9149-3.
Ben-Menahem, Y. (1997). Historical contingency. Ratio, 10(2), 99–107. doi:10.1111/1467-9329.00032.
Brading, K., & Castellani, E. (Eds.). (2003). Symmetries in physics: philosophical reflections. Cambridge: Cambridge University Press.
Brading, K., & Castellani, E. (2007). Symmetries and invariances in classical physics. In J. Butterfield & J. Earman (Eds.), Handbook of the Philosophy of Science. Philosophy of Physics (pp. 1331–1367). Amsterdam: North Holland, Elsevier.
Brown, R. (2014). What evolvability really is. British Journal for the Philosophy of Science, 65, 549–572. doi:10.1093/bjps/axt014.
Cover, T. M., & Thomas, J. A. (2006). Elements of information theory. Hoboken, NJ: Wiley.
Curie, P. (1894). Sur la symétrie dans les phénomènes physiques, symétrie d’un champ électrique et d’un champ magnétique. Journal de Physique Théorique et Appliquée, 3(1), 393–415.
David, P. A. (1985). Clio and the economics of QWERTY. American Economic Review, 75, 332–337.
Desjardins, E. (2011a). Historicity and experimental evolution. Biology and Philosophy, 26, 339–364.
Desjardins, E. (2011b). Reflections on path dependence and irreversibility: Lessons from evolutionary biology. Philosophy of Science, 78, 724–738.
Desjardins, E. (2015). Historicity and ecological restoration. Biology and Philosophy, 30, 77–98. doi:10.1007/s10539-014-9467-6.
Earman, J. (2004). Curie’s principle and spontaneous symmetry breaking. International Studies in the Philosophy of Science, 18, 173–198. doi:10.1080/0269859042000311299.
Ereshefsky, M. (2012). Homology thinking. Biology and Philosophy, 27, 381–400. doi:10.1007/s10539-012-9313-7.
Gavrilets, S. (2004). Fitness landscapes and the origin of species. Princeton: Princeton University Press.
Gould, S. J. (1989). Wonderful life: The burgess shale and the nature of history. New York: W. W. Norton & Company Ltd.
Kaplan, J. (2008). The end of the adaptive landscape metaphor? Biology and Philosophy, 23, 625–638.
Longo, G., & Montévil, M. (2011). From physics to biology by extending criticality and symmetry breakings. Progress in Biophysics and Molecular Biology, 106(2), 340–347.
MacKay, D. J. C. (2003). Information theory, inference, and learning algorithms. New York: Cambridge University Press.
Matthen, M. (2009). Drift and ‘statistically abstractive explanation’. Philosophy of Science, 76, 464–487.
Moret, B. M. E., Nakhleh, L., Warnow, T., Linder, C. R., Tholse, A., Padolina, A., et al. (2004). Phylogenetic networks: Modeling, reconstructibility, and accuracy. IEEE Transactions on Computational Biology and Informatics, 1(1), 13–23.
Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge: Cambridge University Press.
Pierson, P. (2004). Politics in time: History, institutions, and social analysis. Princeton: Princeton University Press.
Pigliucci, M., & Kaplan, J. (2006). Making sense of evolution. Chicago: University of Chicago press.
Plutynski, A. (2008). The rise and fall of the adaptive landscape? Biology and Philosophy, 23, 605–623. doi:10.1007/s10539-008-9128-8.
Powell, R. (2012). Convergent evolution and the limits of natural selection. European Journal for the Philosophy of Science, 2, 355–373. doi:10.1007/s13194-012-0047-9.
Ruse, M. (1996). Are pictures really necessary? The case of Sewall Wright’s ‘adaptive landscapes’. In B. S. Baigrie (Ed.), Picturing knowledge: Historical and philosophical problems concerning the use of art in science (pp. 303–337). Toronto: University of Toronto Press.
Skipper, R. A. (2004). The heuristic role of Sewall Wright’s 1932 adaptive landscape diagram. Philosophy of Science, 71, 1176–1188.
Sober, E. (1983). Equilibrium explanation. Philosophical Studies, 43, 201–210.
Sober, E. (1988). Reconstructing the past: Parsimony, evolution, and inference. Cambridge, MA: MIT Press.
Sober, E., & Steel, M. (2011). Entropy increase and information loss in Markov models of evolution. Biology and Philosophy, 26, 223–250. doi:10.1007/s10539-010-9239-x.
Strevens, M. (2006). Bigger than Chaos: Understanding complexity through probability. Cambridge, MA: Harvard University Press.
Stroock, D. W. (2005). An introduction to Markov processes. New York: Springer.
Szathmáry, E. (2006). Path dependence and historical contingency in biology. In A. Wimmer & R. Kössler (Eds.), Understanding change: Models, methodologies, and metaphors (pp. 140–157). New York: Palgrave Macmillan.
Velasco, J. D., & Sober, E. (2010). Testing for treeness: Lateral gene transfer, phylogenetic inference, and model selection. Biology and Philosopy, 25, 675–687. doi:10.1007/s10539-010-9222-6.
Wilkins, J. F., & Godfrey-Smith, P. (2009). Adaptationism and the adaptive landscape. Biology and Philosophy, 24, 199–214. doi:10.1007/s10539-008-9147-5.
Young, N. M., & Hallgrímsson, B. (2005). Serial homology and the evolution of mammalian limb covariation structure. Evolution, 59(12), 2691–2704.
Acknowledgments
I wish to thank Andreas De Block, Grant Ramsey and Michael Strevens for interesting discussions about related topics. Support for this research was generously provided by Research Foundation Flanders (FWO).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Theorem 1
A coarse-graining of the explanandum makes an explanation increasingly convergent and a coarse-graining of the explanans makes an explanantion increasingly divergent.
Proof
We will prove it for an explanation that is purely parallel, thus neither convergent nor divergent. The generalization for a random explanation holds analogously.
Assume a deterministic explanation (O, I, f), so that f is a bijection \(f:I\rightarrow O\). Define an equivalence relation \(\sim \) on O such that \(o_1 \sim o_2\) iff \(o_1, o_2 \in A\) for some A (dependent on theoretical interests) with \(\#A >1\). Because f is a bijection there exists a uniquely defined \(B \in I\) such that \(f(B) = A\). Call B the ‘basin’ and A the ‘attractor’ of f on I.
Then O / A represents a coarse-graining of the explanandum and I / B a coarse-graining of the explanans. So define an associated function \(R_c: I \rightarrow \#O/A: i \mapsto f(i)\) and relation \(R_d \in I/B \times O = {(f^{-1}(o),o)|o\in O}\). Because f is a bijection, \(\#I = \#O > \# O/A\) and \(\#O = \#I > \# I/B\), and hence \(R_c\) will be a non-injective surjection, and \(R_d\) a non-function. Hence the number convergent structures has increased in explanation \((O/A, I, R_c)\), and the number of divergent structures has increased in \((O,I/B,R_d)\). \(\square \)
Theorem 2
Let (O, I, R) be symmetrical at some instant in time. Then (O, I, R) is symmetric at all prior instants.
Proof
Assume (O, I, R) is symmetric at time t, corresponding to the set of intermediate states S. Let \(S'\) represent some earlier generation of states. From the local symmetry of (O, I, R) at S we can deduce that \(P(o|s^*) = p \in [0,1]\) for all \(s^* \in S\).
Take a random predecessor state \(s' \in S'\). Assume it branches out to a number of states \(s^* \in S\). Then
since the sum of the probabilities of all paths leaving \(s'\) is 1. Thus the network is symmetric at \(S'\).
This also means that the bias p towards outcome o is preserved as long as the network remains symmetric. \(\square \)
Rights and permissions
About this article
Cite this article
Desmond, H. Symmetry breaking and the emergence of path-dependence. Synthese 194, 4101–4131 (2017). https://doi.org/10.1007/s11229-016-1130-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11229-016-1130-0