I evaluate Schurz's proposed meta-inductive justification of induction, a refinement of Reichenbach's pragmatic justification that rests on results from the machine learning branch of prediction with expert advice. My conclusion is that the argument, suitably explicated, comes remarkably close to its grand aim: an actual justification of induction. This finding, however, is subject to two main qualifications, and still disregards one important challenge. The first qualification concerns the empirical success of induction. Even though, I argue, Schurz's argument does not need (...) to spell out what inductive method actually consists in, it does need to postulate that there is something like the inductive or scientific prediction strategy that has so far been significantly more successful than alternative approaches. The second qualification concerns the difference between having a justification for inductive method and for sticking with induction for now. Schurz's argument can only provide the latter. Finally, the remaining challenge concerns the pool of alternative strategies, and the relevant notion of a meta-inductivist's optimality that features in the analytic step of Schurz's argument. Building on the work done here, I will argue in a follow-up paper that the argument needs a stronger dynamic notion of a meta-inductivist's optimality. (shrink)
This article poses a challenge to Schurz’s proposed metainductive justification of induction. It is argued that Schurz’s argument requires a notion of optimality that can deal with an expanding pool of prediction strategies.
The no-free-lunch theorems promote a skeptical conclusion that all possible machine learning algorithms equally lack justification. But how could this leave room for a learning theory, that shows that some algorithms are better than others? Drawing parallels to the philosophy of induction, we point out that the no-free-lunch results presuppose a conception of learning algorithms as purely data-driven. On this conception, every algorithm must have an inherent inductive bias, that wants justification. We argue that many standard learning algorithms should rather (...) be understood as model-dependent: in each application they also require for input a model, representing a bias. Generic algorithms themselves, they can be given a model-relative justification. (shrink)
Douven (in press) observes that Schurz's meta-inductive justification of induction cannot explain the great empirical success of induction, and offers an explanation based on computer simulations of the social and evolutionary development of our inductive practices. In this paper, I argue that Douven's account does not address the explanatory question that Schurz's argument leaves open, and that the assumption of the environment's induction-friendliness that is inherent to Douven's simulations is not justified by Schurz's argument.
In this thesis I investigate the theoretical possibility of a universal method of prediction. A prediction method is universal if it is always able to learn from data: if it is always able to extrapolate given data about past observations to maximally successful predictions about future observations. The context of this investigation is the broader philosophical question into the possibility of a formal specification of inductive or scientific reasoning, a question that also relates to modern-day speculation about a fully automatized (...) data-driven science. I investigate, in particular, a proposed definition of a universal prediction method that goes back to Solomonoff and Levin. This definition marks the birth of the theory of Kolmogorov complexity, and has a direct line to the information-theoretic approach in modern machine learning. Solomonoff's work was inspired by Carnap's program of inductive logic, and the more precise definition due to Levin can be seen as an explicit attempt to escape the diagonal argument that Putnam famously launched against the feasibility of Carnap's program. The Solomonoff-Levin definition essentially aims at a mixture of all possible prediction algorithms. An alternative interpretation is that the definition formalizes the idea that learning from data is equivalent to compressing data. In this guise, the definition is often presented as an implementation and even as a justification of Occam's razor, the principle that we should look for simple explanations. The conclusions of my investigation are negative. I show that the Solomonoff-Levin definition fails to unite two necessary conditions to count as a universal prediction method, as turns out be entailed by Putnam's original argument after all; and I argue that this indeed shows that no definition can. Moreover, I show that the suggested justification of Occam's razor does not work, and I argue that the relevant notion of simplicity as compressibility is already problematic itself. (shrink)
Putnam construed the aim of Carnap’s program of inductive logic as the specification of a “universal learning machine,” and presented a diagonal proof against the very possibility of such a thing. Yet the ideas of Solomonoff and Levin lead to a mathematical foundation of precisely those aspects of Carnap’s program that Putnam took issue with, and in particular, resurrect the notion of a universal mechanical rule for induction. In this paper, I take up the question whether the Solomonoff–Levin proposal is (...) successful in this respect. I expose the general strategy to evade Putnam’s argument, leading to a broader discussion of the outer limits of mechanized induction. I argue that this strategy ultimately still succumbs to diagonalization, reinforcing Putnam’s impossibility claim. (shrink)
Algorithmic information theory gives an idealized notion of compressibility that is often presented as an objective measure of simplicity. It is suggested at times that Solomonoff prediction, or algorithmic information theory in a predictive setting, can deliver an argument to justify Occam’s razor. This article explicates the relevant argument and, by converting it into a Bayesian framework, reveals why it has no such justificatory force. The supposed simplicity concept is better perceived as a specific inductive assumption, the assumption of effectiveness. (...) It is this assumption that is the characterizing element of Solomonoff prediction and wherein its philosophical interest lies. (shrink)
An aspect of Peirce’s thought that may still be underappreciated is his resistance to what Levi calls _pedigree epistemology_, to the idea that a central focus in epistemology should be the justification of current beliefs. Somewhat more widely appreciated is his rejection of the subjective view of probability. We argue that Peirce’s criticisms of subjectivism, to the extent they grant such a conception of probability is viable at all, revert back to pedigree epistemology. A thoroughgoing rejection of pedigree in the (...) context of probabilistic epistemology, however, _does_ challenge prominent subjectivist responses to the problem of the priors. (shrink)
Wenmackers and Romeijn (2016) formalize ideas going back to Shimony (1970) and Putnam (1963) into an open-minded Bayesian inductive logic, that can dynamically incorporate statistical hypotheses proposed in the course of the learning process. In this paper, we show that Wenmackers and Romeijn’s proposal does not preserve the classical Bayesian consistency guarantee of merger with the true hypothesis. We diagnose the problem, and offer a forward-looking open-minded Bayesians that does preserve a version of this guarantee.
We study computable PAC (CPAC) learning as introduced by Agarwal et al. (2020). First, we consider the main open question of finding characterizations of proper and improper CPAC learning. We give a characterization of a closely related notion of strong CPAC learning, and provide a negative answer to the COLT open problem posed by Agarwal et al. (2021) whether all decidably representable VC classes are improperly CPAC learnable. Second, we consider undecidability of (computable) PAC learnability. We give a simple general (...) argument to exhibit such ndecidability, and initiate a study of the arithmetical complexity of learnability. We briefly discuss the relation to the undecidability result of Ben-David et al. (2019), that motivated the work of Agarwal et al. (shrink)
An a priori semimeasure (also known as “algorithmic probability” or “the Solomonoff prior” in the context of inductive inference) is defined as the transformation, by a given universal monotone Turing machine, of the uniform measure on the infinite strings. It is shown in this paper that the class of a priori semimeasures can equivalently be defined as the class of transformations, by all compatible universal monotone Turing machines, of any continuous computable measure in place of the uniform measure. Some consideration (...) is given to possible implications for the association of algorithmic probability with certain foundational principles of statistics. (shrink)