Skip to main content
Log in

The Implications of the No-Free-Lunch Theorems for Meta-induction

  • Article
  • Published:
Journal for General Philosophy of Science Aims and scope Submit manuscript

There is only limited value in knowledge derived from experience. The knowledge imposes a pattern, and falsifies, for the pattern is new in every moment.

T.S. Eliot.

Abstract

The important recent book by Schurz (2019) appreciates that the no-free-lunch theorems (NFL) have major implications for the problem of (meta) induction. Here I review the NFL theorems, emphasizing that they do not only concern the case where there is a uniform prior—they prove that there are “as many priors” (loosely speaking) for which any induction algorithm A out-generalizes some induction algorithm B as vice-versa. Importantly though, in addition to the NFL theorems, there are many free lunch theorems. In particular, the NFL theorems can only be used to compare the expected performance of an induction algorithm A, considered in isolation, with the expected performance of an induction algorithm B, considered in isolation. There is a rich set of free lunches which instead concern the statistical correlations among the generalization errors of induction algorithms. As I describe, the meta-induction algorithms that Schurz advocates as a “solution to Hume’s problem” are simply examples of such a free lunch based on correlations among the generalization errors of induction algorithms. I end by pointing out that the prior that Schurz advocates, which is uniform over bit frequencies rather than bit patterns, is contradicted by thousands of experiments in statistical physics and by the great success of the maximum entropy procedure in inductive inference.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. To see this relationship, note that cross-validation chooses among a set of learning algorithms (rather than theories), and does so according to which of those performs best at out-of-sample prediction (evaluating that performance by forming “folds” of the single provided data set).

  2. As an historical aside, it’s interesting to note that Parrondo went on to make some of the seminal contributions to stochastic thermodynamics and non-equilibrium statistical physics (Parrondo 2015).

  3. The interested reader is directed to Wolpert (1995), Adam et al. (2019) for further-ranging discourse on how to integrate Bayesian and non-Bayesian statistics into an overarching probabilistic model of induction.

  4. The interested reader is directed to Adam et al. (2019) for further discussion reconciling the NFL theorems and computational learning theory.

  5. It is also true if we condition on a particular one of the two allowed f’s, as in sampling theory statistics, in which case the prior is irrelevant, and NFL does not apply.

References

  • Adam, S. P., Alexandropoulos, S. A. N., Pardalos, P. M., & Vrahatis, M. N. (2019). No free lunch theorem: A review. In Approximation and Optimization (pp. 57–82). Springer.

  • Breiman, L. (1996). Stacked regression. Machine Learning, 24.

  • Cesa-Bianchi, N., Long, P. M., & Warmuth, M. K. (1996). Worst-case quadratic loss bounds for prediction using linear functions and gradient descent. IEEE Transactions on Neural Networks, 7(3), 604–619.

  • Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D. P., Schapire, R. E., & Warmuth, M. K. (1997). How to use expert advice. Journal of the ACM (JACM), 44(3), 427–485.

  • Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. Cambridge University Press.

  • Christopher, M. (2006). Bishop pattern recognition and machine learning. Springer.

  • Clarke, B. (2003). Bayes model averaging and stacking when model approximation error cannot be ignored. Journal of Machine Learning Research, 683–712.

  • Ghasemian, A., Hosseinmardi, H., Galstyan, A., Airoldi, E. M., & Clauset, A. (2020). Stacking models for nearly optimal link prediction in complex networks. Proceedings of the National Academy of Sciences, 117(38), 23393–23400.

    Article  Google Scholar 

  • Guimerà, R. (2020). One model to rule them all in network science? Proceedings of the National Academy of Sciences, 117 (41), 25195–25197.

    Article  Google Scholar 

  • Hans, R., et al. (1938). Experience and prediction: An analysis of the foundations and the structure of knowledge.

  • Harmer, G. P., & Abbott, D. (1999). Losing strategies can win by parrondo’s paradox. Nature, 402(6764), 864–864.

    Article  Google Scholar 

  • Hume, D. (2003). A treatise of human nature. Courier Corporation.

  • Jaynes, E. T., & Bretthorst, G. L. (2003). Probability theory: The logic of science. Cambridge University Press.

  • Jaynes, E. T. (1968). Prior probabilities. IEEE Transactions on systems science and cybernetics, 4(3), 227–241.

    Article  Google Scholar 

  • Kohavi, R., & Wolpert, D. H. (1996). Bias plus variance decomposition for zero-one loss functions. In ICML (Vol. 96, pp. 275–283).

  • Kroese, D. P., Botev, Z., Taimre, T., & Vaisman, R. (2019). Data science and machine learning: mathematical and statistical methods. CRC Press.

  • Parrondo, J. M. R., Horowitz, J. M., & Sagawa, T. (2015). Thermodynamics of information. Nature Physics, 11(2), 131–139.

  • Parrondo, J. M. R. & Español, P. (1996). Criticism of feynman’s analysis of the ratchet as an engine. American Journal of Physics, 64 (9), 1125–1130.

    Article  Google Scholar 

  • Peel, L., Larremore, D. B., & Clauset, A. (2017). The ground truth about metadata and community detection in networks. Science Advances, 3(5), e1602548.

  • Rubinstein, R. Y., & Kroese, D. P. (2016). Simulation and the Monte Carlo method (Vol. 10). Wiley.

  • Schurz, G. (2019). Hume’s problem solved: The optimality of meta-induction. MIT Press.

  • Smyth, P., & Wolpert, D. (1999). Linearly combining density estimators via stacking. Machine Learning, 36(1–2), 59–83.

    Article  Google Scholar 

  • Tom, F. (2019). Sterkenburg. The meta-inductive justification of induction. Episteme.

  • Tracey, B., Wolpert, D., & Alonso, J. J. (2013). Using supervised learning to improve monte carlo integral estimation. AIAA Journal, 51(8), 2015–2023.

    Article  Google Scholar 

  • Wolpert, D. H. (1995). The relationship between PAC, the statistical physics framework, the Bayesian framework, and the VC framework. In The mathematics of generalization (pp. 117–215). Addison-Wesley.

  • Wolpert, D. H. (1990). The relationship between occam’s razor and convergent guessing. Complex Systems, 4, 319–368.

    Google Scholar 

  • Wolpert, D. H. (1996a). The lack of a prior distinctions between learning algorithms. Neural Computation, 8(1341–1390), 1391–1421.

    Article  Google Scholar 

  • Wolpert, D. H. (1996b). The existence of a priori distinctions between learning algorithms. Neural Computation, 8, 1391–1420.

    Article  Google Scholar 

  • Wolpert, D. H. (1997). On bias plus variance. Neural Computation, 9, 1211–1244.

    Article  Google Scholar 

  • Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1 (1), 67–82.

    Article  Google Scholar 

  • Wolpert, D. H., & Macready, W. (2005). Coevolutionary free lunches. Transactions on Evolutionary Computation, 9, 721–735.

    Article  Google Scholar 

  • Wolpert, D., & Rajnarayan, D. (2013). Using machine learning to improve stochastic optimization. In Workshops at the twenty-seventh AAAI conference on artificial intelligence.

  • Yao, Y., Vehtari, A., Simpson, D., & Gelman, A. (2018). Using stacking to average bayesian predictive distributions (with discussion). Bayesian Analysis, 13(3), 917–1007.

    Article  Google Scholar 

Download references

Acknowledgements

I would like to thank the Santa Fe Institute for support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David H. Wolpert.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wolpert, D.H. The Implications of the No-Free-Lunch Theorems for Meta-induction. J Gen Philos Sci 54, 421–432 (2023). https://doi.org/10.1007/s10838-022-09609-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10838-022-09609-2

Navigation