Skip to main content
Log in

Explanatory and Creative Alternatives to the MDL priciple

  • Published:
Foundations of Science Aims and scope Submit manuscript

Abstract

The Minimum Description Length (MDL) principle is the modernformalisation of Occam's razor. It has been extensively and successfullyused in machine learning (ML), especially for noisy and long sources ofdata. However, the MDL principle presents some paradoxes andinconveniences. After discussing all these, we address two of the mostrelevant: lack of explanation and lack of creativity. We present newalternatives to address these problems. The first one, intensionalcomplexity, avoids extensional parts in a description, so distributingcompression ratio in a more even way than the MDL principle. The secondone, information gain, forces that the hypothesis is informative (orcomputationally hard to discover) wrt. the evidence, so giving a formaldefinition of what is to discover.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  • Angluin, D.: 1988, Queries and Concept Learning. Machine Learning 2(4): 319–342.

    Google Scholar 

  • Barker, S.F.: 1957, Induction and Hypothesis. Ithaca.

  • Bar-Hillel, Y. and R. Carnap: 1953, Semantic Information. British J. for the Philosophy of Science 4: 147–157.

    Article  Google Scholar 

  • Barron, A., J. Rissanen and B. Yu: 1998, TheMinimum Description Length Principle in Coding andModeling. IEEE Transactions on Information Theory 44(6): 2743–2760.

    Article  Google Scholar 

  • Blum, M.: 1967, A Machine-Independent Theory of the Complexity of Recursive functions, J. ACM 14(4): 322–326.

    Article  Google Scholar 

  • Blum, L. and M. Blum: 1975, Towards a Mathematical Theory of Inductive Inference. Inform. and Control 28: 125–155.

    Article  Google Scholar 

  • Blumer, A., A. Ehrenfeucht, D. Haussler and M. Warmuth: 1989, Learnability and the Vapnik-Chervonenkis Dimension. Journal of ACM 36: 929–965.

    Article  Google Scholar 

  • Board, R. and L. Pitt: 1990, On the Necessity of Occam Algorithms, in Proc., 22nd ACM Symp. Theory of Comp.

  • Bosch, van den: 1994, Simplicity and Prediction, Master Thesis, dep. of Science, Logic & Epistemology of the Faculty of Philosophy at the Univ. of Groningen.

  • Case J. and C. Smith: 1983, Comparison of Identification Criteria for Machine Inductive Inference. Theoret. Comput. Sci. 25: 193–220.

    Article  Google Scholar 

  • Cheeseman, P.: 1990, On Finding the Most Probable Model. In J. Shrager and P. Langley (eds.), Computational Models of Scientific Discovery and Theory Formation. Morgan Kaufmann.

  • Conklin, D. and I.H. Witten: 1994, Complexity-Based Induction. Machine Learning 16: 203–225.

    Google Scholar 

  • Derthick, M.: 1990, The Minimum Description Length Principle Applied to Feature Learning and Analogical Mapping, MCC Tech. Rep. no. ACT-CYC-234-90.

  • Ernis, R.: 1968, Enumerative Induction and Best Explanation. J. Philosophy LXV(18): 523–529.

    Google Scholar 

  • Freivalds, R., E. Kinber and C.H. Smith: 1995, On the Intrinsic Complexity of Learning. Information and Control 123: 64–71.

    Google Scholar 

  • Gold, E.M.: 1967, Language Identification in the Limit. Information & Control 10: 447–474.

    Article  Google Scholar 

  • Grünwald, P.: 1999, Model Selection Based on Minimum Description Length, submitted to Journal of Mathematical Psychology. Amsterdam: CWI.

    Google Scholar 

  • Gull, S.F.: 1988, Bayesian Inductive Inference and Maximum Entropy. In G.J. Erickson and C.R. Smith (eds.), Maximum Entropy and Bayesian Methods in Science and Engineering Vol. 1 Foundations. Dordrecht: Kluwer, 53–74.

    Chapter  Google Scholar 

  • Harman, G.: 1965, The Inference to the Best Explanation. Philos. Review 74: 88–95.

    Article  Google Scholar 

  • Hempel, C.G.: 1965, Aspects of Scientific Explanation. New York: The Free Press.

    Google Scholar 

  • Hernandez-Orallo, J.: 1999a, Constructive Reinforcement Learning, International Journal of Intelligent Systems, vol. 15, no. 3, pp. 241–264, 2000.

    Article  Google Scholar 

  • Hernandez-Orallo, J.: 1999b, What is a subprogram?, submitted.

  • Hernandez-Orallo, J. and I. Garcia-Varea: 1998, Distinguishing Abduction and Induction Under Intensional Complexity. In A.I. Flach and P.A. Kakas (eds.), Proceedings of the ECAI'98 Workshop on Abduction and Induction Brighton, 41–48.

  • Hintikka, J., 1970, Surface Information and Depth Information. In J. Hintikka and P. Suppes (eds.), Information and Inference. D. Reidel Publishing Company, 263–297.

  • Kearns, M., Y. Mansour, A.Y. Ng and D. Ron: 1999, An Experimental and Theoretical Comparison of Model Selection Methods. Machine Learning, to appear.

  • Kuhn, T.S.: 1970, The Structure of Scientific Revolutions. University of Chigago.

  • Levin, L.A.: 1973, Universal Search Problems. Problems Inform. Transmission 9: 265–266.

    Google Scholar 

  • Li, M. and P. Vitanyi: 1997, An Introduction to Kolmogorov Complexity and its Applications, 2nd Ed. Springer-Verlag.

  • Merhav, N. and M. Feder: 1998, Universal Prediction. IEEE Transactions on Information Theory 44(6): 2124–2147.

    Article  Google Scholar 

  • Muggleton, S., A. Srinivasan and M. Bain: 1992, Compression, Significance and Accuracy. In D. Sleeman and P. Edwards (eds.), Machine Learning: Proc. of the 9th Intl Conf (ML92), Wiley, 523–527.

  • Muggleton, S. and L. De Raedt: 1994, Inductive Logic Programming - theory and methods. J. of Logic Prog. 19-20: 629–679.

    Article  Google Scholar 

  • Pfahringer, B.: 1994, Controlling Constructive Induction in CiPF: an MDL Approach. In F. Bergadano and L. de Raedt (eds.), Machine Learning, Proc. of the European Conf. on Machine Learning (ECML-94), LN AI 784, Springer-Verlag, 242–256.

  • Popper, K.R.: 1962, Conjectures and Refutations: The Growth of Scientific Knowledge. New York: Basic Books.

    Google Scholar 

  • Quinlan, J. and R. Rivest: 1989, Inferring Decision Trees Using the Minimum Description Length Principle. Information and Computation 80: 227–248.

    Article  Google Scholar 

  • Rissanen, J.: 1978, Modeling by the Shortest Data Description. Automatica-J.IFAC 14: 465–471.

    Article  Google Scholar 

  • Rissanen, J.: 1986, Stochastic Complexity and Modeling. Annals Statist. 14: 1080–1100.

    Article  Google Scholar 

  • Rissanen, J.: 1996, Fisher Information and Stochastic Complexity. IEEE Trans. on Information Theory 42(1).

  • Rivest, R.L. and R. Sloan: 1994,A Formal Model of Hierarchical Concept Learning. Inf. and Comp. 114: 88–114.

    Article  Google Scholar 

  • Schaffer, C.: 1994, A Conservation Law for Generalization Performance, in Proc. of the 11th Intl. Conf. on Machine Learning, 259–265.

  • Sharger, J. and P. Langley: 1990, Computational Models of Scientific Discovery and Theory Formation. Morgan Kaufmman.

  • Solomonoff, R.J.: 1964, A Formal Theory of Inductive Inference, Inf. Control 7: 1-22, Mar., 224–254, June.

    Article  Google Scholar 

  • Solomonoff, R.J.: 1978, Complexity-Based Induction Systems: Comparisons and Convergence Theorems. IEEE Trans. Inform. Theory IT-24: 422–432.

    Article  Google Scholar 

  • Valiant, L.: 1984, A Theory of the Learnable. Comm. of the ACM 27(11): 1134–1142.

    Article  Google Scholar 

  • Vitányi, P. and M. Li: 1996, Minimum Description Length Induction, bayesianism, and Kolmogorov complexity. Manuscript, CWI, Amsterdam, September 1996, Submitted to: IEEE Trans. Inform. Theory. URL: http://www. cwi.nl/_paulv/selection.html.

  • Vitányi, P. and M. Li: 1997, On Prediction by Data Compression, in: Proc. of the 9th European Conf. on Machine Learning, LNAI 1224, Springer-Verlag, 14–30.

    Google Scholar 

  • Wallace, C.S. and D.M. Boulton: 1968, An Information Measure for Classification. Computing Journal 11: 185–195.

    Article  Google Scholar 

  • Watanabe, S.: 1972, Pattern Recognition as Information Compression. In Watanabe (ed.), Frontiers of Pattern Recognition. New York: Academic Press.

    Google Scholar 

  • Wolff, J.G.: 1995, Computing as Compression: An Overview of the SP Theory and System. New Gen. Computing 13: 187–214.

    Article  Google Scholar 

  • Wolpert, D.: 1992, On the Connection Between In-sample Testing and Generalization Error. Complex Systems 6: 47–94.

    Google Scholar 

  • Zemel, R.: 1993, A Minimum Description Length Framework for Unsupervised Learning. Ph.D. Thesis, Dept. of Computer Science, Univ. of Toronto.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hernández-Orallo, J., García-Varea, I. Explanatory and Creative Alternatives to the MDL priciple. Foundations of Science 5, 185–207 (2000). https://doi.org/10.1023/A:1011350914776

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1011350914776

Navigation