Abstract
The Minimum Description Length (MDL) principle is the modernformalisation of Occam's razor. It has been extensively and successfullyused in machine learning (ML), especially for noisy and long sources ofdata. However, the MDL principle presents some paradoxes andinconveniences. After discussing all these, we address two of the mostrelevant: lack of explanation and lack of creativity. We present newalternatives to address these problems. The first one, intensionalcomplexity, avoids extensional parts in a description, so distributingcompression ratio in a more even way than the MDL principle. The secondone, information gain, forces that the hypothesis is informative (orcomputationally hard to discover) wrt. the evidence, so giving a formaldefinition of what is to discover.
Similar content being viewed by others
REFERENCES
Angluin, D.: 1988, Queries and Concept Learning. Machine Learning 2(4): 319–342.
Barker, S.F.: 1957, Induction and Hypothesis. Ithaca.
Bar-Hillel, Y. and R. Carnap: 1953, Semantic Information. British J. for the Philosophy of Science 4: 147–157.
Barron, A., J. Rissanen and B. Yu: 1998, TheMinimum Description Length Principle in Coding andModeling. IEEE Transactions on Information Theory 44(6): 2743–2760.
Blum, M.: 1967, A Machine-Independent Theory of the Complexity of Recursive functions, J. ACM 14(4): 322–326.
Blum, L. and M. Blum: 1975, Towards a Mathematical Theory of Inductive Inference. Inform. and Control 28: 125–155.
Blumer, A., A. Ehrenfeucht, D. Haussler and M. Warmuth: 1989, Learnability and the Vapnik-Chervonenkis Dimension. Journal of ACM 36: 929–965.
Board, R. and L. Pitt: 1990, On the Necessity of Occam Algorithms, in Proc., 22nd ACM Symp. Theory of Comp.
Bosch, van den: 1994, Simplicity and Prediction, Master Thesis, dep. of Science, Logic & Epistemology of the Faculty of Philosophy at the Univ. of Groningen.
Case J. and C. Smith: 1983, Comparison of Identification Criteria for Machine Inductive Inference. Theoret. Comput. Sci. 25: 193–220.
Cheeseman, P.: 1990, On Finding the Most Probable Model. In J. Shrager and P. Langley (eds.), Computational Models of Scientific Discovery and Theory Formation. Morgan Kaufmann.
Conklin, D. and I.H. Witten: 1994, Complexity-Based Induction. Machine Learning 16: 203–225.
Derthick, M.: 1990, The Minimum Description Length Principle Applied to Feature Learning and Analogical Mapping, MCC Tech. Rep. no. ACT-CYC-234-90.
Ernis, R.: 1968, Enumerative Induction and Best Explanation. J. Philosophy LXV(18): 523–529.
Freivalds, R., E. Kinber and C.H. Smith: 1995, On the Intrinsic Complexity of Learning. Information and Control 123: 64–71.
Gold, E.M.: 1967, Language Identification in the Limit. Information & Control 10: 447–474.
Grünwald, P.: 1999, Model Selection Based on Minimum Description Length, submitted to Journal of Mathematical Psychology. Amsterdam: CWI.
Gull, S.F.: 1988, Bayesian Inductive Inference and Maximum Entropy. In G.J. Erickson and C.R. Smith (eds.), Maximum Entropy and Bayesian Methods in Science and Engineering Vol. 1 Foundations. Dordrecht: Kluwer, 53–74.
Harman, G.: 1965, The Inference to the Best Explanation. Philos. Review 74: 88–95.
Hempel, C.G.: 1965, Aspects of Scientific Explanation. New York: The Free Press.
Hernandez-Orallo, J.: 1999a, Constructive Reinforcement Learning, International Journal of Intelligent Systems, vol. 15, no. 3, pp. 241–264, 2000.
Hernandez-Orallo, J.: 1999b, What is a subprogram?, submitted.
Hernandez-Orallo, J. and I. Garcia-Varea: 1998, Distinguishing Abduction and Induction Under Intensional Complexity. In A.I. Flach and P.A. Kakas (eds.), Proceedings of the ECAI'98 Workshop on Abduction and Induction Brighton, 41–48.
Hintikka, J., 1970, Surface Information and Depth Information. In J. Hintikka and P. Suppes (eds.), Information and Inference. D. Reidel Publishing Company, 263–297.
Kearns, M., Y. Mansour, A.Y. Ng and D. Ron: 1999, An Experimental and Theoretical Comparison of Model Selection Methods. Machine Learning, to appear.
Kuhn, T.S.: 1970, The Structure of Scientific Revolutions. University of Chigago.
Levin, L.A.: 1973, Universal Search Problems. Problems Inform. Transmission 9: 265–266.
Li, M. and P. Vitanyi: 1997, An Introduction to Kolmogorov Complexity and its Applications, 2nd Ed. Springer-Verlag.
Merhav, N. and M. Feder: 1998, Universal Prediction. IEEE Transactions on Information Theory 44(6): 2124–2147.
Muggleton, S., A. Srinivasan and M. Bain: 1992, Compression, Significance and Accuracy. In D. Sleeman and P. Edwards (eds.), Machine Learning: Proc. of the 9th Intl Conf (ML92), Wiley, 523–527.
Muggleton, S. and L. De Raedt: 1994, Inductive Logic Programming - theory and methods. J. of Logic Prog. 19-20: 629–679.
Pfahringer, B.: 1994, Controlling Constructive Induction in CiPF: an MDL Approach. In F. Bergadano and L. de Raedt (eds.), Machine Learning, Proc. of the European Conf. on Machine Learning (ECML-94), LN AI 784, Springer-Verlag, 242–256.
Popper, K.R.: 1962, Conjectures and Refutations: The Growth of Scientific Knowledge. New York: Basic Books.
Quinlan, J. and R. Rivest: 1989, Inferring Decision Trees Using the Minimum Description Length Principle. Information and Computation 80: 227–248.
Rissanen, J.: 1978, Modeling by the Shortest Data Description. Automatica-J.IFAC 14: 465–471.
Rissanen, J.: 1986, Stochastic Complexity and Modeling. Annals Statist. 14: 1080–1100.
Rissanen, J.: 1996, Fisher Information and Stochastic Complexity. IEEE Trans. on Information Theory 42(1).
Rivest, R.L. and R. Sloan: 1994,A Formal Model of Hierarchical Concept Learning. Inf. and Comp. 114: 88–114.
Schaffer, C.: 1994, A Conservation Law for Generalization Performance, in Proc. of the 11th Intl. Conf. on Machine Learning, 259–265.
Sharger, J. and P. Langley: 1990, Computational Models of Scientific Discovery and Theory Formation. Morgan Kaufmman.
Solomonoff, R.J.: 1964, A Formal Theory of Inductive Inference, Inf. Control 7: 1-22, Mar., 224–254, June.
Solomonoff, R.J.: 1978, Complexity-Based Induction Systems: Comparisons and Convergence Theorems. IEEE Trans. Inform. Theory IT-24: 422–432.
Valiant, L.: 1984, A Theory of the Learnable. Comm. of the ACM 27(11): 1134–1142.
Vitányi, P. and M. Li: 1996, Minimum Description Length Induction, bayesianism, and Kolmogorov complexity. Manuscript, CWI, Amsterdam, September 1996, Submitted to: IEEE Trans. Inform. Theory. URL: http://www. cwi.nl/_paulv/selection.html.
Vitányi, P. and M. Li: 1997, On Prediction by Data Compression, in: Proc. of the 9th European Conf. on Machine Learning, LNAI 1224, Springer-Verlag, 14–30.
Wallace, C.S. and D.M. Boulton: 1968, An Information Measure for Classification. Computing Journal 11: 185–195.
Watanabe, S.: 1972, Pattern Recognition as Information Compression. In Watanabe (ed.), Frontiers of Pattern Recognition. New York: Academic Press.
Wolff, J.G.: 1995, Computing as Compression: An Overview of the SP Theory and System. New Gen. Computing 13: 187–214.
Wolpert, D.: 1992, On the Connection Between In-sample Testing and Generalization Error. Complex Systems 6: 47–94.
Zemel, R.: 1993, A Minimum Description Length Framework for Unsupervised Learning. Ph.D. Thesis, Dept. of Computer Science, Univ. of Toronto.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hernández-Orallo, J., García-Varea, I. Explanatory and Creative Alternatives to the MDL priciple. Foundations of Science 5, 185–207 (2000). https://doi.org/10.1023/A:1011350914776
Published:
Issue Date:
DOI: https://doi.org/10.1023/A:1011350914776