Software Intensive Science

Symons, John; Horner, Jack

doi:10.1007/s13347-014-0163-x

Software Intensive Science

Research Article
Published: 09 May 2014

Volume 27, pages 461–477, (2014)
Cite this article

Philosophy & Technology Aims and scope Submit manuscript

John Symons¹ &
Jack Horner²

527 Accesses
19 Citations
Explore all metrics

Abstract

This paper argues that the difference between contemporary software intensive scientific practice and more traditional non-software intensive varieties results from the characteristically high conditionality of software. We explain why the path complexity of programs with high conditionality imposes limits on standard error correction techniques and why this matters. While it is possible, in general, to characterize the error distribution in inquiry that does not involve high conditionality, we cannot characterize the error distribution in inquiry that depends on software. Software intensive science presents distinctive error and uncertainty modalities that pose new challenges for the epistemology of science.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Why, When, Who, What, How, and Where for Trainees Writing Literature Review Articles

Article 21 May 2019

Plagiarism in research

Article 04 July 2014

What an Algorithm Is

Article 11 January 2015

Notes

See Symons 2008 for a discussion of how computational models have figured in discussions of the metaphysics and epistemology of science.
Software has begun to perform some of the functions that, in the pre-software era, were considered distinctively human aspects of science. For example, Michael Schmidt and Hod Lipson described how their program Eureqa inferred Newton’s second law and the law of conservation of momentum from descriptions of the behaviour of a double-pendulum system (Schmidt and Lipson 2009). More recently, Eugene Loukine and colleagues demonstrated a model that was able to predict unforeseen side-effects for pharmaceuticals that were already approved for consumption (Eugen 2012). These two papers represent very different examples of software intensive science: one is a system which is capable of generating theoretical insights and law-like relationships from a data set, while the other makes dramatic progress on a specific practical question of great importance. Examples like these indicate that across a broad swath of scientific endeavor, from highly theoretical to applied science, inquiry itself is no longer purely a matter of individual or collective human effort. Across the sciences, software-intensive systems are increasingly driving the direction of research and in some cases are already beginning to displace human researchers. Unlike previous improvements in scientific technology, computers not only extend our capacities, but are taking on at least some of the cognitive aspects of theoretical work in the sciences. Fundamental to understanding the character of post-human science is careful attention to the nature of its distinctive kinds of error and uncertainty.
Of course, there are trivial counterexamples, such as the case of a program containing the same instruction repeatedly; say for example, the instruction “a = 2” repeated an arbitrary number of times. Such examples are not representative of typical or even useful software and they certainly have no role in scientific inquiry.
As we use the term here, a method is effective for a class of problems iff (Hunter 1971, pp. 13–15)
- it consists of a finite number of exact, finite instructions
- when applied to a problem from its class, it always finishes (terminates) after a finite number of steps
- when applied to a problem from its class, it always produces a correct answer
A computer language is Turing complete if it can be simulated on a single-tape Turing machine (Boolos et al. 2002). Being Turing complete is a condition of adequacy for being a general-purpose computer language.
Here’s why: The equivalent of the “if-then” schemata is realizable in a Turing machine (e.g., “not-x or y” is representable in a Turing machine (Boolos et al. 2002), which is logically equivalent to “if x, then y”). Therefore, any Turing complete language must be able to simulate the “if-then-else” schemata. How, specifically, one maps the Turing-machine equivalent of the “if-then” schemata into a particular Turing complete language will in general depend on the particulars of the language of interest. For the purposes of this paper, we do not need to consider the precise details of those mappings: it is enough for our purposes that such mappings exist.
More precisely, the sample size required to attain a given confidence level is a function of the distribution of interest.
One can distinguish several kinds of testing in terms of properties of control-flow graphs (Nielson et al. 1999). By “testing every path” in a software system, we mean “executing, and analyzing the results of that execution of, all edges and all combinations of condition-edges in the control-graph representation of the software system of interest.”
The path-test cases for some software could be executed in parallel (Hennessy and Patterson 2007, p. 68); in theory, given a large enough parallel machine, all path tests in such a case could be executed in the same time it takes to execute one test case. But these are special cases. In general, we must consider cases in which we must execute the tests serially.
High path complexity is not the only aspect of SIS that has not received adequate attention to date by philosophers. As one anonymous referee for this paper points out, the high variability in the methods, algorithms, and language choices evident in SIS also has no counterpart in NSIS, leading to, among other things, fundamental questions of commensurability among different software systems that nominally concern the same subject matter. For example, there are at least 10 widely used numerical methods for solving systems of partial differential equations, and the results they produce are in general not identical (Morton and Mayers 2005). In addition, simply changing the computer language in which an algorithm is realized is not, for some pairs of languages such as Fortran 77 and C, even well-defined because the language standards do not provide an adequate basis for inter-language translation of certain numerical types (ANSI 1977; ISO/IEC 2005; Feldman et al. 1990). Problems of this kind have led to serious errors whose origin is quite difficult to isolate in practice. (No such problems arise in NSIS.) All these issues clearly bear on the reliability of software and scientific inferences based on the use of software. These topics merit careful treatment in their own right. Here, however, we focus on the distinctively high conditionality of SIS.
For a derivation, see Hogg et al. 2005 (Sections 2.6 and 9.4).
An algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function (Boolos et al. 2002).
This does not imply, of course, that the software does not have the same error distribution as M: it merely means that we would not have a warrant to make the inference that the software has the error distribution of M, on the basis of the procedure.
The existence of a requirement for high confidence does not, as such, imply that this requirement is satisfied.
In some computer languages, this can be done by implicitly accepting the default specification. So-called “interpreted” languages, which include many of the widely scripting languages in the UNIX family of operating environments, determine “type” only during execution. Type management in these contexts is obviously fragile.

References

Alexandrova, A. (2008). Making models count. Philosophy of Science, 75(3), 383–404.
Article Google Scholar
ANSI. (1977). American National Standard Programming Language Fortran. ANSI, X3, 9–1977.
Google Scholar
Batterman, R. W. (2009). Idealization and modeling. Synthese, 169(3), 427–446.
Article Google Scholar
Black, R., van Veenendaal, E., & Graham, D. (2012). Foundations of software testing ISTQB certification. Cengage Learning EMEA.
Bokulich, A. (2011). How scientific models can explain. Synthese, 180(1), 33–45.
Article Google Scholar
Bolinska, A. (2013). Epistemic representation, informativeness and the aim of faithful representation. Synthese, 190(2), 219–234.
Article Google Scholar
Boolos, G., Burgess, J., & Jeffrey, R. (2002). Computability and Logic (4th ed.). Cambridge: Cambridge University Press.
Book Google Scholar
Boschetti, F., Fulton, E. A., Bradbury, R. H., & Symons, J. (2012). What is a model, why people don’t trust them, and why they should. In Negotiating our future: Living scenarios for Australia to 2050, Vol 2, 107–119). Australian Academy of Science.
Center for Systems and Software Engineering, University of Southern California. (2013). COCOMO II. http://csse.usc.edu/csse/research/COCOMOII/cocomo_main.html.
Chakravartty, A. (2011). Scientific realism. In Stanford encyclopedia of philosophy. E. Zalta (Ed.). http://plato.stanford.edu/entries/scientific-realism/.
Chang, C., & Keisler, J. (1990). Model theory. North-Holland.
Chung, K. (2001). A course in probability theory (3rd ed.). New York: Academic.
Google Scholar
Cox, D. (2006). Principles of statistical inference. Cambridge: Cambridge University Press.
Book Google Scholar
Diestel, R. (1997). Graph theory. New York: Springer.
Google Scholar
Eugen, L. (2012). Large-scale prediction and testing of drug activity on side-effect targets. Nature, 486(7403), 361–367.
Google Scholar
Feldman, S. I., Gay, D. M. Maimone, M. W., & Schryer, N. (1990). A Fortran to C Converter. AT&T Bell Laboratories technical report.
Fewster, M., & Graham, D. (1999). Software test automation. Reading: Addison-Wesley.
Google Scholar
Frigg, R., & Reiss, J. (2009). The philosophy of simulation: hot new issues or same old stew? Synthese, 169(3), 593–613.
Article Google Scholar
Giere, R. (1976). Empirical probability, objective statistical methods, and scientific inquiry. In C. A. Hooker & W. Harper (Eds.), Foundations of probability theory, statistical inference, and statistical theories of science (Vol. 2, pp. 63–101). Dordrecht: Reidel.
Chapter Google Scholar
Good, I. J. (1983). Good thinking: The Foundations of probability and its applications. University of Minnesota Press. Republished by Dover, 2009.
Graham, R. M., Clancy, G. J., Jr., & DeVaney, D. B. (1973). A software design and evaluation system. Communications of the ACM, 16(2), 110–116. Reprinted in E Yourdon, (Ed.), Writings of the Revolution. New York: Yourdon Press, 1982 (pp. 112–122).
Article Google Scholar
Guala, F. (2002). Models, simulations, and experiments. In Model-based reasoning (pp. 59–74). Springer
Gustafson, J. (1998). Computational verifiability and the ASCI Program. Computational Science and Engineering 5, 36–45. http://www.johngustafson.net/pubs/pub55/ASCIPaper.htm.
Halmos, P. (1950). Measure theory. D. Van Nostrand Reinhold.
Hatton, L. (1997). The T experiments: errors in scientific software. IEEE Computational Science and Engineering 4, 27–38. Also available at http://www.leshatton.org/1997/04/the-t-experiments-errors-in-scientific-software/.
Hatton, L. (2013). Power-laws and the conservation of information in discrete token systems: Part 1: General theory. http://www.leshatton.org/Documents/arxiv_jul2012_hatton.pdf.
Hennessy, J., & Patterson, D. (2007). Computer architecture: A quantitative approach (4th ed.). New York: Elsevier.
Google Scholar
Hogg, R., McKean, J., & Craig, A. (2005). Introduction to mathematical statistics (6th ed.). Upper Saddle River: Pearson.
Google Scholar
Horner, J. K. (2003). The development programmatics of large scientific codes. Proceedings of the 2003 International Conference on Software Engineering Research and Practice (pp. 224–227). Athens: CSREA Press.
Google Scholar
Horner, J. K. (2013). Persistence of Plummer-distributed small globular clusters as a function of primordial-binary population size. Proceedings of the 2013 International Conference on Scientific Computing (pp. 38–44). Athens: CSREA Press.
Google Scholar
Humphreys, P. (1994). Numerical experimentation. In Patrick Suppes: Scientific philosopher (pp. 103–121). Kluwer.
Hunter, G. (1971). Metalogic: An introduction to the metatheory of standard first-order logic. Berkeley: University of California Press.
Google Scholar
IEEE. (2000). IEEE-STD-1471-2000. Recommended practice for architectural description of software-intensive systems. http://standards.IEEE.org.
ISO/IEC. (2005). ISO/IEC 9899: TC2—Programming languages – C—Open standards.
ISO/IEC. (2008). ISO/IEC 12207:2008. Systems and software engineering—Software life cycle processes.
Kuhn, T. (1970). The structure of scientific revolutions. Second edition, enlarged (2nd ed.). Chicago: University of Chicago Press.
Google Scholar
Littlewood, B., & Strigini, L. (2000). Software reliability and dependability: a roadmap. ICSE ‘00 Proceedings of the Conference on the Future of Software Engineering (pp. 175–188).
Maxwell, J. (1891). A treatise on electricity and magnetism. Third edition (1891). Dover reprint, 1954.
Mayo, D., & Spanos, A. (2011). Error statistics. In P.S. Bandyopadhyay & M. R. Forster (volume Eds.). D. M. Gabbay, P. Thagard & J. Woods (general Eds.), Philosophy of statistics, Handbook of philosophy of science, Volume 7, Philosophy of statistics. (pp. 1–46). Elsevier.
McCabe, T. (1976). A complexity measure. IEEE Transactions on Software Engineering 2, 308–320. Also available at http://www.literateprogramming.com/mccabe.pdf.
Morton, K. W., & Mayers, D. F. (2005). Numerical solution of partial differential equations. Cambridge: Cambridge University Press.
Book Google Scholar
National Coordination Office for Networking and Information Technology Research and Development. (2013). DoE’s ASCI Program. http://www.nitrd.gov/pubs/bluebooks/2001/asci.html.
Newton (1726). The Principia. Edition of 1726 (Trans: Motte, A.). 1848. Prometheus reprint, 1995.
Nielson, F., Nielson, H. R., & Hankin, C. (1999). Principles of program analysis. Heidelberg: Springer.
Book Google Scholar
Oreskes, N., Shrader-Frechette, K., & Belitz, K. (1994). Verification, validation, and confirmation of numerical models in the earth sciences. Science, 263(5147), 641–646.
Article Google Scholar
Parker, W. S. (2009). II—Confirmation and adequacy‐for‐purpose in climate modelling. Aristotelian Society Supplementary Volume, 83 (1).
Peled, D., Pelliccione, P., & Spoletini, P. (2008). Model checking. In B. Wah (Ed.). Wiley encyclopedia of computer science and engineering
Primiero, G. (2013). A taxonomy of errors for information systems. Minds and Machines. doi:10.1007/s11023-013-9307-5.
Google Scholar
Reichenbach, H. (1958). The philosophy of space and time. (Trans: Reichenbach, M., & Freund, J). New York: Dover.
Salmon, W. (1967). The foundations of scientific inference. Pittsburgy: University of Pittsburgh Press.
Google Scholar
Schmidt, M., & Lipson, H. (2009). Distilling free-form natural laws from experimental data. Science, 324(5923), 81–85.
Article Google Scholar
Silva, J. (2012). A vocabulary of program slicing-based techniques. ACM Computing Surveys 44, Article No. 12.
Sorenson, R. (2011). Epistemic paradoxes. In E. Zalta (Ed.), Stanford encyclopedia of philosophy. http://plato.stanford.edu/entries/epistemic-paradoxes/.
Symons, J. (2008). Computational models of emergent properties. Minds and Machines, 18(4), 475–491.
Article Google Scholar
Symons, J., & Boschetti, F. (2013). How computational models predict the behavior of complex systems. Foundations of Science, 18, 809–821.
Article Google Scholar
Taylor, J. (1982). An introduction to error analysis: The study of uncertainties in physical measurements (2nd ed.). Sausalito: University Science.
Google Scholar
United Nations. (1996). Resolution adopted by the general assembly:50/245. Comprehensive Nuclear-Test-Ban Treaty.
Waite, W. M., & Goos, G. (1984). Compiler construction. New York: Springer.
Book Google Scholar
Winsberg, E. (1999). Sanctioning models: the epistemology of simulation. Science in Context, 12(2), 275–292.
Article Google Scholar
Winsberg, E., & Lenhard, J. (2010). Holism and entrenchment in climate model validation. In M. Carrier & A. Nordmann (Eds.), Science in the context of application: Methodological change, conceptual transformation, cultural reorientation. Dordrecht: Springer.
Google Scholar
Woodward, J. (2009). Scientific explanation. In E. Zalta (Ed.), Stanford encyclopedia of philosophy. http://plato.stanford.edu/entries/scientific-explanation/.

Download references

Acknowledgments

This work benefited from discussions with Sam Arbesman, George Crawford, Paul Humphreys, and Tony Pawlicki. We are grateful to the reviewers of earlier versions of this paper for extensive and insightful criticisms. For any errors that remain, we blame the path complexity of our (biological) software.

Author information

Authors and Affiliations

Department of Philosophy, University of Kansas, Lawrence, KS, USA
John Symons
Los Alamos, NM, 87544, USA
Jack Horner

Authors

John Symons
View author publications
You can also search for this author in PubMed Google Scholar
Jack Horner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John Symons.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Symons, J., Horner, J. Software Intensive Science. Philos. Technol. 27, 461–477 (2014). https://doi.org/10.1007/s13347-014-0163-x

Download citation

Received: 27 December 2013
Accepted: 10 April 2014
Published: 09 May 2014
Issue Date: September 2014
DOI: https://doi.org/10.1007/s13347-014-0163-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Software Intensive Science

Abstract

Access this article

Similar content being viewed by others

Why, When, Who, What, How, and Where for Trainees Writing Literature Review Articles

Plagiarism in research

What an Algorithm Is

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Software Intensive Science

Abstract

Access this article

Similar content being viewed by others

Why, When, Who, What, How, and Where for Trainees Writing Literature Review Articles

Plagiarism in research

What an Algorithm Is

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation