Genetic, epigenetic and exogenetic information in development and evolution Paul E. Griffiths Department of Philosophy and Charles Perkins Centre, University of Sydney, NSW 2006, Australia. paul.griffiths@sydney.edu.au Interface Focus 2017 7 20160152; DOI: 10.1098/rsfs.2016.0152. Published 18 August 2017 Abstract The idea that development is the expression of information accumulated during evolution and that heredity is the transmission of this information is surprisingly hard to cash out in strict, scientific terms. This paper seeks to do so using the sense of information introduced by Francis Crick in his sequence hypothesis and central dogma of molecular biology. It focuses on Crick's idea of precise determination. This is analysed using an information theoretic measure of causal specificity. This allows us to reconstruct some of Crick's claims about information in transcription and translation. Crick's approach to information has natural extensions to non-coding regions of DNA, to epigenetic marks, and to the genetic or environmental upstream causes of those epigenetic marks. Epigenetic information cannot be reduced to genetic information. The existence of biological information in epigenetic and exogenetic factors is relevant to evolution as well as to development. Keywords: Genetic information, Epigenetics, Specificity 1 1. Genetic Information1 That the development of evolved characteristics is the expression of infor-2 mation accumulated in the genome during evolution and that heredity is the3 transmission of this information from one generation to the next will strike4 most biologists as common-sense. But it is surprisingly difficult to cash out5 this statement in a way that is grounded in the detailed theory and practice6 of the biosciences 1. Biology today is certainly an 'information science', both7 because it is a science of big data and because many specific models are in-8 spired by the information sciences, but these applications and models do not9 seem to be unified by a single conception of biological information. If the10 actual science straightforwardly corresponded to that opening statement, we11 would expect to find that instructions written in the genetic code are read12 by gene regulatory networks to make an organism. But the genetic code runs13 out of steam when it has specified the linear structure of proteins [2]. It is14 impossible to describe higher levels of biological organisation in the genetic15 code for the same reason that I cannot write literature using a geodetic co-16 ordinate system: the language does not have the expressive power. Nor is it17 easy to see how the expressive power of the genetic code could be expanded18 to describe something beyond the order of animo acids in a polypeptide. The19 'histone codes' [3] and 'splicing codes' [4] that have been proposed as supple-20 ments to the genetic code are not integrated with the genetic code through21 a shared measure of coded information. As things stand, histone modifica-22 tion and mRNA splicing are molecular mechanisms that interact with the23 mechanisms of transcription and translation in the straightforward way that24 any combination of physical mechanisms can interact. This paper outlines25 a measure of information that allows us to compare the contributions made26 by each of these mechanisms to determining a final product in a shared,27 informational currency.28 Turning our attention to gene regulatory networks, these are productively29 modeled as computing Boolean functions and/or differential equations, but30 these computational operations are not specified in any of the three 'codes'31 to which we just referred. Instead, these operations are specified by the32 stereochemical affinities of genomic regions and gene products. The science33 1In his final book the influential evolutionary theorist George C. Williams called for a new, 'codical' biology founded on the concept of information precisely because that is not the biology we actually have [1]. 2 that connects the 'codes' with the 'computing networks' is the physics of how34 stereochemical properties emerge from the linear structure of biomolecules35 and the cellular contexts in which those biomolecules mature and function.36 The same is true of the other molecular networks that are at the heart of37 our understanding of the cell – when we model these networks as performing38 computations those formal operations do not take as inputs representations39 written in the genetic code.40 All this suggests that perhaps 'biology is an information science' only in41 the sense that it uses many models that start with analogies to some aspect42 of communication or computing, and makes many direct applications of for-43 malisms from the information sciences. Each of these models or applications44 stands or falls on its own scientific merits. They do not link together to form45 a single theory of biological information or a theory of life as an informational46 phenomenon [5] [6][7][2]. On this sceptical view the ubiquity of information47 talk in biology is only evidence of the power and generality of theories of in-48 formation and computation, something we can observe in many other areas49 of science.50 This paper defends a more robust view of biological information, however.51 It argues that there is an important sense of 'information' which is related52 very closely to the older notion of biological 'specificity'. Biological informa-53 tion in this sense gives scientific substance to the claim that development is54 the expression of information accumulated during evolution, and that hered-55 ity is the transmission of this information from one generation to the next.56 These claims turns out to be more or less equivalent to the idea that heredity57 is the ability of one cell to transmit biological specificity to another and that58 development is the expression of that specificity in a controlled manner.59 The paper builds on Paul Griffiths and Karola Stotz's 'bottom-up' ap-60 proach to biological information, starting with a simple concept of informa-61 tion that plays a straightforward role at the heart of molecular biology and62 seeing how many other aspects of biology can be clarified by applying this63 sense of information. That starting point is what they termed 'Crick infor-64 mation', the sense of information introduced by Francis Crick (1958) in his65 'sequence hypothesis' and 'central dogma of molecular biology' [8][9]266 2Griffiths and Stotz used the phrase 'Crick information' to refer to what, in this article, will be called 'sequence specificity'. In more recent work I and my collaborators have reserved the term 'Crick information' for a measure of the intrinsic information content of a sequence, rather than for the measure of the relationship between a sequence and its 3 Given the central role of Crick's ideas in molecular biology it is surprising67 that previous efforts to explicate the idea of biological information have not68 adopted Crick's straightforward approach. Instead, they have mostly focused69 on the richer connotations of the term 'information': ideas like meaning,70 representation, and semiosis.3 Some authors have even attributed this rich71 sense of information to Crick: "The sense of information relevant to the72 central dogma is of course the sort which requires 'intentionality', 'aboutness',73 'content', the representation of other states of affairs. . . " [13][pp. 550-1].74 As we will see in the next section, nothing could be further from Crick's75 intentions. The problem with rich approaches to biological information is76 that we do not have developed, technical theories of information in this sense.77 The various terms used in the passage just cited are, as the author admits,78 merely "one or another facet of a philosophically vexed concept"[13][p. 151].79 So the approach amounts to taking this vexed concept, for which we have no80 developed theory, and placing it at the foundations of an account of living81 systems. In this paper, in contrast, we will use only the standard formalism82 of information theory and the idea of biological specificity.83 2. Crick's conception of information84 The key move made by Crick in his work on protein synthesis was to85 supplement the existing idea of stereochemical specificity, embodied in the86 three-dimensional structure of biomolecules and underlying the well-known87 lock-and-key model of interaction between enzymes and their substrates, with88 the idea of informational specificity, embodied in the linear structure of nu-89 cleic acids that determine the linear structure of a gene product [14][5]. This90 idea is present in Crick's statements of both the sequence hypothesis, and91 the central dogma (Figure 1):92 The Sequence Hypothesis . . . In its simplest form it assumes that93 the specificity of a piece of nucleic acid is expressed solely by the94 sequence of its bases, and that this sequence is a (simple) code95 for the amino acid sequence of a particular protein.96 causes that is the subject of this article. 3Sahotra Sarkar [5] gives a brief history of efforts by molecular biologists to construct a theory of biological information. Key papers in philosophical literature are[10][11]. For 'biosemiotics' see [12] 4 DNA RNA Protein Figure 1: The Central Dogma, as it is held today. After [16], with modifications. In particular, an arrow from dna to protein has been removed. The Central Dogma This states that once 'information' has passed97 into a protein it cannot get out again. In more detail, the transfer98 of information from nucleic acid to protein may be possible, but99 transfer from protein to protein, or from protein to nucleic acid100 is impossible. Information means here the precise determination101 of sequence, either of bases in the nucleic acid or of amino-acid102 residues in the protein. [15][pp. 152-153, italics in original]103 According to Crick the process of protein synthesis involves "the flow of104 energy, the flow of matter, and the flow of information." While noting the105 importance of the "exact chemical steps", he separated this transfer of mat-106 ter and energy from what he regarded as "the crux of the problem", namely107 how to join the amino acids in the right order – "the crucial act of sequen-108 tialization." His solution to this problem would "particularly emphasise the109 flow of information" where "By information I mean the specification of the110 amino acid sequence of the protein" [15][144].111 Crick maintained the same, straightforward view of information through-112 out his career. In his well-known paper clarifying the central dogma he113 reiterated that his key achievement in 1958 was to reduce the problem of114 protein synthesis to "the formulation of the general rules for information115 transfer from one polymer with a defined alphabet to another." [16][561]116 Information is a causal concept, referring simply to precise determination.117 Crick reiterated this forty years later: ". . . 'Information' in the dna, rna,118 5 protein sense is merely a convenient shorthand for the underlying causal ef-119 fect." (Crick to Morgan, March 20 1998 ). "As to 'information,' I imagine120 one could avoid the word if one didn't like it and say 'detailed residue-by-121 residue determination' " (Crick to Morgan, April 3 1998). Moreover, "As to122 'meaning' . . . I would keep away from the term." (Crick to Morgan, April 3123 1998) 4124 So if we take Crick at his word, then information is about (1) precise125 determination and (2) transfer of biological specificity from one biomolecule126 to another (in both development and in heredity).127 These two aspects of Crick's ideas about information can be made precise128 using Shannon information measures and algorithmic information measures129 respectively. This paper concentrates on the first aspect of information and130 on Shannon measures of information.5131 3. Information as precise determination132 When Crick said that he would emphasise information in his account of133 protein synthesis, rather than matter and energy, he meant that he would134 focus on the precise determination of the structure of one biomolecule by135 another. There are variables through which the cell exercises this precise136 determination, notably coding sequences of nucleic acid, and other variables137 through which it does not, such as the presence or absence of an RNA poly-138 merase in the transcription process. Variables of this second kind are ab-139 solutely required to construct the downstream biomolecule: without them140 nothing will happen. But they do not precisely determine the structure of141 that biomolecule: their role will remain the same no matter what particular142 structure is produced. Crick's distinction between 'matter and energy' on the143 one hand and 'information' on the other thus corresponds to the standard144 distinction between the efficiency and specificity of a molecular process. The145 efficiency of a molecular process is a matter of how much product is obtained146 4Philosopher Gregory Morgan received two letters from Crick in response to questions about how and why Crick came to use the concept of information in his work. These were kindly made available to us by Morgan. Crick also states that the inspiration for his use of 'code' in the sequence hypothesis was the Morse Code's purely syntactic mapping between two alphabets (Crick to Morgan, April 3 1998) 5A treatment of the second aspect of Crick's ideas about information using algorithmic information measures is in preparation 6 for a given quantity of inputs. The specificity of the process is the extent to147 which the process produces just one output, rather than other energetically148 equivalent outputs. A well-designed polymerase chain reaction, for example,149 will produce just one DNA product (specificity) but many copies of that150 product (efficiency).151 Biological specificity is explained by locating the variables through which152 cells exercise precise determination of outcomes. In philosophy these vari-153 ables are known (coincidentally as far as the author can discover) as 'specific154 causes'[17][18]. In earlier work the present author and collaborators have155 developed an information-theoretic approach to measuring the specificity of156 causal relationships [19][20].157 This work was a contribution to the so-called 'interventionist' approach158 to causation[21][22], which is based on the insight that "causal relationships159 are relationships that are potentially exploitable for purposes of manipula-160 tion and control"[17][p. 314]. Interventionists treat causation as relationships161 between the variables that characterise an organised system. These rela-162 tionships can be represented by a directed acyclic graph. In such a graph,163 variable C is a cause of variable E when a suitably isolated manipulation164 of C would change the value of E. With suitable restrictions on the idea165 of 'manipulation' this test provides a criterion of causation, distinguishing166 causal relationships between variables from merely correlational relationships167 [21][pp. 94-107].168 Using this definition most events have many, many causes. But only some169 of these causal relationships are highly specific. The presence of oxygen in170 the atmosphere was one cause of the bushfire, but the arsonist was a more171 specific cause. The intuitive idea of specificity is that interventions on C172 can be used to produce any one of a large number of values of E, so that173 the cause variable has what Woodward terms "fine-grained influence" over174 the effect variable [17][p. 302]. This idea can be quantified using Shannon175 information theory with the addition or an intervention operator that allows176 us to isolate the causal component of the correlation between variables:177 SPEC: the specificity of a causal variable is obtained by measur-178 ing how much mutual information interventions on that variable179 carry about the effect variable.6180 6[19][20]. This measure has been independently proposed in neuroscience [23]and in the computational sciences [24]. For other related measures see [25][26]. 7 Formally, the specificity (I) of C for E against a background of other181 variables B is:182 I(Ĉ;E|B) = ∑ b p(b) ∑ c p(ĉ|b) ∑ e p(e|ĉ, b) log2 p(e|ĉ, b) p(e|b) (1) Equation 1 is a variant on the equation for Shannon's mutual information,183 which measures the overlap, or redundancy, in the probability distributions of184 two variables. Thê('hat') on a variable denotes Judea Pearl's intervention185 operator [22] and indicates that the value of that variable is determined186 by intervention rather than observation. These interventions transform the187 symmetrical mutual information measure into an asymmetric measure of188 causal influence, since it now represents not the observed correlation between189 the variables, but the effect on E of experimentally intervening on C whilst190 controlling for background variables B. If two variables are not causally191 connected, then however strongly they are correlated, I(Ĉ;E|B) = 0.192 A more intuitive way to think about the specificity measure is that it193 measures the extent to which an agent can reduce their uncertainty about194 the value of the effect variable if they can change the value of the cause, that195 is, the extent to which the agent can precisely determine the value of E by196 intervening on C.197 SPEC can be used to measure either how specifically two variables are198 connected (potential causal influence) or how much of the actual variation199 in E in some data is causally explained by variation in C (actual causal200 influence) [19][20]. Whilst the use of Shannon information theory means201 that the measure is restricted to discrete variables, equivalent measures of202 metric variables are possible. None of these additional complexities need203 concern us in the present discussion, however. Instead, we will briefly see how204 SPEC can be used to elucidate the difference between sources of specificity,205 such as coding sequences of DNA, on the one hand and sources of efficiency,206 such as RNA polymerase, on the other. We will then turn our attention to207 generalising this approach to sequence specificity.208 4. Genetic and epigenetic information209 If biological information is precise determination, as measured by SPEC,210 then it is easy to see that DNA is a rich source of information in the produc-211 tion of biomolecules in a way that distinguishes it from many other causes212 8 of those biomolecules. Varying the sequence of DNA exerts fine-grained con-213 trol over the structure of the molecules produced. Griffiths and collaborators214 [19][pp. 539-40] constructed a toy causal model of transcription with three215 variables: RNA Polymerase (POL), which is either Present or Absent, DNA,216 whose values are alternative DNA sequences, and RNA, whose values are217 alternative RNA sequences. The value of RNA depends on both POL and218 DNA. Nothing is transcribed if POL = absent and when POL = present,219 each value of DNA determines a unique value of RNA. This is roughly how220 Crick imagined transcription, although, of course, the chemical nature of the221 transcription machinery was unknown. Assuming for simplicity a maximum222 entropy distribution over both POL and DNA, the specificity of POL for223 RNA can never exceed 1 bit, since POL has an entropy of 1 bit and the mu-224 tual information between two variables cannot exceed the lowest maximum225 entropy of either variable. However, once the number of possible values of226 DNA each determining a unique RNA product exceeds 4, then DNA will227 always have > 1 bit of specificity for RNA. 7228 Calculations on a toy model are of limited interest. However, the approach229 that lies behind them has some immediate exciting consequences. The first is230 that this measure can be applied to both coding and non-coding regions in the231 genome to allow a quantitative comparison of the contribution of variables of232 both kinds to the precise determination of the sequence of a biomolecule. For233 example, mutations to any of the many well-characterised intronic splicing234 enhancer (ISE) or silencer (ISS) regions change the probability that one or235 more exons will be removed from the resulting transcript [27]. We could236 introduce this process into our toy model by replacing the variable DNA237 with two variables, INT and EXO, whose values would be the intronic and238 exonic content of the original DNA sequences respectively. The existence of239 intronic splicing control regions would be represented by the specificity of240 INT for RNA. This is an absolutely natural extension of the moves Crick241 himself made in his 1958 paper in the light of what we now know about how242 biomolecules are synthesized from the genome. There is sequence specificity243 in non-coding regions.244 Our approach has vindicated the idea that biological information is not245 restricted to the coding regions of the genome, but can be found in other246 7The entropy of RNA is H(RNA) > 2, we have just seen that I(POL;RNA) = 1, and DNA accounts for all the remaining entropy: I(DNA;RNA) = H(RNA|POL) > 1 9 functional regions as well. But we can go further. Our measure can be ex-247 tended to variables representing epigenetic (narrow sense, see Box 1.) modi-248 fications of DNA, insofar as they make a difference to the precise sequence of249 biomolecules through their role in the regulation of transcription and post-250 transcriptional and post-translational processing.251 Box 1. Definitions of epigenetic. From [8] [p. 112] Epigenesis: the idea that the outcomes of development are created in the process of development, not preformed in the inputs to development; epigenetic can be used in these senses: Epigenetics (broad sense Waddington 1940): the study of the causal mechanisms by which genotypes give rise to phenotypes; the integration of the effects of individual genes in development to produce the epigenotype. Epigenetics (narrow sense Nanney 1958): the study of the mechanisms that determine which genome sequences will be expressed in the cell; the control of cell differentiation and of mitotically and sometimes meiotically heritable cell identity. Epigenetic inheritance (narrow sense): the inheritance of genome expression patterns across generations (e.g. through meiosis) in the absence of a continuing stimulus. Epigenetic inheritance (broad sense): the inheritance of phenotypic features via causal pathways other than the inheritance of nuclear DNA. We refer to this as exogenetic inheritance (West and King 1987). 252 253 Numerous mechanisms have been suggested by which epigenetic marks254 could determine which exons will be included in a mature mRNA. RNA splic-255 ing is frequently co-transcriptional, either by splicing actually occurring while256 the pre-mRNA is still being transcribed or by the recruitment of factors that257 determine later splicing whilst the pre-mRNA is being transcribed. This cre-258 ates many opportunities for interaction between the splicing machinery and259 chromatin. The strongest direct evidence to date of epigenetic determination260 of alternative splicing is by alternative methylation states of histones. Indi-261 rect evidence suggests multiple significant roles for chromatin in determining262 alternative splicing [28][29][30].263 Epigenetic regulation of splicing is another missing variable in the toy264 model described above. If we extended the model to include it, variable(s)265 representing the methylation and acetylation state of histones would have266 10 some specificity for the RNA product variable. So, by a direct application of267 Crick's original reasoning, there is both genetic and epigenetic information in268 Crick's original sense: both genes and epigenes can have sequence specificity.269 Epigenetic modifications of chromatin can have sequence specificity. This270 will seem unsurprising to many biologists, given the number of papers that271 described the discovery of such mechanisms as the discovery of 'missing in-272 formation' for splicing [27][30]. This way of speaking need not be regarded273 in the deflationary manner described in Section 1. The approach to infor-274 mation outlined here shows that it can be taken literally as a step towards a275 unified theory of biological information. Sequence specificity is a measurable276 quantity that plays a causal role in the production of biomolecules, namely277 the precise determination of their linear structure.278 5. Why epigenetic information cannot be reduced to genetic infor-279 mation280 A common thought about why epigenetics cannot be a distinct source281 of information is worth considering, because it throws light on why Crick282 needed to introduce the idea of information. The thought is that, because283 the machinery that creates epigenetic modifications consists of molecules284 transcribed from the genome, the information in the epigenetic marks must285 ultimately be derived from the genome.286 "The problem with this kind of hair splitting is that ultimately287 the extra information (e.g. methylation) is provided by enzymes288 (methylases) encoded by genes in the genome. Epigenetics, per289 se, doesn't add any new information. It's just a consequence, or290 outcome, of the information already in the DNA." 8291 This informal comment is significant precisely because it is a typical first292 response to the idea that epigenetic marks contain information that supple-293 ments the information in the genome. This response makes it clearer why294 Crick needed to distinguish "the flow of energy, the flow of matter, and the295 flow of information." (1958, 144) The concept of specificity is a causal con-296 cept, not a material one, and identifying the sources of biological specificity297 8Larry Moran, Sandwalk Blog: http://sandwalk.blogspot.com.au/2016/10/extendingevolutionary-theory-paul-e.html Accessed 2016-12-08. This was a response to the abstract of the conference presentation from which this article is derived. 11 requires measuring causal control, not material contributions. Once we look298 at the matter in this light it becomes clear that some epigenetic modifications299 are specified by genomes whilst others are not.300 To see why the 'matter and energy' side of how epigenetic marks are301 created is not relevant, consider a case in which epigenetic marks are a site302 of conflict between multiple genomes. In cases of parental imprinting of303 genes it is biological common-sense that the parent, not merely the offspring,304 is a source of the biological information expressed in offspring phenotype. If305 this genetic conflict is mediated by epigenetic mechanisms that contribute306 to the precise determination of the sequence of gene products, for example307 by affecting which exons are included in a transcript [31], then it makes no308 sense to say that the information specifying the splice variant all comes from309 the offspring genome. The fact that the coding sequences for the enzymes310 involved in establishing and maintaining the methylation pattern are in the311 offspring genome is irrelevant. The relevant issue is where causal control312 is being exercised over the transcription and processing of those sequences.313 When parental imprints are established, the offspring provides the efficiency314 of the reaction, but the parent provides at least part of the specificity of the315 reaction.316 Now consider a case where the epigenetic mechanism that contributes to317 the precise determination of phenotype is influenced by the offspring's en-318 vironment. For example, regulation of alternative splicing by temperature319 seems to be an important mechanism for maintaining circadian rhythms in320 a wide range of species [32][33]. It seems reasonable to describe this as a321 mechanism for conveying environmental information to the genome, so that322 genome expression can be correctly matched to the environment. After all,323 the adaptive problem facing the organism is to reduce its uncertainty about324 where it is in the diurnal cycle and it does this by responding to an environ-325 mental cue. Our account of information vindicates this idea we could, at326 least in principle, measure the contribution of the environmental variable to327 the precise determination of sequence, just as we did the contribution of the328 epigenetic marks further along in the causal graph. The fact that the coding329 sequences for the enzymes involved are in the genome is irrelevant. The real330 issue is where causal control is being exercised over the transcription and pro-331 cessing of those sequences. In this case, evolution has designed a mechanism332 which detects and responds to information from the environment.333 In this section we have seen that our measure can be used to identify334 sequence specificity in both coding and non-coding sequences, in epigenetic335 12 marks, and in the causes of those marks, whether that is other genomes in336 cases of genetic conflict, or the environment in cases of plasticity. Information337 in Crick's sense is about precise determination. We have expanded the class338 of things that do the determining beyond those Crick originally envisaged.339 In the following section we will also expand the class of things that get340 determined.341 6. Sequence specificity and other biological information342 Crick used 'information' to label the distinctive relationship of precise343 determination that holds between coding sequences of nucleic acids and the344 order of elements in their products, a relationship which does not hold be-345 tween those products and many of their other causes. However, in Sections346 4 and 5 we saw that some other causes do have this relationship to the order347 of elements in gene products. In this section we ask whether this distinc-348 tive relationship of precise determination exists for phenotypes more distal349 than the primary structure of RNAs or proteins. In this context we will not350 talk of 'sequence specificity', reserving that term for the precise determina-351 tion of sequence, which was Crick's original concern. We will use the more352 general term 'biological information' to refer to the precise determination of353 phenotypes that are causally downstream of the primary structure of gene354 products, phenotypes such as the tertiary structure of proteins, and still more355 distally, morphology, and behavior.356 As we noted in Section 1, the expressive power of the genetic code is357 limited to specifying the linear order of elements in a polypeptide. Changes358 to DNA coding sequences cause a whole chain of events, but they do not code359 for the more distal events in that chain [2]. The use of 'code' in this extended360 sense is metaphorical, like saying that when Richard Nixon literally ordered361 the Watergate cover-up he also 'ordered' his own downfall.362 But while the genetic triplet code is limited in this way, the broader idea363 of information as precise determination is not. The idea of information as364 precise determination, whether measured using SPEC or another measure,365 can be applied to any set of variables arranged in a causal graph. In principle,366 therefore, our approach can be used to measure biological information in a367 gene (or an epigene) with respect to any downstream variable affected by that368 gene. In fact, a range of causal Shannon information measures related to the369 one introduced here are already used in complex systems science to study a370 wide spectrum of living and non-living systems [34]. Genes or epigenes may371 13 not literally 'code' for morphology and behavior, but they do literally contain372 biological information that specifies to some measurable degree morphology373 and behavior.374 It is now possible to extend our approach to biological information to375 mechanisms of exogenetic heredity (broad-sense epigenetic inheritance, see376 Box 1). We have already seen that environmental factors can have sequence377 specificity, since they can be specific causes of epigenetic modifications of378 chromatin and thus contribute to the precise determination of the structure379 of biomolecules. But there are broader mechanisms of environmental hered-380 ity, such as habitat or host imprinting, in which the phenotype of offspring381 is influenced by parental phenotype but where no epigenetic mark is trans-382 mitted through meiosis, so there is no epigenetic inheritance in the standard,383 narrow sense. These broader mechanisms are still usually referred to as 'epi-384 genetic inheritance' but we will refer to them as exogenetic inheritance to385 avoid confusion. The question of whether such environmental variables con-386 tribute information to development becomes the considerably more precise387 question of how specific is the causal relationship between those variables388 and variables representing morphology and behavior.389 At this point we have something like a general theory of biological infor-390 mation. Information refers to a distinctive relationship of precise determi-391 nation, which we can identify with the older concept of biological specificity.392 The phenomenon of biological specificity is explained by the existence of393 causes through which organisms exercise precise determination of outcomes,394 and the functional expression of this specificity is explained by natural se-395 lection acting on those causes. Central to organisms' ability to exercise this396 highly specific control is the relationship of precise determination originally397 identified by Crick between the sequence of DNA and the sequences of RNA398 and protein. Heredity is the transfer of biological specificity from one gener-399 ation to the next. Central to organisms' ability to transfer specificity in this400 way is the existence of coding sequences of DNA which contain the informa-401 tion to determine the specificity of their products.9402 9Comparison of causal roles need not be reduced to a simple 'more or less specific'. For example, elucidating the distinction between permissive and instructive induction events in development requires a more complex application of the tools used here [35] 14 7. Development and evolution403 We have seen that there can be genetic, epigenetic and exogenetic sources404 of biological information in development. How significant the later two405 sources are in development is an empirical question. But even biologists406 who find it plausible that epigenetic and exogenetic factors are significant in407 development are often sceptical about whether they are significant in evolu-408 tion. The most common reason for this scepticism is that epigenetic marks409 are relatively unstable when compared to genetic mutations.410 The key point is that if epigenetic states are important to evolu-411 tion, they are important through stable changes in these states,412 namely transmissible epimutations. And if epimutations are not413 transmitted with reasonable stability over generations, they can-414 not have any long-term evolutionary potential (Slatkin 2009). If415 an epimutation is to have evolutionary importance, it must per-416 sist. [36] [p. 391]417 The stability of epigenetic marks is certainly an important question. But418 whether their evolutionary significance turns on their stability depends on419 what is meant by 'evolutionary significance'. In at least one important sense420 of that phrase, epigenetic marks do not need to be stable to be significant. It421 is surely reasonable to regard a biological phenomenon as having evolutionary422 significance if it has widespread and substantial impact on the dynamics423 of evolution, or, to put it another way, if models that do not include this424 phenomena are unlikely to correctly predict the course of evolution. But we425 already know that this is the case from work on the evolutionary genetics of426 maternal effects [37]. Maternal effects can be defined as the causal influence427 of maternal genotype or phenotype on offspring phenotype independent of428 offspring genotype [38], which is in line with the approach taken here to429 defining epigenetic and exogenetic information. Maternal effects may be430 either epigenetic or exogenetic, depending on the specific causal pathway by431 which maternal influence is exerted.432 Maternal effects, and parental effects generally, are recognised as a sig-433 nificant factor in evolution [39]. But any form of epigenetic or exogenetic434 heredity that is a significant source of biological information in the sense435 defined above will be significant in the same way because it substantially436 alters the mapping from parent phenotype to offspring phenotype. In this437 sense, epigenetic and exogenetic heredity is significant for evolution for the438 15 same reason that Mendelian models of heredity were significant. The pri-439 mary significance of Mendelism for the theory of natural selection was that440 it specified the form of the transmission phase. Epigenetic and exogenetic441 heredity change this form, and even in the most conventional cases, where442 maternal effects are simply a one-generation time-lag in the expression of an443 allele, this has substantial impact on the dynamics of natural selection.444 Since Wilkins is well aware of all these points we can infer that this is not445 the sense in which he is asking 'if epigenetic states are important to evolu-446 tion.' Another valid sense of that question is whether epigenetic or exogenetic447 mutations can be the basis of cumulative adaptation. It is plausible that an448 unstable inheritance system cannot play this role, but that does not mean449 that it cannot play an important role in a process of cumulative adaptation450 that also involves the genetic heredity system [40]. Finally, an important per-451 spective on the relative evolutionary significance of genetic, epigenetic and452 exogenetic heredity is that they may play complementary roles. For example,453 it is plausible that genetic and epigenetic heredity allows organisms to adapt454 themselves to changing environments on different timescales [41].455 Other authors have argued that to suppose epigenetic inheritance implies456 anything for evolutionary theory is to conflate 'proximate' or mechanistic457 with 'ultimate' or evolutionary biology. Scott-Phillips et al [42] draw a useful458 comparison between the discovery of epigenetic inheritance and the discovery459 of Mendelian genetics. In the first years of the 20th century some Mendelians460 saw Mendelian inheritance as a theory of evolutionary change and presented461 it as a challenge to the Darwinian theory of natural selection. They suggest462 that authors who present epigenetic inheritance as a challenge to conventional463 neo-Darwinism are like those early Mendelians: they are confusing a proxi-464 mate, mechanistic theory of heredity with an ultimate theory of the causes465 of evolutionary change. Scott-Phillips et al are engaged in a wider dispute466 with authors who question the value of the proximate/ultimate distinction467 [43] and I will not address that wider dispute here. However, with respect468 to the specific issue of whether epigenetic inheritance has implications for469 evolutionary theory, their analogy seems to establish exactly the opposite of470 their intended conclusion. The founders of modern neo-Darwinism did not471 dismiss Mendelism as a merely proximal mechanism, they used it to derive472 the form of the transmission phase in the process of natural selection. As I473 pointed out above, epigenetic and exogenetic heredity shows up in quantita-474 tive genetics as parental effects, and the incorporation of parental effects into475 evolutionary models has a significant effect on evolutionary dynamics. In this476 16 way both Mendelian heredity and epigenetic heredity are part of ultimate,477 not merely proximate biology.478 An interesting aspect of Scott-Phillips et al's argument is their insistence479 that, "Put simply, if we wish to offer an ultimate explanation for the exis-480 tence of some trait, we must make reference to how that trait contributes481 to inclusive fitness." [42] [p 40]. They base this conclusion on the results of482 Grafen's 'formal Darwinism' project [44] which seeks to show that evolution-483 ary dynamics are in important respects equivalent to the maximisation of484 inclusive fitness. But what is done in this very impressive program of work485 is to rigorously compare optimisation models to population genetic models,486 where the latter models simply assume that there is no epigenetic hered-487 ity. This is not a problem for the formal Darwinism program.10 But it is a488 problem for Scott-Phillips et al, who are effectively arguing that epigenetic489 inheritance cannot contribute to ultimate explanation because maximising490 (genetic) inclusive fitness fully represents evolutionary dynamics in models491 which assume there is no epigenetic inheritance.492 Dickins and Rahman [46] suggest that, while epigenetic inheritance may493 play a role in evolution, those who present it as a challenge to conventional494 neo-Darwinism have only presented evidence that it is a significant proxi-495 mate mechanism. They have failed to present evidence that it is significant496 in ultimate biology. Once again, this seems to overlook the way that epige-497 netic and exogenetic heredity show up in conventional, quantitative genetic498 models, namely as parental effects, and the known impact of such effects on499 evolutionary dynamics.500 8. Conclusion501 We set out to define a sense of 'information' that can make sense of the502 idea that development is the expression of information that accumulated503 during evolution and that heredity is the transmission of this information.504 Whilst compelling at a metaphorical level, this is surprisingly hard to cash505 out in serious, scientific terms. We began with a simple conception of infor-506 mation that plays a straightforward role at the heart of molecular biology and507 explored how many other aspects of biology can be clarified using this sense508 10Lu and Bourrat [45] have recently discussed how this program can be extended to include epigenetic inheritance and suggest that because of this epigenetic inheritance does not require any radical revision of conventional neo-Darwinism. 17 of information. Our starting point was the sense of information introduced509 by Francis Crick in 1958. We identified two aspects of Crick's conception510 of information (1) precise determination and (2) the transfer of biological511 specificity from one molecule to another. This paper concentrated on the512 first aspect. We analysed the idea of precise determination using an informa-513 tion theoretic measure of causal specificity. Using this measure we showed514 that coding sequences of DNA have a distinctive relationship of precise de-515 termination to RNAs and polypeptides. This distinguishes coding sequences516 from many other causes of the same outcomes, such as the presence of an517 RNA polymerase. This is what Crick meant when he identified coding se-518 quences as containing information and the other causes as not doing so. His519 distinction is closely related to the distinction between the specificity and520 efficiency of a biochemical process.521 Since 1958, however, a great deal has been learnt about the production522 of biomolecules. We saw that Crick's approach to information has natural523 extensions to non-coding regions of DNA, to epigenetic marks, and to the524 genetic or environmental upstream causes of those epigenetic marks. Any525 of these variables may have sequence specificity, that is, they may con-526 tribute substantially to the precise determination of the linear structure527 of biomolecules. Moreover, we saw that it is a mistake to suppose that528 the sequence specificity of epigenetic marks must always derive from se-529 quence specificity elsewhere in the genome, or in other genomes. Finally,530 we generalised to a broader concept of 'biological information' that is ap-531 plicable to more distal phenotypes, and not merely to the linear structure532 of biomolecules. Relationships of precise determination can exist between533 genetic, epigenetic and exogenetic factors in development and distal pheno-534 types, such as morphology and behavior. This gives us a general theory of535 biological information that can be used to restate more precisely the idea with536 which we started. Development is the expression of biological specificity, or537 biological information conceived as precise determination and measured using538 causal information theory. In heredity, factors which are able to exercise this539 precise determination are passed on from previous generations. These factors540 may be genetic, epigenetic or exogenetic. In the penultimate section of the541 article we argued that the existence of biological information in epigenetic542 and exogenetic factors is relevant to evolution as well as to development.543 18 Competing Interests544 I have no competing interests.545 Funding546 This publication was made possible through the support of a grant from547 the Templeton World Charity Foundation. The opinions expressed in this548 publication are those of the author and do not necessarily reflect the views549 of the Templeton World Charity Foundation.550 Acknowledgements551 I thank the members of the Theory and Method in Biosciences group552 at the Charles Perkins Centre for feedback on an earlier draft, Arnaud553 Pocheville for drawing Figure 1, and Stefan Gawronski for his assistance554 with the preparation of the manuscript.555 References556 [1] Williams GC. Natural Selection: Domains, Levels and Challenges. New557 York: Oxford University Press; 1992.558 [2] Godfrey-Smith P. On the theoretical role of "genetic coding". Philoso-559 phy of Science. 2000;p. 26–44.560 [3] Jenuwein T, Allis CD. Translating the histone code. Science (New York,561 NY). 2001 Aug;293(5532):1074–1080.562 [4] Barash Y, Calarco JA, Gao W, Pan Q, Wang X, Shai O, et al. Deci-563 phering the splicing code. Nature. 2010 May;465(7294):53–59. Available564 from: http://www.nature.com/doifinder/10.1038/nature09000.565 [5] Sarkar S. Biological information: A sceptical look at some central dog-566 mas of molecular biology. In: Sarkar S, R S C, editors. The Philosophy567 and History of Molecular Biology: New Perspectives. vol. 183 of Boston568 Studies in the Philosophy of Science. Dordrecht: Kluwer Academic Pub-569 lishers; 1996. p. 187–232.570 [6] Levy A. Information in Biology: A Fictionalist Account. Nous.571 2011;45(4):640–657.572 19 [7] Griffiths PE. Genetic Information: A Metaphor in Search of a Theory.573 Philosophy of Science. 2001;68(3):394–412.574 [8] Griffiths PE, Stotz K. Genetics and Philosophy: An introduction. New575 York: Cambridge University Press; 2013.576 [9] Stotz K, Griffiths PE. Biological Information, causality and specificity577 an intimate relationship. In: From Matter to Life:Information and578 Causality. Cambridge and New York: Cambridge University Press; In579 Press. p. 000–000.580 [10] Maynard Smith J. The concept of information in biology. Philosophy581 of Science. 2000;67(2):177–194.582 [11] Shea N. Representation in the genome and in other inheritance systems.583 Biology and Philosophy. 2007;22:313–331.584 [12] Witzany G, Baluŝka F. Life's code script does not code itself. EMBO585 reports. 2012 Dec;13(12):1054–1056.586 [13] Rosenberg A. Is epigenetic inheritance a counterexample to the central587 dogma? History and philosophy of the life sciences. 2006;p. 549–565.588 [14] Morange M. A History of Molecular Biology. Cambridge, MA: Harvard589 University Press; 1998.590 [15] Crick FHC. On Protein Synthesis. Symposium of the Society for Ex-591 perimental Biology. 1958;12:138–163.592 [16] Crick FHC. Central Dogma of Molecular Biology. Nature. 1970593 Aug;227:561–563.594 [17] Woodward J. Causation in biology: stability, specificity, and the choice595 of levels of explanation. Biology & Philosophy. 2010;25(3):287–318.596 [18] Waters CK. Causes that make a difference. Journal of Philosophy.597 2007;104(11):551–579.598 [19] Griffiths PE, Pocheville A, Calcott B, Stotz K, Kim H, Knight RD.599 Measuring causal specificity. Philosophy of Science. 2015;82:529–555.600 20 [20] Pocheville A, Griffiths PE, Stotz K. Comparing causes an information-601 theoretic approach to specificity, proportionality and stability. In: Leit-602 geb H, Niiniluoto I, Sober E, Seppälä, Päivi, editors. Proceedings of603 the 15th Congress of Logic, Methodology and Philosophy of Science.604 London: College Publications; In Press. p. 000–000.605 [21] Woodward J. Making Things Happen: A Theory of Causal Explanana-606 tion. New York & Oxford: Oxford University Press; 2003.607 [22] Pearl J. Causality: Models, Reasoning, and Inference. 2nd ed. Cam-608 bridge University Press; 2009.609 [23] Tononi G, Sporns O, Edelman GM. Measures of degeneracy and redun-610 dancy in biological networks. Proceedings of the National Academy of611 Sciences. 1999;96(6):3257–3262.612 [24] Korb KB, Hope LR, Nyberg EP. Information-Theoretic Causal Power.613 In: Emmert-Streib F, Dehmer M, editors. Information Theory and Sta-614 tistical Learning. Boston, MA: Springer US; 2009. p. 231–265.615 [25] Ay N, Polani D. Information flows in causal networks. Advances in616 complex systems. 2008;11(01):17–41.617 [26] Janzing D, Balduzzi D, Grosse-Wentrup M, Schölkopf B. Quantifying618 causal influences. The Annals of Statistics. 2013;41(5):2324–2358.619 [27] Wang Z, Burge CB. Splicing regulation: From a parts list of regulatory620 elements to an integrated splicing code. RNA. 2008 Mar;14(5):802–813.621 [28] Luco RF, Allo M, Schor IE, Kornblihtt AR, Misteli T. Epigenetics in622 Alternative Pre-mRNA Splicing. Cell. 2011 Jan;144(1):16–26.623 [29] Sorenson MR, Jha DK, Ucles SA, Flood DM, Strahl BD, Stevens SW,624 et al. Histone H3K36 methylation regulates pre-mRNA splicing in Sac-625 charomyces cerevisiae. RNA Biology. 2016 Apr;13(4):412–426.626 [30] de Almeida SF, Carmo-Fonseca M. Design principles of interconnections627 between chromatin and pre-mRNA splicing. Trends in Biochemical Sci-628 ences. 2012 Jun;37(6):248–253.629 21 [31] Cowley M, Wood AJ, Bohm S, Schulz R, Oakey RJ. Epigenetic control630 of alternative mRNA processing at the imprinted Herc3/Nap1l5 locus.631 Nucleic Acids Research. 2012 Oct;40(18):8917–8926.632 [32] Sanchez SE, Petrillo E, Beckwith EJ, Zhang X, Rugnone ML, Hernando633 CE, et al. A methyl transferase links the circadian clock to the regulation634 of alternative splicing. Nature. 2010 Nov;468(7320):112–116.635 [33] Syed NH, Kalyna M, Marquez Y, Barta A, Brown JWS. Alterna-636 tive splicing in plants coming of age. Trends in Plant Science. 2012637 Oct;17(10):616–623.638 [34] Bossomaier T, Barnett L, Harré M, Lizier JT. An Introduction to Trans-639 fer Entropy. Cham: Springer International Publishing; 2016.640 [35] Calcott B. Causal Specificity and the Instructive-Permissive Distinction.641 Biology & Philosophy. In Press;p. 000–000.642 [36] Wilkins A. Epigenetic inheritance: Where does the field stand today?643 What do we still need to know? In: Gissis SB, Jablonka E, editors.644 Transformations of Lamarckism: From Subtle Fluids to Molecular Biol-645 ogy. Cambridge, MA: The MIT Press; 2011. p. 389–393.646 [37] Wolf JB, Wade MJ. Evolutionary genetics of maternal effects. Evolution.647 2016 Apr;70(4):827–839.648 [38] Wolf JB, Wade MJ. What are maternal effects (and what are they not)?649 Philosophical Transactions of the Royal Society B: Biological Sciences.650 2009 Apr;364(1520):1107–1115.651 [39] Badyaev AV, Uller T. Parental effects in ecology and evolution: Mech-652 anisms, processes, and implications. Philosophical Transactions of the653 Royal Society, Biological Sciences. 2009;364:1169–1177.654 [40] Badyaev AV, Hill GE, Beck ML, Dervan AA, Duckworth RA, McGraw655 KJ, et al. Sex-Biased Hatching Order and Adaptive Population Diver-656 gence in a Passerine Bird. Science. 2002;295(5553):316–318.657 [41] Danchin E, Pocheville A. Inheritance Is Where Physiology Meets Evo-658 lution. Journal of Physiology. 2014;592:2307–2317.659 22 [42] Scott-Phillips TC, Dickins TE, West SA. Evolutionary Theory and the660 Ultimate-Proximate Distinction in the Human Behavioral Sciences. Per-661 spectives on Psychological Science. 2011 Jan;6(1):38–47.662 [43] Laland KN, Sterelny K, Odling-Smee J, Hoppitt W, Uller T. Cause663 and Effect in Biology Revisited: Is Mayr's Proximate-Ultimate Di-664 chotomy Still Useful? Science. 2011 Dec;334(6062):1512–1516. Avail-665 able from: http://science.sciencemag.org.ezproxy1.library.666 usyd.edu.au/content/334/6062/1512.667 [44] Grafen A. Optimization of inclusive fitness. Journal of Theoretical Bi-668 ology. 2006 Feb;238(3):541–563. Available from: http://linkinghub.669 elsevier.com/retrieve/pii/S0022519305002559.670 [45] Lu Q, Bourrat P. The Evolutionary Gene and the Extended Evolution-671 ary Synthesis. The British Journal for the Philosophy of Science. In672 Press;p. TBC.673 [46] Dickins TE, Rahman Q. The extended evolutionary syn-674 thesis and the role of soft inheritance in evolution. Proc675 R Soc B. 2012 Aug;279(1740):2913–2921. Available from:676 http://rspb.royalsocietypublishing.org.ezproxy1.library.677 usyd.edu.au/content/279/1740/2913.678