A comprehensive theory of induction and abstraction, part II Cael L. Hasse∗ January 7, 2019 Abstract This is part II in a series of papers outlining Abstraction Theory, a theory that I propose provides a solution to the characterisation or epistemological problem of induction. Logic is built from first principles severed from language such that there is one universal logic independent of specific logical languages. A theory of (non-linguistic) meaning is developed which provides the basis for the dissolution of the 'grue' problem and problems of the non-uniqueness of probabilities in inductive logics. The problem of counterfactual conditionals is generalised to a problem of truth conditions of hypotheses and this general problem is then solved by the notion of abstractions. The probability calculus is developed with examples given. In future parts of the series the full decision theory is developed and its properties explored. ∗caelthetutor@gmail.com Contents 1 Logic and language 1 1.1 Logic . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Formal interpretation . . . . . . . . . . . . . . 4 1.3 Identity and prop functions . . . . . . . . . . . 6 2 Abstractions and a problem with truth conditions 8 2.1 Substance . . . . . . . . . . . . . . . . . . . . 10 2.2 Abstraction and the web of pictures . . . . . . 11 3 Probability 13 3.1 Bayes' theorem with hypothesis spaces . . . . 15 3.2 An example calculation . . . . . . . . . . . . . 16 3.3 The dissolution of arguments against unique probabilities . . . . . . . . . . . . . . . . . . 18 3.4 The 'grue' non-problem . . . . . . . . . . . . 19 1 Logic and language In Abstraction Theory, probabilities are considered to be logical degrees of validity as opposed to degrees of belief, some physical property or some frequency of events in a population. This logical conception of probabilities is generally considered to be fatally flawed due to at least two problems: Goodman's 'grue' problem (Goodman, 1983) and the problem of the uniqueness of priors (van Fraassen, 1989). I argue that these problems are not inherent to logic itself but rather are a problem with conceptions of logic in the analytic tradition. The problems are dissolved if the principles of logic are reformulated such that the close relationship between logic and language is severed. To do this, we must 1 consider what logic is conceived for. There are two roles that logic plays that can be distinguished. The first is as a tool for 'good' argumentation between people. This role is necessarily related to language: one wants to communicate in a way such that arguments that are 'good' are distinguishable from arguments that are faulty in various ways. To do this, a certain argreement of rules between a linguistic community is required and the goodness of the argument is defined by the rules to be related to the form in which the argument is seen to be expressed. Language understood as determining relationships between forms of expression and the goals of expression (e.g., communication) is then the context in which to formulate logics. The second role of logic is as a normative constraint on action. Such constraint applies to individual agents which need not communicate in any way. This second role is not necessarily related to language. It is my contention that the second role should be the primary role of logic; logic should be considered non-linguistic and its relationship to language oblique. One result of this is that logic becomes greatly simplified. Concepts related to language are disentangled from logic. For example, the expressivity of a formal language–being a linguistic concept–ceases to be important in the formulation of logic itself; we will not have to worry that the formal apparatus we shall use to express and work with the theory is sufficiently expressive enough. The calculus consequently becomes extremely simple and ostensibly universal. 1.1 Logic To understand the formulation of logic to be proposed, consider the schema for the decision theory outlined in part I: we have a collection of hypotheses {Hk |k = 1, . . . ,m}, where each hypothesis Hk is associated with outcomes {Zkj | j = 1, . . . ,n(k)}. We also have assumptions Y and potential actions {Ai | i}. The appropriate action is the one that maximises the expected utility e′(Ai)+ ∑ m k=1 ∑ n(k) j=1U(Z k j Ai,Hk)p(Z k j Hk |AiY ) = ∑ m k=1 p(Hk |AiY )∑ n(k) j=1U(Z k j Ai,Hk)p(Z k j |HkAiY ), where U is a utility function. The expected utility e′ is a weighted sum of the subexpected utilities e′(Ai) = ∑ m k=1 p(Hk |AiY )e′(Ai , Hk) with subexpected utilities given by e′(Ai , Hk)+ ∑ n(k) j=1U(Z k j Ai,Hk)p(Z k j |HkAiY ). Subexpected utilities are the expected utilities one gets upon assumption of a hypothesis. If for some reason e′(Ai) = Ce′(Ai , Hk) + K, where C and K are constant over the set of actions, then we say one should act as if hypothesis Hk were the appropriate assumption. For exmaple, this could happen if for some l, U(Zkj Ai,Hk) = 0 for all k 6= l. A more complicated version of this mechanism is how the theory describes how one is licensed to act as if the world were a certain way when one does not have enough evidence to inductively infer it so. Note that this decision theory is only a schema used as a stepping stone to the full theory. As logic is developed, the concepts used above will be replaced with more refined ones. Two central aspects of the above schema are (1) we have not assumed that the hypotheses are exclusive,i.e., we have not assumed that it would be inconsistent to assume a conjunction of two different hypotheses. Assumption of exclusivity affects the probabilities in, for example, the principle of total probability (see part I) such that, with the assumption of exhaustivity, the sum over hypotheses has only one term for each hypothesis (as opposed to terms containing conjunctions of multiple hypotheses). The sum over hypotheses in e′ comes from the definition of e′ as opposed to the assumption of exclusivity and the principle of total probability. Despite not assuming exclusivity, (2) one does not act on the assumption of two hypotheses at the same time. This tells us something about how we are to understand what hypotheses are: different hypotheses are not necessarily 2 exclusive but we shall say that they are incommensurable with eachother. This notion of incommensurability differs in important ways from standard definitions (Kuhn, 1989; Feyerabend, 1962) but has conceptual affinity with the notion of taxonomic incommensurability. As hypotheses are defined to be incommensurable with eachother, there are things that won't be considered hypotheses. Consider the two sentences 'Alice is either at the shops or the movies' and 'Alice is either at the movies or at home.' These sentences can be appropriately associated with something that narrows down possibilities. Moreover, they can be associated with narrowing down the same collection of possibilities. What they are appropriately associated with can be simultaneously assumed (in this case, Alice can be at the movies) and are hence commensurable, i.e., they're not hypotheses. More generally, no hypothesis is reducible to a collection of possibilities: they should not be understood as truth functions of atomic propositions or sets of possible worlds. In contrast to the above sentences, consider 'the possible colours that objects may possess are red and green' and 'the possible colours objects may possess are red, green and blue.' In our theory it is reasonable to associate hypotheses to these sentences. Such hypotheses are incommensurable in the following way: consider the hypotheses to constitute conditions defining the space of possible colours. Despite the fact that I used the word 'red' in both sentences, what is meant by an outcome associated with something possessing the colour red depends on the context of its hypothesis. Thus an outcome reasonably associated with 'the apple is red' under one of the hypotheses considered above is different to an outcome associated with the same sentence under the other hypothesis. Two different hypotheses cannot 'overlap' in their outcomes because their outcomes are by definition different. With these ideas in mind, let's develop the central principles of logic. Propositions are the central entities of concern, as opposed to sentences or statements. Early Wittgenstein (Wittgenstein, 1994) associates propositions with a logical space. The use of the phrase 'logical space' being metaphorically suggestive as it reminds one of the concept of geometrical space. I'm going to use this metaphor to twist Wittgenstein's notion to new purposes. Definition: A logical space consists of points called basic propositions. They have no complexity but are not like atomic propositions in essential ways, hence the different name. Consider a geometrical space, which defines a domain in which to understand things such as, for example, events. Such a space has points that are given context through, for example, the topology of the space, which determines which points are near eachother. Similarly, we say a logical space provides a context which determines which basic propositions are dependent on eachother. A logical space provides the conditions for inference between basic propositions, much like how geometrical space provides the conditions for movement between points. As two points in geometrical space can be considered near eachother only if they live in the same space, two basic propositions can be considered dependent only if they live in the same logical space. We say that a logical space determines the (nonlinguistic) meaning of its basic propositions1. We say basic propositions can have meanings (given by the logical space it is situated in) as opposed to a standard conception where propositions are meanings. This notion of meaning is pragmatic in character. Two sets of basic propositions that have the same intra-set conditions for deductive inference and no dependencies that leave the sets are considered the same; there is nothing else that is considered to distinguish them. A geometrical space cannot have two topologies. Similarly, a logical space cannot have two different webs of dependencies. Each possible web of dependencies 1From now on meaning refers to this understanding of non-linguistic meaning unless stated otherwise. 3 requires a new logical space, i.e., there is more than one logical space and deductive or inductive inference can occur only inside one and cannot occur between them. Every hypothesis is associated with a logical space: hypotheses determine the meaning of their basic propositions. A proposition living in one space cannot be reduced to propositions living in another. This is why basic propositions are not atomic. A proposition can have only one meaning. For example, consider a proposition within the hypothesis of (a version of) Newtonian Gravity. We say such a proposition cannot be within (a version of) General Relativity. In a formal language, propositions associated with all sentences of the language are generally taken to be simultaneously meaningful. This is not possible unless all of these propositions have meanings due to a single hypothesis. But there are many hypotheses. Thus a formal language as standardly considered cannot be universal and thus is inadequate to our needs. For example, there is no such thing as the conjunction of two propositions from different hypotheses. Consequently, we must alter the relationship between sentences of a language and propositions. Namely, an utterance/inscription/sentence is not considered to automatically be associated with a proposition; it must first be interpreted. Definition: Interpretation is a mental act of assigning meaning by associating the idea2 of a meaningful proposition with an instance of an utterance/inscription/sentence3. An interpreted proposition is one imagined in the context of acting as if some hypothesis were true. 2I do not provide any sophisticated notion of what an idea or mental act may be. My focus is on logic and decision theory; the definition of notions like interpretation is to be used as an informal guide to how we are to understand the relationships between logic, mind and language. It is a contention of mine that a precise description of mind is unnecessary for our purposes. 3We do not need this association to be perfect, even in principle. In the analytic tradition, one defines an abstract substitute to the above notion of interpretation capable of mapping meanings with sentences: the idea being that there is some aspect of a sentence/proposition that is independent of us that, when analysed, can identify, or at least constrain meaning, namely its form. Our notion of interpretation does not compel us to use this abstract substitute for meaning association. Moreover, the use of form artificially constrains us. Thus we shall define the theory such that propositions do not have form. Propositions not having form is a result of the view of the role of language in the theory. A contrasting view influenced by the likes of Frege, Russell, Wittgenstein in the Tractatus and others considers language in terms of its capacity to represent the world. The idea being that there are features such as the form and content of a sentence/proposition/statement that provide this representation and this representation is somehow reflected in aspects of reality–perhaps for example certain terms refer to things in the world. I contend that any such direct reflection is misguided; that language–even in its most idealised, regimented form–is a product of the reciever of a message and their understanding of the intentions of the author. There is no necessary connection between reality and the structure of the message in terms of things like form and hence no necessary connection between form and what is supposedly being expressed. Any connection is contingent on the receiver and author of the message. This is not to say that some type of representation cannot be achieved, at least to some limited extent, only that it is contingent upon the interpreter. One aspect of a message that is not dependent upon the author and receiver is the possibility of its individuation. This is reflected in the individuality of the basic propositions. Other aspects of the basic propositions are described in the next section. 1.2 Formal interpretation Instead of direct expression of a proposition that we interpret upon analysis, consider that we can instead express interpretation itself. 4 Definition: Formal interpretation is expression of meaning assignment. This is not to be associated with an actual act of interpretation. Definition: We shall use the term 'prop' as a device to replace expressions usually associated with propositions. Props are not interpreted–they are not propositions. Rather, they will be used for formal interpretation. When formally interpreting basic propositions, we shall use basic props, expressed using lower case letters such as 'a' and 'b'. We shall define expression of negation, conjunction and disjunction with props: ab expresses 'a and b', a+ b expresses (inclusive) 'a or b' and a expresses 'not a'. Formally this has some similarities with Boolean algebra with the basic operations of negation, conjunction and disjunction. Using these basic operations we can form sentences with props, such as a+ bc. Formal rules we take from Boolean algebra are associativity, commutativity, distributivity and idempotence of conjunction and disjunction, and duality. We also have two new rules: for any props a and b, a(b+b) = a, and = b+b. (1) In this second rule, the blank space ' ' counts as a prop. A difference to Boolean algebra is that props are not variables and there is no expression of truth values. Definition: Meaning of a basic proposition is given by its logical space when no data is given. Often we'll say the logical space providing meaning of a basic proposition makes up a picture of the basic proposition. Definition: We can express logical spaces with sentences. For example, ab+ ab expresses a logical space. The prop a is not associated with a proposition, but a situated in the space ab+ ab is. When referring to a basic proposition, we express this situatedness using '◦' after a basic prop and before a sentence expressing a logical space. For example, we can write 'a◦ab+ab'. Sentences of this form are used when talking about propositions. For example, we can say 'the proposition a◦ab+ab has meaning ab+ab.' Definition: When expressing the situatedness of a proposition in a logical space that may or may not be the meaning of the proposition, we use '|' instead of '◦'. These kinds of expression are defined as arguments, the sentence to the right of the | called the premise and the sentence to the left called the conclusion. A central property of an argument is its validity. Probabilities are measures of degrees of validity of an argument. Definition: The premise of an argument provides conditions necessary for making a valid argument with various basic props as the conclusion. For any particular proposition, these are its validity conditions. It is of central importance to understand that validity conditions are not truth conditions. There are two reasons for the difference. Firstly, there is no notion of truth in the theory. Definition: To understand the second point let's introduce the notion of a prop function which is analogous to a truth function but in this theory is used merely as a notational device for concision of expression of sentences of basic props. We use upper case letters for prop functions. What is analogous to truth conditions of a prop function X can be expressed by for example, X = ab + a. They are conditions sufficient to make a valid argument with 'X' as its conclusion. This is as opposed to X expressing–if X expresses a logical space under consideration–conditions necessary to validly infer a ◦ ab + a, i.e., a ◦ ab + a is validly inferrable only if b ◦ ab+ a is also validly inferrable but b ◦ ab+ a is not a sufficient assumption to validly infer a ◦ ab + a. X expresses validity conditions for a ◦ ab + a while the 'truth conditions' for X can be expressed by ab+ab. 5 Rule: Basic props are not prop functions. For example, it is nonsense to write a = b+c, this being akin to writing 1 = 2 . Thus there is nothing analogous to the 'truth conditions' for a basic prop. The 'truth conditions' of a prop function can be thought of as provided by the basic props that 'make it up'. In contrast, validity conditions of a basic proposition are provided by the configuration of basic propositions it is surrounded by in its logical space. In natural language, we can characterise partial meaning of a proposition interpreted from the statement 'the ball is red' with another statement like 'if the ball is red, then it is not blue and vice versa.' One can translate an 'argument' made in natural language or a more formal one into an argument as defined in the theory: when presented in an argument, the sentences of an 'argument'–whether in a natural or formal language–will be sandwiched between p . . .q. For example, we could have pthe ball is redq and pthe ball is not blueq. These denote signs assigned to either basic props or prop functions. Most propositions of interest will be basic propositions or conjunctions of basic propositions. The basic props in the argument are formally interpreted by including in the premise a picture which can be represented by a prop function such as X . An example of an 'argument' translated into an argument with the sentences formally interpreted is the following: pthe ball is not blueq |pthe ball is redqX . Note that (a) this translation process will be generalised in a subsequent section, and (b) one is in principle free to choose whatever picture–X–one wants; the meaning of the basic propositions is not constrained by the form of the sentences being translated (even though the form intended by the author of a sentence is generally suggestive of the intended meaning of the sentence). The author of a sentence does not need to intend a precise meaning. Interpretation should be seen as a creative act with no unique answer. The multiple possible interpretations perhaps sharing a conceptual affinity (though a concrete explanation of this notion is beyond the scope of these papers). In some situations it is possible that all interpretations are a bastardisation of the intuition that was intended; with an appropriately general expected utility, all interpretations will contribute and the resulting appropriate acts may be due to each interpretation equally so that each interpretation is only one ingredient in the full intuition, making the intuition uninterpretable. I bring these things up because calculation of probabilities requires formal interpretation and there is always room for disagreement as to such a choice and moreover there may even be cases where there is no one good choice of interpretation. Interpretations used to analyse a situation of interest will generally be much more simple than the intuitions we use for said interpretations. We can only hope that our intuitions are a) interpretable, b) that our interpretations are complex enough to model the features of our intuitions we are interested in exploring and c) that more complex interpretations that are closer to our intuitions are approximated by these simpler ones. Even the simplest situations can require much introspection into our intuitions. 1.3 Identity and prop functions Theories of logical probabilities generally require ways to determine the values of said probabilities. Generally, (at least some) values are argued to be determinable by symmetries. It is also common enough to counter-argue (Franklin, 2001; van Fraassen, 1989; Urbach and Howson, 1993; Earman, 1992) that one may make multiple symmetry arguments of a given situation which give inconsistent probability values. I claim that such problems can be dissolved by this theory by a proper accounting of the identities of basic propositions leading to an elucidation of what counts as a legitimate symmetry argument and when one makes two symmetry arguments, whether the probabilities that are determined are probabilities of the same propositions. We go through the principles of such a proper accounting in this subsection. The signs used in an argument denoting basic props are arbitrarily chosen. The arguments themselves (and more specifically their degree of validity) should be independent of this choice. To express this independence, we shall start by expressing the choice of sign assignment using 6 subscripts with signs separated by commas: a |a+ba,b. We will generally drop this notation if sign assignation is unambiguous enough or we have intentionally only partially defined the argument such that the argument itself is ambiguous. We previously noted that it is nonsense to write 'a = b+ c'. We can however express equality of basic props, i.e., 'a = b'. These expressions of equality are expressions about the language we are using, not about props or propositions themselves, i.e., a = b expresses how one should use the signs 'a' and 'b'. This is fundamentally distinct from expressing co-inferrability–that a can be validly inferred given b and vice versa, i.e., ab+ab. For example, a = b may be expressed outside of a partially defined argument prompting a more different definition: a = b, c |ac+bd⇒ c |ac+ada,c,d . Note that in this example, the argument is changed as the assumptions represent a different picture. Compare the use of a = b to expression of meaning ab+ab that creates co-inferrability when assumed. This is expressed in an argument: c |(ac+bd)(ab+ab)a,b,c,d = c |abc+abda,b,c,d . In this way we understand the identity of basic propositions as determined prior to an argument. Basic propositions provide the building blocks of meaning; one cannot infer that two propositions are the same, only perhaps that they are co-inferrable. Relabelling Symmetry: An argument is to be considered an expression of structure of validity conditions or geometry of logical spaces. It is validity conditions modulo sign assignment. This will be reflected in a redundancy of expression of our arguments, e.g., the arguments p | pq+ pqp,q and r |rs+ rsr,s are equal: p | pq+ pqp,q = r |rs+ rsr,s. An argument is symmetric upon a relabelling of its basic props. Despite this symmetry, we may still distinguish pq+ pq and rs+ rs within an argument, i.e., p | pq+ pq+ rs+ rsp,q,r,s 6= p | pq+ pq+ pq+ pqp,q. Any relabelling step such as r → p must take one label to an unused one. If we want to do a two way relabelling r ↔ p where r and p are currently being used while m and n aren't, this should be construed as four separate relabellings such as r→ n, p→m, n→ p and m→ r. Swapping Symmetry: We also have a swapping symmetry for arguments. Basic propositions and their negations are defined only as relative to eachother4; the negation of a basic proposition has exactly the same opportunity for validity conditions as a basic proposition and should be seen as a kind of basic proposition itself. Thus, to choose to use p in an argument as opposed to p does not change the expression of structure of validity conditions. Thus we have symmetries such that replacing p with p in both the premises and conclusions does not affect the argument. For example p | pq+ pqp,q = p | pq+ pqp,q. Rule: These symmetries do not apply to prop functions. Prop functions can be used within arguments. For example, we may define prop functions X + a+b and Y + c such that we may express the argument a |(a+b)c as a |XYa,b,c. We must however be careful about sign assignation. If we write a |XY, this argument is not fully defined as it may be interpreted with multiple incompatible sign assignations. For example, a |XYX ,Y 6= a |XYa,b,c. 4See propositions 4.063 and 4.0641 in (Wittgenstein, 1994) for similar thoughts. 7 If an argument with prop functions is given without an assignment of symbols, i.e., a |XY , then it is either ambiguous or should be construed with a sign assignment using lower case letters taken from the definitions of the prop functions if possible, i.e., a |XYa,b,c where X = a+ b and Y = c. If upper case letters are used for sign assignment–as in the case of argument a |XYX ,Y –then in the argument, the upper case letters denote basic props, not prop functions, despite our conventions. Prop functions may also be defined recursively. For example we may define for n ≥ 1, Xn = Xn−1 + an and X1 = a1. We shall however set implicit limits to possible definitions. To do this we first need some definitions. Definition: Full disjunctive normal form of a prop is expressed as a disjunction of terms where each term is a conjunction of some m props (in our case basic props) taken from the one finite set, say {a1, . . . ,am}. For each term, some of the basic props are negated. No two terms are the same. For example, a full disjunctive normal form of a+ b is ab+ ab+ ab. The set of basic props one may use is not unique. Consequently, full disjunctive normal form is also not unique; another way to express a + b in disjunctive normal form is by introducing a tautology such as (ab+ ab+ ab)(c+ c) = abc+ abc+ abc+ abc+ abc+ abc. We define minimal full disjunctive normal form as the form that requires the least number of basic props to express. The minimal full disjunctive normal form of a+b is ab+ab+ab. Minimal full disjunctive normal form is useful because it gives us a way to say whether two sentences express the same thing or not (assuming consistent sign assignation). Definition: We define the scope of a prop function as the set of basic props used to express the sentence the prop function is equal to in minimal full disjunctive normal form (with sign assignment implicit). We write the scope of a prop function X as scope(X). Suppose scope(X) = {a,b}. We can write X = X [a,b ]. Or if scope(X) = {ai|i}, we can write X = X [ai : i ]. A prop function is considered well defined only if its scope is of finite cardinality, i.e., one may always express it as a string of conjunctions and disjunctions of basic props of finite length. An argument that is expressed with prop functions is well defined only if its prop functions are well defined with unambiguous sign assignation. By insisting on considering only well defined arguments, we implicitly define limits on possible prop functions. For example, the prop function X = ∞ ∑ i=1 an is not well defined and hence is not applicable. 2 Abstractions and a problem with truth conditions Consider the problem of counterfactual conditionals. I argue that the problem can be generalised to a problem with the use of truth conditions. This general problem can be overcome with notions of abstractions and the process of abstraction. The problem is generally understood in the context of more traditional notions of logic, so for the moment let's consider a logic where there are truth bearers–say, sentences–that have truth conditions, there are some sentences where it seems like they should have some fairly reasonable truth conditions but upon further consideration, said truth conditions seem to disagree with other strong intuitions. Namely, consider a sentence A → B that seems to have truth conditions of material implication for the sentences A and B. Moreover, let's consider the situation where A or 'not A' is true. Then by the truth conditions of material implication, both A → B and A → B are true. But suppose for example A = "Dave lets go of the pen" and B = "the pen falls onto the floor". Intuitively, it seems that one should not infer both A→ B and A→ B (given appropriate background assumptions such as "Dave was holding the pen in the air"). Something in common with 8 sentences that have this problem is that the sentences are not just asserting the truth conditions hold but also that there is a connection that pertains between the antecedent and the consequent. In this case the connection is causal. I argue that the problem can be understood as not limited to cases of counterfactual conditionals. Suppose that instead of the above situation, Dave lets go of the pen and it falls on the floor. The truth conditions of A → B are satisfied but I argue that one can still not necessarily infer that a connection pertains between A and B; it is in principle possible that B is only incidentally true or that it is true for some reason other than there being a connection between A and B. Thus the problem persists even in the case where the antecedent is true. Note that it is more realistic to consider 'hypotheses' of greater scope than A → B; one instance of A and B being true isn't going to be enough to satisfy the truth conditions of a universal 'hypothesis' of a causal connection between events of a certain type. However, no matter the 'hypothesis', the 'argument' still applies; suppose the universe is clockwork and describable at every instance in time by some state. Then suppose our data specifies the state of the universe is for all time. It is still possible that the data is incidentally true rather than due to the causal connections between states at different times. For propositions pertaining to connections between other propositions, the heart of the problem is the use of truth conditions to guide the inductive/deductive inferrability. For any 'hypothesis' as a proposition with truth conditions, there always exists some data D within the purview of the truth conditions whereby D = HD, i.e., there always exists data sufficient to deduce the hypothesis, independent of any other background assumptions. I assert that there must always exist background assumptions allowing for the possibility that no connections pertain between the propositions in D, i.e., for any hypothesis there must exist possible background assumptions where no amount of data will allow for deductive inference of the hypothesis. We have an inconsistency. I assert the problem is due to the use of truth conditions for hypotheses. I contend that this general problem of truth conditions can be understood with the notion of hypotheses considered in section (1.1): there are some sentences that can be interpreted as narrowing down a set of predetermined possibilities (and hence has a kind of 'truth conditions') and there are others (associated with hypotheses) that are to be associated with defining a set of possibilities or more generally, defining connections between a set of propositions. A connection is reflected in logical dependencies between basic propositions; for example, basic propositions associated with possibilities are dependent on one another–if something is red then it is not blue, if something is heavy then it is not light etc.–and hence possibility–in particular, modality–is a type of connection. From this understanding of hypotheses, it follows that no hypothesis can have truth conditions. Hypotheses are not specific to scientific ones. A hypothesis can be mathematical, logical, metaphysical, linguistic or any category one wishes to assign them. When we think of the conditions of material implication associated with A→ B, we are imagining the case where if we accept the hypothesis associated with A→ B, then, the conditions of material implication hold. But if we are in a situation of uncertainty about A → B, then there's no reason for the conditions to hold. This gives us the freedom to overcome the problem. Let's consider counterfactual conditionals in the theory being proposed in this paper as opposed to the kind of logic considered above. There are no truth conditions in the theory being presented here so there is no equivalent question to 'what are the truth conditions of A→ B?' but we can ask about the nature of inductive inference of hypotheses and in particular what validity conditions are to be associated with hypotheses: a hypothesis contains a picture that determines the validity conditions of its propositions. A simple and instructive interpretation of the sentence A→ B will associate the sentence with a picture giving validity conditions of material implication. These conditions, however, are the validity conditions of the propositions within the picture, not the validity conditions of the picture itself. Moreover, it does not turn out 9 to be the validity conditions of the picture that concern us when considering whether data provides evidence for or against a hypothesis. To distinguish between propositions merely narrowing down possibilities and propositions concerning defining connections, the former is to be associated with a non-basic proposition while the latter is to be associated with an abstraction: Definition: An abstraction of a picture is a proposition that induces the validity conditions making up the picture, upon assumption of the abstraction. Abstractions are constrained to be only basic propositions or conjunctions of them. In the latter case, we say the abstraction is a compound of abstractions. Note that an abstraction of a picture is understood only as relative to some logical space (different to the space of the picture). As an example, suppose we have assumed a space X = Z{a(bc+bc)+abc}, where scope(Z) does not contain a, b or c. Consider a◦X as a possible abstraction. If we now assume a ◦ X , we change the logical space of the basic propositions as we now have aX = Za(bc+bc). We say bc + bc denotes the picture associated with or induced by abstraction a ◦ X . Note the association is dependent on the logical space one starts from, X. Definition: A hypothesis consists of a picture and an associated abstraction. It is the validity conditions of the abstraction that are of interest when considering whether data gives evidence for or against a hypothesis. The precise nature of how this is the case is for later sections. Suppose that the abstraction of A → B can be associated with c ◦Y and the picture of A → B associated with Z = ab + ab + ab where A = a and B = b. Our logical space is given by Y = cZ + cQ with some prop function Q. Now consider the situation where we have aY . It is not necessary from this logical space that the abstraction of A→ B, c ◦Y be inferrable,i.e., in the case of a counterfactual conditional (and more generally), the picture may be inferrable but that does not imply that the abstraction is, and it is the inferrability of the abstraction that is of concern when considering whether the data is for or against the hypothesis. Thus the problem of counterfactual conditionals and in a similar matter the more general problem of truth conditions is dissolved. Our notion of a critical distinction between hypotheses and pictures has some similarities to Goodman's (Goodman, 1983) notion of a 'hypothesis' being lawlike as opposed to merely accidental. 2.1 Substance The metaphysics of the theory is intentionally extremely underdefined. This is for the sake of generality. There is however some positing that a world exists and that there is a relationship between the objects of the theory and the world. These are 'assumptions' made by the theory and not assumptions made in the theory. We use these 'assumptions' to motivate certain choices, particularly our understanding of abstractions. Propositions and validity conditions are not the world they are used to reflect. Definition: The world is substance. It is substance that governs the efficacy of the use of various pictures when making decisions. It is assumed to exist such that there is logical structure to the world. A basic proposition is a pointer used–if used effectively–to point toward substance. Maximally specific outcomes of a hypothesis are an example of pointing toward substance. But another way of pointing toward substance is to point toward pictures. For example, a proposition pertaining to a causal connection between events is pointing towards substance because causal connection is substance, and hence the proposition is basic. This is a way to understand abstractions: they are the basic propositions (or are made up of them in conjunction) that point toward pictures. 10 Substance is transcendental and effable only through pictures. To characterise substance more–for example, to say that there are facts, or that there are objects with properties–is to unnecessarily constrain the theory being presented. 2.2 Abstraction and the web of pictures I expect (see part I) that inductive inference is insufficient for making decisions from assumptions weak enough to be ideal for our collective search to be free from (unconscious) illusion. If this is so, another mechanism other than inductive inference is required to license us to act as if from reasonably strong assumptions. We saw in our schema for a decision theory an example mechanism that fits these requirements. Instead of a hypothesis being inductively inferred, we introduce the notion that a picture may have various levels of desiderative significance–the greatest significance being the situation where one should act as if one is under the assumption of the picture in question. Measured by the relative sizes of the weights of the subexpected utilities, this desiderative significance is determined by a few factors that shall be explored in a later paper. One factor of relevance here is the probabilities of the various abstractions of the picture relative to various other pictures. So probabilities (which are the concern of inductive inference) contribute to desiderative significance but not solely. The mechanism of non-inductive inference will be called abstraction or abstractive inference. The motivations for the notion of abstraction are similar to the ones for abduction (Fann, 1970) or inference to the best explanation (IBE). There is a fundamental difference between IBE and abstraction, hence our use of a different name. We may consider multiple possible abstractions at once. Suppose we have assumptions W . Define B + {a1, . . . ,an} ⊂ scope(W ) as some set of basic props. We say B defines a set of base props {Bi | i = 1, . . . ,2n}, where each Bi is a conjunction of props constructed from all elements of B where some are chosen to be negated. The above set is defined as the set of all Bi for which one may do this. We can decompose W with these base props: W = Z { ∑ j B jX j } , where the sum is determined by the B j present in W when written in minimal full disjunctive normal form. The factor Z is chosen as the unique factor of maximal scope that can be made from W written in minimal form, i.e., it contains everything independent to props like Bi. We call the conjunctions of abstractions compound abstractions. A proposition Bi ◦W , considered as a compound abstraction has an associated picture Xi. Note that W may be decomposed in different ways depending on the choice of B. The choice of B depends on the abstraction one is interested in. If one is interested in abstraction C ◦W , then one chooses a B where C ∈ B. Different abstractions need not utilise elements of the same B. In the situation where two possible abstractions–a1 ◦W and a2 ◦W–are independent in W , their pictures are unaffected by the assumption of the other. If this is the case, we say the two abstractions are commensurable. Otherwise they are incommensurable. This is our version of taxonomic incommensurability (Kuhn, 1989); the conceptual incompatibility of competing abstractions. Incommensurability is not the same as inconsistency; two abstractions can be consistent in that they all may simultaneously be able to be assumed but be incommensurable if assuming one abstraction changes the picture induced by assuming the other. Consider the case where the abstraction a1 ◦W and compound abstraction a1a2 ◦W are both of desiderative significance relative to W . Suppose that a2 is in the scope of the picture corresponding to a1, i.e., a1 ◦W induces meaning for a2 ◦W . As pictures a1W and a1a2W are both of desiderative significance, we may construe the picture of a1a2 ◦W as the result of first abstracting a1 ◦W and then abstracting a2 ◦ a1W . We may describe a1 ◦W as a 11 meta-abstraction of a2 ◦W ; abstracting a1 ◦W induces meaning of possible abstraction a2 ◦W . Meta-abstraction is a story we can apply which is useful mainly in how it helps us to understand how think of relationships between propositions we are interested in; there is a hierarchy of abstraction to our hypotheses. A picture associated with a statement 'the apple is red' may be induced by a meta-abstraction 'the apple has a colour' which may occur in the picture induced by a meta-abstraction associated with many statements including 'colour is a property'. Hierarchies of abstraction provide context for pictures that differentiate and locate pictures that, considered without context, give the same validity conditional structure and hence are the same, i.e., suppose we have pictures of statements 'if x is an apple, then it is a fruit' and 'if x is a raven, then it is black' with validity conditions of component basic propositions being those of material implication. The difference between the pictures is not in their validity conditional structure but rather the context of meta-abstractions they are embedded in with desideratively significant pictures. Definition: We shall say a theory constitutes several pictures linked by abstractions at multiple levels. In this way a theory is not just a hypothesis but also context for a hypothesis. A theory has a corresponding hypothesis which constitutes a compound abstraction made up of the abstractions of the theory and the induced picture of the compound abstraction. When considering the desiderative significance of a theory we are generally interested in the picture of the hypothesis made from the theory. However, the desiderative significance of the various pictures of the theory from multiple levels are also worth treating separately. For example, it may be that the picture of a hypothesis is to be considered significant but the context can change in significance by a change in meta-abstraction. A particular example is our theory of abstraction which will implicitly use meta-abstractions defining meaning of numbers which give meaning to propositions expressing the value of a probability. If one invents and uses a new theory of numbers, the context of our theory will change but the picture of our hypothesis may or may not change. At some point there may be a limit to the height of the hierarchy of abstraction. There may be some abstractions that cannot be construed as having meaning induced by other desideratively significant abstractions. These most meta of abstractions can appear almost impossible to describe because there are no statements of desiderative significance one can assert about them. If the range of desideratively significant logical spaces associated with abstractions near the top of the hierarchy is almost universal, we can be fooled into thinking that some of these propositions are 'necessarily true' in some way; there will be almost no good pictures of the world without them. Examples of meta-abstractions include: propositions of rules of inference in a formal system, propositions of equality of other propositions, propositions of mathematics compared to propositions of theories expressed mathematically, and propositions in more abstract fields of mathematics versus ones from less abstract fields such as category theory versus group theory. The picture of an abstraction will generally contain more basic propositions than those directly referred to in an interpreted sentence. For example, a picture of 'All emeralds are green' may include material implication for a number of objects X = ∏ ni=1(eigi + eigi + eigi) 5 but also includes some Q which determines the spaces of possible objects and possible colours and perhaps more complicated dependencies lay out what it means for objects to be emeralds or green etc. The resulting picture being given by XQ. We shall call X and Q subpictures. The subpicture prop function Q can be common to a space of hypotheses where XQ is the picture of only one, i.e., our effective assumptions could be W = ∑ j A jX jQ6 where A j = the abstraction prop function for picture X jQ. The use of a common Q is one approach to constructing pictures W of meta-abstractions. 5where ∏ ni=1 denotes conjunctions of props label variable by i with values between 1 and n. 6where ∑ j denotes disjunctions of props over a range of values of the label variable j 12 3 Probability Probabilities in Abstraction Theory are understood to be degrees of validity. As arguments are understood to be expressions of validity conditional structure, probabilities are defined as functions of arguments. The probability calculus is very similar to standard approaches except for two important differences: one, probabilities are uniquely determined and two, they are defined on only well defined prop functions, that is prop functions of finite scope. One consequence being that probabilities cannot be countably additive. The restriction to finite scope will not limit the power of the theory compared to definitions of probability as measures on infinite sets; probabilities will be generalised but the existence and properties of a generalised probability will not be presupposed and will depend circumstance. Suppose we have prop functions X , Y and Z of finite scope. From basic desiderata (Cox, 1946) we have the product and sum rules, Product rule: P(XY |Z) = P(X |Z)P(Y |XZ) = P(Y |Z)P(X |Y Z) Sum rule: P(X |Z)+P(X |Z) = 1, from which we have a generalised sum rule P(X +Y |Z) = P(X |Z)+P(Y |Z)−P(XY |Z). All probabilities are conditional (Hájek, 2003); unconditional probabilities are merely ones conditioned the tautology, e.g., P(X |) = P(X |a+a). In section (1.3), arguments were defined to have certain symmetries. Probabilities–being functions of arguments– must be consistent with them: Relabelling Meaning of a basic proposition is given by an assumed expression of meaning; it is extrinsic to the basic proposition and independent of the sign associated with it. Thus if we relabel the prop in an argument, we don't change the argument, i.e., a |a+ba,b = c |c+bc,b. This symmetry must be reflected in the probabilities, i.e., P(a |a+b)a,b = P(c |c+b)c,b. Swapping symmetry Negations of basic propositions are defined only relative to the basic proposition. Upon making an argument, it is as legitimate to use a as a, i.e., a |a+ ca,c = a |a+ ca,c. Thus our probabilities must also be symmetric under replacement of a basic prop with its negation in both the premise and conclusion, i.e., P(a |a+ c)a,c = P(a |a+ c)a,c. These symmetries are ways to impose the notion that inductive inference depends on only validity conditional structure. From these symmetries and the product and sum rules we are lead to uniquely determined probabilities: consider Y written in disjunctive normal form such that there are n terms of conjunctions and in each term there are m basic props or their negations. For example if Y = ab+ ab+ ab then n = 3 and m = 2. One can show (Hasse, 2014) that P(Y |) = n2−m. (2) Any well defined argument can be written in the form Z |Y . The probability corresponding to such an argument can always be reduced to a function of probabilities of the form P(Y |), i.e., P(Z |Y ) = P(ZY |) P(Y |) . (3) Thus any well defined probability–the probability of a well defined argument–is uniquely given by equation (2). Here is a general method of determining probabilities P(Z |Y ): 1) Write Y and ZY in terms of only basic props, e.g., Y = a + b, Z = a + c so ZY = (a + c)(a + b) = a + ab + ac + cb. 2) Find disjunctive normal forms for Y and ZY by introducing new basic props into each term using the rule for all a and b, a = a(b+ b). Keep doing this until each term has the same scope, e.g., Y = a+ b = a(b+ b)+ b(a+ a) = ab+ ab+ ab+ ab = ab+ ab+ ab. 3) Use equations (2) 13 and (3). Indifference between possibilities drops out of our approach (Hasse, 2014). Consider that we have r possibilities7 a1,a2, . . . ,ar that are assumed to be exclusive and exhaustive and nothing else. Let's assign such an assumption a prop function Ir. The first three non-trivial examples being I2 = a1a2 +a1a2, I3 = a1a2a3 +a1a2a3 +a1a2a3, I4 = a1a2a3a4 +a1a2a3a4 +a1a2a3a4 +a1a2a3a4, et cetera. If we consider ai where 1≤ i≤ r, then we have P(ai | Ir) = 1 r . (4) This is only one set of possible versions of indifference. For example, one could assume that the possibility space has a range of sizes instead of just one. Spaces of 'infinite' possibilities require a generalisation. Definition: Instead of prop functions Z and Y we may consider infinite sequences of prop functions (Zn)∞n=1 and (Y n)∞n=1. We define an asymptotic probability as p?(Z? |Y?)+ lim n→∞ P(Zn |Y n), where Z? = (Zn)∞n=1 and Y? = (Y n)∞n=1. Asymptotic probabilities need not satisfy the rules of probability theory. For example define p?(AB? |C?)+ lim n→∞ P(AnBn |Cn) = lim n→∞ P(An |Cn)P(Bn |AnCn), p?(A? |C?)+ lim n→∞ P(An |Cn), p?(B? |AC?)+ lim n→∞ P(Bn |AnCn). A version of Bayes' rule pα(AB |C) = pα(A |C)pα(B |AC) 7Note here we are associating for example ai as a possible world (with no metaphysics attached) as opposed to the common way where something like a1a2 . . .an is considered a possible world (Carnap, 1947; Gaifman and Snir, 1982). exists only if the limits in p?(A? |C?) and p?(B? |AC?) converge. Moreover, if the limit doesn't converge, the asymptotic probability doesn't exist. Instead of there being probabilities with pictures of infinite scope, we have asymptotic probabilities with infinite sequences of pictures with finite scope. Definition: We call an infinite sequence of pictures asymptotic pictures that provide asymptotic meaning. This is not to imply that meanings of propositions in asymptotic pictures converge–they don't. In a realistic asymptotic picture with possibilities given by various Ir, the possibilities will have meaning additional to just being possibilities. For example maybe they are possible positions of a particle at a certain point in time in a universe with many particles; the additional meaning/validity conditional structure then being logical dependencies between the particle's position at this time and the properties of all particles at all other times. This will generally disrupt the permutation symmetry that gives us indifference. Consider a picture with a (finite) number of possibilities, each with their own meaning. The probabilities of the possibilities can be modelled by a different picture Iq where the possibilities are identified with subsets of the q new possibilities. Now consider an infinite sequence of pictures and their model pictures where the numbers of possibilities increase indefinitely. With these we can get 'integrals over an infinite possibility space' without measure theory: suppose we have a subpicture In = In[b1, . . . ,bn] that provides a space of n possibilities. We have abstractions ak where k= 1, . . . ,m(n)−1 which partition the n possibilities into m pieces of interest. As n→ ∞ so does m(n)→ ∞. The subset associated with ak is {bi | jn(k)+ 1 ≤ i ≤ jn(k+ 1)}, where the function jn defines the borders between subsets. We associate ak with it's subset8 using a subpicture Rnk + ak { jn(k+1) ∑ i= jn(k)+1 bi } +ak { jn(k+1) ∑ i= jn(k)+1 bi } . 8Note the use of abstractions here isn't necessary; one could also use ∑ jn(k+1) i= jn(k)+1 bi instead of ak . 14 One can show that P(ak |Xn ∏ m(n)−1 l=1 R n l ) = jn(k+1)− jn(k) n . We may associate a parameter y with abstraction ak for each step in the sequence, defining a coordinate system for the space. In this way we can say for each step in the sequence, ak is associated with the coordinate interval [x(k,n),x(k + 1,n)), i.e., for step n, px(k,n) ≤ y < x(k+ 1,n)q + ak. Assuming convergence, we can define an 'integral' of some appropriate function f with the probabilities: ∫ 1 0 d p f + lim n→∞ m(n)−1 ∑ k=1 f (x(k,n))× P(px(k,n)≤ y < x(k+1,n)q |Xn ∏ m(n)−1 l=1 R n l ) = lim n→∞ m(n)−1 ∑ k=1 f (x(k,n)) jn(k+1)− jn(k) n . The choice of parameter intervals x(k,n) is relatively arbitrary. However, there may be logical dependencies between the possibilities they point to and others which one wants to assign another parameter. For example, two parameters may be related by an equation–pictures in the asymptotic picture potentially only approximating the logical dependencies of the equation, with better approximation given by subsequent pictures. The equation relating the two parameters then makes some choices of parameter intervals better than others. 3.1 Bayes' theorem with hypothesis spaces A version of Bayes' theorem can be proved. To do this we'll prove another theorem first: Theorem (1): For any two prop functions with scopes that do not overlap Y = Y [ai : i ] and Z = Z[b j : j ], P(Y |Z) = P(Y |), i.e., they are independent if one has no other assumptions. Proof: Decompose Y and Z into their minimal disjunctive normal forms such that we can write Y = ∑ kHk and Z = ∑ lGl where the prop functions Hk and Gl are the terms of the minimal disjunctive normal forms of their respective prop functions Y and Z. Let |scope(Y )| = y and |scope(Z)|= z. We have P(Y |Z) = P(Y |∑ lGl) = P(Y ∑uGu |∑ lGl) = ∑ u P(Y Gu |∑ lGl) = ∑ u P(Gu |∑ lGl)P(Y |Gu) = ∑ u P(Gu |) P(∑ lGl |) P(Y Gu |) P(Gu |) = ∑ u P(Gu |) P(∑ lGl |) ∑ kP(HkGu |) P(Gu |) = ∑ u P(Gu |) P(∑ lGl |) ∑ k2−(y+z) 2−z = ∑ u P(Gu |) P(∑ lGl |) P(Y |) = P(Y |). Theorem (2): Consider a set of base props {Bi|i} to be used for compound abstractions. Suppose our assumptions are W = ∑ jB jX j where pictures Xi have a shared subpicture Q such that for all i, Xi = QXi. Through judicious use of the product and sum rules and using W = W ∑ j B j, one can find that the probability of a compound abstraction Bi given DW is given by P(Bi |DW ) = P(Xi |Q)P(D | XiQ) ∑ jP(X j |Q)P(D | X jQ) . (5) Proof: P(Bi |DW ) = P(BiXi |DQ) ∑ jP(B jX j |DQ) = P(Xi |DQ)P(Bi | XiDQ) ∑ jP(X j |DQ)P(B j | X jDQ) = P(Xi |DQ)P(Bi |) ∑ jP(X j |DQ)P(B j |) = P(Xi |DQ) ∑ jP(X j |DQ) = P(Xi |Q)P(D | XiQ) ∑ jP(X j |Q)P(D | X jQ) , 15 where we used P(Bi |) = P(B j |) for all j. Compare equation (5) to a standard version of bayes theorem where one considers there to be a space of possible hypotheses. We'll use lower case p for probabilities not defined in our system. Let {Hi|i = 1, . . . , t} be a set of exclusive and exhaustive 'hypotheses', D is some data and Q is some background 'knowledge'. We have p(Hi |DQ) = p(Hi |Q)p(D |HiQ) ∑ j p(H j |Q)p(D |H jQ) . (6) The most notable difference is that for us, a hypothesis consists of both an abstraction and a picture which play different roles in our probabilities. In particular, the left hand side of equation (5) is the probability of an abstraction as opposed to the probability of a picture. Consider again how we solved the problems of material implication; the picture of a statement like A→ B may be deductively inferrable given the non-occurance of event A but the abstraction of the statement does not need to be. We are generally interested in the inferrability of the abstraction, not the picture; given the non-occurance of event A, the picture of A→ B will have probability 1 but the abstraction generally won't. Another difference is that the exclusivity and exhaustivity of the hypotheses {Hi|i = 1, . . . , t} is predefined and not represented in the assumptions of the arguments. We may reasonably worry where this exclusivity and exhaustivity comes from. For us, the exclusivity and exhaustivity is explained as a natural outgrowth of our understanding of abstraction and is not related to the exclusivity or exhaustivity of the pictures. Abstractions anchor pictures to a context of meaning given by assumptions W and it is this context that provides exclusivity and exhaustivity. 3.2 An example calculation Let's run through a non-trivial calculation. As the syntax is (deliberately) low level, calculation can be cumbersome. We shall thus focus on an unrealistic, simple example that shall have enough significant features to make it potentially interesting. Imagine that we have gathered all the birds in the world into a single enormous aviary and we have observed some of the birds and noted their species and colour. Consider a hypothesis for the statement 'being a raven makes ravens black'9. Note this means we are not just interested in whether all ravens are black but whether ravenness causes blackness. We are interested in the probability of the corresponding abstraction given various kinds of potential data observed. The meta-picture in our assumptions will be a picture of the statement 'the colour of ravens may be caused by their genes'. It consists of a space of d2 +1 abstractions and their pictures; d2 of them correspond to 'all ravens are colour r' where r is one of d2 colours. The last one corresponds to there being no substance relating ravenness and something's colours. We assume that there are n birds10 of d1 possible species and d2 possible colours. Consider the following basic propositions and associated statements: for various i and j we have sij = pbird j is of species iq , c i j = pbird j is of colour iq , s1j = pbird j is a ravenq and c 1 j = pbird j is blackq . We shall have subpictures Xr of the statement 'all ravens are colour r' and a common subpicture Q that represents certain background assumptions. Our subpictures Xr will be defined as Xr + n ∏ j=1 (s1jc r j + s 1 j), where s1jc r j + s 1 j give the validity conditions of material implication11, i.e., 'if bird j is a raven, then it is colour r'. 9This example is directly related to the so called ravens paradox. There are already a number of 'solutions' (Fitelson and Hawthorne, 2010) and the goal of the above example is not to add to these but rather to gain some experience with the calculus and the type of thinking involved in coming up with pictures. We thus won't concern ourselves with the minutiae of the various 'solutions' and how they compare with our example. 10One generalisation that one can play with is having a set of hypotheses for various values of n and being uncertain as to the correct value. 11More realistic pictures would model the causal mechanism between genes and colour. For example, genes can produce a range a colours (and patterns of colours) in the same species. There are possibly even albino ravens. Our pictures could reflect this diversity of potentialities. 16 Our subpicture Q will define a space of d1 possible species and d2 possible colours for each bird12: Q + n ∏ k=1 Ikd1 [si : i ]I k d2 [c j : j ], where Ikd1 and I k d2 are subpictures that define spaces of size d1 and d2 respectively for bird k, e.g., Ik2 [si : i ] = s1ks 2 k + s 1 ks 2 k and I k 3 [c j : j ] = c 1 kc 2 kc 3 k + c 1 kc 2 kc 3 k + c 1 kc 2 kc 3 k . Our first d2 pictures are given by XrQ and our last picture is given by Q. We shall consider two situations with our data13: Da + m ∏ l=1 s1l c 1 l and Db + m ∏ l=1 s1l , where Da corresponds to us discovering that the first m birds are black ravens and Db corresponds to us discovering that the first m birds are not ravens (their colour and particular species will turn out to be irrelevant in this situation). Note, for simplicity we have not represented data corresponding to 'there are m birds that are black ravens'; this would be data where we haven't been able to distinguish the birds. We shall use our version of Bayes' theorem to calculate the probabilities of the abstraction of interest. To do this we shall calculate the probabilities of the subpictures Xr 12Note that we are also assuming that every bird in the aviary could be a raven and if we just assumed Q every bird would have the same probability of being a raven as any other species. Some potential changes for the enthusiastic reader present themselves: one, we could assume some information about the sizes of the populations of the individual bird species. And two, we could expand the notion of species to all object types and the notion of birds to all objects. We would then have to be careful; suppose I observe the pen sitting on my desk to be a pen, realistically, my effective assumptions would be such that the probability of it being a raven is much lower than the probability of it being a pen. This has a strong potential–depending on the rest of our assumptions–to affect changes in the probability of our abstraction of interest. In the limiting case where I assume that the pen is not a raven before I observe it (and we're using the same pictures and abstractions described in this section), learning that it is a pen will not change the probability of the abstraction of interest. 13We are choosing for simplicity data that are 'empirical' but we could also instead use 'sense data' which from assumptions are used to infer the 'empirical' data. given our data and Q. Consider first X1 and Da: P(X1 |DaQ) = P(∏ ni=1(s1i c1i + s1i ) |∏mj=1s1jc1j ∏ nk=1Ikd1 I k d2) = m ∏ i=1 P(s1i c 1 i + s 1 i |s1i c1i Iid1 I i d2) n ∏ j=m+1 P(s1jc 1 j + s 1 j | I j d1 I jd2) = n ∏ i=m+1 P(s1i c 1 i + s 1 i | Iid1 I i d2) = n ∏ i=m+1 {P(s1i c1i | Iid1 I i d2)+P(s 1 i | Iid1 I i d2)} = ( 1 d1d2 +1− 1 d1 )n−m = ( d1d2−d2 +1 d1d2 )n−m , where we made use of Theorem (1). Similarly, the probabilities of the other pictures given both DaQ and DbQ are P(X j 6=1 |DaQ) = 0 and for all j, P(X j |DbQ) = ( d1d2−d2 +1 d1d2 )n−m . With meta-picture W = ∑ d2i=1AiX1Q+Ad2+1Q, the probability for the abstraction for 'being a raven makes ravens black' is then given by P(A1 |Da/bW ) = P(X1 |Da/bQ) ∑ d2 i=1 P(Xi |Da/bQ)+1 . where the +1 comes from the probability of the tautology. Substituting in the probabilities for the pictures we get P(A1 |W ) = 1 d2 + ( d1d2 d1d2−d2+1 )n , P(A1 |DaW ) = 1 1+ ( d1d2 d1d2−d2+1 )n−m , P(A1 |DbW ) = 1 d2 + ( d1d2 d1d2−d2+1 )n−m . We see with P(A1 |DaW ) that as soon as one black raven (m = 1) is found, it disqualifies most abstractions, changing the probability of A1 significantly compared to subsequent discoveries of black ravens. This is due to our rather strict subpictures Xi where every raven has to be 17 the same colour with no exceptions. Both P(A1 |DaW ) and P(A1 |DbW ) increase as m increases, i.e., seeing only black ravens increases the probability that 'being a raven makes ravens black' and seeing only non-ravens also increases the probability that 'all ravens are black' because it increases probability for the trivial possibility that there are no ravens. However, as m→ n, P(A1 |DaW )→ 1 2 , and P(A1 |DbW )→ 1 d2 +1 . This aligns with intuition that finding a black raven in the aviary should provide better evidence for 'all ravens are black' than finding a non-raven. The larger the number of possible colours d2, the smaller the probability increase when finding a non-raven. For realistically large d2, the probability increase becomes negligible as one may intuit. Also notice the limits of possible evidence; neither P(A1 |DaW ) or P(A1 |DbW ) can reach 1 because no matter how many black ravens or non-ravens are found, there are other hypotheses consistent with the evidence. 3.3 The dissolution of arguments against unique probabilities We now have a universal calculus to uniquely determine logical probabilities from assumptions. There are many 'arguments' (Franklin, 2001; van Fraassen, 1989; Urbach and Howson, 1993; Earman, 1992) made against this possibility based on applying methods to determine probabilities in different ways and getting different answers for the same probabilities. These 'arguments' do not apply to Abstraction Theory. Consider a simple example: there is a ball in an urn which has one of three possible colours: red, blue or white. What is the probability that, if I pick a ball from the urn in such a way that I have no assumed control in choosing which ball, I will pick the red ball? From a principle of indifference/symmetry of the three possible results, I deduce the probability to be 1/3. Let's now consider a reframing of the situation; that there are also two possible results–that the ball I pick is either red or not red. Using the principle of indifference with these two possibilities gives us a probability of 1/2. We thus arrive at contradictory probabilities. This is not a particularly strong 'argument' but its structure and limitations are the same as the more sophisticated ones. All have at least one of two problems. Firstly, often there is an appeal to symmetry in the assumptions without explicit choice of what those assumptions are. I could say I have drawn a two dimensional shape with four straight edges and one corner but such a shape need not exist. Similarly, assumptions with the stated symmetries need not exist either. Because the assumptions in Abstraction Theory are in principle explicit, the existence of such assumptions can in principle be checked. Secondly, the criteria used for the applicability of the symmetry constraints can be too vague, weak or dependent on the particular formulation of inductive logic used. The above example did not state any particular criteria but suggests that symmetry constraints between any partition of a set of exclusive and exhaustive possibilities are legitimate. I can't imagine a good 'argument' as to why this should be the case. Let's consider the typical kinds of futher analysis that are better arguments but which Abstraction Theory is able to sidestep. Suppose for the moment that we are looking at a formulation of probabilities of 'events' (as for example, sets of possible worlds or similarly, truth functions of atomic propositions). Ignoring for the moment any semantic differences between events, a reasonable sounding criterion for indifference constraints is that it is applied to the most fine grained partition possible. It is generally supposed that the three possibilities are actually partitions of infinitely many possibilities. For example, a ball is not only one of three colours but also one of an infinite number of possible orientations etc. Because of the infinite number of events, there is now not necessarily an obvious unique choice of indifference constraint; we could for example try some transformation group method (Jaynes, 2003) but the choice of transformation group is not unique. Or we could apply a symmetry on some 18 transformed parameter rather than the original. There are two ways Abstraction Theory sidesteps these problems: 1) there is no particular reason why in the simple case of three possibilities being specified that the picture from interpretation cannot actually have finite scope, e.g., I3; this picture being an approximation of potentially better pictures much like how a scientific hypothesis is approximated by one that comes along to replace it. The basic props representing the three possibilities are not truth functions of more fine grained propositions. 2) The symmetry arguments used on infinite spaces require an initial specification of a parameter for which probabilities are constrained to be symmetric with respect to. In contrast, asymptotic probabilities are determinable without any specification of parameter. The symmetry methods of indifference or transformation groups applied to continuous parameters/random variables are not directly applicable (and unnecessary) and hence the problems with the uniqueness of the results are dissolved. It is also worth noting that when specifying a situation in which one wants to calculate an asymptotic probability with some parameter in mind, even with a relatively rich background theory on the meaning of parameter value ranges, the choice of asymptotic picture may not be unique. Another difficulty with finding a good criterion of applicability of symmetry constraints in inductive logics is that generally probabilities are formulated as probabilities of propositions that have semantic content independent of syntax. So for example, not all possible worlds are semantically the same so why should probabilities based on them have symmetries based on swapping them? The solution in Abstraction Theory is that the only symmetries used are between basic props or their negations (or any symmetries derived from these ones). Basic props and their negations have no semantic content and hence there is no problem. Finally, inductive logics such as Carnap's (Carnap, 1950) and successors are dependent upon the language in which it is formulated in. We have formulated Logic in a language invariant way; by making everything a probability is dependent upon explicit, if we changed the language we used to express arguments then either we could identify arguments from each language such that they have the same probabilities or the language reflects a different theory of Logic. We thus don't have this problem either. 3.4 The 'grue' non-problem Goodman's 'grue' problem (Goodman, 1983) is often considered to be fatal to theories of logical probabilities. Despite Abstraction Theory using logical probabilities, their 'meaning' and use are very different to Confirmation Theories with probabilities. In order for there to be a 'grue' problem in Abstraction Theory, it must be formulated differently and there's no unique way to do this. I argue that any attempt at a 'standard' formulation of the problem collapses. A Confirmation Theory considers relationships between 'hypotheses' and 'evidence'. This can be given by a confirmation function c(h,e) for 'hypothesis' h and evidence e. If c is defined to have a range on a two element set, c(h,e) = x expresses whether e confirms or disconfirms h. If c is defined on the real line, then it expresses a degree of confirmation. Often, the confirmation function is interpreted as a probability function (Carnap, 1950). The general 'argument' is that in a Confirmation Theory, there is a situation whereby two 'hypotheses' are equally confirmed as the confirmation function is symmetric but the two 'hypotheses' intuitively shouldn't be equally confirmed, giving us a contradiction. Firstly, these 'hypotheses' in Confirmation Theories are defined syntactically with meaning associated with the constituent signs implicitly assigned, while in Abstraction Theory, pictures–which are the things that best correspond to 'hypotheses' in Confirmation Theories–are used to explicitly assign meaning to propositions. These meanings are underconstrained by usual formulations of the problem and thus it is not a well-posed problem. Moreover, probabilities of abstractions (these being the probabilities of interest) are not uniquely determined by a picture and evidence, but also require a meta-picture that is not chosen in the setup of the problem, making 19 the problem less well-posed. Finally, the most apt thing in Abstraction Theory that best corresponds to degree of confirmation is not probability but desiderative significance (which shall be explicated in a future paper)–the relative weight a picture contributes to the decision process. The probabilities of abstractions of a picture from many (generally infinite) meta-pictures contribute to the desiderative significance of said picture. Desiderative significance is also dependent upon initial assumptions (which is also not unique) and a kind of 'sense data' instead of 'scientific evidence'. A proper formulation of a 'grue' problem in Abstraction Theory must make an argument that two different pictures have the same desiderative significance under various circumstances when they shouldn't. There are three classes of possible approaches to constructing a 'grue' problem in Abstraction Theory. 1) Find a 'grue' problem that applies to probabilities with a wide enough class of meta-pictures such that it affects desiderative significance for any realistic initial assumptions. 2) Find a problem that applies to desiderative significance as a whole even if it doesn't necessarily apply to individual probabilities. 3) Make a strong argument for a certain meta picture as effective assumption (or a class of them) that allows for two pictures with a 'grue' like problem. 4) Make a strong argument for the desiderative significance of two pictures (that don't necessarily share a significant meta picture) with realistic initial assumptions that intuitively shouldn't both be significant. Suppose we wanted to construct an 'argument' as closely aligned as possible with techniques from original formulations of the problem–a 'standard' formulation. We would look at symmetries in probabilities that may be useful to classes 1) and 3) of approaches. Let's attempt to construct such an 'argument'. Suppose that we observe a number of objects before some time t that are emeralds and we note that they are green. We are interested in the hypothesis that 'all emeralds are green.' Suppose we also consider the statement 'object a is grue' where something is grue iff it is green and observed before time t or it is not green and not observed before time t. The second hypothesis we're interested in is that 'all emeralds are grue.' Interpreting these as hypotheses means we are not interested in whether all emeralds are accidentally green or grue but whether there is substance that makes them so (such as a cause). The 'argument' utilises symmetries between statements like 'a is green' and 'a is grue'. The universal symmetry that most closely corresponds to the symmetries generally used is the relabelling symmetry of basic props. Thus our interpretation of these statements will use basic props as opposed to prop functions14. Supposing we have n objects, {xi|i}, for each i = 1, ...,n, we have basic props pGxiq for 'xi is green' and pUxiq for 'xi is grue'. We shall also have basic props pOxiq for 'xi is observed before time t. The choice of grue and green statements corresponding to basic props means the definition relating grue and green together is a prop function X + n ∏ i=1 {pUxi ≡ (Gxi ≡ Oxi)q} = n ∏ i=1 {pUxiq(pGxiqpOxiq+pGxiqpOxiq) +pUxiq(pGxiqpOxiq+pGxiqpOxiq)}, where≡ denotes material biconditionality. The Goodman argument most similar to our construction requires the equivalent of this prop function to be assumed, breaking a requirement of Carnap's theory that atomic sentences be independent. For us there is no inconsistency but rather a natural interpretation; X is a subpicture giving partial meaning to the basic propositions within its scope. However, we run into a different problem: the addition of X in the pictures of interest changes the original meanings of our propositions. We have altered the original situation we were interested in such that it may not be applicable or interesting anymore. But for the sake or argument, let us suppose for the moment that this isn't a problem. Let the abstractions for 'all emeralds are green' and 'all emeralds are grue' be AG and AU respectively and other 14This choice is similar to a version of Goodman's argument against Carnap's theory where both the green and grue predicates are primitive as opposed to one of them being elementary. If predicate M were elementary and a an individual, then Ma is similar to a prop function as opposed to a basic prop. 20 abstractions given by A j. Suppose the full meta-picture is given by Y + XW{AGZ[(pExiq,pGxiq : i) . . . ] +AU Z[(pExiq,pUxiq : i) . . . ]+ q ∑ j=3 A jZ j}, where W is a shared subpicture for doing things like defining the space of possible colours. The pictures corresponding to abstractions are manifest. Also suppose we have data Dm + m ∏ i=1 pExiqpGxiqpOxiq, where we use DmX = DmX , where Dm + m ∏ i=1 pExiqpGxiqpUxiqpOxiq. Let's now apply the relabelling symmetries for all i, pGxiq↔ pUxiq: P(AG |Dm[(pGxiq,pUxiq : i) . . . ]Y [(pGxiq,pUxiq : i) . . . ]) = P(AG |Dm[(pUxiq,pGxiq : i) . . . ]Y [(pUxiq,pGxiq : i) . . . ]). The abstraction AG now corresponds to the prop function Z[(pExiq,pUxiq : i) . . . ] which was originally associated with the picture for 'all emeralds are grue'. So it seems like we're almost there. The prop functions Dm and X are symmetric to the relabelling also. The corresponding symmetries in regular Confirmation Theories with probabilities are then sufficient to finish the 'argument' proving that 'all emeralds are green' and 'all emeralds are grue' have the same probability. An obvious problem is that Y does not contain only X and so Y is not necessarily symmetric to the relabelling. But that's not the real problem: for all i, the propositions associated with pGxiq and pUxiq either have the same meaning or they don't. If they have the same meaning then this contradicts the presupposition behind the intuition that the two hypotheses can't be equally confirmed. If they don't have the same meaning, then the relabelling swaps the propositions associated with the props; the proposition originally associated with pUxiq becomes associated with pGxiq, i.e., the relabelling did not change the situation represented in the argument. Thus the 'argument' dissolves. The 'grue' problem exploits ambiguity in the meaning associated with the signs used in a constructed language. Abstraction Theory is devised to take full explicit account of meaning such that no ambiguity remains and thus cannot be exploited in a similar way. One virtue of the original 'grue' problem is that one can imagine applying its techniques to a wide variety of 'hypotheses' and hence is not strongly dependent upon specifics of the situation. We see from the above 'argument', that trying to make an 'argument' similar to versions of the original with a universal symmetry fails. If instead one wants to construct an 'argument' that does not rely on a universal symmetry but rather relies on specific examples, one faces large difficulties. Such an 'argument' falls into our third and fourth categories of approaches which means one has to argue for the desiderative significance of various pictures. As will be explicated in more detail in future papers, the desiderative significance of a picture is dependent upon its logical structure and the logical structure of the initial assumptions in a complex way. Without extreme amounts of hard work, one should never expect the example pictures one uses for individual calculations to properly reflect a realistically significant picture; most pictures one imagines are vastly simplified models of realistic pictures which, if constructed well, approximate predictions of realistic pictures within domains of interest, i.e., having similar 'likelihoods' and potentially even similar relative 'priors'. However, despite these approximations, the logical structures of model pictures versus their realistic counterparts need not lead to similar significance at all. Thus one cannot simply use two model pictures for an 'argument' that does not rely on some universal symmetry making such an 'argument' incredibly difficult. References Carnap, R. (1945). On inductive logic. Philosophy of Science, 12(2):72–97. 21 Carnap, R. (1947). Meaning and Necessity: A study in Semantics and Modal Logic. Chicago: University of Chicago Press. Carnap, R. (1950). Logical Foundations of Probability. Chicago: University of Chicago Press. Cox, R. T. (1946). Probability, frequency and reasonable expectation. American Journal of Physics, 14:1–13. Earman, J. (1992). Bayes or Bust? A Critical Examination of Bayesian Confirmation Theory. MIT Press. Fann, K. T. (1970). Peirce's Theory of Abduction. The Hague: Martinus Nijhoff. Feyerabend, P. K. (1962). Explanation, reduction and empiricism. In Feigl, H. and Maxwell, G., editors, Crítica: Revista Hispanoamericana de Filosofía, pages 103–106. Fitelson, B. and Hawthorne, J. (2010). How bayesian confirmation theory handles the paradox of the ravens. Springer Netherlands, pages 247–275. Franklin, J. (2001). Resurrecting logical probability. Erkenntnis, 55(2):277–305. Gaifman, H. and Snir, M. (1982). Probabilities over rich languages, testing and randomness. Journal of Symbolic Logic, 47(3). Goodman, N. (1946). A query on confirmation. Journal of Philosophy, 43(14):383–385. Goodman, N. (1983). Fact, Fiction and Forecast. Harvard University Press. Hájek, A. (2003). What conditional probability could not be. Synthese, 137(3):273–323. Hasse, C. L. (2014). In principle determination of generic priors. ArXiv e-prints, (1408.2287). Hempel, C. G. (1945). Studies in the logic of confirmation i. Mind, 54(213):1–26. Jaynes, E. T. (2003). Probability Theory: The Logic of Science. Cambridge University Press. Kuhn, T. S. (1989). Possible worlds in history of science. In Possible Worlds in Humanities, Arts and Sciences: Proceedings of Nobel Symposium., volume 65, pages 9–32. Urbach, P. and Howson, C. (1993). Scientific Reasoning: The Bayesian Approach. Open Court. van Fraassen, B. (1989). Laws and Symmetry. Oxford University Press. Wittgenstein, L. (1994). Tractatus Logico-Philosophicus. Edusp.