This paper attempts to define Exploratory Data Analysis (EDA) more precisely than usual, and to produce the beginnings of a philosophy of this topical and somewhat novel branch of statistics. A data set is, roughly speaking, a collection of k-tuples for some k. In both descriptive statistics and in EDA, these k-tuples, or functions of them, are represented in a manner matched to human and computer abilities with a view to finding patterns that are not "kinkera". A kinkus is a (...) pattern that has a negligible probability of being even partly potentially explicable. A potentially explicable pattern is one for which there probably exists a hypothesis of adequate "explicativity", which is another technical probabilistic concept. A pattern can be judged to be probably potentially explicable even if we cannot find an explanation. The theory of probability understood here is one of partially ordered (interval-valued), subjective (personal) probabilities. Among other topics relevant to a philosophy of EDA are the "reduction" of data; Francis Bacon's philosophy of science; the automatic formulation of hypotheses; successive deepening of hypotheses; neurophysiology; and rationality of type II. (shrink)
The form of argument used by Popper and Miller to attack the concept of probabilistic induction is applied to the slightly different situation in which some evidence undermines a hypothesis. The result is seemingly absurd, thus bringing the form of argument under suspicion.
The use of a concept called "explicativity", for (provisionally) accepting a theory or Hypothesis H, has previously been discussed. That previous discussion took into account the prior probability of H, and hence implicitly its theoretical simplicity. We here suggest that a modification of explicativity is required to allow for what may be called the pragmatic simplicity of H, that is, the simplicity of using H in applications as distinct from the simplicity of the description of H.
It is shown by means of a simple example that a good explanation of an event is not necessarily corroborated by the occurrence of that event. It is also shown that this contention follows symbolically if an explanation having higher "explicativity" than another is regarded as better.
Good expresses agreement that the controversy between Bayesian and non-Bayesian statistics is more fundamental than that between Carnap and Popper, and points out that his own position is a Bayes/non-Bayes compromise.
The causal propensity of an event F to cause another event E is explicated as the weight of evidence against F if E does not occur, given the state of the universe just before F occurred. This definition, first given in 1961, is sharpened, defended, and applied to several examples. In this definition the concept of weight of evidence in favor of a proposition, provided by another one, is to be understood in a technical sense that is intended to capture (...) its most customary informal meaning. (shrink)