By Paul Newall (2005)

Suppose we have an idea about world and put it to the test. Our discussion of falsificationism looked at what we can conclude from a *failure*, but what if an experiment shows us what we expected to find? We usually say that the test has *confirmed* the theory, but what does this mean? There have been several approaches in the philosophy of science to understanding what confirmation involves, some with more success than others. We will examine the main candidates here.

Basic Confirmation

The easiest way to tell a story about testing scientific theories is to say that a successful trial proves that the theory was true. If we set this out in syllogistic form, we have:

- If theory T is true, we would expect to note observations C;
- We observe C;
- Therefore, T is true.

Unfortunately, this reasoning is an example of affirming the consequent. Even if we drop the difficult issue of truth and try to say that observing C merely *confirms* T, we still run up against the same underlying problem: that a theory "works" is no guarantee of its accuracy. After all, it could be that something else is causing the effects we notice. For example, consider the example of Brownian motion covered when we looked at Ockham’s Razor. The phenomenological theory of gases explained the behaviour of gases and was highly confirmed by experiment but nevertheless gave way to the kinetic theory; that is, another explanation was found.

Pardoxes of Confirmation

This should not come as any surprise, however. Expecting a single successful experiment to confirm a theory so decisively is perhaps aiming too high, but another difficulty with confirmation was identified by Hempel (1937). Suppose we consider the proposition "all swans are white" (1). This is logically equivalent to the proposition "all non-white things are non-swans" (2); or, to put it another way, "if it isn’t white then it can’t be a swan". Now imagine that we notice a black raven, a creature beloved of philosophical arguments. Although it may seem that this has nothing to do with (1), it actually confirms (2): the black raven isn’t white and isn’t a swan, so (2) holds. Since (1) and (2) are logically equivalent, though, the black raven turns out to confirm that all swans are white.

Notice the way that this example was constructed: we could have chosen any number of ridiculous instances for the confirmation of (2) to arrive at the same result. It seems that (1) is thus confirmed by observations that have nothing at all to do with *whiteness* or being a swan. This result is paradoxical because we tend to think that a proposition like (1) is confirmed by sighting white swans, and further than the more white swans we observe the more likely (1) is to be true; but if a black raven can confirm (1) then this account seems to make little sense.

The Problem of Induction

The issue at the heart of understanding confirmation is of course the famous problem of induction, due to Hume: how can we justify an inductive inference – in the form of a general (scientific) theory – from a finite number of particular instances? A number of solutions have been proposed, including Popper’s falsificationism (claiming that scientific inference is actually *deductive*) and Mill’s *System of Logic* (1837 - actually much the same as Galileo’s and that of Aristotle and the Jesuits before him), but induction is interesting because it seems that any description of what confirmation is must rely on it. After all, if we want to say that a test has confirmed a theory in some way that we are making an inductive inference.

A more recent version of the problem is Nelson Goodman’s (1983) *New Riddle of Induction*. Suppose we take two propositions: "all emeralds are green" (3) and "all emeralds are *grue*" (4), where “grue” means green until time T and blue thereafter. Now consider what we can say about each observation we make of an emerald *before time T*. (3) says that we should find that each emerald is green, so a green emerald confirms it; but (4) says the same and hence seems to be confirmed as well. This is an example of underdetermination but is also another paradox of confirmation. The obvious response is to say that no one has seen any emeralds change colour in the past, nor have we heard of a reason how this could happen, but this is begging the question: if we assume that a causal link exists in the first place and hence that all emeralds are green come what may, then it is trivial to say that an instance of a green emerald confirms what is already certain.

Goodman’s own solution was linguistic, saying that the predicate *green* is entrenched in our language and our interaction with the world (especially when buying or talking about emeralds); but this is no solution at all, since all it does is acknowledge that we are inclined to think in a certain way without explaining whether or not we are justified in so doing. Another – more promising – possibility is to make a distinction between *weak* and *strong* confirmation, with observation and experiment never being more than a fallible form. Theoretical reduction, which proposes and explains causal mechanisms at work in predicates like *greenness*, gives us stronger reasons for believing that (3) is meaningful while (4) is not. This is to say that we have no idea what a *grue* form of science would be like – that is, how could we have a science if we had no way of knowing when emeralds would change colour or why - or how it could make any sense, and is thus a realist argument. It has the unfortunate consequence, however, of making non-scientific inferences unjustified; that is, unless we know of a theory that explains why swans are white, we have no reason at all to suppose that the next one we see will be.

Bayesian Probability Theory

Given the difficulties with these understandings of confirmation, an alternative is to appeal to *probabilities* instead. This is perhaps a more intuitive approach, since it aims only to say that a successful test of a theory makes it *more likely*. For example, suppose that someone claimed that they were friends with a certain film director and able to predict what she would be working on next. If he was correct with his first attempt, we might say it was just a lucky guess; but if he was right again on numerous occasions, we would probably think there is something to the claim after all. Indeed, the more times his guesses were accurate the more likely we would say that his being friends with her actually is – or so it seems. Can we justify this kind of thinking, though?

Bayes’ Theorem is a way of evaluating the probability of an hypothesis based on the evidence we have for it. It takes several forms but the simplest is to consider evidence *e* for an hypothesis *h*. We say that

- P( h | e ) = [P( e | h ) * P(h)] / P(e) (5)

This means that the probability of the hypothesis

*h*, given the evidence

*e*that we have for it, is equal to the probability of the evidence

*given the theory*multiplied by the probability of the hypothesis, all divided by the probability of the evidence itself. Sometimes this is expressed as

- P( h | e * b ) = [P( e | h * b ) * P( h | b )] / P( e | b ) (6)

where the extra term *b* stands for the background conditions (thus P( e | h * b ) means the probability of the evidence given the hypothesis *and* the background conditions, and so on).

Bayesian theory is helpful because it helps us appreciate that the likelihood of an hypothesis depends on the evidence for it. The problems arise when we look at the terms on the right-hand side of (5) or (6): P(e | h) expresses the *conditional probability* of the evidence given the hypothesis; that is, how likely are we to find *e* if we suppose that *h* is true? Similarly, P(h) is the *prior probability* that the hypothesis is true, but this is precisely what we do not know and are using the evidence to evaluate. It is the assigning of these probabilities that poses the most significant challenge to Bayesian ideas.

For example, suppose we are pulling numbers out of a hat, written on slips of paper, and that the first eight have all read "10". How do we then decide what the conditional probability of the hypothesis "all the numbers read 10" is, given that these eight were? Similarly, and even before we took any pieces from the hat, how could we determine the prior probability that all would read 10? Bayesians respond that although instances like this are troublesome, typically in science we have a good idea of which probabilities to use. In the *grue* case, say, we would imagine that the likelihood of an emerald turning blue at some point in the future is very small indeed, so we can use Bayes theorem. Critics object that actually the Bayesian approach only addresses *part* of the *grue* problem – i.e., the hypothesis *before* T and not after.

Inference to the Best Explanation

An alternative method proposed by C.S. Peirce (see his *Collected Papers*, 1931-1958) and others is *inference to the best explanation*, sometimes known as *abduction*. A nice way to understand it is via two different metaphors: rather than science being like wandering around a beach at night, picking up "observations" to confirm our theories, instead we try to come up with the best explanation of the facts we have and then use this theory like a candle or flashlight, illuminating larger areas of the beach to see what else we can learn about it. This intuitively makes a good deal of sense: when we have the best explanation of a set of evidence, we say that the evidence confirms the theory.

There are several difficulties associated with abduction. In the first place, what do we mean by the *best* explanation? We could say that it is the *most probable*, but then we are back with Bayesianism or something similar. Also, what makes an explanation *good enough*? That it is able to explain the evidence is admirable, but – once again – so do others, since the theory is always underdetermined by the evidence. Moreover, sometimes scientists infer several explanations where there are competing possibilities (colour-perception models in neurophysiopsychology, for example) and sometimes they refuse to make any inference at all (the most notable instance being Bohr in his early years, when he struggled with the implications of quantum theory and steadfastly refused to take the easy road to an instrumental interpretation).

More importantly, perhaps, we recognise that we use other (non-empirical) criteria to judge how good our theories are. For example, we tend to prefer them to be parsimonious; not *ad hoc*; predicting novel facts about the world; and so on. Including these in a description of the "best" theory, however, is not easy; after all, there seems to be no reason why the universe should be fundamentally *simple* rather than *complex*, so which of two theories fitting these characterisations is the better one, other things being equal? It seems, then, that to be more accurate we need to replace "best explanation" with "best of the available explanations, where this option is good enough for our purposes", with the latter being open for discussion.

In summary, there are many aspects to confirmation and much debate as to which formulation is most satisfactory. Note, though, that there is no question that we *do* employ inductive inferences and that we regard our ideas confirmed in some way; the question is how we can *justify* this inevitable practice.

---

Selected References:

- Goodman, N.,
*Fact, Fiction and Forecast*(Cambridge, MA: Harvard University Press, 1983). - Hartshorne, C., Weiss, P. and Burks, A. (eds.),
*Collected Papers of Charles Sanders Peirce*, 8 vols. (Cambridge, MA: Harvard University Press, 1931-1958). - Hempel, C.G.,
*Le probl�me de la v�rit�*in*Theoria*, 3, pp.206-246, 1937. - Mill, J.S.,
*A System of Logic*(Honolulu: University Press of the Pacific, 2002).