The Duhem-Quine Problem

THESIS *  THESIS *
THESIS *  THESIS * THESIS
*  THESIS
CHAPTER 3

THE BAYESIAN TURN


The previous chapter concluded with an account of the attempt by Lakatos to retrieve the salient features of falsificationism while accounting for the fact that a research programme may proceed in the face of numerous difficulties, just provided that there is occasional success.  His methodology exploits the ambiguity of refutation (the Duhem-Quine problem) to permit a programme to proceed despite seemingly adverse evidence.   According to a strict or naive interpretation of falsificationism, adverse evidence should cause the offending theory to be ditched forthwith but of course the point of the Duhem-Quine problem is that we do not know which among the major theory and auxiliary assumptions is at fault.  The Lakatos scheme also exploits what is claimed to be an asymmetry in the impact of confirmations and refutations. 

The Bayesians offer an explanation and a justification for Lakatos;  at the same time they offer a possible solution to the Duhem-Quine problem.  The Bayesian enterprise did not set out specifically to solve these problems because Bayesianism offers a comprehensive theory of scientific reasoning.  However these are the  kind of problems that such a comprehensive theory would be required to solve.

Howson and Ubrach,  well-regarded and influential exponents of the Bayesian approach,  provide an excellent all-round exposition and spirited polemics in defence of the Bayesian system  in Scientific Reasoning: The Bayesian Approach (1989).  In a nutshell, Bayesianism takes its point of departure from the fact that scientists tend to have degrees of belief in their theories and these degrees of belief obey the probability calculus.  Or if their degrees of belief do not obey the calculus, then they should, in order to achieve rationality.  According to Howson and Urbach probabilities should be 'understood as subjective assessments of credibility, regulated by the requirements that they be overall consistent (ibid  39).

They begin with some comments on the history of probability theory, starting with the Classical Theory, pioneered by  Laplace.  The classical theory aimed to provide a foundation for gamblers in their calculations of odds in betting, and also for philosophers and scientists to establish grounds of belief in the validity of inductive inference.  The seminal book by Laplace was Philosophical Essays on Probabilities  (1820) and the leading modern exponents of the Classical Theory have been Keynes  and Carnap.

Objectivity is an important feature of the probabilities in the classical theory.  They arise from a mathematical relationship between propositions and evidence, hence they are not supposed to depend on any subjective element of appraisal or perception.  Carnap's quest for a principle of induction to establish the objective probability of scientific laws foundered on the fact that these laws had to be universal statements, applicable to an infinite domain.  Thus no finite body of evidence could ever raise the probability of a law above zero (e divided by infinity is zero).  

The Bayesian scheme does not depend on the estimation of objective probabilities in the first instance.  The Bayesians start  with the probabilities that are assigned to theories by scientists.  There is a serious bone of contention among the Bayesians regarding the way that probabilities are assigned, whether they are a matter of subjective belief as argued by Howson and Urbach ( 'belief' Bayesians')  or a matter of behaviour, specifically betting behaviour ('betting' Bayesians).

The purpose of the Bayesian system is to explain the characteristic features of scientific inference in terms of the probabilites of the various rival hypotheses under consideration, relative to the available evidence, in particular the most recent evidence.

BAYES'S THEOREM

Bayes's Theorem can be written as follows:

P(h!e) =  P(e!h)P(h)   where P(h), and P(e) > 0
       P(e)

In this situation we are interested in the credibility of the hypothesis h relative to empirical evidence e.  That is, the posterior probability, in the light of the evidence.  Written in the above form the theorem states that the probability of the hypothesis conditional on the evidence (the posterior probability of the hypothesis) is equal to the probability of the evidence conditional on the hypothesis multiplied by the probability of the hypothesis  in the absence of the evidence (the prior probability), all divided by the probability of the evidence.

Thus:

e confirms or supports h when P(h!e) > P(h)
e disconfirms or undermines h when P(h!e) < P(h)
e is neutral with respect to h when P(h!e) = P(h)

The prior probability of h, designated as P(h) is that before e is considered.  This will often be before e is available, but the system is still supposed to work when the evidence is in hand.  In this case it has to be left out of account in evaluating the prior probability of the hypothesis.  The posterior probability P(h!e) is that after e is admitted into consideration.

As Bayes's Theorem  shows, we can relate the posterior probability of a hypothesis to the terms P(h), P(e!h) and P(e).  If we know the value of these three terms we can determine whether e confirms h, and more to the point, calculate P(h!e).

The capacity of the Bayesian scheme to provide a solution to the Duhem-Quine problem will be appraised in the light of two examples.

CASE 1. DORLING ON THE ACCELERATION OF THE MOON

Dorling (1979) provides an important case study, bearing directly on the Duhem-Quine problem in a paper titled 'Bayesian Personalism, the Methodology of Scientific Research Programmes, and Duhem's Problem'.  He is concerned with two issues which arise from the work of Lakatos and one of these is intimately related to the Duhem-Quine problem.

1(a) Can a theory survive despite empirical refutation?  How can the arrow of modus tollens be diverted from the theory to some auxiliary hypothesis? This is essentially the Duhem-Quine problem  and it raises the closely related question;

1(b) Can we decide on some rational and empirical grounds whether the arrow of modus tollens should point at a (possibly) refuted theory or at (possibly) refuted auxiliaries?

2.How are we to account for the different weights that are assigned to confirmations and refutations? 

In the history of physics and astronomy, successful precise quantitative predictions seem often to have been regarded as great triumphs when apparently similar unsuccessful predictions were regarded not as major disasters but as minor discrepancies. (Dorling, 1979, 177).

The case history concerns a clash between the observed acceleration of the moon  and the calculated acceleration based on a hard core of Newtonian theory (T) and an essential auxiliary hypothesis (H) that the effects of tidal friction are too small to influence lunar acceleration.  The aim is to evaluate T and H in the light of new and unexpected evidence (E') which was not consistent with them. 

For the situation prior to the evidence E' Dorling ascribed a probability of 0.9 to Newtonian theory (T) and 0.6 to the auxiliary hypothesis (H). He pointed out that the precise numbers do not matter all that much; we simply had one theory that was highly regarded, with subjective probability approaching 1 and another which was plausible but not nearly so strongly held.

The next step is to calculate the impact of the new evidence E' on the subjective probabilities of T and H.  This is done by calculating (by the Bayesian calculus) their posterior probabilities (after E') for comparison with the prior probabilities (0.9 and 0.6).  One might expect that the unfavourable evidence would lower both by a similar amount, or at least a similar proportion.

Dorling explained that some other  probabilities have to be assigned or calculated to feed into the Bayesian formula.  Eventually we find that the probability of T has hardly shifted (down by 0.0024 to 0.8976) while in striking contrast the probability of H has collapsed by 0.597 to 0.003.  According to Dorling this accords with scientific perceptions at the time and it supports the claim by Lakatos that a vigorous programme can survive refutations provided that it provides opportunities for further work and has some success.  Newtonian theory would have easily survived this particular refutation because on the arithmetic its subjective probability scarcely changed. 

This case is doubly valuable for the evaluation of Lakatos because by a historical accident it provided an example of a confirmation as well as a refutation. For a time it was believed that the evidence E' supported Newton but subsequent work revealed  that there had been an error in the calculations. The point is that before the error emerged, the apparent confirmation of T and H had been treated as a great triumph for the Newtonian programme.  And of course we can run the Bayesian calculus, as though E' had confirmed T and H, to find what the impact of the apparent confirmation would have been on their posterior probabilities.  Their probabilities in this case increased to 0.996 and 0.964 respectively and Dorling uses this result to provide support for the claim that there is a powerfully asymmetrical effect on T between the refutation and the confirmation.  He regards the decrease in P from 0.9 to 0.8976 as negligible while the increase to 0.996 represents a fall in the probability of error from 1/10 to 4/1000. 

Thus the evidence has more impact in support than it has in opposition, a result from Bayes that agrees with Lakatos.

This latest result strongly suggests that a theory ought to be able to withstand a long succession of refutations of this sort, punctuated only by an occasional confirmation, and its subjective probability still steadily increase on average (Dorling, 1979,  186).

As to the relevance to Duhem-Quine problem; the task is to pick between H and T. In this instance the substantial reduction in P(H) would indicate that the H, the auxiliary hypothesis, is the weak link rather than the hard core of Newtonian theory. 

CASE  2.  HOWSON AND URBACH ON PROUT’S LAW

The point of this example (used by Lakatos himself) is to show how a theory which appears to be refuted by evidence can survive as an active force for further development, being regarded more highly  than the confounding evidence.  When this happens, the Duhem-Quine problem is apparently again resolved in favour of the theory.

In 1815 William Prout suggested that hydrogen was a building block of other elements whose atomic weights were all multiples of the atomic weight of hydrogen.  The fit was not exact, for example boron had a value of 0.829 when according to the theory it should have been 0.875 (a multiple of the figure 0.125).  The measured figure for chlorine was 35.83 instead of 36.  To overcome these discrepancies Prout and Thompson suggested that the values should be adjusted to fit the theory, with the deviations explained in terms of experimental error.  In this case the ‘arrow’ of modus tollens was directed from the theory to the experimental techniques.

In setting the scene for use of Bayesian theory, Howson and Urbach designated Prout's hypothesis as 't'.  They refer to 'a' as the hypothesis that the accuracy of measurements was adequate to produce an exact figure.  The troublesome evidence is labelled 'e'.

It seems that chemists of the early nineteenth century, such as Prout and Thompson, were fairly certain about the truth of t, but less so of a, though more sure that a is true than that it is false. (ibid, page 98)

In other words they were reasonably happy with their methods and the purity of their chemicals while accepting that they were not perfect. 

Feeding in various estimates of the relevant prior probabilities, the effect was to shift from the prior probabilities to the posterior probabilities listed as follows:

P(t) = 0.9  shifted to P(t!e) = 0.878      (down 0.022)
P(a) = 0.6 shifted to P(a!e) = 0.073      (down 0.527)

Howson and Urbach argued that these results explain why it was rational for Prout and Thomson to persist with Prout's hypothesis and to adjust atomic weight measurements to come into line with it.  In other words, the arrow of modus tollens is validly directed to a and not t.

Howson and Urbach noted that the results are robust and are not seriously affected by altered initial probabilities: for example if P(t) is changed from 0.9 to 0.7 the posterior probabilities of t and a are 0.65 and 0.21 respectively, still ranking t well above a (though only by a factor of 3 rather than a factor of 10).

In the light of the calculation they noted ‘Prouts hypothesis is still more likely to be true than false, and the auxiliary assumptions are still much more likely to be false than true’ (ibid 101). Their use of language was a little unfortunate because we now know that Prout was wrong and so Howson and Urbach would have done better to speak of 'credibility' or 'likelihood' instead of truth.  Indeed, as will be explained, there were dissenting voices at the time.

REVIEW OF THE BAYESIAN APPROACH

Bayesian theory has many admirers, none more so than Howson and Urbach.   In their view, the Bayesian approach should become dominant in the philosophy of science, and it should be taken on board by scientists as well.  Confronted with evidence from research by Kahneman and Tversky that ‘in his evaluation of evidence, man is apparently not a conservative Bayesian: he is not a Bayesian at all’ (Kahneman and Tversky, 1972, cited in Howson and Urbach, 1989, 293)  they reply that:

...it is not prejudicial to the conjecture that what we ourselves take to be correct inductive reasoning is Bayesian in character that there should be observable and sometimes systematic deviations from Bayesian precepts...we should be surprised if on every occasion subjects  were apparently to employ impeccable Bayesian reasoning, even in the circumstances that they themselves were to regard Bayesian procedures as canonical.  It is, after all, human to err. (Howson and Urbach, 1989, 293-285)

They draw some consolation from the lamentable performance of undergraduates (and a distressing fraction of logicians) in a simple deductive task (page 294).  The task is to nominate which of four cards should be turned over to test the statement ‘if a card has a vowel on one side, then it has an even number on the other side’.  The visible faces of the four cards are 'E', 'K', '4' and '7'.  The most common answers are the pair 'E' and '4' or '4' alone.  The correct answer is e and 7.

The Bayesian approach has some features that give offence to many people.  Some object to the subjective elements, some to the arithmetic and some to the concept of probability which was so tarnished by the debacle of Carnap's programme.

Taking the last point first, Howson and Urbach argue cogently that the Bayesian approach should not be subjected to prejudice due to the failure of the classical theory of objective probabilities. The distinctively subjective starting point for the Bayesian calculus of course raises the objection of excessive subjectivism, with the possibility of irrational or arbitrary judgements.  To this, Howson and Urbach reply  that the structure of argument and calculation that follows after the assignment of prior probabilities resembles the objectivity of deductive inference (including mathematical calculation) from a set of premises.   The source of the premises does not detract from the objectivity of the subsequent manipulations that may be performed upon them. Thus Bayesian subjectivism is not inherently more subjective than deductive reasoning.

EXCESSIVE REFLECTION OF THE INPUT

The input consists of prior probabilities (whether beliefs or betting propensities) and this raises another objection, along the lines that the Bayesians emerge with a conclusion (the posterior probability) which overwhelmingly reflects what was fed in, namely the prior probability.  Against this is the argument that the prior probability (whatever it is) will shift rapidly towards a figure that reflects the impact of the evidence.  Thus any arbitrariness or eccentricity of original beliefs will be rapidly  corrected in a 'rational' manner.  The same mechanisms is supposed to result in rapid convergence between the belief values of different scientists.

To stand up, this latter argument must demonstrate that convergence cannot be equally rapidly achieved by non-Bayesian methods, such as offering a piece of evidence and discussing its implications for the various competing hypotheses or the alternative lines of work without recourse to Bayesian calculations.

As was noted previously, there is a considerable difference of opinion in Bayesian circles about the measure of subjective belief.  Some want to use a behavioural measure (actual betting, or propensity to bet), others including Howson and Urbach opt for belief rather than behaviour. The 'betting Bayseians' need to answer the question - what, in scientific practice, is equivalent to betting?  Is the notion of betting itself really relevant to the scientist's situation? Betting forces a decision (or the bet does not get placed) but scientists can in principle refrain from a firm decision for ever (for good reasons or bad).  This brings us back to the problems created by the demand to take a stand or make a decision one way or the other.  Even if some kind of behavioural equivalent of betting is invoked, such as working on a particular programme or writing papers related to the programme, there is still the kind of problem, noted below, where a scientist works on a theory which he or she believes to be false.

Similarly formidable problems confront the 'belief Bayesians'. Obviously any retrospective attribution of belief (as in the cases above) calls for heroic assumptions about the consciousness of people long dead.  These assumptions expose  the limitation with the 'forced choice' approach which attempts to  collapse all the criteria for the decision into a single value.  Such an approach (for both betting and belief Bayesians)  seems to preclude a complex appraisal of the theoretical problem situation which might be based on multiple criteria.  Such an appraisal might run along the lines that theory A is better than theory B in solving some problems and C is  better than B on some other criteria,  and so certain types of work are required to test or develop each of the rival theories.  This is the kind of situation envisaged by Lakatos when he developed his methodology of scientific research programmes.

The forced choice cannot comfortably handle the situation of Maxwell who continued to work on his theories even though he knew they had been found wanting in tests.  Maxwell hoped that his theory would come good in the end, despite a persisting run of unfavourable results.   Yet another situation is even harder to comprehend in Bayesian terms.  Consider a scientist at work on an important and well established theory which that scientist believes (and indeed hopes)  to be false.  The scientist is working on the theory with the specific aim of refuting it, thus achieving the fame assigned to those who in some small way change the course of scientific history.  The scientist is really betting on the falsehood of that theory.  These comments reinforce the value of detaching the idea of working on a theory from the need to have belief in it, as noted in the chapter on the Popperians.

REVIEW OF THE CASES

What do the cases do for our appraisal of Bayesian subjectivism?  The Dorling example is very impressive on both aspects of the Lakatos scheme - swallowing an anomaly and thriving on a confirmation.  The case for Bayesianism (and Lakatos) is reinforced by the fact that Dorling set out to criticise Lakatos, not to praise him.   And he remained critical of any attempt to sidestep refutations because he did not accept that his findings provided any justification for ignoring refutations, along the lines of 'anything goes'. 

Finally, let me emphasise that this paper is intended to attack, not to defend, the position of Lakatos, Feyerabend and some of Kuhn's disciples with respect to its cavalier attitude to 'refutations'.  I find this attitude rationally justified only under certain stringent conditions: p(T) must be substantially greater than 1/2, the anomalous result must not be readily explainable by any plausible rival theory to T...(Dorling, 1979,  187).

In this passage Dorling possibly gives the game away.  There must not be a significant rival theory that could account for the aberrant evidence E'.  In the absence of a potential rival to the main theory the battle between a previously successful and wide-ranging theory in one corner (in this case Newton) and a more or less isolated hypothesis and some awkward evidence in another corner is very uneven. 

For this reason, it can be argued that the Bayesian scheme lets us down when we most need help - that is, in a choice between major rival systems, a time of 'crisis' with clashing paradigms, or a major challenge as when general relativity emerged as a serious alternative to Newtonian mechanics.  Presumably the major theories (say Newton and Einstein) would have their prior probabilities lowered by the existence of the other, and the supposed aim of the Bayesian calculus in this situation should be to swing support one way or the other on the basis of the most recent evidence.  The problem would be to determine which particular piece of evidence should be applied to make the calculations.  Each theory is bound to have a great deal of evidence in support and if there is recourse to a new piece of evidence which appears to favour one rather than the other (the situation with the so-called 'crucial experiment') then the Duhem-Quine problem arises to challenge the interpretation of the evidence, whichever way it appears to go.

A rather different approach can be used in this situation.  It derives from a method of analysis of decision making which was referred to by Popper as 'the logic of the situation' but was replaced by talk of 'situational analysis' to take the emphasis off logic.  So far as the Duhem-Quine problem is concerned we can hardly appeal to the logic of the situation for a resolution because it is precisely the logic of the situation that is the problem.   But we can appeal to an appraisal of the situation where choices have to be made from a limited range of options. 

Scientists need to work in a framework of theory.  Prior to the rise of Einstein, what theory could scientists use for some hundreds of years apart from that of Newton and his followers? In the absence of a rival of comparable scope or at least significant potential there was little alternative to further elaboration of the Newtonian scheme, even if anomalies persisted or accumulated.  Awkward pieces of evidence create a challenge to a ruling theory but  they do not by themselves provide an alternative.  The same applies to the auxiliary hypothesis on tidal friction (mentioned the first case study above), unless this happens to derive from some non-Newtonian theoretical assumptions that can be extended to rival the Newtonian scheme.

The approach by situational analysis is not hostage to any theory of probability (objective or subjective), or likelihood, or certainty or inductive proof.  Nor does it need to speculate about the truth of the ruling theory, in the way that Howson and Urbach speculate about the likelihood that a theory might be true.

This brings us to the Prout example which is not nearly as impressive as the Dorling case.  Howson and Urbach concluded that the Duhem-Quine problem in that instance was resolved in favour of the theory against the evidence  on the basis of a high subjective probability assigned to Prout's law by contemporary chemists.  In the early stages of its career Prout's law may have achieved wide acceptance by the scientific community, at least in England, and for this reason Howson and Urbach assigned a very high subjective probability to Prout's hypothesis (0.9).  However Continental chemists were always skeptical and by mid-century Staas  (and quite likely his Continental colleagues) had concluded that the law was an illusion (Howson and Urbach, 1989, 98).  This potentially damning testimony was not invoked by Howson and Urbach to reduce p(H), but it could have been (and probably should have been).  Staas may well have given Prout the benefit of the doubt for some time over the experimental methodology, but as methods improved then the fit with Prout should have improved as well.   Obviously the fit did not improve and under these circumstances  Prout should have become less plausible, as indeed was the case outside England.  If the view of Staas was widespread, then a much lower prior probability should have been used for Prout's theory.

Another point can be made about the high prior probability assigned to the hypothesis.   The calculations show that the subjective probability of the evidence sank from 0.6 to 0.073 and this turned the case in favour of the theory.  But there is a flaw of logic there: presumably the whole-number atomic numbers were calculated using the same experimental equipment and the same or similar techniques that were used to estimate the atomic number of Chlorine.  And the high p for Prout was based on confidence in the experimental results that were used to pose the whole-number hypothesis in the first case.  The evidence that was good enough to back the Prout conjecture should have been good enough to refute it, or at least dramatically lower its probability.

In the event, Prout turned out to be wrong, even if he was on the right track in seeking fundamental building blocks.  The anomalies were due to isotopes which could not be separated or detected by chemical methods.  So Prout's hypothesis may have provided a framework for ongoing work until the fundamental flaw was revealed by a major theoretical advance.  As was the case with Newtonian mechanics in the light of the evidence on the acceleration of the moon, a simple-minded, pragmatic approach might have provided the same outcome without need of Bayesian calculations.

Consequently it is not true to claim, with Howson and Urbach that  “...the Bayesian model is essentially correct.  By contrast, non-probabilistic theories seem to lack entirely the resources that could deal with Duhem's problem” (Howson and Urbach, 1989, 101).

CONCLUDING COMMENTS

It appears that the Bayesian scheme has revealed a great deal of power in the Dorling example but is quite unimpressive in the Prout example.  The requirement that there should not be a major rival theory on the scene is a great disadvantage because at other times there is little option but to keep working on the theory under challenge, even if some anomalies persist.  Where the serious option exists it appears that the Bayesians do not help us to make a choice. 

Furthermore, internal disagreements call for solutions before the Bayesians can hope to command wider assent; perhaps the most important of these is the difference between the ‘betting’ and the ‘belief’ schools of thought in the allocation of subjective probabilities. There is also the worrying aspect of betting behaviour which is adduced as a possible way of allocating priors but, as we have seen, there is no real equivalent of betting in scientific practice. One of the shortcomings of the Bayesian approach appears to be an excessive reliance on a particular piece of evidence (the latest) whereas the Popperians and especially Lakatos make allowance for time to turn up a great deal of evidence so that preferences may slowly emerge.

This brings us to the point of considering just how evidence does emerge, a topic which has not  yet been mentioned but is an essential part of the situation.  The next chapter will examine a mode of thought dubbed the 'New Experimentalism' to take account of the dynamics of experimental programs.



CHAPTER 4

THE NEW EXPERIMENTALISM


‘Experimentation has a life of its own’ (Hacking)

‘I don't understand any of that [talk of theorising]. I think just sort of messing about is the answer.  You've got to keep messing about at the bench.’  (Epstein)

The previous chapter on Bayesian subjectivism addressed the problem of weighing up the credibility of evidence against the credibility of a theory which is apparently challenged by the evidence.  This chapter continues the theme of evidence, in particular that gleaned from experiments.  The aim is to explore the implications of the so-called ‘new experimentalism’.

According to Ackermann, the new experimentalism has a profound impact on the Duhem-Quine problem. It ceases to exist.  It is no longer a problem because it is rooted in an antiquated philosophy of science.

The Quine-Duhem problem is generated within the old philosophy of science by its reliance on the logical articulation of theory and observation.  A more thorough reworking of philosophical concerns within the new philosophy of replacements suggested by Hacking would make the Quine-Duhem problem, as it is traditionally formulated, simply irrelevant. (Ackermann, 1989, page 189)

In this chapter I will argue that there is an element of truth in Ackermann's claim, in the light of Hacking's arguments for the partial autonomy of experimentation.   Because the Duhem-Quine problem is essentially a matter of theory, of the logical relationship between theories and their articulation with observation statements, if experimentation indeed has a life of its own, in some sense, then, in that sense, the Duhem-Quine problem  is rendered ineffective.  However theory also has a life of its own, and if the focus of attention is the articulation, testing and growth of theories, then the autonomy of experimentation does not render the Duhem-Quine problem irrelevant.  It remains a live issue and one might hope that the new experimentalism will cast light on the way that experimentation can sometimes produce decisive confirmations (or refutations).

THE RISE OF AND FALL OF EXPERIMENTATION

Experimentation became a central concern of science in modern times under the influence of Bacon and  the Royal Society of London.  A new form of communication came into being to transmit the information gathered by the tireless experimenters of the Society.

Experiment was officially declared to be the royal road to knowledge, and the schoolmen were scorned because they argued from books instead of observing the world around them... (Hacking, 1983, 149)

Boyle emerged as a leading representative of the new experimental philosophy and his papers attracted the hostile attention of the philosopher Thomas Hobbes.  The polemics between Boyle and Hobbes reveal an early anticipation of the Duhem-Quine problem.

Hobbes noted that all experiments carry with them a set of theoretical assumptions embedded in the actual construction and functioning of the apparatus and that, both in principle and in practice, those assumptions could always be challenged.

The resonance with the "Duhem-Quine" thesis is intentional.  We shall see that Hobbes's particular objections to Boyle's experimental systems provide a concrete exemplar of this "modern" thesis concerning the impossibility of crucial experiments.  (Shapin and Schaffer, 1985, 112).

Science and experimentation advanced despite the polemics of Hobbes and subsequently the philosophy of science became primarily concerned with matters of logic, with the validation (or refutation) of theories and the evolution of theoretical schemes.  As Ackermann put it ‘One simply began to philosophise on the assumption that science was capable of delivering a data base of settled observational statements’ (Ackermann, 1989, 185).

Times have changed.  History of the natural sciences is now almost always written as a  history of theory.  Philosophy of science has so much become philosophy of theory that the very existence of pre-theoretical observations or experiments has been denied.  I hope the following chapters might initiate a Back-to-Bacon movement, in which we attend more closely to experimental science.  Experimentation has a life of its own. (Hacking, 1983, 149-50).

THE NEW EXPERIMENTALISM

Hacking and others such as Allan Franklin have initiated something of a resurgence of interest in experimentation. Hacking in his seminal book Representing and Intervening (1983) provided the catchcry for the movement ‘experimentation has a life of its own’. Central to his concern is the role of intervention and manipulation, with the implication that it is successful intervention that gives experimentation a life of its own.  Success here does not have the same meaning as it has for a theoretician for whom success consists of a successful prediction deduced from theory.  For the new experimentalist, successful intervention is more instrumental, more to do with making things work and feeling
‘at home’ in hitherto novel experimental situations (the submicroscopic world, for example).  It is a matter of manipulation and control rather than explanation.

A strategy to establish the Hacking thesis can be pursued along at least five lines of which he has followed three.   (1) He first sets out to detach the activities of collectors, observers and experimenters from the hegemony of theory, to correct a ‘deductivist’ bias against the collection of information and the conduct of experiments without explicit direction from a theory.  (2) He then proceeds to make the case for ‘noteworthy observations’ which establish certain features of the world more or less independent of theoretical explanation (though this may follow).  (3) With the role of noteworthy observations in place it is possible to discern how  research programs  can be driven by the opportunities presented by advances in experimental techniques and their fruits, again independent of theoretical prediction and explanation.  Two other lines of argument for the autonomy of experimentation are not pursued by Hacking; these are (4) the fact that we do not need to have a good theory of our instruments or equipment to make progress and (5) there may be no deductive closure between existing theory and stable, repeatable interventions (experimental effects).

The point that has to be established by these various lines of argument is that experiments have some kind of life of their own. This claim can be established by demonstrating that at least some experiments can be conducted without making strict deductions from theory.  In this situation the Duhem-Quine problem (a problem of logic) does not arise.

THE CRITIQUE OF DEDUCTIVISM

The strong form of deductivism is offered by Justus von Liebig the German chemist and his deductivist soulmate Karl Popper.

In all investigations Bacon attaches a great deal of value to experiments.  But he understands their meaning not at all. He thinks they are a sort of mechanism which once put in motion will bring about a result of their own.  But in science all investigations is deductive or a priori.  Experiment is only an aid to thought, like calculation: the thought must always and necessarily precede it if it is to have any meaning.  An empirical mode of research, in the usual sense of the  term, does not exist.  An experiment not preceded by theory, i.e. by an idea, bears the same relation to scientific research as a child's rattle does to music (cited in Hacking, 1983, 153).

Thus it is he [the theorist] who shows the experimenter the way.  But even the experimenter is not in the main engaged in making exact observations; his work is largely of a theoretical kind.  Theory dominates the experimental work from its initial planning up to the finishing touches in the laboratory  (Popper, 1972, 107).

In contrast, Humphrey Davey opening his chemistry textbook as follows:

The foundations of chemical philosophy  are observation, experiment, and analogy.  By observation, facts are distinctly and minutely impressed on the mind.  By analogy, similar facts are connected.  By experiment, new facts are discovered; and, in the progression of knowledge, observation, guided by analogy, leads to experiment, and analogy confirmed by experiment, becomes scientific truth. (Elements of Chemical Philosophy, 1812, cited in Hacking, page 152).

Davey went on to explain how he conducted various simple experiments with aquatic plants to obtain a gas, collected in a wine glass inverted over the plant filaments, a gas which flared when exposed to a lighted taper.  Hacking explained that none of this work was guided by theory or designed to test any theory.  At the same time Hacking  stated that his target was a strong form of deductivism which categorically denied any autonomous role for curiosity and ‘messing about’ with the material.  Hacking is entirely in favour of speculation and calculation to draw out testable consequences of theories and he favours a symbiotic or synergistic partnership of theory and observation.

It may be argued that Hackings advocacy of symbiosis between theory and experiment is a fatal concession which destroys his argument for the autonomy of experimentation.  The rejoinder is that experimentation may not proceed in a theoretical vacuum but the kind of investigation pursued by Davey does not depend on any particular theory, and, moreover, they do not involve rigorous deduction from any theory.  Experiments of that kind  can be regarded as sufficiently autonomous to avoid the Duhem-Quine problem.

To enrich the conventional understanding of the theory-observation complex, Hacking set out to demonstrate that observation is not ‘just one monolithic practice’  (ibid. 210)  He explained that observation using the unaided eye as a primary source of data is not very important, certainly not when sciences progress beyond the ‘nature study’ or ‘stamp collecting’ stage to engage seriously in the articulation of explanatory theories.  When controlled experimentation emerges another type of observation becomes important - the capacity to "see the instructive quirks or unexpected outcomes of this or that bit of the equipment" (167).  This is a point made, from the workbench as it were, by the Nobel prizewinning virologist Anthony Epstein.

You've got to keep messing about at the bench.  You see how to change this just a little bit, and you see how you change that a bit, and you want to tinker with something and find a slightly different and new way of doing it.  You make a little bit of apparatus...You've actually got to be there, seeing it, messing with it...It means registering inside yourself minute changes, tiny things which may have a big influence. (Epstein in Wolpert and Richards, 1988, 165)

NOTEWORTHY OBSERVATIONS AND EXPERIMENTATION

What Hacking calls ‘noteworthy observations’ may stimulate a line of investigation but are rapidly superseded by experimentation. He might have made the point that noteworthy observations are most likely to be made by people with prepared minds, that is, people attuned by certain problems or interests (including theoretical interests) resulting in a tendency to anticipate or search for certain types of things.  The point that he does make is that observation is a skill which some people have to a greater degree than others, a skill which most can improve with practice (page 168).  He then moved on to the important point that seeing with the naked eye has been almost entirely replaced by observations made using more or less complicated instruments, notably the microscope and the telescope.  The overwhelming point to emerge from all this, in addition to the idea that observational is not a monolith practice, is that observation in its crude form is very much over-rated.  Any serious examination of the ‘observational’ side of the observation/theory complex needs to take a hard look at experimentation and the way that observation or fact gathering is focussed (or diverted) by the availability of equipment and the discipline and ingenuity required to make it work properly.

Some of this insight may perhaps be captured in the notion of ‘theory-impregnated’ observations, and some can be found in Duhem's analysis of the role of theory in setting up and interpreting experiments (the point made by Hobbes in his polemics with Boyle).  But this is not the insight that Hacking is most concerned to consolidate.  He is concerned to establish a leading role for noteworthy observations.

An example of a noteworthy observation is the double refraction of crystals of Iceland Spar or calcite.  A crystal placed on a printed page will produce double lines of print. This phenomenon was first observed by Erasmus Bartholin circa 1689, by which time the laws of (ordinary) refraction were well known.  These crystals offered a puzzle of double refraction, though Hacking does not indicate how this puzzle was resolved, whether by extending the laws of refraction or assimilation as a special case within the known principles.  Hacking went on to describe the role of Iceland Spar in its more significant role in the history of optics as the first known polariser of light, an effect which was discovered in 1808 by a colonel in Napoleon's corps of engineers.

E.L. Malus (1775-1812) was experimenting with Iceland Spar and noticed the effect of evening sunlight being reflected from the windows of the nearby Palais de Luxembourg.  The light went through his crystal when it was held in a vertical plane, but was blocked when the crystal was held in a horizontal plane (Hacking, 1983, 157)

THE DYNAMICS OF EXPERIMENTATION

The Malus example demonstrates a happy congruence of favourable circumstances and an experimenter/observer skilful enough to exploit the situation to advantage.  Hacking provided further examples of experimenters who advanced our knowledge of phenomena without obvious theoretical motivation.  David Brewster (1781-1868) according to Hacking was the major figure in experimental optics for some decades; he determined the laws of reflection and refraction for polarized light, induced polarising properties in bodies under stress and established much of the material for developments in wave theory.  All this,  despite the fact that so far as theory was concerned, he was a Newtonian ‘corpuscular’ adherent.  Similarly, R. W. Wood (1868-1955) made fundamental contributions to the experimental exploration of optics which called for quantum mechanical explanations, while he himself remained ‘almost entirely innocent of, and sceptical about, quantum mechanics...(his) contribution arose not from the theory, but, like Brewster's, from a keen ability to get nature to behave in new ways’ (Hacking, 1983, 158).

One of the most striking examples of intervention and manipulation prior to adequate theory is afforded by Hertz and his discovery of radio waves, described in detail by Buchwald (1994).  Admittedly, Hertz had theoretical concerns and he expected that his discoveries would have theoretical impact, if only by contradicting the ruling ideas of Maxwell.  But his experimental work was driven by its own dynamic as he found ways to create and direct these mysterious forms of electromagnetic radiation, even bouncing them off the walls of the laboratory.  He noted:
...no great preparations are essential if one is content with more or less complete indications of the phenomena.  After some practice one can find indications of reflection at any wall (Hertz, cited in Buchwald, ibid, 309).

The point of these examples is that discoveries can be made by following up experimental opportunities, not just by rigorously following theoretical intimations by deductive testing as required by those of the Popper/von Liebig persuasion.  Another example is the development of the ultracentrifuge which won a Nobel Prize for its inventor even though the theories which he set out to test were of little scientific moment.  The prize was presumably awarded on account of the scientific and technological value of the invention for the further development of biology.

We do not necessarily need a good theory of our instruments (such as the eye itself).  The eye is perhaps the paradigm case of an instrument which can be used effectively (most of the time) without any theoretical insight into the way it works.  Some of the most sophisticated uses of the eye are demonstrated by so-called primitive hunters and trackers. Similarly crafts such as cooking, cultivation, working in wood and metal and many others (such as imparting swing or swerve to cricket balls, footballs and the like) can be pursued in profound ignorance of the theoretical principles at work.

The secrets of success are presumably trial and error learning, transmission of successful practices and a degree of stability in the environment.  Moving to the realm of science, Hacking pointed out that the value of microscopic evidence remained virtually unchanged despite significant changes in the theory of microscopy.

We sometimes cannot close the gap by deductive means from theory to some of the most interesting effects observed with actual phenomena.  Hertz could never produce a body of theory to satisfactorily account for the generation and behaviour of the waves that he could produce at will and bounce off his laboratory walls (Buchwald, ibid, 321-324).

These lines of argument add up to a strong case for the notion that experimentation has a life of its own, at least a life prior to a deductive explanation for the effects observed.  Hacking's position is somewhat complex and he backs away from the claim that experimental work is completely independent of theory.

I do not contend that noteworthy observations in themselves do anything...Thus I make no claim that experimental work could exist independent of theory.  That would be the blind work of those whom Bacon mocked as 'mere empirics'. It remains the case, however, that much truly fundamental research precedes any relevant theory whatsoever (Hacking, 1983, 158)

Again, Hacking's statement ‘I make no claim that experimental work could exist independent of theory’ could be regarded as a demolition of his case that experimentation has a life of its own.  However the kind of autonomy required to place experiment outside the ambit of the Duhem-Quine problem  is merely that which does not involve a strict deduction from a particular body of theory.   The criterion of coming before any relevant theory is  much less demanding than the criterion of operating without reference to any theories at all, which is scarcely conceivable. 

To the extent that experimentation does precede theory, the Duhem-Quine problem  is circumvented because it is a problem of the logical articulation of theories and observation statements.  Because certain types of observation and experimentation can precede theory we are left with the conclusion that these phenomena exist as ‘brute facts’ regardless of changes in theories and the explanation (if any) that is provided for them. Solar eclipses and all manner of natural events  would be examples.  We are left in the rather paradoxical situation of returning to the ‘pre-Fall’ condition  mocked by Ackermann, when people ‘began to philosophise on the assumption that science was capable of delivering a data base of settled observational statements’ (1989, 185).

THE POSITIVE CONTRIBUTION  OF EXPERIMENTATION

This section shows how the strength of experimentation can be turned to good account by theorists who are perplexed by the Duhem-Quine problem .  Franklin has addressed a number of experimental sequences to assess the role of evidence in settling theoretical issues. He is concerned with the way that experiments become accepted as reliable in the face of alternative interpretations of the results and the problems that are involved in obtaining valid results.

He is especially sensitive to the Duhem-Quine problem and he produced an example that shows that the problem can be resolved by experiment in at least one kind of situation - that kind depicted by Popper where two systems of theories differ on a single identifiable issue.

Franklin’s example is the discovery of parity non-conservation by Lee and Yang in the late 1950s, a theoretical proposition with testable consequences which were promptly explored and confirmed by independent teams.  This episode has great interest for the history and philosophy of science, partly for its relevance to the Duhem-Quine problem and also because it shows how crucial evidence can be overlooked if the community of theorists is not ready for it.  Franklin reports that experiments in the 1920s demonstrated parity non-conservation but were not pursued at the time (Franklin, 1986, Chapter 2).  The problem situation was very different in the 1950s, following the discovery of so-called ‘strange particles’ in 1947.  These came in two types, one half the mass of a proton (K mesons) and the other, heavier than the proton, called hyperons. These particles were called strange because of some perplexing features of their decay modes, leading to speculation about the actual number of separate particles and decay modes which might occur in strong and weak interactions.

A particular pair of particles seemed to have exactly the same lifetime  and comparable masses but opposite spin parity.  Could these be the same particles, with opposite parity, products of a decay where parity was not conserved, in contrast with a parity-conserving decay which produces particles with the same parity? In view of the ‘sacred cow’ nature of the assumption of parity conservation, the non-conservation option was not generally considered, far less explored.

At first, physicists, including Lee and Yang, attempted to solve this puzzle within the framework of conventional theories...those attempts were unsuccessful.  Lee and Yang saw clearly that a possible solution to the problem would be the nonconservation of parity in the weak interactions...They wrote "This prospect did not appeal to us.  Rather we were, so to speak, driven to it through frustration with the various other efforts at understanding the puzzle that had been made”. (Franklin, 1986, 14-15)

Franklin (ibid 16) notes that the possibility of parity violation had been mentioned in two earlier papers as a logical possibility but not as the solution of a concrete problem.  The significant  contribution by Lee and Yang was to suggest nonconservation of parity as a solution to a pressing problem, and then to take the vital next steps to unpack some of the contents of their parcel and propose experiments to test some of the consequences nonconservation.  Several research teams went to work to follow up these leads. Some encountered formidable experimental problems but eventually several lines of work came up with the results predicted by Lee and Yang.

Virtually the entire physics community accepted these results as crucial support for nonconservation.  Support came  partly because of the problem-solving power of the idea in the context where it was formulated, and partly because some novel predictions were confirmed by ‘overwhelming statistical evidence’ (Franklin, 1986, 35).  Evidence, moreover, from more than one source.

Franklin speculated that there may have been some attempt to modify background knowledge or use auxiliary hypotheses to preserve parity conservation ‘in line with suggestions made on general grounds by Duhem and Quine’ (ibid 36) but his literature review turned up no attempts of that nature.  Instead, theoretical work concentrated on exploiting the opportunities of the new idea leading to highly successful and important developments in the fundamental theory of weak interactions (the V-A theory).

In discussing the episode of parity nonconservation I have argued that the decision is between classes of theories, those that do and those that do not conserve parity, and that these classes are both exclusive and exhaustive.  Thus the hypothesis "parity is conserved in the weak interactions" was both tested and refuted.  (Franklin, ibid 106)

In this situation the experimental evidence helped to achieve a swift resolution of a theoretical problem situation.  Franklin described a much more troublesome case  where the vital experiment is classified as ‘convincing’ (rather than ‘crucial’) because the results were not so statistically clear-cut and long chains of reasoning with many conjectural theories were involved in setting up the experiment and interpreting the outcome ("CP or not CP", Chapter 3 in Frankin, 1986).  Eventually the experiments became sufficiently convincing for the theory under test to be effectively discarded - the theory of invariance of physical processes under CP (combined space inversion and particle-antiparticle interchange).

Franklin noted that the delay in accepting the result of ‘convincing’ experiments represented an example of the Duhem-Quine problem.  In practice the scientific community worked through the problem by taking up alternative ‘saving’ explanations of the effect (at least those considered to be realistic alternatives) until no viable possibilities remained. 

The alternative explanations were precisely attempts to alter parts of the background.  Each was tested and failed, leaving CP unprotected and violated.  In a practical sense, the physics community quickly solved the Duhem-Quine problem.  This does not, however, answer the logical problem posed by Duhem and Quine.  There certainly exist other alternative explanations, but they were not regarded as physically interesting. (Franklin, 1986, 100).


CONCLUSION

It appears that the promise of the new experimentalism to render the Duhem-Quine problem irrelevant has been partly made good.  There is a sense in which experiments take on a life of their own, partly by preceding deduction from theories, partly through the impact of noteworthy observations and partly through the dynamism of experimental programs which permit effective manipulation and intervention, in advance of theoretical explanation.

However theory has a life of its own as well, and the Duhem-Quine problem persists whenever there is a desire to obtain theoretical explanations.  But here experimentation plays a vital role, sometimes by way of genuinely crucial experiments and more usually by the steady accumulation of confirmations (or refutations) which build the credibility of a theory (or undermine its rivals). 

Introduction and Chapter 1 Some Popperians                 Chapter 5 Duhem's Response and Conclusions