The Agora

A center for philosophical dialogue and scholarship about education

mission | lectures | conferences

In the Spirit of Hardie: Scientific Educational Research, Methodolatry, and the Platium Standard

D.C. Phillips, Stanford University*

 

Preamble: The Hardian spirit in philosophy of education

I am deeply honored by the invitation to present the first C.D. Hardie lecture at the University of Illinois, and to celebrate, too, the first incumbent of the Hardie Chair, my friend Walter Feinberg. I hope that my presentation today disappoints neither of them!

I was educated, and worked, in the great city of Melbourne, only a few hundred miles to the north of Hobart, Tasmania, where Hardie spent most of his professional life; both of us were foundation members of the Philosophy of Education Society of Australasia, and both of us were associated in various ways with the journal Educational Philosophy and Theory (EPAT) in its earlier days. But Hardie and I never met – he was something of a recluse, and did not (to my knowledge) attend any of the meetings of the Society, even the one we held in Hobart. But I had read, and was greatly influenced by, his pioneering book in analytical philosophy of education – Truth and Fallacy in Educational Theory (Hardie, 1962) a few years after it had been reissued in the early 1960’s (for of course the book had first appeared during the dark days of the war, which had severely limited its circulation and impact). The book, and the few essays of his that appeared in the early volumes of EPAT, were written in a somewhat acerbic, no-nonsense style that seemed to me to perfectly reflect the Aussie temperament, and they were marked above all by a dogged determination to be clear and avoid ambiguities. (I noted early on, too, that he tended to call himself C.D. while I tended to call myself D.C.— a bond indeed!)

Hardie was quite knowledgeable about the educational research of his day. One of his book’s five chapters was a semi-technical discussion of educational measurement; his opening paper in the inaugural issue of EPAT was, essentially, on reductionism in natural and social science; and the last paper of his that I recall reading, in the third volume of EPAT in 1971, was titled “The Philosophy of Educational Research”. I quote a few lines from this paper – written, I stress, almost 35 years ago – to serve as segue to the topic of my talk today. Hardie wrote:
But to surrender the methods of investigation that have led from success to success in the natural and social sciences is a counsel of despair, and no good reason has been offered as to why we should. If we are to understand education and the social sciences we need to look at and manipulate the world, and not just live and meditate in it. To imagine otherwise is to return to the age of scholasticism. Research in education, then, I shall assume, is similar to research in other fields, but the object of the investigation is, of course, knowledge about education instead of knowledge about, for example, the stars. (Hardie, 1971, p.2)
Here Hardie was endorsing – thirty years beforehand – one of the key principles of that notorious and embattled document, the National Research Council (NRC) report Scientific Research in Education (2002), of which I was one of the co-authors and to which I shall allude in more detail in a few moments. Some might even think, in light of his stress on the importance of manipulating the world, that he was a closet advocate of the “gold standard” methodology of the randomized field trial! Perish the thought!

But it is time for me to turn less obliquely to my main topic, the parlous state of educational research in the early 21st century.

Introduction: The new Babel

The Old Testament of the Bible tells us that Babel was the site of a great tower, the construction of which was interrupted because of “the confusion of tongues”, and the resultant inability of its builders to communicate with each other. When the languages being spoken were different, presumably disagreements could not easily be settled about the design of the tower and the methods to be used in its construction. It only requires a slight leap of the imagination to view educational research as the new Babel, although it is decentralized around the Globe rather than being located in one particular spot. Nevertheless, scattered as it is, the aspiration has been to construct a grand edifice – a towering international accumulation of reliable and practically useful results. This noble endeavor, unfortunately, could conceivably come to a standstill, the hoped-for grand structure replaced by a multitude of humble small piles of findings often of dubious worth, scattered among which is a multitude of dubiously-substantiated personal opinions and abstruse ideologies. All this because debate within the research community is beset by a “confusion of tongues”.

It should be made clear at the outset that the problem is not simply a linguistic one – differences in language can, with effort, often be overcome. The problem is more like that of cross-paradigm communication in Thomas S. Kuhn’s original formulation – there is a gulf of incommensurability that makes communication impossible (Kuhn later backed-off and said “very difficult”; see Kuhn, 1970). For the point is, differences in terminology mask many other differences often at a deep-seated level. By way of example, consider what assumptions underlie the following passage by Patti Lather in her essay about the NRC report, titled “This IS your father’s paradigm”; certainly her assumptions are so far removed from my own that communication and discourse become, if not impossible, extremely difficult – I doubt that we could construct an edifice of knowledge together. Lather wants to see an educational research that moves “toward a Nietzschean sort of ‘unnatural science’ that leads to greater health by fostering ways of knowing that escape normativity” (Lather, 2004, p.27). From my perspective, this is so murky and fraught with danger that (if taken seriously) it portends the extinction of the empirical research enterprise – how, for example, could an epistemology that eschews normativity lead to anything but relativistic chaos? (The whole point of a “way of knowing” is that it is normative, otherwise just anything will do – without norms there is no “way”, no way for others to follow, at all!)

Thus, underlying the cacophony of languages in educational research (the babble if not the Babel) there are (what are close to being) paradigmatic differences: There are different ideals or models for inquiry and hence different purposes to which its conclusions can be put, different views about the status of conclusions reached in inquiry (and about whether any conclusions – rather than opinions or political stances – can be formulated at all), different methodologies, different assumptions about the possibility (and desirability) of achieving objectivity, different stances with respect to the concepts of truth and reality, different attitudes towards reason and its efficaciousness, different presuppositions about how the social and educational worlds work and hence about the ways in which these interrelated realms can be understood, different views about the causative agents in the social world (causes that influence the researcher as much as the researcher’s human subjects), different attitudes towards the issue of whether regularities in the social and educational worlds are uncovered by investigators or are constructed by them. There are those who believe that rigorous scientific research can pave the way to educational improvement (see for example Hargreaves, 1997; Mosteller and Boruch, 2002), and there are others who hold the view that “the forms of human association characteristic of educational engagement are not really apt for scientific or empirical study at all” (Carr, 2003, pp. 54-5).
In the discussion that follows I shall try to shed some light on this complex, if not confusing, situation – but of course my aspirations must be modest, for there is too much going on to be discussed in a short presentation.

A preliminary categorization of the different voices

I open by giving a crude mapping of the terrain—the lay of the land around Babel—with the reminder that crude maps can be dangerous if they are used for detailed guidance in one’s travels. But it is useful to have a preliminary overview of the landscape, an advanced organizer, before delving more carefully into the details of specific parts. By stressing the crudeness I protect myself from refutation, for crude maps cannot be criticized on the grounds that they have gotten some of the details wrong—if the details were all correct then the map would not be crude! (After the nuanced discussion the map, although initially helpful, may well have outlived its usefulness.)

According to the account I shall use to organize the following discussion, then, there are a number of complex positions that are marked by internal debate and sometimes dissension, but which nevertheless can be arranged in a rough and sometimes overlapping way along a continuum from left to right (using these terms in their spatial and not necessarily their political senses). At the left-hand end or pole I see clustered a number of positions that (I must stress) differ from each other sometimes quite markedly, but which have in common a skeptical (if not a highly negative) view of the validity, and hence the value, of social-science inquiry and empirical educational research especially insofar as this research takes inspiration from the model that supposedly can be found in the natural sciences. At the very least, the left-handed positions all reject the so-called “physics envy” of the social sciences. (To give a precise rendering of the underpinnings of these left-handed positions would in itself be a mammoth undertaking, which will not be attempted in the present essay. But see Phillips, 2000, 2005b (in press), and Phillips and Burbules, 2000.)

These positions thus stand in strong opposition to those located at the right-hand pole, which not only regard rigorous research as possible, but possible to perform in a scientific manner; the right-hand pole has been struck by “physics envy”, or perhaps by the related disease of “medical research envy”. However, it is important not to stereotype the researchers on the right as mindless positivists, for technically most of them are not positivists (see Phillips, 1987, especially ch.4, for misreadings of positivism), and some of their arguments have considerable weight; but it also should be noted that they sometimes are their own worst enemy and make occasional intemperate outbursts against modes of research that they regard as lacking in the form of rigor that they themselves prefer -- attacks made even more intemperate by the narrow account that they give of the nature of “scientific rigor”, a methodologically-oriented account centered on the use of the randomized controlled experiment or field trial or RFT (and a few related designs). The fact that the RFT has come to be referred-to as the “gold standard” has not served to win them many friends within the wide educational research community.

Situated in the middle of the continuum are a variety of moderate or temperate positions, including the postpositivism that I personally advocate (see Phillips and Burbules, 2000); here research is seen as a fallible enterprise that attempts to construct viable warrants or chains of argument that draw upon diverse bodies of evidence and that support any assertions that are being made. (John Dewey speaks of “warranted assertibility”; see Dewey, 1938/1966.) As so often happens, the positions in the middle of the range are generally scorned by those at either pole, and are sometimes attacked as being extreme by those on the left who regard any view that is somewhat to the right of their own on the continuum as being extreme, while those on the right sometimes see the center as being wishy-washy! (See the symposia in Educational Researcher 2002, Educational Theory 2005, Teachers College Record 2005, and Qualitative Inquiry 2004.)

My purpose in the present discussion is chiefly to expose the excesses of those at the right, and to put forward and illustrate a moderate proposal that builds upon John Dewey’s notion of a warranting argument, but to set the context I need to say something about the other portions of the continuum, and I need to take one contentious topic off the table. So let me start with the NRC report, which develops a position roughly in the center --- a position that I regard as bland rather than as wishy-washy (James Gee regards it as full of “vacuous generalities”).

Taking the NRC report off the table

Over the past few years much of the debate over the nature of educational research in the USA has been cast in the form of reactions to the report “Scientific Research in Education” by the National Research Council (NRC), the operating arm of the US National Academies of Sciences (NRC, 2002). In my view the adoption of this focus has been a strategic error on the part of the research community; certainly the report is a document issued by an influential body, and as such it is fitting that it receive scholarly attention (indeed, it was intended to provoke scholarly debate), but too much focus on the report distracts attention from what I take to be the real enemy, namely the policy that is imposing the “gold standard” on the research community. The NRC report did not recommend this policy, and in fact was written in order to oppose it, a fact appreciated by only a few of the critics.

There have been major, lively symposia on the report in the Educational Researcher (2002), Qualitative Inquiry (2004), Educational Theory (2005), and Teachers College Record (2005), and there is an ongoing stream of “stand alone” pieces (for example, Feuer, Towne and Shavelson, 2002; Eisenhart and Towne, 2003; and Eisenhart and DeHaan, 2005), and the NRC recently has produced a follow-up report (NRC, 2005). Some of the articles have been exceptional – the ones I have found most stimulating (although, as a member of the writing committee for the NRC, I did not find them to be altogether flattering) were by Fred Erickson (2005) and James Gee (2005), both in the TC Record symposium. But misconceptions abound; particularly wrongheaded, I feel, is Ken Howe’s remark that “the resilience of the belief in a fail-safe, sublimely pure scientific method that will free us from having to deal with uncertainty and flux is truly remarkable.” (Howe, 2005, p. 320) – for this probably is appropriate and on target if directed at the “gold standard” policy, but it is entirely wide of the mark when directed (as it is) at the report itself (which enunciates no such crude belief). And indeed this encapsulates my criticism of all of these symposia; like Howe’s paper, they appeared several years after the publication of the report, and were directed at it, whereas they should have appeared earlier and been directed at members of Congress and officials in the Federal Department of Education and at the legislation they crafted that enthrones an illiberal view of science that could well be fatal to sensible and fruitful educational research! The NRC report had its problems, and may have been banal and ineffective – but it was well intentioned (as Margaret Eisenhart, 2005, has pointed out) and it did not make a mistake about the identity of the common enemy!

This is not the appropriate place to give a detailed history of the political circumstances that led the NRC to establish its Committee to report, inter alia, on “What are the principles of scientific quality in education research?”. But a little historical context might help dispose of the main misconceptions. At the time the climate in Washington was (and of course still is) socially and intellectually conservative and marked by skepticism and concern about the poor quality of education research, especially when it was compared with other domains of research that were regarded as rigorous and that (therefore) helped to frame policy decisions -- medical and health-related research were often held up as exemplars that educational research failed to match. (I will not comment here on the exalted opinion that seemed to be current about the status of medical research!) The lack of quality in educational research is a long-standing concern, even within the research community itself; in recent years the historian Carl Kaestle wrote a paper that discusses the “awful reputation of educational research” (Kaestle, 1993); and the OFSTED report in the UK in 1998 created a furor when it asserted much educational research in that country was technically deficient (in that the conclusions reached were not supported by the evidence presented or the methods used) and too much was uncritically ideological in the mode of argument that was used (Tooley and Darby, 1998).

The crucial thing is this: In an effort to inject rigor into the research enterprise, the US Congress was on the verge of positioning itself at the far right of the continuum (one of the few times a political elected body has taken a stand on research methodology), by restricting research funds in education to scientifically rigorous research, as defined narrowly (and by itself) in terms of the use of randomized experimental or field trial designs (the so-called RFT, the “gold standard” methodology) – a step that it has since taken! (Various attempts to legislate about scientific rigor are detailed in Eisenhart and Towne, 2003.) So, at the time, some leading members of the research community (those on the then OERI advisory board, for example) were keen to have an independent group jump the Congressional gun, and offer a less narrow and more reasoned account (a more centrist one) of what it was to be “rigorously scientific”. Evidently it was judged that Congress had gone to far to be shaken in its commitment to “scientific rigor”, but the hope was that if the National Academies of Science were to step in (via the NRC) at least a reasonable and liberal definition of “science” might replace the “gold standard”. Members of the Committee established by the NRC to undertake this task were quite clear among themselves that they were not defining rigorous education research, simpliciter, for they recognized explicitly that there was much valuable educational inquiry that fell outside the domain of science; their task, by charter, was much more restricted: What are the desiderata for those types of research that purport to be “scientific”, and the findings of which are often taken to offer concrete guidance for policymakers? In addition, it is clear – and it is evident in the final report itself – that the majority of members of the Committee were committed to a view much more liberal than that being canvassed in the Congress, namely, that any reasonable list of scientific principles or desiderata would be such that academically-respectable research fields such as anthropology and ethnography would be included under the mantle of “science” (for of course they would be excluded if this mantle was characterized in terms of the use of the gold standard RFT). But it is important to stress that it was not part of the NRC Committee’s charge to question the wisdom of the effort to criticize the wisdom of the course upon which the Congress appeared to be setting out; nevertheless a number of the critiques of the Committee’s report in the symposia mentioned above ignored the task that the Committee specifically had been assigned, and criticized it (unfairly) for not doing what it was not instructed to do.

Having cleared the terrain of a potential minefield, it is time to focus undistracted upon the key issue – the narrowing of educational research to a narrow group of gold standard designs in the name of science -- and this takes us back to my rough map and its continuum.

The two poles of the continuum

The “left” pole

At the left-hand end or pole there is a cluster of quite different viewpoints that all reject the natural science approach as a defensible model for educational (and other forms of social) inquiry. Some philosophers of education in the UK, for example, stress that education is a normative field that is not apt at all for empirical investigation (the reasons why a normative field cannot also be studied empirically have not been made clear); others in this professional group assert that the findings of empirical educational research can only be trivial (see Phillips, 2005b, in press). Taking quite a different tack, postmodernists and poststructuralists reject the Enlightenment notion of rational inquiry that underlies modern science, and instead stress that within human societies forms of control and power are legitimated by complex discourses (such as the discourse of modern social science) that need to be treated with “incredulity” (Lyotard, 1984) and debunked or at least defused largely by case studies that are historical or genealogical in nature. Thus Patti Lather writes of the norms of science as “your father’s paradigm” that those in positions of power as funders and commissioners of research are attempting to impose on the scholarly world (her striking phrase clearly was not intended to be complimentary; see Lather, 2004); and a recent commentator writes (admiringly) of Michel Foucault that his work “seeks to uncover not the development of rationality, but the ways new forms of control and power are legitimated by complex discourses which stake a claim to rationality and which are embedded in diverse institutional sites” (Olssen, 2004, p. 58).

There are various overlapping positions that are a little nearer the center rather than being located completely at the extreme left-hand pole, that hold that social science inquiry is possible, but only if it is based on a different model to the one found in the natural sciences. One of these is developed by Ken Howe; in the interesting symposium in a recent number of that great journal Educational Theory he rejects the experimentism that he regards as characteristic of views I have placed on the right, and he endorses experimentalism which he characterizes as a “clinical-democratic” orientation that should “absorb” the narrower “experimentism”. (Howe, 2005, pp. 316-7) Another view nearer the center, but not quite there, is that held by the Danish social scientist Bent Flyvbjerg; heavily indebted to Foucault, he also stresses that the concept of “power” should be at the center of social science, but crucially he does not see this as spelling the end of empirical social science but rather as pointing the way to a new future, to a phronetic social science that is focused on particular contexts and does not am to discover cross-contextual generalizations in the mode of the natural sciences. (Flyvbjerg, 2001) In the conclusion to his book he writes that if we wish to “re-enchant and empower social science” we must do three things:

First, we must drop the fruitless efforts to emulate natural science’s success in producing cumulative and predictive theory; this simply does not work in social science. Second, we must take up problems that matter to the local, national, and global communities in which we live, and we must do it in ways that matter; we must focus on issues of values and power like great social scientists have advocated from Aristotle and Machiavelli to Max Weber and Pierre Bourdieu. Finally, we must effectively communicate the results of our research to our fellow citizens. (Flyvbjerg, p. 166)

Flyvbjerg’s case against the natural science model rests largely upon the contextual nature of human action (Ken Howe makes much of this point as well), but one can concede this without giving up on the ideal or without retreating to a very limited view of the focus of social inquiry. Human action certainly is shaped by myriad contextual factors, and these make the finding of broad generalizations extremely difficult but not impossible (for one thing, it depends upon what level of analysis one is looking at), but furthermore there is more than human voluntary action to be studied in social science. Both Howe and Flyvbjerg commit the error of assuming that the undoubted contextual nature of human voluntary action – that seems to rule out the discovery of non-vacuous “laws” – is incompatible with the discovery of law-like generalizations that apply to groups or populations or aggregates, and which hold true for appreciable periods of time and which are extremely useful for public policy purposes (economics is an obvious but perhaps dubious example; for others see Phillips, 2000). To say this is not to disagree that power relations should be a part of the focus in social and educational inquiry – but the part should not be mistaken for the whole. (For other discussions of the context-dependent nature of human action, see Cronbach, 1975, who develops the notion of webs of interactive effects that change over time – hence “generalizations decay”, and Labaree, 1998, who makes the point that humans can often act so as to contradict any generalization that is made about them – hence researchers must live with a “lesser form of knowledge”. Neither of these authors seem to have serious doubts that educational research is both possible and useful; instead they hold that it must have modest aspirations.)

Another position to the left, but also more moderate than those at the extremity, is that developed by the philosopher of social science Brian Fay, who writes:

Throughout much of its history the basic question in the philosophy of social science has  been: is social science scientific, or can it be? Social scientists have historically sought to claim the mantle of science and have modeled their studies on the natural sciences ….
However, although this approach yielded important insights into the study of human     beings, it no longer grips philosophers or practitioners of social science. Some new approach more in touch with current intellectual and cultural concerns is required. (Fay, 1996, p. 1)

The “current concerns” he is referring to are those espoused on the left, not those held by those either in the center or on the right!

But the issue Fay highlights is certainly a crucial one that arises when considering the positions that are clustered at the hard left of this pole of my rough continuum: What is the nature of science, especially in social and educational contexts? Fay sets up a straw man in answer to this question, saying there is only the narrow natural science view, which must be rejected as inappropriate for our purposes. And in a sense he is correct -- it is apparent that if a very narrow view is accepted, one that is based (say) on the illiberal reading of the methods of physics as given by those on the right of my continuum, then it is reasonable to argue (as he does) that the social sciences and educational research cannot be like that. Insofar as this view of science is the enemy, then those who are left of center on the continuum are justified in being appalled. The really crucial point, however, is that those who dismiss scientific educational research as being misguided, as being based on “our father’s paradigm”, together with those on the right who accept a narrow “scientistic” account of science, both have failed to consider carefully enough that there may be other and more fruitful ways to characterize this scientific paradigm. A more valid, postpositivist account of science is possible, one that is situated in the middle of the continuum (see Phillips and Burbules, 2000, and Phillips, 2000, for further discussion); later I shall develop a heretofore neglected, Deweyan aspect of this more moderate way of viewing science.

What has caused the heat in current debates about educational research (what upsets Howe, Lather, Fay and Flyvbjerg, among many others) is the fact that there are many at the opposite (right-hand) pole of the continuum who are pushing a narrow and illiberal view of the nature of science which extols the use of randomized controlled experiments or field trials (RFTs), and whose views are being endorsed by those who are in positions of power and who control the coffers of governmental agency research funds. The remainder of my discussion shall be directed at this other pole, where a narrow reading of the nature of science and its methods are being imposed upon the educational research community.

The right-hand “pole” and the establishment of the “gold standard” for science

Earlier I suggested that the main aim of the NRC report was to liberalize the account of scientific research in education used by Congress and the research branch of the Federal Department of Education. As is well-known, it largely failed to achieve this goal, although the report did lead to some softening of the federal rhetoric -- some phrases now are occasionally used in official discourse indicating that many approaches to research are worthwhile, but the sad fact is that almost all of the Federal discretionary funding for research in the USA still goes to support work that uses the so-called “gold standard” methodology of randomized controlled experimentation (or, less favored, quasi–experimental work or research that uses regression discontinuity designs). Whether an intervention or treatment was causally responsible for producing a desired effect is taken to be the hallmark of useful (and hence fundable) educational research.

Shortly before the NRC report was published, the conservative Brookings Institution, in Washington, D.C., published a volume titled Evidence Matters (Mostelller and Boruch, 2002) the majority of the chapters of which canvassed strongly for the use of the randomized controlled experiment or field trial. (Boruch was also a member of the NRC committee, where unsurprisingly he was a strong advocate for the use of RFTs.) In one typical chapter of this book it was argued that:

Randomized field trials are a sturdy device for generating defensible evidence about relative effectiveness. Nonrandomized trials can do so at times. But the conditions under which they produce the same results as randomized trials depend heavily on assumptions that may or not be plausible and empirically testable. It is then crucial to keep account of whether nonrandomized trials approximate the results of RFTs and to learn why they do or do not. (Boruch, De Moya, and Snyder, 2002).

Here it is clear that the RFT was being used as a “gold standard”; all other designs are to be compared with this standard, although the passage does not consider all other designs and only talks of non-randomized field trials. The volume’s editors, Boruch and Mosteller, do have this to say about other designs in their introductory chapter: “Other kinds of research are important in building up to controlled studies of program effectiveness” (Boruch and Mosteller, 2002, p. 2, emphasis added), which makes clear that other work is only subsidiary to the RFT. A few lines later they make a truly remarkable statement -- “Even throat-clearing essays at times contribute to understanding” (Boruch and Mosteller, 2002, p.2) – a statement that is remarkably denigratory of the work of theorists, philosophers, and social critics, and many others.

It is the general position promulgated in many of the chapters of Evidence Matters that won out in Congress, and it is this volume that ought to have been the target of criticism and debate in scholarly symposia rather than the (in comparison, blameless) NRC report. The call to “keep account” of how well research approximates the gold standard also seems to have fallen on receptive ears in Washington, for Boruch is now the Principal Investigator of the multi-million dollar Federally-funded project to build a website where educational research studies are evaluated and rated with respect to how trustworthy their conclusions are about program effectiveness – that is, how closely they approximate an ideal RFT (see the What-Works-Clearinghouse at www.W-W-C.org).

There are a number of vexing issues that arise in this right-handed position.

Vexing issues when looking right

(1) It is not unreasonable to be concerned about scholarly rigor, especially if one is a funder of research and has to allocate scarce dollars to worthy projects. But the tendency to associate rigorous or reliable scholarship exclusively with scientific research needs to be resisted strongly. There are many respectable disciplines with well-established canons of rigor (some long antedating empirical research in education), the practitioners of which ought to feel no need to disguise their work as a type of science, and whose work ought not to be dismissed as mere “throat clearing”.  Philosophers of education, political theorists, and historians are among those who are capable of producing well-crafted works that present vital and sometimes mind-expanding insights about education that are well supported by arguments and warranting considerations that can withstand critical scrutiny, and that we ignore at our peril. (See Phillips, 2005, for some examples.) The fact that after several millennia Plato’s “throat clearing” work still offers stimulus to us about the social and intellectual purposes of education, and the hierarchical organization of subjects of instruction, should give us pause (and may well lead many to wonder what work from our own day will still be acting as a stimulus in the year 4005). Furthermore, even within well-established scientific fields the establishing of “rigor” is not straightforward, for these fields often harbor theoretical disagreements, and these can bubble over into methodological disagreements (as Gee, 2005, points out).

(2) The belief that more rigorous educational research (whether or not carried out via the RFT) is the key to the formulation of successful educational policies and the alleviation of the problems besetting schooling, is at best charmingly naïve; here some of Ken Howe’s diatribe is quite correct (Howe, 2005). It is to accept the so-called rational model of decision making as the norm in educational contexts, where arguably it is by far the exception. (For a fine discussion of rational choice theory, see Elster, 1989.) Political decision makers (such as Federal Secretaries of Education, or School Board Chairpersons) are not irrational, for they pursue their goals quite effectively, and certainly they make use of evidence -- but often they use it selectively and generally only when it supports a policy or practice that has appeal for them on other (political/ideological) grounds. Evidence that challenges strongly held and ideologically-supported views is routinely ignored or is subject to “spin doctoring”. Patti Lather reports a nice example; she was aghast when she attended a meeting addressed by US government officials who spoke “about the need for policy research that supported the present administration’s initiatives” (Lather, 2004, p.17). Further examples are probably unnecessary, but I offer the medical virtues of marijuana, the efficacy of the death penalty as a deterrent to crime, the effects of abortion on the mental health of the prospective mother – information on this latter issue that ran counter to contemporary dominant ideology was removed from US Federal websites a couple of years ago.

It is not entirely reprehensible that decision makers act in this way, especially in the field of education which (at least in the form of schooling) is a process that takes place within society and uses massive social resources, and which therefore ought to be shaped by the values and suppositions that dominate in a society (or, in a democracy, that are held by those who were duly elected to serve as decision makers). The view that evidence which was produced rigorously will be utterly convincing (and determinative) because it is “believable” needs to be abandoned – often believability is a function of coherence with our pre-existing value and political commitments, not a function of having been produced by a RFT. (Sometimes if evidence is “believable” but runs counter to the economic or other interests of an interested person or organization, efforts are made to undermine or debunk it. See, for examples, Michaels, 2005.)

But there is a further complexifying factor: Different researchers often produce contradictory findings, because they use different data sets to throw light on a problem, or because they decided to use different analytic techniques. This can be taken to show that values and judgment are as much a part of the research process itself as they are with respect to the use of research findings for policy purposes – a fact stressed by many at or near the left-hand pole of the continuum, who also take this kind of phenomenon as undermining the claim of social science and educational research to be value-neutral and objective. (Lee Shulman recently discussed the different conclusions reached by three different groups of researchers, all highly competent, with respect to the impact of high stakes testing on students in the USA; see Shulman, 2005.)

The most that reasonably can be expected is that educational decisions ought to be constrained by the evidence, and that evidence indicating differential harms and benefits ought to be given due weight. (Michael Scriven put the matter in his characteristically blunt fashion: “Who wants their children taught to read using a method that is only half as effective and no more fun than another program of equal cost?” Scriven, 1991, p. 25.) This view is strengthened by the philosophical insight that facts or evidence underdetermine generalizations; the point is that there are many possible policies that can be formulated that are compatible with a given body of evidence, indicating that – at best – empirical evidence may be relevant to, but not determinative of, a policy. If we wish to influence the quality of the policies and programs that are promulgated in education, we probably need to pay as much (or more) attention to values and ideology as we do to experimental design – throat clearing is more efficacious than Boruch and Mosteller imagined.

(3) The strong supporters of the gold standard have as an important plank of their position that the RFT is by far and away the most effective, most reliable, way to establish causation --which itself supposes that establishing the causal efficacy of programs or interventions is the main, or most important, purpose of educational research, a matter that will be discussed below. (They also argue, not unreasonably, that the RFT allows the making of good estimates of effect size, a matter that will not be pursued here.) Certainly the RFT, based as it is on J.S. Mill’s principles of logic, is an excellent way to establish that X causes Y, and it can be used with profit in many educational research and program evaluation studies. But it is not the only way to establish a causal relation, and it is not the necessary way. And it is important to remember that establishing X causes Y is not the same thing as establishing why it does so (that is, establishing the physical or social mechanism), and it is this latter issue that is often of vital interest in science (and in the public policy arena); it also is salutary to remember that the RFT is of little or no value in answering this deeper question about causal mechanisms. The philosopher and social scientist Jon Elster phrases this reminder well:
To cite the cause is not enough: the causal mechanism must also be provided, or at least suggested. In everyday language, in most historical writing and in many social scientific analyses, the mechanism is not explicitly cited. Instead, it is suggested by the way the cause is described.
But crucially he adds: “Any given event can be described in many ways” (Elster, 1989, p. 4), which means that the correct description -- given the situation and the researcher’s and policy-maker’s interests -- needs to be the object of further investigation.

Thus, eyewitness testimony about a crime can establish causation, at least to the satisfaction of the law; no RFT is required, and furthermore the eyewitness sometimes sheds light on why the crime was committed. The detailed report of an anthropologist (to take a nineteenth-century example) can establish that if an Australian aboriginal tribal member has a sharpened human bone (one that is marked with magical signs) pointed at him, he will walk into the bush and slowly die (that is, the anthropologist can establish the causal efficacy of bone-pointing, and can also establish that it does not work on individuals of European background). William Harvey established that the pumping of the heart causes the circulation of the blood, an insight he achieved not by conducting randomized field trials but by developing a model of the circulatory system and systematically testing it using hypothetico-deductive reasoning and by doing small scale studies to rule out rival hypotheses.

The bone-pointing example is important, too, for revealing that causal factors cannot always (or even often) be conceptualized as “interventions” in the mode of a program; bone pointing leads to death because of the cultural beliefs that were held in aboriginal tribes in Australia and which provided many of the values and metaphysical suppositions that gave meaning to the lives of individuals within the tribe. Doing a controlled experiment, with the members of one group assigned to have a bone pointed at them, and the control group members assigned to be free of bone pointing, might (if well-done) establish the efficacy of the bone pointing but would miss the real point – which is, how on earth did bone pointing produce its dramatic effect? What mechanism, physical or social or both, provides the causal link between bone pointing and death? Documenting the causal efficacy of cultural beliefs is not usually part of the random assigner’s credo! (For deeper discussion of such causes, see Phillips and Burbules, 2002, ch.4.)

(4) Finally, the interrelated views that scientific research is epitomized by the establishing of causation (“program P causes effects E”), and that the establishment of causation is epitomized by randomized experiments or field trials, are based on a serious misreading of the history of the sciences. The natural sciences are taken as the epitome by those at the right-hand pole, solooking back at their history with open eyes is liberating; one cannot help but be struck by the huge range of activities engaged in by researchers: establishing what causal factors are operating in a given situation; distinguishing genuine from spurious effects; determining function; determining structure; careful description and delineation of phenomena; accurate measurement; development and testing of theories, hypotheses, and causal models; testing of received wisdom; elucidating unexpected phenomena; production of practically important techniques and artifacts. (Fuller discussion and examples of each can be found in Phillips, 2005a.) Rene de Reaumur constructing wire gauze containers for food to be placed inside the gut of a hawk (to determine if the food can be digested when protected from mechanical interference by movements of the stomach and intestines), William Harvey blocking a vein in his arm with pressure from a finger, Darwin observing turtles in the Galapagos and breeding pigeons on his farm, Hawking doing calculations, Kinsey and his co-workers administering questionnaires, von Frisch constructing a glass-sided bee-hive, Galileo rolling a ball down an inclined plane, John Snow locating on the one map the locations of water-wells and also the cases of cholera across London, Crick and Watson tinkering with a crude metal molecular model in the attempt to unravel the structure of DNA – all these are as much a part of science as a modern educational psychologist consulting a table of random numbers to select members of the control and treatment groups for a randomized controlled experiment or field trial.

This suggests that attempts to delineate “the central method of science” – the attempt to give a simple “gold standard” account of the “nature of science” – must always be quite arbitrary; perhaps it was recognition of all this that led Percy Bridgman, a Nobel Laureate in Physics, to remark that “the scientist has no other method than doing his damnedest” (cited in Kaplan, 1964, p.27).

A positive, centrist (and Deweyan) suggestion: Replacing gold by the platinum standard

The over-emphasis on using a narrow methodological criterion to delineate
”scientific rigor” detracts from the main question at hand when one is assessing the validity or rigor of an inquiry, which is this: Has the overall case made by the investigator been established to a degree that warrants the tentative acceptance of the theoretical or empirical claims that are being put forward? For making a case for tentative belief is, in essence, the point of scientific inquiry, and the spelling-out clearly and explicitly of the line of reasoning used was one of the characteristics of science pointed to in the NRC report – although I now believe we should have stressed it far more than we did. (To allude to examples previously mentioned, William Harvey was making a case about the circulation of the blood, Darwin was making a case about natural selection, and the anthropologist in Australia was making a case about the bone-pointing phenomenon.) The assessment of such cases is what I propose to call the platinum standard.
 
The methodology used in a particular study undoubtedly is an important consideration in case-building, but it is not an “authoritative umpire” (to use Arthur Kaplan’s expression, 1964, p.25) that should rule in or out of play the various diverse considerations that the scientist puts forward in developing his or her case. A weakness here might be compensated for by a strong argument or relevant piece of evidence there, but a methodological purist might exert a negative or at least an unnecessarily restraining influence because some of the relevant considerations might not be mentioned in his or her rulebook. To repeat: What needs to be judged is the overall case that is made – the cohesion and convincingness and rigor of the often-complex argument that the particular scientist is making and is supporting with a diverse body of (hopefully relevant) evidence, which includes of course whether what the case asserts appears to hold true of the natural or social phenomena being investigated. To use John Dewey’s felicitous expression (Dewey, 1938/1966), the key point is whether or not a warrant has been established that justifies the assertion of the claim that is under consideration.

The attitude towards research being advocated here is that it should be recognized as being an exercise in argumentation, which the philosopher Alvin Goldman defines as “a complex speech act in which a speaker presents a thesis to a listener or audience, and defends this thesis with reasons or premises” (Goldman, 1994, p. 27; he goes on to spell out several desiderata of good argumentation). Another slightly different way to formulate this position is that research is a rhetorical activity (in the classic and not the modern sense in which the term almost becomes synonymous with “non-rational persuasion” or “spinning”). As Nelson, Megill and McCloskey put it in the opening chapter of The Rhetoric of the Human Sciences,

“Rhetoric” covers at once what is communicated, how it is communicated, what happens when it is communicated, how to communicate it better, and what communication is in general. Rhetoric of inquiry enlarges these meanings to encompass the interdependence of inquiry and communication, and to encourage connecting all the skeins of rhetoric into a commitment for better inquiry to inform action. (Nelson, Megill, and McCloskey, 1987, p. 16) 

Although this rhetorical aspect of research is most apparent in the published report, the notion is applicable to day-to-day activity in the lab or in the field or in the library where a case actually is being constructed (for later publication) that will be convincing to other (often skeptical) members of the relevant research community or to the relevant group of “consumers”. As Wayne Booth and his colleagues put it in their The Craft of Research, “In a research report you make a claim, back it with reasons based on evidence, acknowledge and respond to other views, and sometimes explain your principles of reasoning. There’s nothing arcane in any of this….” (Booth et al., 2003, p.114). There is no disguising the fact that the assessing of a complex piece of rhetoric that purports to warrant a claim can be an extremely difficult task, and it is one that many critics of the quality of education research either abrogate (or are incapable of tackling or do not feel inclined to tackle); but assessing the case simply by consulting a text (or website) on the proper conduct of RFTs simply will not suffice.

Yet a third way of putting all this is that scientific research can be regarded as parallel to the work of a trial lawyer – what is crucial is the way the case is built up, how evidence or arguments are marshaled to fill in the “holes”, how the final argument hangs together including whether it can stand up to the critical scrutiny of peers (trial lawyers working for the other side) and the independent jurors who need to be convinced “beyond all reasonable doubt”. As Stephen Toulmin put it in a classic discussion (1958, reissued in 2003), argumentation is “generalized jurisprudence”:

A sound argument, a well-grounded or firmly-backed claim, is one which will stand up to criticism, one for which a case can be presented coming up to the standard required if it is to deserve a favorable verdict. How many legal terms find a natural extension here! One may even be tempted to say that our extra-legal claims have to be judged … before the Court of Reason. (Toulmin, 2003, p. 8)

This parallelism between science and jurisprudence is quite obvious in the evidentiary trail-blazing that led to the verdict that smoking was the guilty party in the cause of most cases of lung cancer (a verdict now even accepted by cigarette manufacturers). And of course Darwin’s groundbreaking work On the Origin of Species (1859) had a masterly rhetorical structure, and it presented a wide variety of evidence and many different arguments to help make the case for evolution. A fine contemporary example is provided by the work of the young Chicago economist Steven D. Levitt, who in scholarly papers and more popular essays and book chapters has developed convincing cases on such important social issues as the cause of the decline in serious crimes in the United Sates during the 1990s – cases that do not depend upon evidence obtained from RFTs! Levitt’s contention has been that the decline in serious crime is attributable more to the legalization of abortion than to such factors as the introduction of innovative police practices or the increase in use of the death penalty. Using comparative statistics and other evidence, and with careful argumentation, he undermined the credibility of the many traditional, rival explanations; then he demonstrated that States in the USA, and countries overseas, experienced a decline in serious crime when the first cohort of children born after the liberalization of abortion laws reached their late teen years – and furthermore he found the opposite was true in those places where abortion restrictions had been reintroduced. (The most accessible account of this work is in Levitt and Dubner, 2005, ch. 4.)

Conclusion

Acceptance of what I have called the platinum standard implies that the gold-standard, based as it is on the origin of evidence via the RFT, obscures the fundamental point -- namely, that what is key is how the evidence is used in the course of argumentation. For “good evidence” can be vitiated by being incorporated into a poor or incomplete argument or case; the thrust of a piece of evidence can be countered by other, differing evidence; the significance of what seems to be a strong piece of evidence can be changed by an appeal to context, or by showing how value judgments skewed the analytic process that produced the evidence, or by construction of a brilliant novel argument that was unforeseen by the purveyors of the evidence, or by pointing to new phenomena that have a bearing on the status of the purported evidence but which were not known at the time this was discovered. (Achinstein, 2001, gives a detailed discussion of factors of this sort, and shows that the notion of evidence can be relativized to the epistemic situation of the scientist who accepts it, without destroying the objectivity that is so important in the concept of evidence.)

There is nothing in the foregoing discussion to suggest that evidence should be collected in a slipshod way, for how the evidence came to light can (and almost certainly will) become an issue when the case in which it is used faces public scrutiny – just as the evidence presented by the prosecution in a criminal trial can become the object of intense scrutiny. The point is, however, to put forward evidence that was not collected by use of the RFT is not necessarily to be slipshod! The wise researcher, like the wise prosecutor, will use quality evidence of different kinds, and will weld it all into a coherent case the parts of which strengthen and support each other.

Finally, by turning to platinum we stand a fighting chance to immunize ourselves against what the philosopher of social science Arthur Kaplan once identified as “methodolatry”, a “pervasive trait of American culture” which also may fast be coming a trait of the international educational research community -- namely, an “overemphasis on what methodology can achieve” (Kaplan, 1964, p.24). In a sense, by using platinum we replace the narrow practice of methodolatry with the broader practice of intelligent argumentation. C.D. Hardie would have been pleased, for the clarification and assessment of arguments and positions was the ideal he promulgated throughout his writings; as he put it in the introduction to his groundbreaking book, “if two educational theorists disagree I think it should be made clear whether the disagreement is factual or verbal or due to some emotional conflict. If this is to be done it is necessary to state each theory in the clearest possible way so that no ambiguity may be allowed to flourish undiscovered…. It is then possible to see to what extent disagreements may legitimately be allowed” (Hardie, 1962, pp. xix-xx).

 *This paper is based on my keynote address to the EARLI Conference, Cyprus, August 2005, but there have been substantial revisions including some substantial additions.

References

Achinstein, Peter. (2001). The Book of Evidence. (Oxford: Oxford University Press).

Booth, Wayne, et al. (2003) The Craft of Research. (Chicago: University of Chicago Press, 2nd. Edn.)

Boruch, Robert, de Moya, Dorothy, and Snyder, Brooke. (2002). The Importance of Randomized Field Trials in Education and Related Areas. In Mosteller, Frederick and Boruch, Robert. (eds.) Evidence Matters (Washington, D.C.: Brookings Institution Press).

Carr, David. (2003). Making Sense of Education (London and New York: Routledge-Falmer).

Cronbach, Lee J. (1975). Beyond the Two Disciplines of Scientific Psychology, American Psychologist, 30 (2), pp. 116-127.

Dewey, John. (1938/1966). Logic: The Theory of Inquiry (New York: Holt, Rinehart and Winston).

Educational Theory. (2005), 55 (3).

Educational Researcher. (2002), 31 (8).

Eisenhart, Margaret, and Towne, Lisa. (2003). Contestation and Change in National Policy on ‘Scientifically based’ education research, Educational Researcher, 32(7), pp. 31-38.

Eisenhart, Margaret, and DeHaan, Robert. (2005). Doctoral Preparation of Scientifically Based Education Researchers. Educational Researcher, 34 (4), pp. 3-13.

Elster, Jon. (1989). Nuts and Bolts for the Social Sciences. (Cambridge: Cambridge University Press).

Erickson, Fred. (2005). Arts, Humanities, and Sciences in Educational Research and Social Engineering in Federal Education Policy, Teachers College Record, 107(1), pp.4-9.

Fay, Brian. (1996). Contemporary Philosophy of Social Science. (Oxford: Blackwell).

Feuer, Michael, Towne, Lisa, and Shavelson, Richard. (2002). Scientific Culture and Educational Research. Educational Researcher, 31 (8), pp. 4-14.

Flyvbjerg, Bent. ( 2001). Making Social Science Matter (Cambridge: Cambridge University Press).

Gee, James. (2005). It’s Theories All the Way Down: A Response to Scientific Research in Education, Teachers College Record, 107 (1), pp. 10-18.

Goldman, Alvin. (1994). Argumentation and Social Epistemology, The Journal of Philosophy, 91 (1), pp.27-49.

Hargreaves, David. (1997) In Defence of Research for Evidence-based Teaching, British Educational Research Journal, 23(4), pp. 405-419.

Hardie, C. D. (1962) Truth & Fallacy in Educational Theory. (New York: Bureau of Publications, Teachers College, Columbia University)

Hardie, C.D. (1971). The Philosophy of Educational Research,  Educational Philosophy and Theory, 3 (1), pp.1-10.

Howe, Kenneth. (2005). The Question of Education Science: Experimentism versus Experimentalism, Educational Theory, 55 (3), pp.307-321.

Kaestle, Carl. (1993) The Awful Reputation of Educational Research, Educational Researcher, 22(1), pp. 23-31.

Kaplan, Arthur. (1964) The Conduct of Inquiry (Scranton, PA: Chandler).

Kuhn, Thomas S. (1970). The Structure of Scientific Revolutions. (Chicago: University of Chicago Press).

Labaree, David. (1998). Educational Researchers: Living with a Lesser Form of Knowledge, Educational Researcher, 27 (8), pp. 4-12.

Lather, Patti. (2004) This IS Your Father’s Paradigm: Government Intrusion and the Case of Qualitative Research in Education, Qualitative Inquiry, 10(1), pp. 15-34.

Levitt, Steven, and Dubner, Stephen. (2005). Freakonomics (New York: Morrow/Harper Collins).

Lyotard, Jean-Francois. (1984). The Postmodern Condition: A Report on Knowledge. (Manchester: Manchester University Press).

Michaels, David. (2005) Doubt is their Product, in Scientific American 292 (6), pp.96-101.

Mosteller, Frederick, and Boruch, Robert. (eds) (2002) Evidence Matters (Washington, DC: Brookings Institution Press).

National Research Council. (2002) Scientific Research in Education (Washington, DC: National Academies Press).

National Research Council. (2005) Advancing Scientific Research in Education (Washington, DC: National Academies Press).

Nelson, J., Megill, A., and McCloskey, D. (eds.) The Rhetoric of the Human Sciences (Madison, WI: University of Wisconsin Press).

Olssen, Mark. (2004). The Schoool as the Microscope of Conduction: Doing Foucauldian Research in Education. In James Marshall (ed.) Poststructuralism, Philosophy, Pedagogy (Dordrecht, The Netherlands: Kluwer).

Phillips, D.C. (1987) Philosophy, Science, and Social Inquiry (Oxford: Pergamon Press).

Phillips, D.C. (2000) The Expanded Social Scientist’s Bestiary (Lanham, MD: Rowman and Littlefield. Original edition, Oxford: Pergamon, 1992.)

Phillips, D.C. (2005a) Muddying the Waters: The Many Purposes of Educational Inquiry. In C. Conrad and R. Serlin (eds.) SAGE Handbook for Research in Education: Engaging Ideas and Enriching Inquiry (Thousand Oaks, CA: SAGE).

Phillips, D.C. (2005b). The Contested Nature of Empirical Educational Research (And Why Philosophy of Education Offers Little Help). Journal of Philosophy of Education, (in press).

Phillips, D.C., and Burbules, Nicholas. (2000) Postpositivism and Educational Research (Lanham, MD: Rowman and Littlefield).

Qualitative Inquiry. (2004). 10 (1).

Scriven, Michael. (1991) Beyond Formative and Summative Evaluation, in M. McLaughlin and D.C. Phillips (eds.) Evaluation and Education: At Quarter Century. Ninetieth Yearbook of the NSSE. (Chicago: University of Chicago Press/NSSE).

Shulman, Lee. (2005). Seek Simplicity … and Distrust It. Education Week, 24 (39), June 8, pp. 36, 48.

St. Pierre, Elizabeth. (2002). “Science” Rejects Postmodernism. Educational Researcher, 31 (8), pp. 25-27.

Teachers College Record (2005), 107 (1).

Tooley, James, and Darby, D. (1998). Educational Research: an OFSTED Critique. (London: OFSTED).

Toulmin, Stephen. (2003) The Uses of Argument. (Cambridge: Cambridge University Press).

 

back to top

Department of Educational Policy Studies
College of Education
H