The Final Synthesis1

Michael Scriven
 
 

Introduction


 


This is a discussion of the methodology of the last-synthesis step in an evaluation--of whether and how to make it. In many evaluations, this synthesis is poorly connected to the data gathering and analysis. Sometimes we have to rely on judgment, clinical inference, intuition, professional judgment, connoisseurship, or impressionism, but at considerable peril. A simple empirically based rule beats expert judgement in almost all cases.
 


Rule Governed Synthesis


 


Just because we use a judgment-free algorithm, we do not guarantee valid conclusions, and of course mere guidelines or rubrics do not either. This can easily be demonstrated in the case of drawing conclusions about competence using scoring-rubrics for standardized tests. Another example is rating personal competence using a checklist.

Even though it is not a sure bet, we still should use rules if we can, or even heuristics and rubrics, and we should train people to use them.

Program evaluation puts the greatest demands on the synthesis process. There especially, we should use rules, rubrics and calibration of judges. These must be carefully developed, not just drawn from personal preference.
 


The Basic Logic of Syntheses


 


The logic of synthesis calls for just two tasks (claims): (1) testing performance and (2) comparing to standards. These are combined (synthesized) to form the final evaluative conclusion. For the expert, standards are so well understood that the task appears to be simply a matter of measuring performance.

Sub-evaluations also occur, each going through the two tasks. Each of many components or aspects of the evaluand is examined separately by the qualified evaluator, often very quickly, and an expert judgment is made. The expert is cautious, wary of errors of fact, standards, judgment or inference. (This is not the final synthesis of interest in this article.) These "on the fly" sub-evaluations sometimes are challenged, especially when the standards seem questionable.

For the final synthesis, frequently no standards exist. They will have to be multi-dimensional standards.

Standards claims are the epistemological key to evaluation. Seldom are they just out there. This critical knowledge usually needs to be developed.

Developing standards claims is more difficult for grading than for ranking, often because we are not acquainted with cases (real or hypothetical evaluands) up and down the graduated scale of quality. Making a standards claim for an evaluand is making an inference of scale value, its quality.

The standards claim can be made using "probative inference," making a prima facie choice of criteria, within conditions. Such a claim takes the form, "For these circumstances, this criterion is important." or "If that's the way they will use it, this concept is critical." We create such a standards claims by thoroughly understanding the criterion, the circumstances, and conditions of use.

With only a few properly chosen standards claims and performance data, we can rank simple cases and even reach some conclusions of "absolute" merit. With nothing heavy going the other way, we can draw conclusions about their overall merit. Of course, with complex policy or program evaluation, simple inferences won't do.

The standards claim presented above is quasi-definitional. Many people presume that a definitional premise cannot add evidential support to the conclusions, since definitions are mere language rules, arbitrary in a way that facts are not. But values properly introduced into such a definition can, in fact, justify or validate an evaluative conclusion.

An important category of statements, partly definitional, partly factual, supports evaluative assertions when they in turn are supported by facts about usages and the external world. For example, "Watches are time-keeping devices, good watches keep good time." Some evaluative premises come from functional analysis, and functional analysis can be based solely on logic and evidence.

Those who argue that you can't get evaluative conclusions from factual premises are wrong. But the general nature of evalutive inferences is not the main issue. Final synthesis is an example of evaluative inference and when all the sub-conclusions are factual, it follows the model described above. The reasoning goes:

"If the performance of X on criteria A,B,C, etc. is high (these being all the leading criteria), and there is no evidence of Q, R or S (these being all of the serious threats to the inference, that is, most of the possibilities excluded by the ceteris paribus consideration), then one may conclude that the evaluand is a prima facie good [better, best, competent, etc.] wristwatch [judge, clinic, school, analgesic, etc.]." Such a probative inference should be believed until disproved.

To apply this to program evaluation, we need to work on getting a comprehensive list of possible (not actual) merits for the kind of program we're looking at; this typically means using both conceptual analysis and empirical needs assessment. Then we look at performance on each of the criteria.

Final synthesis may come from probative inference; it sometimes may come through deductive reasoning; it may come about because the standards claim is fully factual. But there are other cases too where a more complex inference still is required, where we have to show that the particular configuration of scores by E1 makes it better than E2 with its own configuration of scores. To do that we have to get at the weighing of the criteria by relative importance.
 
 



Notes:
1.  This is an overview of the Final Synthesis by Michael Scriven published on Evaluation Practice, Vol. 15, No. 3, pp. 367-282.