Products Liability

'Gatekeeping' of Experts and Unreliable Literature
By Michael Hoenig - New York Law Journal - September 12, 2005
  Print article

Are most published research findings false? Startlingly, an epidemiologist says "yes" in an article issued Aug. 30. If he is correct, the implications for courts policing expert testimony are staggering. More about this arresting new development later in this column.

Judicial "gatekeeping" of expert testimony, i.e., the judge's task to assure that expert testimony is not only relevant but reliable, continues to surge vigorously. Even experts whose qualifications a judge may find acceptable cannot rest on those laurels. Their testimony, exhibits and data must possess solid indicia of trustworthiness. The quest for reliability probes the opinion, methodology, reasoning, foundation and other predicates for the expert's testimony. As one appellate court put it, "A supremely qualified expert cannot waltz into the courtroom and render opinions unless those opinions are relevant and reliable . . . "[1]

The two major lines of probing expert testimony for requisite reliability criteria are often referred to as Frye and Daubert, subjects which this column has often addressed. The Frye "general acceptance" test, with significant impact in New York and in other state courts, was comprehensively reviewed by New York Supreme Court Justice Raymond E. Cornelius of Rochester in DeMeyer v. Advantage Auto, et al.,[2] a decision issued on June 27. In federal and many state courts the Daubert reliability guidelines are applied. Whichever test is used, even experienced experts can be barred by shutting of the reliability "gate" when trustworthiness is lacking. A hot-off-the-press example is the U.S. Court of Appeals for the Seventh Circuit's ruling Aug. 30 that New York expert James Pugh's opinions on causation and defect were unreliable under the Daubert standard.[3]

The gatekeeping question becomes particularly challenging when experts rely for much of their opinion upon out-of-court writings by others, that is, upon hearsay articles or other literature. That experts may consult hearsay for use in litigation is not strange. After all, much of what we say or do is based on what we learn and much of the latter is based on what we read. Further, much of what we read is, in turn, based on what others have read and written. And much of that stems from what still others have written after their own readings. So, it becomes inevitable that, sooner or later, hearsay — often multiple layers of it — underlies what is presented by experts in the courtroom.

Hearsay Problematical

In and of itself, that may not be bad — if the hearsay used is itself reliable. But if it is "junk," then the expert's testimony is not much better than the junky predicate. And, between what is reliable and what is junky is a vast gray area we have previously referred to as "quasi-reliable" or "not-quite-reliable" or "not-quite-junk."[4] This enormous "gray area" literature even may be published by journals with professional-sounding names or by recognized institutions giving the hearsay an aura of trustworthiness. The poverty of the substantive content or other significant failings in reliability may be masked by the veneer.

Use of hearsay by experts is, of course, problematical for a variety of reasons other than reliability concerns. The out-of-court author is not present to testify or to be cross-examined. The nonauthor witness conveying the hearsay thus benefits from using the literature but then is shielded from critical questions about bias and limitations of the underlying data by the simple expedient of saying, "I don't know." Juries may give the literature, particularly if it sounds prestigious, undue weight. Further, experts cannot simply adopt out-of-court materials and then become a mere conduit or a funnel for the admission of what otherwise would be excluded.[5] As courts have observed, the hearsay ought not to be the "principal basis" for the expert's opinion but, rather, "merely a link in the chain of data on which the expert relied."[6] Sometimes, the hearsay also raises "best evidence" rule issues.

On top of these concerns there is the reinvigorated and potentially explosive question of the hearsay's reliability in and of itself.[7] Our columns of July and November 2002[8] reported that the vaunted Journal of the American Medical Association (JAMA) devoted an entire issue (June 5, 2002) to the question of whether biomedical literature truly meets assumed standards of quality and trustworthiness. JAMA's June 5, 2002 issue also probed the quality of the peer review process by which many such articles qualify for publication. The cumulative impact of JAMA's revelations was breathtaking.

In an eye-popping JAMA article called, "Poor Quality Medical Research: What Can Journals Do?", the author states: "There is considerable evidence that many published reports of randomized-controlled trials (RCTs) are poor or even wrong, despite their clear importance . . . . Poor methodology and reporting are widespread . . . . Similar problems afflict other study types." The author goes on to state: "Errors in published research articles indicate poor research that has survived the peer-review process. But the problems arise earlier, so a more-important question is, why are submitted articles poor?"

A separate article, "The Hidden Research Paper," established through post-publication surveys of authors that "important weaknesses were often admitted on direct questioning but were not included in the published article. Contributors frequently disagreed about the importance of their findings, implications and directions for future research." The article further observed that a "scientific research paper is an exercise in rhetoric, that is, the paper is designed to persuade or at least convey to the reader a particular point of view. When one probes beneath the surface of the published report, one will find a hidden research paper that reveals the true diversity of opinion among contributors about the meaning of their research findings."

'Appalling' Standards

Drummond Rennie, JAMA's deputy editor, wrote in his editorial in 2002 that in 1986 he had noted "appalling standards" of quality despite peer review and that, despite some improvement in the 16 years since, "an unbiased reader, roaming at random through a medical library, would find in abundance all the problems I described in 1986."

In the article quoted earlier, "Poor-Quality Medical Research; What Can Journals Do?", author Douglas G. Altman says that in 1994 he "observed that research papers commonly contain methodological errors, report results selectively, and draw unjustified conclusions." His 2002 JAMA piece revisited the same topic. He found research done without the benefit of adequate training in quantitative methods; inadequate review by research ethics committees; copying of incorrect or inappropriate methods; and absence of post-publication peer review that would identify misleading works after publication. Prestigious journals such as JAMA, New England Journal of Medicine and Lancet have rules limiting post-publication correspondence to only some four to eight weeks, in effect establishing a short statute of limitations immunizing authors from disclosure of methodological weaknesses. Mr. Altman concludes: "Many readers seem to assume that articles published in peer-reviewed journals are scientifically sound, despite much evidence to the contrary."

Other authors disclosed that numerous biomedical articles had "honorary authors" for writings really produced by "ghost authors," thereby distorting objectives of accountability, responsibility and credit. From the standpoint of use of seemingly prestigious studies in the courtroom, such authorship practices can mislead testifying experts and judges into believing that highly qualified or distinguished experts in their field actually wrote the reports when they did not, in effect masking that unqualified specialists wrote the articles.

In Richard Horton's article, "The Hidden Research Paper," he asks, "What happens when scientists disagree? Most times, readers of research papers never know." His qualitative study of multi-author research works showed that published papers rarely represented the full range of opinions of those scientists whose work they claim to report. He found evidence of "censored criticism; obscured views about the meaning of research findings; incomplete, confused and sometimes biased assessment of the implications of a study; and frequent failure to indicate directions for future research . . . . What was striking was the inconsistency in published evaluations, especially regarding weaknesses." What this means is that a published study, seeming to be a blissful consensus among researchers, may hide dissension among the authors about study deficiencies. Numerous other shortcomings were elaborated even with these front-line journals. What might be found in less prestigious publications?

'Litigation Science'?

Then there is the growing phenomenon of litigation-focused research with the goal of creating a body of scientific studies generated for or funded by litigation and conducted for expected use in litigation. Often such research will have been funded by lawyers or litigants or controlled in some manner by the lawyers or their testifying experts. This is not inherently or necessarily bad science. But it calls for scrutiny by courts pursuant to Daubert or Frye standards because there may be potential for bias or pressures to assure the "right" outcome, which can result in manipulated procedures, distorted data, selective reporting of results or even falsified outcomes. This growing phenomenon was described in a law review article entitled, "'Daubert's' Backwash: Litigation-Generated Science," published in summer, 2001.[9] The article provides examples; describes the normal scientific process of falsification to weed out bad science; surveys court approaches to the challenges posed by litigation science; and offers suggestions of a methodology for both scientists and the court system with which to examine litigation-generated studies.

Despite the foregoing revelations, some no doubt may be unalarmed by the trend towards what we have called "trial by literature" or what Ithaca, N.Y., Supreme Court Justice Walter J. Relihan Jr. has dubbed "trial by dossier." Yet, in order to achieve its task, the surge in vigorous gatekeeping inevitably must turn to the next most logical reliability frontier: the hearsay writings upon which experts rely. In New York state courts, for example, the out-of-court materials must be professionally reliable,[10] thereby directly injecting the question of reliability of the hearsay. In federal courts under Daubert, the quality of the peer review process, the validity of the scientific method and other applicable indicia of trustworthiness clearly authorize close scrutiny of the hearsay's reliability.

If the foregoing rumblings about acute shortcomings in biomedical or technical literature or litigation-generated science are not enough to whet the bench and bar's appetite for greater reliability probings, a brand-new reason may have appeared. This is an article published Aug. 30, 2005 in the Public Library of Science Medicine entitled, "Why Most Published Research Findings are False." The author is John P.A. Ioannidis, an epidemiologist at the University of Ioannina School of Medicine in Greece and in the Institute for Clinical Research and Health Policy Studies at Tufts University School of Medicine in Boston.[11]

False Findings

The author notes the increasing concern that in modern biomedical research false findings may be the majority, or even the vast majority of published research claims. This should not be surprising, he says, since it "can be proved that most claimed research findings are false." And, then, he proceeds to show why. There are several key factors that influence this problem. First, the probability that a research finding is true depends on the prior probability of its being true before doing the study. When studies go forth with a low prior probability, the chances for false findings are greater. Then there are problems of "bias," which is defined as "the combination of various design, data, analysis and presentation factors that tend to produce research findings when they should not be produced." Bias can entail manipulation in analysis or reporting of findings. Selective reporting is a typical form of such bias, for example.

Then there may even be "reverse bias," that is, true research findings may be annulled because of large measurement errors or because investigators use data inefficiently or fail to notice statistically significant relationships. Or there may be conflicts of interest that tend to "bury" significant findings. However, says the author, there is no good large-scale empirical evidence on how frequently such reverse bias may occur across diverse research fields.

The author additionally observes that several independent teams of researchers may be investigating the same sets of research questions, a growing phenomenon because of globalization of research efforts. Yet, the prevailing mentality has been to focus on isolated discoveries by single teams and interpret research results in isolation. This tends to distort the truth of the isolated finding. Then Dr. Ioannidis sets forth a number of interesting corollaries about the probability that a research finding is indeed true.

1. The smaller the studies conducted in a scientific field the less likely the research findings are to be true. Thus, other factors being equal, research findings are more likely true in scientific fields that undertake large studies. He gives examples.

2. The smaller the effect sizes in a scientific field, the less likely the research findings are to be true. Findings are more likely true in scientific fields with large effects (e.g., impact of smoking on cancer or cardiovascular disease) than in fields where the postulated effects are small (e.g., genetic risk factors for multigenetic diseases). Since modern epidemiology is increasingly obliged to target smaller sizes, the proportion of true research findings is expected to decrease. Small effect sizes also are likely to be plagued by "almost ubiquitous false positive claims."

3. The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true.

4. The greater the flexibility in designs, definitions, outcomes and analytical modes in a scientific field, the less likely the research findings are to be true.

5. The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true. Conflicts of interest and prejudice may increase bias. Prejudice does not necessarily have financial roots. It can come from belief in a scientific theory or researchers' commitment to their own findings. Prestigious investigators may suppress, via the peer review process, the appearance and dissemination of findings that refute their findings, thus condemning their field to perpetuate false dogma. "Empirical evidence on expert opinion shows that it is extremely unreliable."

6. The hotter a scientific field (with more scientific teams involved), the less likely research findings are to be true. Sometimes, however, a hot field promotes larger studies and improved standards of research, enhancing the predictive value of the findings. Small studies in popular fields will tend towards false findings.

Based on his analysis, Dr. Ioannidis posits that "most research findings are false for most research designs and for most fields. Indeed, claimed research findings may often be simply accurate measures of the prevailing bias." Investigators who view large and highly significant results with excitement, as signs of important discoveries, ought instead to do "careful critical thinking" about what might have gone wrong with their data since the significant effects more likely may be signs of large bias. Then the author includes a significant section on how the situation can be improved.

An editorial in the same journal entitled, "Minimizing Mistakes and Embracing Uncertainty,"[12] concedes that Dr. Ioannidis has argued "convincingly" and that his claim that most conclusions are false "is probably correct." The journal editors suggest that published studies should clearly delineate and distinguish between "data," "hypotheses" and "conclusions." Most studies should be viewed as hypothesis-generating, rather than conclusive. Publishers also should issue high-quality negative and confirmatory studies regarding articles they feature.

Scrutiny Required

The foregoing sobering revelations about hearsay writings, punctuated by this new essay, signify that all is not well in "reliability" land. Attempts at "trial by literature" deserve careful probing under Frye or Daubert gatekeeping standards as to whether the hearsay truly is reliable; whether the articles are advancing mere hypotheses; whether bias or other failings are present; and, if conclusions are offered, whether those really are true or valid. Reliability hearings before trial, including a heightened focus on the underlying literature, probably are indicated.



Michael Hoenig is a member of Herzfeld & Rubin

Endnotes:________________________________________________________________________________________________________

[1]. Clark v. Takata Corp., 192 F.3d 750, 759 (7th Cir. 1999).
[2]. 797 N.Y.S.2d 743 (Sup. Ct. Wayne Co. 2005), 2005 N.Y. Slip Op. 25252. DeMeyer was discussed in Hoenig, "'Gatekeeping' Gems Give Guidance," New York Law Journal, July 11, 2005, p. 3.
[3]. Fuesting v. Zimmer, Inc., 2005 U.S. App. LEXIS 18759 (7th Cir. Aug. 30, 2005).
[4]. Hoenig, "Questions About Experts and 'Reliable' Hearsay, NYLJ, July 8, 2002, p. 3.
[5]. Hutchinson v. Groskin, 927 F.2d 722, 725-726 (2d Cir. 1991).
[6]. Borden v. Brady, 92 AD2d 983 (3d Dept. 1983); Hornbrook v. Peak Resorts, Inc., 194 Misc.2d 273 (Sup. Ct. Tompkins Co. 2002) (out-of-court material must not be principal basis for expert's opinion on ultimate issue in the case; expert not to be mere conduit by which to funnel out-of-court material into evidence).
[7]. Hoenig, "Experts' Reliance on 'Unreliable' Hearsay," NYLJ, Nov. 12, 2002, p. 3; See also our columns in the NYLJ on Experts and Professionally Reliable Hearsay dated, respectively, April 11, June 18, July 8 and Aug. 12, 2002, each article commencing at p. 3.
[8]. "Questions About Experts and 'Reliable' Hearsay," NYLJ, July 8, 2002, p. 3; "Experts' Reliance on 'Unreliable' Hearsay," NYLJ, Nov. 12, 2002, p. 3.
[9]. By authors W.L. Anderson, B.M. Parsons; Dr. Drummond Rennie in 34 U. Mich. J. L. Reform 619 (summer 2001).
[10]. Hambsch v. N.Y. City Transit Auth., 63 NY2d 723, 726 (1984).
[11]. John P.A. Ioannidis, "Why Most Published Research Findings are False," Vol. 2, Issue 8, Public Library of Science Medicine (Aug. 30, 2005), DOI: 10.1371/Journal.pmed.0020124.
[12]. DOI:10.1371/Journal.pmed.0020272, 2 Public Library of Science, Issue 8 (Aug. 30, 2005).
 
 
©2004 -2008 Herzfeld & Rubin, P.C. Print article
40 Wall Street, New York, New York, 10005 · Phone: 212-471-8500 · Fax: 212-344-3333