Skip to Content

Testifying Experts and Scientific Articles: Reliability Concerns

September 16, 2011 in  News

New York Law Journal 

In his Products Liability column, Michael Hoenig, a member of Herzfeld & Rubin, writes that experts increasingly testify about, interpret and extrapolate from articles, yet the authors are unavailable to be cross-examined about reliability of the data presented or limits on conclusions the testifiers should draw from the work product.

There is a trend toward more “trial by literature,” particularly in pharmaceutical drug, toxic tort and complex products litigation. Experts increasingly testify about, interpret and extrapolate from articles, yet the authors are unavailable to be cross-examined about reliability of the data presented or limits on conclusions the testifiers should draw from the work product. Indeed, many experts opine well beyond data and findings in scientific literature. As a result, significant concerns may exist about the reliability of the scientific testimony and evidence. Indeed, sober information from science journalists themselves indicates that even peer-reviewed articles may be tainted by unreliability factors that remain obscure.

Peer review is seriously fallible. This may raise questions about reliability of the article itself or perhaps some of its data and conclusions. Courts are tasked to allow only evidence that is relevant and reliable. Thus, in “trial by literature” litigation courts must step up their gatekeeping scrutiny to assure reliability.

This article briefly reports on two recent decisions, one by the Texas Supreme Court and one by the U.S. Court of Appeals for the First Circuit reflecting widely divergent views on the gatekeeping quest when epidemiological and biomedical studies are used by experts. Then the article reminds readers, from statements by science writers themselves, about the likelihood that even peer-reviewed literature often contains flaws and weaknesses that generate doubts about reliability of the articles themselves or portions of them. Thus, the “trial by literature” trend calls for increased vigilance.

A significant Texas Supreme Court decision issued only weeks ago, Merck & Co. v. Garza,1  blazes a clear path in setting requirements for determining whether epidemiological evidence is scientifically reliable to prove causation. The court crisply declares that failure of such evidence to meet reliability standards makes it legally insufficient as proof. The case was filed after a patient with a long history of heart disease, including a heart attack, quadruple bypass surgery and later catheterization procedures, died following ingestion of 25 mg doses of the pain relief drug Vioxx over some 25 days. Plaintiff claimed the drug was defective as designed and as marketed with inadequate warnings.

Causation proof in such cases consists of two components: (1) general causation, i.e., whether the substance is capable of causing a particular injury or condition in the general population; (2) specific causation, i.e., whether a substance caused a particular individual’s injury. Plaintiff in Garza relied on testimony from two cardiologists who based their opinions on data compiled in Merck-sponsored clinical trials of Vioxx, meta-analyses of those trials, and other observational, epidemiological studies regarding the possible cardiovascular risks presented by Vioxx.

The Texas Supreme Court articulated a bright-line boundary for accepting epidemiological studies, including clinical trials, as part of the evidence supporting causation in a toxic tort case. Such studies must show a statistically significant, at the 95 percent confidence level, “more than doubling of the risk” in order to be used as some evidence that a drug more likely than not caused a particular injury.2 This is referred to as a “relative risk of more than 2.0.” Since epidemiological studies only show an “association” of the condition with the drug or toxic agent, the “more than doubling of the risk” boundary plus consideration of other factors, such as biases, comparable conditions and dosages, etc., is needed to assure reliability for the courtroom.

After reviewing each of the studies offered by plaintiff, the court found them unreliable as indicators of general causation. One study involved ingestion of Vioxx at 50 mg, for a median duration of nine months, double the decedent’s dosage of 25 mg for only 25 days. This study “suggested nothing at all about significantly lesser exposure,” and the conditions of the study were not “substantially similar” to the claimant’s circumstances. The study, thus, did not show a statistically significant doubling of the relative risk for a person like the plaintiff. Similarly, a meta-analysis study that combined the results of a number of different studies, with differing dosages, durations and comparison drugs (and which included the first study discussed above) skewed the results and did not meet the 95 percent confidence level of a 2.0 relative risk. The other studies offered by the expert also fell below the declared standard for reliability.

Plaintiff argued, however, that a “totality of the evidence” approach, adding up each of the items, would show general causation. The Texas Court rejected this argument. “The totality of the evidence cannot prove general causation if it does not meet the standards for scientific reliability” established by the Court. “A plaintiff cannot prove causation by presenting different types of unreliable evidence.” The clarity of the Texas Supreme Court’s approach may be contrasted with the First Circuit’s decision in March 2011 in Milward v. Acuity Specialty Products Group Inc.,3 articulating its acceptance of a so-called “weight of the evidence” methodology used by an epidemiological expert. This case is soon likely to be the subject of a petition for certiorari in the U.S. Supreme Court.

The Milward case involved a claim that workplace exposure to benzene-containing products caused plaintiff’s APL, an extremely rare type of leukemia. The “weight of the evidence” approach to making causal determinations, according to the Court, involves a “mode of logical reasoning often described as ‘inference to the best explanation,’ in which the conclusion is not guaranteed by the premises.”4  The Court suggested that “the role of judgment in the weight of the evidence approach…does not mean that the approach is any less scientific.” The Court reasoned that the “use of judgment…is similar to that in differential diagnosis.”5  One problem with the Court’s reasoning, however, is that differential diagnosis, when properly used, usually applies in “specific causation” analysis. Another is that it differs from a slew of circuit court rulings that exclude what is really an expert’s “working hypothesis” as opposed to “scientific knowledge.”6

Yesterday, the Appellate Division, First Department, in Nonnon v. City of New York, 2011 NY Slip Op 06463 (1st Dept. Sept. 15, 2011), a case involving consolidated claims for personal injuries and deaths due to exposure to hazardous substances at a Bronx landfill, discussed the role of epidemiological evidence and recognized the significance of a “greater than 2.0” relative risk. In affirming denial of the city’s motion for summary judgment, the First Department observed that the relative risks proffered by plaintiffs’ experts, when adjusted for confounding factors, were “well in excess of 2.0.” Thus, the Texas Supreme Court’s reliability threshold in Garza, regarding the “more than double relative risk” criterion, seems to accord with the approach expressed in Nonnon by New York’s Appellate Division.

And, of course, if novel scientific evidence is presented, then plaintiffs’ experts certainly must not flunk the Frye test, as occurred in Matter of Bausch & Lomb Contact Lens Solution Prod. Liab. Litig. also decided yesterday. 2011 NY Slip Op 06460 (1st Dept. Sept. 15, 2011). There, the First Department held the experts failed to prove their opinions, that defendant’s soft contact lens solution caused corneal infections, were generally accepted by the relevant medical or scientific community.

Reliable Hearsay

The tension between the foregoing approaches toward reliability of epidemiological evidence highlights problems associated with testifying experts’ abundant use of scientific literature written by out-of-court authors who cannot be cross-examined. Nor is this only a Daubert or Frye threshold reliability issue. As my columns on the New York Court of Appeals’ pivotal October 2006 decision in Parker v. Mobil Oil Corp.7 observed, Parker’s rejection of a claim that a gas station attendant developed AML leukemia from exposure to benzene in gasoline was a decision involving “foundational reliability,” not a “novel scientific evidence” issue under Frye.8  Thus, the Parker Court observed: “the inquiry here is more akin to whether there is an appropriate foundation for the experts’ opinions, rather than whether the opinions are admissible under Frye.”

To be sure, hearsay, in itself, is not a devil. Life in the courts, or outside, would be paralyzed if reliance on all hearsay were declared off limits. That, in part, is why, with respect to evidence, the hearsay exclusionary rule is riddled with so many exceptions. We need acceptable hearsay—if there are good and valid reasons to trust the specific item of information. Junky, unreliable hearsay, however, infests expert evidence with a taint, an infirmity of uncertainty and doubt making the item of information of little more value than gossip, rumor, whim or conjecture.

Experts testify on the basis of hearsay of one kind or another since it is a practical impossibility for an individual to know everything there is to know about a given scientific or technical subject. Rule 703 of the Federal Rules of Evidence provides that an expert may rely upon inadmissible facts or data, such as hearsay, provided that the data upon which the opinion rests are of a type “reasonably relied upon by experts in the particular field.” Ostensibly, Evidence Rule 703’s “reasonable reliance by experts in the particular field” criterion is supposed to be a safety screen against experts leaning on “junky” hearsay for their opinions. But, as all trial practitioners and judges know, the invigorated emphasis upon Daubert and Frye “gatekeeping” to exclude unreliable expert testimony has revealed widespread violation of expert reliability standards including use or misuse of out-of-court materials that we cannot be sure are reliable.

As detailed in my article, “‘Conduit Hearsay’: A Minefield for Lawyers,”9  one offensive form of hearsay snuck in by experts is so-called “conduit hearsay.” It comes in different disguises and flavors. Sometimes the out-of-court hearsay is outright the only or the major basis for an expert’s opinion. Sometimes, the hearsay comes from other more renowned or distinguished sources. The testifier, employing name or image recognition, then seeks to bolster his own position by suggesting that the famous hearsay creators agree with him.

Sometimes, the testifying expert merely parrots or becomes a mouthpiece for what others have said or written. Yet another kind of problem is presented when an article may be reliable for only some limited informational purpose but the expert unreasonably opines beyond the original author’s limited target. This kind of hearsay abuse is difficult to police because scientific or technical literature has its own specialized language and a silky-smooth, articulate expert can use trade jargon as camouflage. Not surprisingly, these practices increasingly trigger a need for substantial motion in limine practice.

Prior columns have discussed the unique problems associated with judicial gatekeeping of experts who seek to use unreliable literature or to rely on trustworthy articles  unreliably.10  Astoundingly, as I reported in a column in September 2005, a noted epidemiologist suggested that most published biomedical research findings are false.11 And, as I reported in the same column, the entire June 5, 2002, issue of the respected Journal of the American Medical Association (JAMA) was devoted to the question whether biomedical literature truly meets assumed standards of quality and trustworthiness. Thus, there exists a very real and potentially explosive question of the reliability of the hearsay itself.12 The following examples from JAMA signify that cautious and diligent inquiry into the reliability of even peer-reviewed scientific articles is needed. A fortiori, when testifying experts generate conclusions beyond the author’s own limits.

In the eye-popping JAMA article called, “Poor Quality Medical Research: What Can Journals Do?” the author stated: “There is considerable evidence that many published reports of randomized-controlled trials (RCTs) are poor or even wrong, despite their clear importance…. Poor methodology and reporting are widespread…. Similar problems afflict other study types.” The author goes on to state: “Errors in published research articles indicate poor research that has survived the peer-review process. But the problems arise earlier, so a more-important question is, why are submitted articles poor?” The author concludes: “Many readers seem to assume that articles published in peer-reviewed journals are scientifically sound, despite much evidence to the contrary.”

Articles’ Shortcomings

A separate article, “The Hidden Research Paper,” established through post-publication surveys of authors that “important weaknesses were often admitted on direct questioning but were not included in the published article. Contributors frequently disagreed about the importance of their findings, implications and directions for future research.” Published papers rarely represented the full range of opinions of those scientists whose work they claim to report. There was evidence of “censored criticism; obscured views about the meaning of research findings; incomplete, confused and sometimes biased assessment of the implications of a study; and frequent failure to indicate directions for future research…. What was striking was the inconsistency in published evaluations, especially regarding weaknesses.” A published study, seeming to be a blissful consensus among researchers, may hide dissension among the authors about study deficiencies.

Drummond Rennie, JAMA’s deputy editor, wrote in his editorial in 2002 that in 1986 he had noted “appalling standards” of quality despite peer review and that, despite some improvement in the 16 years since, “an unbiased reader, roaming at random through a medical library, would find in abundance all the problems I described in 1986.” Further, prestigious journals such as JAMA, New England Journal of Medicine and Lancet have rules limiting post-publication correspondence to only some four to eight weeks, in effect establishing a short statute of limitations immunizing authors from disclosure of methodological weaknesses.

Other authors disclosed that numerous biomedical articles had “honorary authors” for writings really produced by “ghost authors,” thereby distorting objectives of accountability, responsibility and credit. From the standpoint of use of seemingly prestigious studies in the courtroom, such authorship practices can mislead testifying experts and judges into believing that highly qualified or distinguished experts in their field actually wrote the reports when they did not, in effect masking that unqualified specialists wrote the articles.

Then there is the growing phenomenon of litigation-focused research with the goal of creating a body of scientific studies generated for or funded by litigation and conducted for expected use in litigation. Often such research will have been funded by lawyers or litigants or controlled in some manner by the lawyers or their testifying experts. This is not inherently or necessarily bad science. But it calls for close scrutiny by courts pursuant to Daubert or Frye standards because there may be potential for bias or pressures to assure the “right” outcome, which can result in manipulated procedures, distorted data, selective reporting of results or even falsified outcomes. This development was described in a law review article titled, “‘Daubert’s’ Backwash: Litigation-Generated Science,” published in summer, 2001.13  One of the co-authors was Drummond Rennie, JAMA’s deputy editor.

More recently, Trevor Ogden, chief editor of the Annals of Occupational Hygiene, a publication of the British Occupational Hygiene Society, issued an editorial in June 2011 titled, “Lawyers Beware! The Scientific Process, Peer Review and the Use of Papers in Evidence.”14 The editorial expressed frustrations over the way publications have been used in lawsuits. In truth, scientific papers are “all about contributing to an ongoing debate as to how we must interpret certain observable facts. Thus, a single paper can never reveal the absolute truth and that is why in each paper we carefully discuss its own pros and cons. Hence, the nature of science is debate and uncertainty about general situations, whereas the nature of a legal process is about a decision in a particular case.”

Mr. Ogden noted that courts’ use of scientific evidence often involves a single paper that, although peer-reviewed and published, “possibly still has weaknesses which will emerge in the ongoing debate….” Federal Rule of Evidence 702’s reference to “peer review and publication” clearly accounts for a number of papers submitted by consulting individuals and experts involved in expert testimony in U.S. courts “who wish to have the basis of their testimony published in a peer reviewed journal.”

Peer review, says Mr. Ogden, is an important but “fallible filter.” It has “shortcomings.” It is important that these flaws and the role of publications in the ongoing science debate are understood. In one test case demonstrating the “coarse and fallible filter” peer review provides, three papers were sent to 600 reviewers. The papers included randomized trials, and nine major errors and five minor ones were added to each paper. On average, each reviewer missed 6.4 of the major errors. This means that even sending to two reviewers would return substantial error. Further, it is “well known that peer review will only occasionally detect fraud and may well miss serious genuine mistakes if the data look plausible.” Peer review is “unlikely to detect data which are erroneous but plausible.”

Following publication of an article, letters to the editor commenting on the paper are received and sometimes published. However, peer review of such correspondence is particularly limited. Thus, a journal’s publication of the letter does not validate the opinions expressed. “Science deals in provisional truths.” Papers surviving the “coarse filter” of peer review are exposed to public scrutiny. They can then be criticized in correspondence, examined by further research, “and the next paper can perhaps take the provisional truth a little further.” Such developments in the “ongoing debate” of science eventually may lead to general acceptance in the scientific community.

Michael Hoenig is a member of Herzfeld & Rubin.


  1.  No. 09—0073 (Tex. Sup. Ct., Aug. 25, 2011) (Slip Opinion).
  2.  Garza, Slip Op. at 10—13. The Court built upon its earlier decision in Merrell Dow Pharmaceuticals Inc. v. Havner, 953 S.W.2d 706 (Tex. Sup. Ct. 1997).
  3.  639 F.3d 11 (1st Cir. 2011), 2011 U.S. App. Lexis 5727.
  4.  639 F.3d at 17.
  5.  Id. at 19.
  6.  See Tamraz v. Lincoln Elec. Co., 620 F.3d 665, 669—670 (6th Cir. 2010).
  7.  N.Y. 3d 434 (2006).
  8.  Hoenig, “Judicial Gatekeeping: ‘Frye,’ ‘Foundational Reliability‘,” NYLJ, Feb. 11, 2008, p. 3; “‘Parker,’ ‘Frye’ and Gatekeeping of Experts: an Update,” NYLJ, June 17, 2009, p. 3.
  9.  Hoenig, New York Law Journal, March 13, 2006, p. 3.
  10.  Hoenig, “‘Gatekeeping’ of Experts and Unreliable Literature,” NYLJ, Sept. 12, 2005, p. 3; Hoenig, “Questions About Experts and ‘Reliable’ Hearsay,” NYLJ, July 8, 2002, p. 3; Hoenig, “Experts’ Reliance on ‘Unreliable’ Hearsay,” NYLJ, Nov. 12, 2002, p. 3.
  11.  Hoenig, “‘Gatekeeping’ of Experts and Unreliable Literature,” NYLJ, Sept. 12, 2005, p. 3 (citing John P.A. Ioannidis, “Why Most Published Research Findings Are False,” Vol. 2, Issue 8, Public Library of Science Medicine (Aug. 30, 2005), DOI:10.1371/Journal.pmed.0020124).
  12.  See articles cited in n. 10 supra.
  13.  By authors W.L. Anderson, B.M. Parsons, Dr. Drummond Rennie, in 34 U. Mich. J.L. Reform 619 (Summer 2001).
  14.  Ann. Occup. Hyg., Vol. 55, No. 7, pp. 689—691 (June 2011).