Skip to Content

When Experts ‘Cherry-Pick’ Among Competing Studies

May 9, 2016 in  News

New York Law Journal

Just weeks ago, on April 25, a U.S. Court of Appeals for the First Circuit panel split 2-1 on whether a claimant’s expert on specific causation in a toxic tort case offered opinions that were sufficiently reliable under Federal Evidence Rule 702 to withstand exclusion. The case is Milward v. Rust-Oleum,1 a claim by a pipefitter and refrigerator technician who, over 30 years, was exposed to varying levels of benzene from paints and other products made by the (sole remaining) defendant, Rust-Oleum. Brian Milward, the plaintiff, was diagnosed in 2004 with Acute Promyelocytic Leukemia (APL) and sued a number of defendants, contending that their negligence caused his disease.

In 2009, the district court excluded Milward’s general causation expert but that ruling was reversed on appeal,2 so the focus in the trial court on remand shifted to Rust-Oleum’s challenge regarding Milward’s specific causation expert, Dr. Sheila Butler, an occupational medicine physician. As many readers of this column know, in a toxic tort case the plaintiff must establish, through expert testimony, both general and specific causation. In this case, therefore, that meant a sufficiently reliable showing that exposure to benzene can cause APL (general causation) and that, in fact, benzene exposure was a substantial factor in the development of Milward’s APL (specific causation). The instant appellate ruling involved the district court’s rejection of Butler’s specific causation opinions.

This column has frequently reported on federal and state “gatekeeping” decisions in which courts have determined whether expert opinions were “reliable” enough to pass Daubert3 or Frye4 admissibility criteria. Less than two months ago, my March column, “Experts Flunk Reliability Test in BMW Case, “5 reported on the New York Court of Appeals’ decision in a toxic tort case claiming damages to a child from in utero exposure to unleaded gasoline vapor allegedly caused by an automobile’s defective gas hose. The endnotes in that column listed many prior articles on expert “gatekeeping” issues. So, why should this column focus afresh and so soon on the expert reliability/causation calculus?

The reason is that the April 25 First Circuit ruling in Milward (by a 2-1 vote) tees up very well some critical tensions in toxic tort experts’ methodologies and presents crossroads Daubert and Frye issues that recur. For example, experts can and do rely upon scientific and technical literature. But what if the articles relied upon are themselves partially or wholly unreliable? Or, what if there is inconsistent technical literature, that is, other articles at odds with those favored by the testifying expert? Can the expert simply select the line of literature that supports his position in the case without explaining the reasons for that choice in a manner consistent with reliable methodology?

Does “cherry-picking” favorable articles sufficiently create a jury question, leaving the adverse literature to be explored on cross-examination or introduced during the adversary’s case? Or does the problem of conflicting literature go to the heart of the threshold “reliability” question inherent in Daubert admissibility criteria? And, what should a court do when the expert is, let’s say, ” too selective”? These are some of the tensions reflected in the Milward decision.

The majority and dissenting opinions are relatively brief, well-written and quite readable, despite the technical subject matter. The pivotal shortcomings in the expert’s methodology (as elaborated by the majority) offer litigators valuable lessons in challenging or defending expert opinions, lessons that apply to experts’ battles even outside the arena of toxic torts.

The vulnerabilities identified in Milward also inform readers on questions relevant to retention of suitable experts, preparing them for the “reliability” fray sure to come and structuring advocacy on such issues. The dissenting opinion presents a differing point of view which, too, needs to be understood. Therefore, let us take a closer look at this new Milward decision.

As noted above, plaintiff’s theory was that benzene exposure caused his APL disease. His specific causation expert, Dr. Butler, was an employee of the Veterans Administration, specializing in clinical assessments of environmental and occupational exposure in combat-exposed veterans. The dissenting judge, particularly, was impressed by her qualifications. “She has quite the CV,” he said, and proceeded to enthusiastically elaborate her qualifications.6 But Butler’s qualifications were not the issue. Were her theories “reliable” under Federal Evidence Rule 702 and Daubert’s admissibility criteria? That was the focus by the appellate panel majority. Remember, the issue was specific causation, i.e., did the exposure to benzene cause this disease to this person?

Expert’s Theories

Butler presented three theories. First, she testified that, although benzene is naturally occurring, “there is no safe level of benzene exposure.” This was her predominant theory, and “she consistently reiterated her hypothesis,” said the court.7 She emphasized that she reached this conclusion by examining “the biology, the pathophysiology, what the substance does to the person and the disease process.” She was able to do so without relying on any of the relevant epidemiological studies.

Given this no-safe level position, Butler maintained that Milward’s exposure (as detailed by an industrial hygienist expert who had calculated the benzene levels in products plaintiff used) was likely the cause of his APL. The district court rejected this no-safe level hypothesis because it could not properly be tested by any known rate of error—a conclusion the appellate panel assumed was correct.8

Butler offered a second “rather cursorily concluded” position, as described by the court, beyond the no-safe level hypothesis. She contended that an individual’s “relative risk” of developing APL increases when exposed to specified amounts of benzene. She then compared Milward’s exposure levels to those that had been found to be dangerous in studies reporting that research. Since Milward’s exposure levels (as calculated by plaintiff’s industrial hygienist expert) were higher than those found to be dangerous in selected literature, Butler reasoned that benzene exposure was the likely cause of plaintiff’s APL.

However, Butler did not explain why she chose the studies on which she relied, nor did she address any study with contrary findings. During her deposition she was asked, “Are you aware of any studies which find there is no relationship between benzene exposure and APL?” She responded, “Yes…the literature has support for both.” Then counsel asked, “Do you intend in this case to weigh the different epidemiological studies and offer an opinion as to which ones we should rely on and which ones we should discount?” Butler replied, “No.”9

Finally, Butler engaged in “differential diagnosis” to conclude that benzene exposure likely caused Milward’s APL. This method is essentially a process of elimination. Thus, she “ruled out” some of the more common factors associated with APL, among them obesity and smoking. She then determined that, since benzene exposure was a potential cause, she could also “rule out” an idiopathic diagnosis, that is, a diagnosis without a known cause. Thus, since benzene exposure was the “only significant potential cause remaining,” Butler concluded that it likely was the culprit.10

The district court rejected each theory. On appeal, plaintiff did not rely on the “no-safe level” hypothesis (Butler’s first and predominant theory). Instead, Milward pressed on appeal the second conclusion based on “relative risk” and Butler’s third theory, “differential diagnosis.” The appellate court grappled with those. Butler’s “relative risk” methodology was problematical because she expressly disavowed her intent to analyze conflicting epidemiological studies. Here, a number of studies showed a “correlation between APL and benzene exposure at a specific level, while other studies do not show that correlation.”


In order to establish specific causation by the relative risk method, Butler was required to choose a study, or studies, “to serve as a baseline” to which she could then compare Milward’s case. But while one study showed a correlation at exposure levels lower than Milward had experienced, another study exhibited no such correlation even at exposure levels higher than the plaintiff’s. Thus, the latter study yielded “a vastly different comparison.”11

Given that Butler had “anchored her testimony to her no-safe threshold hypothesis,” a theory that did not turn on the validity of any of the epidemiological studies, it was consistent for her to state that she had neither the need nor intent to compare the competing literature. When an expert’s medical opinion is grounded exclusively on scientific literature, the “gatekeeper” trial court has discretion “to require the expert to explain why she relied on the studies that she did and, similarly, why she disregarded other, incompatible research,” noted the court.12

When an expert engages in relative risk analysis, in the manner Butler did here, the district court “is on firm ground in requiring such an explanation, since the validity of the approach depends on the reliability of the studies chosen.” If the expert is comparing the plaintiff’s condition to a study, and the study is based on an unreliable methodology, then “the comparison itself is futile.” Here, the relevant studies were not only in tension with one another, “but expressly cast each other into doubt.” Butler’s “complete unwillingness” to engage with the conflicting studies (whether she was able to or not) made it “impossible” for the district court “to ensure that her opinion was actually based on scientifically reliable evidence and, correspondingly, that it comported with Rule 702.”13

The “differential diagnosis” opinion likewise had shortcomings. Butler had “ruled out” obesity and smoking as causes of Milward’s APL. But the district judge was concerned about the utility of Butler’s approach “given the high percentage of APL cases that are idiopathic”—according to the record, roughly 70—80 percent of all APL diagnoses. The district judge also concluded that Butler’s reasoning was “circular.” She “ruled out” an idiopathic APL by “ruling in” benzene as a cause, but she failed to provide a scientifically reliable method of “ruling in” benzene in the first instance. When “differential diagnosis” is used, it must be shown that the steps are taken as part of that analysis—the “ruling out” and the “ruling in” of causes—were accomplished utilizing scientifically valid methods.14

Since Butler was only able to rule out an idiopathic APL because she had “ruled in” benzene as a cause, the validity of her differential diagnosis “turns on the reliability of that latter conclusion.” The reliability of that decision was “particularly critical here given the extensive number of APL cases that are idiopathic.” Indeed, Butler seems to have “ruled in” benzene exposure solely by relying on her other two theories.

But the district court found both of those to be unreliable, and the appellate majority did not disagree. Thus, plaintiff had failed to show how Butler could have reliably utilized either method to “rule in” benzene exposure. The expert needed some other method to “rule out” an idiopathic diagnosis. She did not provide one. As such, the district court acted within its discretion in excluding the opinions, and the summary judgment in favor of Rust-Oleum was correctly granted.15


The Milward decision is instructive reading for litigation specialists, whether representing plaintiffs or defendants. On one level, it teaches that even highly qualified experts, in this case, a strongly opinionated medical specialist, must hew to the reliability standards required by Daubert and Federal Evidence Rule 702. The entire episode teaches that expert credentials do not substitute for the reliable methodologies that, when challenged, have to be specified.

On another level, Milward teaches that experts’ cherry-picking among competing epidemiological or scientific studies is likely to be exposed by attentive, opposing counsel. Therefore, an expert’s decision to rely on one study but reject a contradictory study requires reliable, non-speculative reasoning. An expert’s choice based on personal preference, belief, assumptions or hunches will not suffice.


  1.  2016 U.S. App. LEXIS 7470 (1st Cir. April 25, 2016).
  2.  See Milward v. Acuity Specialty Prods. Group, 639 F.3d 11 (1st Cir. 2011).
  3.  Daubert v. Merrell Dow Pharms., 509 U.S. 579 (1993).
  4.  Frye v. United States, 293 F. 1013 (1993).
  5.  NYLJ, March 15, 2016, p. 3.
  6.  Milward, 2016 U.S. App. LEXIS 7470, at *18—*20 (dissenting opinion).
  7.  Milward, LEXIS, at *4.
  8.  Id. LEXIS, at *4. The plaintiff did not “meaningfully challenge” that ruling on appeal so the appellate court assumed it was correct and bypassed further discussion.
  9.  Id. LEXIS at *5.
  10.  Id. LEXIS at *5—*6.
  11.  Id. LEXIS at *9—*11.
  12.  Id. LEXIS at *11—*12.
  13.  Id. LEXIS at *13—*14.
  14.  Id. LEXIS at *15—*17.
  15.  Id. LEXIS at *15—*18.