Case Study 3: Kinky Data

(adapted from a case study presented by Prof. Adam Johnson, HMC)

Professor Dee Lemma is in a quandary.

She is preparing a manuscript for submission to the prestigious journal Nurture. She’s worked for nearly two years on the research and is now working day-and- nightto get the paper ready, in part because she knows that this is a hot research area and that others are likely to “scoop” her if she doesn’t get the paper submitted quickly.

Below is a plot of her results, where the x-axis corresponds to a property called “x-ness” and the y-axis corresponds to a property called the “y-ness”. Prof. Lemma’s theoretical models suggest that “y-ness” should be a linear function of “x-ness.” The collected data has an obvious “kink” at x=4 and x=5 that doesn’t conform to the theoretical model.

Each of the eight points in the plot represents the results of an experiment that takes approximately two months to complete (once she gets scheduled for the special instrument that is required and belongs to another research group) and requires several tens of thousands of dollars in materials. Prof. Lemma knows that she doesn’t have the time or resources to re-run any of the experiments.

However, she decides to go back and look at the lab notebooks from the postdoctoral researcher who conducted the experiments. (The former postdoc, by the way, had a falling out with Prof. Lemma and the two are not on speaking terms.) The lab notebooks area bit sloppy and there are several places that strongly suggest that the samples for x=4 and x=5 may have gotten swapped by accident.

When Prof. Lemma replots the data above, swapping the data points for x=4 and x=5, here is the plot that she gets:

This plot looks exactly like what theory would predict!

Prof. Lemma is quite confident that if she submits the manuscript with the “kinky” data (as in the first plot), it won’t get published. Her colleagues tell her that she should clearly “unkink” the two data points (as in the second plot) because it’s quite obvious that this would reflect the right results.

  • What’s your advice to Prof. Lemma?

•Can you imagine any slight differences in the scenario which would result in very different advice?