P-Hacking and Intentional Academic Dishonesty: An Overview, Famous Cases, and Solutions

Rafay Malik

Math 89S: Mathematics of the Universe

2/14/16

Introduction

According to Tyler Vigen’s blog, “Spurious Correlations”, US government spending on space, science, and technology correlates strongly with suicides by hanging, strangulation, and suffocation. With a correlation coefficient of r = .99798, it seems as if the two variables are undeniably correlated. Similarly, per capita cheese consumption correlates strongly with the number of people killed by becoming tangled in their bed sheets (1). These seemingly ridiculous correlations arise out of a process known as p-hacking or data dredging.

P-hacking is the process of analyzing huge sets of data in multiple ways until a significant result is uncovered which can then be published. It isfrequently done in a dishonest way. One of the ways this differs from typical statistical analysis is that the result is found before a hypothesis is created to attempt to explain the data. This process is known as HARKing, or hypothesizing after the results are known (2).This is often done intentionally when a researcher finds his or her original hypothesis to be insignificant and needs a new one that leads to publishable results. Other times, it can be done unintentionally when mining through large sets of data. These sets have a degree of randomness to them and researchers can accidentally get false positives on actually insignificant correlations if they are not careful (3).

The P-Value

Much of p-hacking and data dredging revolves around a certain number in statistics called the p-value. The p-value is used to determine the significance of results and the probability that the result occurred due to chance. For example, let’s imagine a drug company develops a new drug to treat cancer. The company tests the drug and obtains a p-value of .05. This would mean, assuming that there is no relationship between the drug and cancer treatment, there is a 5%probability the individuals would be cured of cancer due to random sampling. Since 5% is too low to just attribute to just random chance, there is then sufficient evidence to support the claim that the drug actually cures cancer (4).

While the p-value seems like a great tool for assessing the importance of results, it is actually misused quite frequently and says very little about the strength of the evidence for a claim. In fact, the p-value is rarely used according to how its creator intended it to be used. According to the creator, Ronald Fisher, the p-value was supposed to simply be used to judge whether data was worthy of more consideration and study. It was not meant to be a decisive, and supposedly objective, tool in evaluating data (6).

The use of the p-value also brings concerns regarding the replicability of an experiment and the possibility of false alarms into play. With a p-value of .01, the probability of a false alarm is 11%. This rises to 29% with a p-value of .05. Researcher Paul Motyl ran into this problem when doing research on how degrees of political affiliation affected one’s ability to see differences in shades of grey. On his first trial, Motyl obtained a p-value of .01, giving him dreams of being published in a famous journal. But when Motyl tried to replicate his research, he got a new p-value of 0.59, nowhere near his original result (6).

It is also extremely easy to obtain a significant result. By exploring the simulation at the FiveThirtyEight.com article (5), one can analyze data sets regarding the majority political party in America and the state of the US economy at the time.Just by changing which variables are analyzed, like GDP as a measure of the economy orthe unemployment rate, one can prove that Republicans were both beneficial and harmful to the US economy. The same could be proven for Democrats. In fact, of the 1,800 possible combinations of variables, 1,078 of them lead to p < .05 significant results, which could then be published. These two examples indicate that the p-value is not such an excellent tool for data analysis. Unfortunately, it is probably the most frequent way data sets are analyzed.

In many scientific journals, a p-value of less than .05 has become the key to getting published which has to lead a decrease in the quality of publications in journals. According to a study done by psychologist Uri Simonsohn, many of the published papers in psychology suspiciously have reported p-values close to .05, which makes it appear as if the researchers searched (or p-hacked) for a significant p-value that could be published (6). To combat methods like these, journals like Basic and Applied Social Psychology have decided to not publish p-values and instead evaluate submissions based on their use of “strong descriptive statistics, including effect sizes” (5).

Why P-Hacking Happens

All of this negativity about the p-value may make science and statistics seem dishonest and fraudulent. But often times, p-hacking is not done intentionally. In fact, it occurs for a variety of reasons. One of the most obvious ways is through the principle of confirmation bias, or the “tendency to search for or interpret information in a way that confirms one's preconceptions, leading to statistical errors.” Researchers often tend to get attached to their original hypothesis. After all, they are human. But, because of this, researchers then actively seek out data and evidence that supports their own hypotheses and ignore evidence that contradicts or detracts from their claims (7). As a result, researchers then resort to p-hacking to find results to support their idea.

In addition to confirmation bias, there are a variety of external factors at play. Researchers often try to pad their number of publications because it can lead better and more job opportunities. Or, it can help them gain tenure at a university if they have a publishing requirement to meet. A recent example of this occurred at Dongguk University with one of their researchers, Hyung-In Moon. The editor of The Journal of Enzyme Inhibition and Medicinal Chemistry noticed a peculiarity in Moon’s research reviews. Almost all of them were favorable and done in less than 24 hours. When confronted about it, Moon admitted that he and his colleagues wrote most of the peer reviews themselves and used the names of real scientists. This led to a retraction of 28 of his papers (9).

Cases like these of academic dishonesty are not isolated. In fact, it happens quite frequently. Because of a lack of proofreading, engineerAlex Smolyanitsky was able to publish a paper filled with nothing but randomly generated text in a paper called “Fuzzy, Homogeneous Configurations” with authors such as Maggie Simpson, Edna Krabappel, and Kim Jong Fun (10).So called “predatory journals”, such as theInternational Journal of Advanced Chemical Research, are created to allow anyone who pays an amount of money to publish papers without any form of peer review. The paper by Smolyanitsky that was mentioned earlier was actually published in one of these “predatory journals” (5).

One might wonder why scientists would turn to such dishonest measures like this. Once again, by increasing their number of publications, scientists can obtain a better looking CV which can lead to more funding and research support or better jobs. This creates a mindset that one must either publish or he or she will fail. Another reason for this academic dishonesty is that science is justdifficult. Conducting studies accurately is not an easy task because the data collected can be obscured by many outside factors. To circumvent this, researchers may rely on fabrication to obtain clean, and more favorable data. Because fabrication is difficult to prove, many researchers participate in it since they know they can get away with it (11).

While academic dishonesty and p-hacking do happen, not all of it is done purposefully or even dishonestly. There are so many factors involved in research like which data to record and compare, and which variables to control for and which ones to ignore. Because of these, differences can arise naturally.

According to a 2005 paper by John Ioannidis, most published research findings are false. As Ioannidis said, “There are so many potential biases and errors and issues that can interfere with getting a reliable, credible result.” (5) In addition, they can simply arise from differences in statistical analysis methods. Brian Nosek, at the Center for Open Science, decided to open source a data analysis project to 29 teams of statistical analysts. All of the researchers were given the same data set and the same task: Do referees give more red cards to dark-skinned players than light-skinned players? Even though all of teams were given the same data, they came up with different conclusions due to the variety of methods they used. Twenty of the teams concluded that referees do give more red cards to dark-skinned players while the other nine teams concluded that there was no significant difference in the amounts of red cards given. The key takeaway from this project was that the difference in results was not because of academic dishonesty or data fabrication. Rather, it depended more on what statistical methods the researchers used. This shows that often times results are based on the research methods used by scientists (5).

Famous Examples of Academic Dishonesty

This paper would not be complete without a discussion of one of Duke’s very own cases of fraud and academic dishonesty. Anil Potti, a medical researcher at the Duke Medical Center was accused and convicted of research conduct and was forced to resign in 2010 due to his falsifying of cancer research. In 2005, Potti claimed to have discovered a revolutionary breakthrough in cancer research by being able to match a person’s DNA to a specific cancer drug, since every person’s tumor is unique and not all cancer treatment drugs work for everyone. His results were published in over nine prestigious journals like the New England Journal of Medicine as well as the Journal of the American Medical Association (12). As it turned out, much of Potti’s research was falsified. Potti claimed to have cured 6 of his 33 patients, when in reality his study only had 4 participants. Potti also purposely altered his data sets to make his treatment seem legitimate. As punishment for his actions, Potti settled with the government to not pursue any research without supervision from the government. He now works at center for cancer in North Dakota, which brings into question whether his punishment was severe enough (13). His actions negatively impacted the lives of the cancer patients he treated, giving dying people false hopes of living.Potti had eight lawsuits against him, and currently only two of those people are still alive. One of the survivors, Joyce Shoffner, is permanently disabled in her joints and suffers from blood clots because of Potti’s incorrect treatment (14).

Shoffner’s case brings the legality and safety of techniques like p-hacking into question. How far is one allowed to p-hack and publish questionable research before it leads to medical trials that put the safety of patients at risk?

Dishonesty in the medical field can have a profound effect. The case of Scott Rueben, one of the leading anesthesiology researchers in the world, is an excellent example of this. For years, Rueben had been an advocate of multimodal anesthesiology, or the use of several types of pain medicines to improve patient comfort and recovery. But in 2010, Rueben admitted to falsifying data in at least 21 of his papers since 1996. In addition, he wrongly added the names of researchers who weren’t even involved in his research to his papers (15). Rueben’s workwas a major contributor to the anesthesiology field and many of the practices recommended by his papers are used today by anesthesiologists like nonsteroidal anti-inflammatory drugs or NSAIDS. But with the retraction of his research, a major hole has been left in our knowledge of anesthesiology. All of the current practices in the field must be reevaluated, like the use of drugs like Celebrex and Lyrica, which Rueben was a huge advocate of. The Medical Center at the University of Pittsburgh has already stopped using these drugs as have many other anesthesiologists.Rueben’s work has been cited over 72 times, showing how pervasive the effects of this scandal will be in our current body of knowledge and modes of research (15).

Rueben was discovered to be falsifying his work through an internal investigation by the medical center he worked at, Baystate Medical Center, after an independent reviewer raised concerns about Rueben’s work (15). But this was almost 14 years after he began his work which raises the question why his work was not flagged by peer reviewers in the field. This indicates that our peer review system in not infallible and needs to either become more rigorous or a whole new method may be needed in evaluating work.

The effects of Reuben’s dishonesty are tremendous. It raises similar concerns like those in the Potti case as to how many people might have been harmed by this invalid research and by p-hacking or fraud in general. In addition, it puts the credibility of the field of anesthesiology and even just science on the line as many people will see headlines like this and begin to doubt the legitimacy of science as an unbiased form of inquiry. Academic fraud also has an impact on the careers of other scientists and the well being of the people treated by them. Much of scientific research is based on the work of others and when a big event such as the ones involving Rueben and Potti happens, the entire work of someone in the field can be invalidated and discounted. When forgery is committed, as Rueben did by adding the names of other scientists to his work, the reputations of innocent scientists can be ruined and sullied. The patients of treatments of invalidated work face several negative effects as well. They might be physically harmed and disabledby the treatment. And if not, they might be under great amounts of emotional and psychological strain because they know their lives were jeopardized by the work of a dishonest scientist. Because of this, we need to reexamine our views on the legality and extent to which p-hacking can be used. Finally, we also need better methods for evaluating data and peer reviewing the work of others.

Solutions

To improve the integrity of scientific work and maintain the credibility of the field, several changes need to be made in the process of research. Some might suggest increasing the number of peer reviewers and making the entire process more rigorous. But this solution is not ideal as cheating rings where scientists agree to approve one another’s work has been an issue in the past. In 2013, Peter Chen, an engineer from China, was found to be the center of a cheating ring involving over 60 articles where authors were reviewing and citing one another. It turned out that Chen had been exploiting the peer review system through which scholars are invited to review papers. In theory, anyone, even someone without any scientific background, could be invited to review a paper. In the end, all of the articles were forced to be retracted (9).

A possible solution to the flaw of peer reviewing rings would be to open source the entire peer review process by providing initial editions of the paper to the public in addition to the regular peer review process. This worked successfully in 2011 when physicists in Italy published a paper on neutrinos that moved faster than light. This claim was quickly debunked thanks to the researchers providing pre-print editions of the paper to the public. The process might have taken years if it had gone through the typical process of peer review (8). This solution actually might be the ideal one as it vastly increases the number of people who can thoroughly check and review a paper for inconsistencies while still maintaining the conventional process of peer review. In a similar case, one of the researchers of a project that suggested the presence of arsenic as a building block for DNA instead of phosphorous decided to create an open source blog of her replication of the findings. The public was satisfied with her work while the original researchers faced criticism because of their lack of transparency on additional evidence of their claim (8).

Another solution to preventing dishonesty would be to require all data to be reported, positive or not. Pharmaceutical companies are required to register all of their trials now by law because of their tendency to report the positive results of trials and downplay the negative ones. Forcing all fields to do this would also definitely be a viable solution for preventing pernicious p-hacking by only presenting results that were found to be significant and publishable while ignoring the ones that didn’t support their claims (8). This could easily be accomplished through a new project called the Open Science Framework. In the program, researchers outline what they aim to to study and their original hypotheses. They also agree to only analyze their data in alignment with their original hypothesis and allow the public to track and see their progress. Doing this would prevent p-hacking and HARKing, and keep researchers more honest and less biased. They would be forced to maintain their original research goals and not change them because they found another result that could be published through the process of p-hacking (8).