Data reproducibility - is it negotiable?
As a first-year PhD student, I spent my six months optimizing a multi-step synthetic procedure of a published target molecule. I modified different experimental parameters so as to get the best product yields, facile reaction and purification processes. The efforts in replicating the reported quality of data came at the cost of the vast amount of resources that I utilized, and crucially one semester of valuable PhD time. It is then I realized that data reproducibility is not just beneficial, but a non-negotiable right of researchers.
The benefits of data reproducibility are indisputable for conserving invaluable scientific resources, manpower, time and money. In this era of rapidly progressing science that has unfortunately seen several instances of data fudging and paper retractions, reproducibility holds all the more relevance.
Sources of irreproducibility
The sources of irreproducibility are field-specific, but according to a comprehensive survey by Nature, the impact of it has been felt less by physicists and chemists, as compared to biologists [1]. Myriad parameters could result in variabilities: reagent suppliers offering differing purities, minor variations in protocol details, stability of samples like antibodies and crucially, the number of times a particular experiment has been repeated.
Apart from the experimental conditions, statistics forms an integral part of data interpretation. For any set of data, the standard deviation (SD) and p-values provide useful but not conclusive information. Specifically, the misuse of p-value in several instances has been condemned recently [2]. While p-values are useful indicators for assessing the probability in hypothesis testing, they are excessively relied on and advocated. The misinterpretation of data associated with p-values arises from various manipulations like cherry picking data excluding non-significant results, and the interchangeable usage of ‘statistical’ and ‘therapeutic’ significance to draw conclusions. These biases lead to over-interpreting the data, and thereby falsified conclusions.
Strategies to improve data reproducibility
Measures to address irreproducibility have to be employed at various levels – from scientists, peer-reviewers to publishers – minimizing the scope for variability. The approach to dealing this starts at the individual level, which involves training students in the lab. For example, a new student in chemistry lab must be trained to use tools like SciFinder to review literature, to study safety data sheets (MSDS), and to handle light/temperature sensitive chemicals. Standardising such trainings is as important towards getting reproducible data as actually performing experiments. Furthermore, there must be a strong emphasis on providing detailed protocols and analytical data, so as to facilitate reproducibility by other researchers. Researcher bias involving p-hacking can be eliminated by blinding the statistician to the data labels – this would be valuable in preventing over-interpretation of the statistics.
Meanwhile, publishers and referees must establish and enforce rigorous guidelines that can ensure that the published data is of a high standard. Journal ‘Organic Syntheses’ is an example, wherein synthetic procedures are reported in great detail and submitted articles are published only after the reported protocol is reproduced in an authorized lab [3]. This has been an efficient strategy for reducing variability in organic synthesis, but might be unfeasible and uneconomical in other fields. To enhance transparency, the authors must be required to submit the raw data for review, in addition to the manuscript figures and data. Finally, attempts to report data irreproducibility of the reported literature should also be encouraged.
Reproducibility forms the very foundation of science. Research scaffolding – the primary mechanism of progress in science – heavily relies on the reproducibility of the reported data. As researchers, we bear the ethical and moral responsibility of employing the best scientific practices, together contributing to the progress of research towards innovation and human advancement.