Skip to main content

Analytic transparency is key for reproducibility of agricultural research

Abstract

There are growing concerns over the failure of attempts to confirm findings from past studies in various disciplines, and this problem is now known as the “reproducibility crisis” or “replication crisis”. In the agricultural sciences, this problem has remained unappreciated, underreported and there are deficiencies in efforts to tackle it. According to a recent analysis, it is difficult to reproduce on-farm experiments due to the lack of research transparency. Non-reproducible research does not only waste resources, but it can also slow down scientific progress and undermine public trust. In this commentary, my aim is to draw attention to the evolving concepts and terminology used in characterizing reproducibility and the common reasons for non-reproducibility of past research. I argue that analytic transparency is not only key for reproducibility of research but it can facilitate systematic reviews, meta-analyses and evidence mapping to guide formulation of evidence-based policies and practices.

Background

There are growing concerns over failure of attempts to confirm findings from past studies. This problem, now called the “reproducibility crisis” or “replication crisis”, has been documented in the agricultural sciences, biomedical sciences, computer science, economics and many other disciplines (Baker 2016; Begley and Ellis 2012; Breznau et al. 2022; Botvinik-Nezer et al. 2020; Camerer et al. 2018; Chang and Li 2018; Cockburn et al. 2020; Ioannidis 2005; Ioannidis et al. 2017; Kool et al. 2020; Lash 2017; Open Science Collaboration 2015).

Using simulations, Ioannidis (2005) showed that most research results were not reproducible. Since then, empirical evidence supporting this claim has accumulated. For example, Begley and Ellis (2012) were unable to reproduce results in over 89% of 53 landmark animal experiments. Similarly, Prinz et al. (2011) were unable to reproduce 75–80% of the published data from 67 landmark projects in the field of oncology, cardiovascular and women’s health. In an analysis by Freedman et al. (2015), the cumulative total prevalence of non-reproducible preclinical research exceeded 50%. In the field of psychology, the Open Science Collaboration (2015) conducted replications of 100 studies published in three journals and found that replication effects were only half the magnitude of the original effects. Camerer et al. (2018) replicated 21 studies in the social sciences published in Nature and Science between 2010 and 2015 and found replicability in 57–67% of the studies; over 33% being not replicable. Chang and Li (2018) examined 60 published studies from 13 economics journals and found that less than 50% of results were reproducible. Similarly, Ioannidis et al. (2017) surveyed 64,076 estimates of economic parameters in over 6,700 empirical studies and found that nearly 80% of the reported effects were exaggerated. According to a 2016 Nature survey, approximately 60% of researchers in the field of biology could not reproduce their own findings, while over 70% were unable to reproduce the findings of other researchers (ATCC 2022; Baker 2016).

This low level of reproducibility can slow down scientific progress and undermine public trust in research (Freedman et al. 2015). There is also a monetary cost of non-reproducible research. In the USA alone, approximately US$ 28 billion is spent annually on non-reproducible preclinical studies (Baker 2015; Freedman et al. 2015). As much as 85% of the expenditure in biomedical research is also wasted (Chalmers and Glasziou 2009) due to factors that contribute to non-reproducibility such as inappropriate study design, failure to adequately address biases, omitting null results, and insufficient descriptions of interventions and methods (ATCC 2022). As a result, academics, funding agencies, publishers, industry and other stakeholders have called for transparency and reproducibility of research (Feger and Woźniak 2022; Macleod et al. 2021). Several initiatives and joint efforts have also been launched to promote reproducibility in the medical and other biosciences (see Macleod et al. 2021).

In the agricultural sciences, the full extent of the reproducibility crisis has not yet been characterized (Bello and Renter 2018) and the problem remains unappreciated and underreported. This is probably because researchers are more preoccupied by other research-related problems (e.g., funding, manpower, facilities, etc.) than the reproducibility problem. The very few reports available (e.g., Beillouin et al. 2019, 2022; Bello and Renter 2018; Kool et al. 2020; Lightfoot and Barker 1988) have revealed deficiencies in the reproducibility of both primary studies and meta-analysis. For example, after reviewing 172 papers of on-farm experiments, Kool et al. (2020) concluded that it was difficult (or impossible) to reproduce the results. As early as the 1980s, Lightfoot and Barker (1988) warned that many if not most farming systems research projects have failed to provide useful information to farmers or to station-based researchers. Given the potential role of on-farm experimentation in transforming global agriculture (Lacoste et al. 2022), there is an urgent need for creating a common understanding of reproducibility among stakeholders. The available literature suggests lack of clear definition of concepts, terminology, indicators and metrics of reproducibility of agricultural research. In this commentary, first I will try to draw attention of readers to the evolving concepts and often confusing terminology used in characterizing reproducibility and the main reasons for failures to reproduce results from past studies.

Multiplicity of concepts and definitions

Several distinct concepts and terminology are used interchangeably to characterize reproducibility in the different fields of science (Barba 2018; Freedman et al. 2015; Goodman et al. 2016; Peng and Hicks 2021). Overlapping uses of the same terms and/or different terms for the same concept may cause confusion. There is also an on-going debate on how the various terms should be defined (McArthur 2019; Plesser 2018), and different disciplines use terminologies contradicting each other (Barba 2018; Feger and Woźniak 2022). In the agricultural literature, the terms repeatability and reproducibility are often used in quantifying errors in measurement (see Niero et al. 2020; Svensgaard et al. 2019), which differs from their use in characterizing reproducibility of research. Agronomic research often generates recommendations based on small-plot trials on research stations (Laurent et al. 2022), while on-farm research is increasingly used with the donor pressure to move quickly from piloting to demonstrating impact at scale (de Roo et al. 2019; Kool et al. 2020). In that context, the terms validation and verification are sometimes applied in reference to testing the performance of technologies or varieties on farmers’ fields (Laurent et al. 2022; Johnston et al. 2003). The terms replication, verification and validation are also used loosely and sometimes interchangeably with reproducibility (de Roo et al. 2019; Kool et al. 2020; Johnston et al. 2003). Kool et al. (2020) and de Roo et al. (2019) used the terms “internal validity” and “external validity” in apparent reference to reproducibility and generalizability, respectively. Obviously, confusion of concepts and terminology can create misunderstanding among researchers, funding agencies and other stakeholders. In the following sections, I aim to summarise the current understanding of repeatability, replicability, reproducibility and validity, and suggest their standardisation in agriculture and biosciences. My aim is not to redefine these terms but to inform the reader about the distinction between the terms as applied in the reproducibility literature.

Repeatability

In the reproducibility literature, repeatability pertains to the degree to which the same team can reproduce its own findings. Therefore, the meaning of “repeatability” is different from its use in connection with measurement errors e.g., Niero et al. (2020) and Svensgaard et al. (2019). Here, the focus is on a complete study. Accordingly, a study is said to be repeatable if the original team of researchers can reliably reproduce the same results using the same experimental setup. This may involve repeating the same experiment over seasons, years or farms to verify performance. In that context, the terms verification and validation as often used in agricultural research are synonymous with repeatability. This is because the same team often does the small plot experiments and the follow-up on-farm verification. Therefore, validation or verification in that sense does not represent confirmation of the research findings by an independent team.

Replicability

Replicability is the degree to which a different team can reproduce the findings of a previous study using the same experimental setup. This should not be confused with the traditional use of “replication” in agricultural research where a treatment is applied on several farms, and a farm is often used as a single replicate (Kool et al. 2020) as part of a single experimental design. Replications in that context do not represent an independent test of the hypothesis, and therefore they cannot be used to provide evidence for the reproducibility of the findings (see Vaux et al. 2012). In the reproducibility literature, a replication is a complete study designed to confirm the findings of another team of researchers (Vaux et al. 2012). A study is said to be replicable if another team can produce the same results using the same source materials and the same experimental setup (FASB 2016; NASEM 2019; Plesser 2018). According to the Federation of American Societies for Experimental Biology (FASB 2016), replicability should only be used when referring to repeating the results of a specific experiment rather than an entire study (Barba 2018). On the other hand, the American Statistical Association (ASA) applies replicability to the act of repeating an entire study independently (Broman et al. 2017).

Reproducibility

Reproducibility is the degree to which a different team of researchers can reproduce the findings of a previous study using a different experimental setup (FASEB 2016). The meaning of “reproducibility” is different from its traditional use in quantifying measurement errors in agricultural research (e.g., Niero et al. 2020; Svensgaard et al. (2019). Here, “reproducibility” is used in reference to the methods, results, or inferences of a complete study (for details see Goodman et al. 2016) or a meta-analysis (Beillouin et al. 2019). Accordingly, a study is said to be reproducible if an independent group of researchers can produce similar or nearly identical results using comparable materials but under a completely different experimental setup (FASB 2016). According to the ASA definition (Broman et al. 2017), a study is also reproducible if one can take the original data and the computer codes and reproduce all of the numerical results. This definition is also followed by the National Academies of Sciences, Engineering, and Medicine (NASEM 2019). In the context of on-farm experimentation, Kool et al. (2020) equated reproducibility with “internal validity”, and implied that an experiment can be reproducible when the published paper includes descriptions of all relevant factors.

Validity

Validity refers to the extent of systematic error (bias) in a study (CEE 2022; Frampton et al. 2022), and two types have been recognized; internal and external validity. Internal validity refers to the extent of systematic errors inherent to a study resulting from flaws in study design or conduct (CEE 2022; Frampton et al. 2022). Internal validity is not only an issue in the reproducibility of primary studies but it is also a threat to systematic reviews and meta-analysis (Frampton et al. 2022). Since it is not possible to directly measure internal validity of primary studies, an indirect approach is used to infer the risk of bias by examining the study design and methods (CEE 2022). Accordingly, the risk of bias is the likelihood that features of the study design or conduct of the study will give misleading results (CEE 2022; Frampton et al. 2022).

On the other hand, generalizability refers to an inference from a sample (S1) to population the (P1) (Findley et al. 2021). Transportability refers to inferences based on a sample from one population (e.g., S1 of P1) but targeted at a different population (P2) (CEE 2022; Findley et al. 2021). As such transportability may be viewed as the degree to which a finding is broadly applicable across different conditions, systems, populations or contexts that differ significantly from the original study (FASB 2016; Spake et al. 2022). Transportability may also refer to the degree to which relationships found in a study apply in different situations or the transferability of the predictions (see Spake et al. 2022). For more details, readers are encouraged to refer to the standards for appraisal of the validity of studies provided by the Collaboration for Environmental Evidence (CEE 2022). Critical appraisal skills program (CASP) provide users with a checklist (https://casp-uk.net/casp-tools-checklists/) to appraise various kinds of studies and systematic reviews.

Reasons for the non-reproducibility of research

The growing body of literature suggests that the causes of non-reproducible research are diverse, and range from fraudulent research (Ioannidis 2005) to many innocent mistakes, systemic failures and poor reporting (AMS 2015; ATCC 2022; Botvinik-Nezer et al. 2020; Breznau et al. 2022; Open Science Collaboration 2015). For the purpose of this commentary, I will focus on those factors identified by the American Type Culture Collection (ATCC) and the Academy of Medical Sciences (AMS) as both present a more comprehensive list compared to many other sources. The ATCC (2022) identified six reasons for non-reproducibility in the life science research: (1) inaccessibility of methodological details, raw data, and research materials; (2) use of misidentified, cross-contaminated, or over-passaged cell lines and microorganisms; (3) inability to manage complex datasets; (4) poor research practices and experimental design; (5) cognitive biases including confirmation bias, selection bias, the bandwagon effect, cluster illusion, and reporting bias; and (6) a competitive culture that rewards novel findings and undervalues negative (null) results. Similarly, the AMS (2015) identified six reasons: (1) omitting null results, (2) data dredging; (3) under-powered studies; (4) technical errors; (5) flawed experimental design; and (6) under-specification of methods.

According to the Science Advisory Council of the USDA (McLellan et al. 2016), the issues identified by AMS are largely applicable to agricultural research. In the following passages, I will describe each of those issues briefly. Omitting null results refers to the malpractice of selectively publishing “significant” findings. Data dredging (also called P-fishing) is the practice of repeatedly searching the data for “significant” differences beyond the original hypothesis (Cockburn et al. 2020; Sileshi 2014). A related problem is post-hoc reframing of experimental intentions (formulating hypotheses after the results are known) to present a P-fished outcome (Cockburn et al. 2020). An under-powered study is one with an insufficient number of replicates or sample sizes to achieve the desired statistical power of >0.80 to identify a real effect. Recent evidence suggest that some of the failures to reproduce are the result of low statistical power (Ioannidis et al. 2017; Lash 2017). For example, in a survey of 64,076 estimates of economic parameters in 6,700 empirical studies, Ioannidis et al. (2017) found the median statistical power to be 0.18 or less. The small number of replicates (usually 3–4 on research stations and 1 on-farm) used in agricultural experiments often result in low statistical power.

Technical errors and flaws in experimental design may arise due to differences in the underlying study conditions, the sampled population and unexpected treatment by environment interactions in the new set up. For example, in on-farm experimentation, researchers have much less control over the variability in the biophysical and socio-economic environment (Kool et al. 2020). In a meta-analysis of studies on maize yield response to inorganic and organic nutrient inputs across sub-Saharan Africa, Sileshi et al. (2010) found large differences in yields between farmers’ fields and research stations due to factors that are generally non-transferable. For example, research stations were often located on good soils and managed well while most smallholder farming takes place on marginal soils and agronomic recommendations are not strictly followed (Sileshi et al. 2010).

With the increasing emphasis on moving experimentation from research stations to farmers’ fields, there are emerging challenges relating to experimental design and trial management. On-farm experimentation occurs at scales that are meaningful to farmers, requiring joint exploration by researchers, farmers and other stakeholders (Lacoste et al. 2022) but may not be amenable to conventional experimental design. Different paradigms of on-farm research have recently emerged, including “experimentation at scale”, “research for development” and “research in development”, which do not strictly follow the rules of traditional agricultural experimentation (de Roo et al. 2019; Kool et al. 2020). Some of the approaches do not seek to control variability in experimental conditions and different kinds of bias (Kool et al. 2020). After reviewing three case studies from sub-Saharan Africa and South Asia, de Roo et al. 2019) showed that the ‘research for development’ paradigm introduces biases in trial location, host-farmer selection, trial design, management and evaluation. This may not only limit reproducibility but also the validity of research results.

How can analytic transparency improve reproducibility?

Many of the causes of non-reproducibility identified in the previous section fall under inaccessibility of methodological details and/or quality of reporting and dimensions of research transparency (sensu Moravcsik, 2014). Lack of clarity in reporting has long been recognized as a hindrance in reviews and meta-analysis. Over the years, checklists and standards for transparent and complete reporting of systematic reviews and meta-analyses have been evolving. PRISMA is now widely used by a range of medical journals as a pre-submission checklist. Considering the known limitations of PRISMA for reviews in conservation and environmental management, Haddaway et al. (2018) introduced reporting standards for systematic evidence syntheses (ROSES). For details, I encourage readers to consult the ROSES reporting standards in Haddaway et al. (2018). Here, I will focus on the definition of research transparency and its three dimensions (Moravcsik (2014), namely, process, data and analytic transparency.

Process transparency

Process transparency is about the disclosure of the process behind the choice of the study population, research design, hypotheses, experimental setup, materials (e.g., animal breed, crop variety, etc.), randomization, field operations and specific procedures used to gather data. Experience shows that this is often not the case in agricultural research. For example, none of the 172 on-farm studies reviewed by Kool et al. (2020) explicitly defined the research population or provided the criteria used for selection of research sites, farmers and specific fields. Some 35–76% of the studies also did not provide information on pest management, land preparation, field history, weed management, water management and nutrient management (Kool et al. 2020). Similarly, in a survey of 271 animal experiments, Kilkenny et al. (2009) found that 87% of the papers did not report the randomization of treatments to subjects, and 86% did not report blinding. In addition to precluding reproducibility, poor process transparency can hamper research synthesis in the form of systematic reviews, meta-analyses and evidence maps (Beillouin et al. 2019; Haddaway and Verhoeven 2015). These kinds of syntheses combine primary studies over time to identify patterns that may go undetected by smaller studies (Haddaway and Verhoeven 2015). However, the evidence produced using systematic reviews and meta-analyses depends strongly on the quantity and quality of the primary studies included. Low process transparency in primary studies and meta-analyses can also compromise the quality of evidence maps. Beillouin et al. (2019) examined 99 meta-analyses of more than 3700 agronomic experiments on crop diversification and concluded that most of the meta-analyses are not fully transparent and reproducible. Beillouin et al. (2022) also found questionable transparency and reproducibility of some meta-analyses of land management effects on soil organic carbon stocks. Out of the 192 meta-analyses they examined, 32% did not present enough details to reproduce the literature search (Beillouin et al. 2022).

Data transparency

Data transparency is about providing a full disclosure of the input data used to support the claims in a publication (Moravcsik, 2014). This includes describing how the raw data was organized, cleaned and processed before analysis and providing metadata containing the definitions of each variable. Vital information such as background measurements, the rationale and criteria used for data transformation, conversion into different units, handling of outliers and missing datapoints is often left out from publications. By providing such information, authors can maximize the legacy and impact of their research (Haddaway 2014). Data transparency will not only promote reproducibility but researchers’ ability to produce robust meta-analyses and evidence maps.

Analytic transparency

Analytic transparency refers to how researchers code and analyse data, interpret results and arrive at specific conclusions (Moravcsik 2014). The value of step-by-step reporting of the analysis in achieving reproducibility in the animal sciences has been highlighted by Bello and Renter (2018). Here, I emphasize the need for the same approach in other areas of agriculture not only to promote reproducibility but also as a key to those conducting meta-analysis and evidence mapping. Inadequate reporting of key statistics and exclusion of null findings in primary studies pose challenges for estimating effect sizes in meta-analysis. Out of the 192 meta-analyses examined by Beillouin et al. (2022), 59% did not present adequate details to reproduce the statistical analyses.

Despite their best efforts, researchers often do not provide description of the model used, the justification for choosing a specific model and the underlying assumption, the power of test and the uncertainty around estimates. Sometimes researchers choose models arbitrarily although some are inadequate for certain datasets (see Sileshi 2006; 2008; 2012; 2014; 2021). Researchers also rarely report data needed for calculating effect sizes or the power of test. For example, in a reproducibility study of 193 experiments from 53 high-impact papers on cancer biology, Errington et al. (2021) found that the data needed to compute effect sizes and conduct power analyses were accessible only from 2% of the experiments. After contacting the authors, Errington et al. (2021) were also unable to obtain these data for 68% of the experiments.

Incomplete reporting of uncertainty is another common problem in primary studies, which hampers meta-analysis (Haddaway 2014). Uncertainty may be expressed either as outcome or inferential uncertainty. Outcome uncertainty is the variation of observations around the sample mean, which is represented by the standard deviation or the prediction interval. Inferential uncertainty is the uncertainty in the estimate of a population mean, which is represented either by the standard error or the confidence interval. In the absence of inferential uncertainty, it is impossible to make claims about the degree to which the sample estimates translate to the population.

There is also some subjectivity in the choice of analytic protocols, workflows, computational tools, algorithms and programming languages (Botvinik-Nezer et al. 2020; Hicks and Peng 2019; Silberzahn et al. 2018). As a result, the conclusions drawn from the same data can vary depending on the specific analytic choices. Recent studies (e.g., Botvinik-Nezer et al. 2020; Breznau et al. 2022; Silberzahn et al. 2018) have demonstrated that even teams provided with identical data to test the same hypothesis may not reliably converge in their findings. For example, Breznau et al. (2022) coordinated 161 researchers in 73 teams provided with the same data to test the same hypothesis and found both widely diverging numerical findings and substantive conclusions among teams. Similarly, Botvinik-Nezer et al. (2020) asked 70 independent teams to test the same hypotheses using the same dataset and found sizeable differences in the results because no two teams chose identical workflows. Workflows in many areas of science have become exceedingly complex, and there are many possible choices at each step of the analysis (Botvinik-Nezer et al. 2020). With the increasing complexities in analytic tools, the risk of human error and bias in data analysis can also increase (Peng and Hicks 2021). This provides a compelling case for analytic transparency.

Conclusions and recommendations

The literature points that most of the causes of non-reproducibility fall under lapses in dimensions of research transparency. Therefore, research transparency in general and analytic transparency in particular can play a vital role in promoting reproducibility while also facilitating systematic reviews, meta-analyses and evidence mapping in the agriculture and the biosciences. In that spirit, I strongly recommend full disclosure of the: (1) process behind the choice of study populations, research design, hypotheses, experimental setup, randomization, field operations and specific procedures used to gather data; (2) input data, metadata and a complete description of how data were organized, cleaned and processed before analysis; (3) justification for the choice of analytic protocols, workflows, computational tools, algorithms and computer codes used; and (4) the power of test, inferential uncertainty around estimates and all null findings. While authors may be primarily responsible for the lack of analytic transparency, reviewers, editors and publishers also share some responsibility. I strongly recommend that authors take analytic transparency as their responsibility, while reviewers and editors make an effort to enforce the recommendations in line with applicable standards set by the journal. Some journals now provide checklists relevant to reproducibility, but they may not allow lengthy presentation of methods and data. In that case, detailed information on process, data and analytic transparency could be presented as Supplementary Online Materials. I also recommend that on-farm experiments be designed in such a way that they are reproducible.

Availability of data and materials

The author is willing to share all data and additional information on materials used in this study upon a written request to the corresponding author.

References

Download references

Acknowledgements

I would like to thank the anonymous reviewers for their invaluable comments.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Contributions

SGW conducted the review of literature, write-up of the manuscript, read and approved the final version.

Corresponding author

Correspondence to Gudeta W. Sileshi.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The author declares no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sileshi, G.W. Analytic transparency is key for reproducibility of agricultural research. CABI Agric Biosci 4, 2 (2023). https://doi.org/10.1186/s43170-023-00144-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s43170-023-00144-8

Keywords