This was our second lecture on reproducibility and replication in which we discussed false discoveries, effect sizes, and p-hacking / researcher degrees of freedom.
The previous lecture provided a high-level overview of the ongoing replication crisis in the sciences. In this lecture we continued the discussion, first by talking about false discoveries. Following Felix Schönbrodt’s excellent blog post, we talked about how underpowered studies lead to false discoveries. Then we went on to discuss effect sizes, specifically Cohen’s d and the AUC, through this excellent visual tool.
Next we spoke about post-hoc data analysis and p-hacking. We looked at the False-Positive Psychology paper by Simmons, Nelson & Simonsohn, which has an illustrative example of how one can arrive at non-sensical conclusions if there’s enough flexibility in data collection and analysis. Gelman and Loken’s The Garden of Forking Paths makes a similar point, noting that this can often occur without mal intent on the part of the researcher. While these issues are complex, there are few best practices (e.g., running pilot studies followed by pre-registration of high-powered, large-scale experiments) that can help mitigate these concerns. Registered reports are a particularly attractive solution, wherein researchers write up and submit an experimental study for peer review before the study is conducted. Reviewers make an acceptance decision at this point based on the merit of the study, and, if accepted, it is published regardless of the results. We also discussed how these ideas that come largely from randomized experiments might be adapted for observational studies.
We finished up class by talking about a few tools for computational reproducibility, specifically RMarkdown for reproducible documents and Makefiles for efficient workflows. Example files are up on Github.
References:
- A guide on effect sizes and related blog post
- Interpreting Cohen’s d effect size
- The New Statistics: Why and How by Cummings
- The Insignificance of Significance Testing by Johnson
- The Insignificance of Null Hypothesis Significance Testing by Gill
- Why Most Published Research Findings Are False
- Felix Schönbrodt’s blog post and shiny app on misconceptions about p-values and false discoveries
- Calculating the power of a test
- Power failure: why small sample size undermines the reliability of neuroscience by Button, et. al.
- False-Positive Psychology by Simmons, Nelson & Simonsohn
- The garden of forking paths by Gelman & Loken
- The cumulative effect of reporting and citation biases on the apparent efficacy of treatments by de Vries et al. (popular coverage)
- Pre-registration portals from the Open Science Framework, Center for Open Science, and AsPredicted.org
- Science magazine’s announcement of registered reports
- Why Use Make by Mike Bostock
- GNU Make for Reproducible Data Analysis
- RMarkdown cheatsheet
- RStudio’s RMarkdown site
- The RMarkdown: The Definitive Guide book