Lecture 6: Reproducibility and replication, Part 2

This was our second lecture on reproducibility and replication in which we discussed false discoveries, effect sizes, and p-hacking / researcher degrees of freedom.

The previous lecture provided a high-level overview of the ongoing replication crisis in the sciences. In this lecture we continued the discussion, first by talking about false discoveries. Following Felix Schönbrodt’s excellent blog post, we talked about how underpowered studies lead to false discoveries. Then we went on to discuss effect sizes, specifically Cohen’s d and the AUC, through this excellent visual tool.

Next we spoke about post-hoc data analysis and p-hacking. We looked at the False-Positive Psychology paper by Simmons, Nelson & Simonsohn, which has an illustrative example of how one can arrive at non-sensical conclusions if there’s enough flexibility in data collection and analysis. Gelman and Loken’s The Garden of Forking Paths makes a similar point, noting that this can often occur without mal intent on the part of the researcher. While these issues are complex, there are few best practices (e.g., running pilot studies followed by pre-registration of high-powered, large-scale experiments) that can help mitigate these concerns. Registered reports are a particularly attractive solution, wherein researchers write up and submit an experimental study for peer review before the study is conducted. Reviewers make an acceptance decision at this point based on the merit of the study, and, if accepted, it is published regardless of the results. We also discussed how these ideas that come largely from randomized experiments might be adapted for observational studies.

We finished up class by talking about a few tools for computational reproducibility, specifically RMarkdown for reproducible documents and Makefiles for efficient workflows. Example files are up on Github.

References:

A guide on effect sizes and related blog post
Interpreting Cohen’s d effect size
The New Statistics: Why and How by Cummings
The Insignificance of Significance Testing by Johnson
The Insignificance of Null Hypothesis Significance Testing by Gill
Why Most Published Research Findings Are False
Felix Schönbrodt’s blog post and shiny app on misconceptions about p-values and false discoveries
Calculating the power of a test
Power failure: why small sample size undermines the reliability of neuroscience by Button, et. al.
False-Positive Psychology by Simmons, Nelson & Simonsohn
The garden of forking paths by Gelman & Loken
The cumulative effect of reporting and citation biases on the apparent efficacy of treatments by de Vries et al. (popular coverage)
Pre-registration portals from the Open Science Framework, Center for Open Science, and AsPredicted.org
Science magazine’s announcement of registered reports
Why Use Make by Mike Bostock
GNU Make for Reproducible Data Analysis
RMarkdown cheatsheet
RStudio’s RMarkdown site
The RMarkdown: The Definitive Guide book