This was our second lecture on reproducibility and replication in which we discussed false discoveries, effect sizes, and p-hacking / researcher degrees of freedom.

The previous lecture provided a high-level overview of the ongoing replication crisis in the sciences. In this lecture we continued the discussion, first by talking about false discoveries. Following Felix Schönbrodt’s excellent blog post, we talked about how underpowered studies lead to false discoveries. Then we went on to discuss effect sizes, specifically Cohen’s d and the AUC, through this excellent visual tool.

Next we spoke about post-hoc data analysis and p-hacking. We looked at the False-Positive Psychology paper by Simmons, Nelson & Simonsohn, which has an illustrative example of how one can arrive at non-sensical conclusions if there’s enough flexibility in data collection and analysis. Gelman and Loken’s The Garden of Forking Paths makes a similar point, noting that this can often occur without mal intent on the part of the researcher. While these issues are complex, there are few best practices (e.g., running pilot studies followed by pre-registration of high-powered, large-scale experiments) that can help mitigate these concerns. Registered reports are a particularly attractive solution, wherein researchers write up and submit an experimental study for peer review before the study is conducted. Reviewers make an acceptance decision at this point based on the merit of the study, and, if accepted, it is published regardless of the results. We also discussed how these ideas that come largely from randomized experiments might be adapted for observational studies.

We finished up class by talking about a few tools for computational reproducibility, specifically RMarkdown for reproducible documents and Makefiles for efficient workflows. Example files are up on Github.

References: