We had a guest lecture from Çağatay Demiralp on data visualization.

Çağatay discussed both the principles and practice of data visualization, starting with historical examples of John Snow’s visualization of cholera outbreaks and Florence Nightingale’s infographic on causes of death in the army. He emphasized Stuart Card’s point that visualizations represent data in a way that amplifies cognition, making it easier to see patterns in data, a point nicely illustrated by Anscombe’s Quartet.

We discussed the perceptual aspects of visualizations, including Stevens’ Power Law, and experiments by Cleveland and McGill showing that not all visual encodings are created equal, and that the best encoding depends on the type of data being visualized. He closed with a discussion of different data visualization tools, including Mackinlay’s expressiveness / effectiveness tradeoff and Wilkinson’s grammar of graphics.

In the second part of class we look at ggplot2, Hadley Wickham’s popular implementation of Wilkinson’s grammar of graphics. We focused on using ggplot2 to effectively communicate information through visualizations. Every visualization should convey a point, preferrably one that can be summarized by a short sentence. This Jupyter notebook provides an intro to ggplot2, detailing how the choices we make in the visualization process affect the messages our plots and figures convey.

Readings and references: