Modeling Social Data

Spring 2017
Department of Applied Physics and Applied Mathematics
Columbia University
Instructor: Jake Hofman
Course Numbers: Applied Mathematics E4990
Time: Fridays, 10:10am-12:40pm
Location: 633 Seeley W. Mudd Building
310 Fayerweather Hall


This class focuses on data-driven models for social data—data that capture how people behave and interact with each other or with online platforms. The course will focus on the challenges that arise when working with large-scale observational data. We will present data science and data engineering methods needed for analyzing such real-world data at scale, focusing on learning models which balance predictive power and interpretability. In addition to core computational and statistical concepts, the course will also address practical issues around collecting, manipulating, and analyzing data with APIs, Unix tools, and statistical programming libraries.

Because the course builds on a wide range of fields, we do not have hard prerequisites but strongly suggest familiarity with some subset of: statistics, probability, machine learning, linear algebra, and a scripting language such as Python or R.

Course work will include writing R code, Python code, and shell scripting. Code will be distributed and collected via Git, hosted on Github.