Modeling Social Data

Spring 2019
Department of Applied Physics and Applied Mathematics
Columbia University
Course Number: Applied Mathematics E4990
Time: Fridays, 10:10am-12:40pm
Location: 602 Hamilton Hall
Instructor: Jake Hofman, Adjunct Assistant Professor &
Senior Researcher at Microsoft Research
Contact: jmh2045 [at] columbia [dot] edu
Course discussion: Piazza


This class focuses on data-driven models for social data—data that capture how people behave and interact with each other or with online platforms. The course will focus on the challenges that arise when working with large-scale observational data. We will present data science and data engineering methods needed for analyzing such real-world data at scale, focusing on learning models which balance predictive power and interpretability. In addition to core computational and statistical concepts, the course will also address practical issues around collecting, manipulating, and analyzing data with a focus on reproducibile research.

Because the course builds on a wide range of fields, we do not have hard prerequisites but strongly suggest familiarity with some subset of: statistics, probability, machine learning, linear algebra, and a scripting language such as Python or R.

Course work will include writing R code, Python code, and shell scripting. Code will be distributed and collected via Git, hosted on Github.