Lecture 10: Networks I | Modeling Social Data

We used this lecture to first go through applications of logistic regression and then to discuss the history of network science.

We started off this lecture by revisiting logistic regression, looking at the problem of modeling which passengers survived the Titanic disaster. We saw that interpreting logistic regression results can be challenging, as coefficients give information about changes in log-odds (as opposed to probabilities directly). We stressed the idea of converting back to probabilities and visually comparing predicted and actual values for a range of feature values to better understand the model fit. See this notebook for details.

Next we discussed Vowpal Wabbit (VW), an open source tool for various machine learning tasks. VW has many attractive features, such as a flexible input format, speed, scalability, and sensible defaults. For binary classification, VW defaults to fitting a (clipped) linear model to minimize squared loss. We looked at an example of classifying news with VW to get a sense of the interface and performance, which is quite competetive.

Then we moved on to a history of nertwork science.

We talked about some of the earliest studies of networks, such as Jacob Moreno’s sociograms and Mark Granovetter’s work on the strength of weak ties. We contrasted theoretical models of graphs (e.g., Erdős–Rényi random graphs) to real-world networks, which tend to have highly skewed degree distributions as originally discussed in Derek de Solla Price’s studies of citation networks. At the same time, social networks typically have short path lengths, in the sense that one needs only to traverse a handful of links to connect a randomly selected set of people in the network.

We finished by discussing different types of networks that we might analyze as well as the various levels of abstraction available for representing them.