This was the first of two lectures on the theory and practice of regression.

We started with a high-level overview of regression, which can be broadly defined as any analysis of how one continuous variable (the “outcome”) changes with others (the “inputs”, “predictors”, or “features”). The goals of a regression analysis can vary, from describing the data at hand, to predicting new outcomes, to explaining the associations between outcomes and predictors. This includes everything from looking at histograms and scatter plots to building statistical models.

We focused on the latter and discussed ordinary least squares regression. First, we motivated this as an optimization problem and then connected squared loss minimization to the more general principle of maximum likelihood. Then we discussed several ways to solve this optimization problem to estimate coefficients for a linear model, which are summarized in the table below.

Method Space Time Comments
Invert normal equations Good for medium-sized datasets with a relatively small number (e.g., hundreds or thousands) of features
Gradient descent per step Good for larger datasets that still fit in memory but have more (e.g., millions) features; requires tuning learning rate
Stochastic gradient descent per step Good for datasets that exceed available memory; more sensitive to learning rate schedule

See also this interactive Shiny App to explore manually fitting a simple model and this notebook by Jongbin Jung with an animation of gradient descent.

In the second half of class we looked at fitting linear models in R, with an application to understanding how internet browsing activity varies by age and gender. See the Jupyter notebook up on Github for more details. The main lesson here is that there’s more to modeling than just optimization, with many important steps along the way that range from collecting and specifying outcomes and predictors, to determining the form of a model, to assessing performance and interpreting results.

References: