This was the first of two lectures on the theory and practice of regression.

In the first part of class we shifted from talking about problems in how science is often done to best practices for doing good science. We went through the pipeline of designing a study, piloting and revising it, doing a power calculation, pre-registering the study, running it, creating a reproducible analysis and report, and thinking critically about the results.

Next we moved on to regression. We started with a high-level overview of regression, which can be broadly defined as any analysis of how one continuous variable (the “outcome”) changes with others (the “inputs”, “predictors”, or “features”). The goals of a regression analysis can vary, from describing the data at hand, to predicting new outcomes, to explaining the associations between outcomes and predictors. This includes everything from looking at histograms and scatter plots to building statistical models.

We focused on the latter and discussed ordinary least squares regression. First, we motivated this as an optimization problem and then connected squared loss minimization to the more general principle of maximum likelihood. Then we discussed several ways to solve this optimization problem to estimate coefficients for a linear model, which are summarized in the table below.

Invert normal equations $N K + K^2$ $K^3$ Good for medium-sized datasets with a relatively small number (e.g., hundreds or thousands) of features
Gradient descent $N K$ $NK$ per step Good for larger datasets that still fit in memory but have more (e.g., millions) features; requires tuning learning rate
Stochastic gradient descent $K$ $K$ per step Good for datasets that exceed available memory; more sensitive to learning rate schedule