This class will involve a good deal of coding, for which you will need some basic tools. Please make sure to set up the following tools after the first day of class.
An interactive bash shell
This will give you the ability to interact with your filesystem via the command line instead of a GUI such as Windows Explorer or Mac Finder. We will also use bash to automate acquiring and cleaning data sets.
If you use Windows, you can try the builtin bash/Ubuntu shell on Windows 10 or you can install Cygwin which includes bash and a terminal application by default. Mac OS X includes a bash shell by default, and a terminal application in /Applications/Utilities
. Linux also includes a working shell and terminal.
Verify that your environment is properly configured by typing the following commands indicated after the #
symbol. You should see something similar (although not necessarily identical) to the following:
# echo $SHELL
/bin/bash
# grep --version
grep (BSD grep) 2.5.1-FreeBSD
# cut
usage: cut -b list [-n] [file ...]
cut -c list [file ...]
cut -f list [-s] [-d delim] [file ...]
If you’re new to the command line, see Codecademy’s interactive tutorial, this crash course, and Software Carpentry’s guide. Lifehacker’s command line primer is also decent.
O’Reilly’s Classic Shell Scripting book is a more complete reference.
A Git client
Git is a version control system that allows you to track modifications to files and code over time. It also facilitates collaborations so that multiple people can share and edit the same code base.
If you are on Windows you can install Github for Windows which provides both the command line tool for git and a graphical user interface. Alternatively, you can install git as an optional package under Cygwin. We recommend the Github application, as it will be easier to interface with Github using it. Likewise, modern versions of Mac OS X have a command line git client installed by default, but the Github for Mac tool is a recommended addition. Linux users can install git with the appropriate package manager (e.g., yum install git
on RedHat or apt-get install git
), and there are a number of different git GUIs for Linux.
Complete this relatively brief interactive tour of git. See this one page guide for explanations of the usual git workflow and most common commands, or here for a more verbose guide. Github also has an introductory video, some training courses, and a handy cheatsheet.
A Github account
Github is a platform that facilitates collaboration on projects that use git. You can use it to host projects, publish them to the web, and share them with other people. Create a free account if you don’t already have one.
Once you have an account, clone the course repository using your local git client. This is most easily done on the command line as follows:
# git clone https://github.com/jhofman/msd2019.git
Cloning into 'msd2019'...
remote: Enumerating objects: 6, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 6 (delta 1), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (6/6), done.
When this is complete, verify that you have a local directory called msd2019
containing a README.md
file.
R and RStudio
R is a useful programming language for exploratory data analysis, data visualization, and statistical modeling. RStudio is a popular integrated development environment (IDE) for working in R.
First, download and install R from a CRAN mirror. Then download Rstudio from here. Finally, install and load some important packages as follows:
install.packages('tidyverse')
library(tidyverse)
If you’re new to R, see the Code School and DataCamp online tutorials.
We will discuss all of these tools in more detail in class.