Module 5 Version Control

5.1 Before Class #5

5.1.1 Install git on your computer

Access the git download page and download the appropriate version for your machine.

If you have a Windows 10 machine, you can watch this video that shows you how to install Git on windows. When installing, note where it’s installed (on the “Select Destination Location” window) so you can check if you have the correct path to Git set up in RStudio (it’s usually C:\Program\Git).

If you have a Mac, you can watch this video that shows you how to install Git on a Mac.

5.1.2 Create a GitHub account

  1. Access the GitHub page.

  2. Click on “Sign Up for GitHub.”

  3. Fill out the “Create your account” forms.

  4. A verification will be sent to your email address, check your inbox for a “Please verify your email address” message. Click on “Verify email address” button.

If you already have a GitHub account, confirm you know your username and password by logging in at GitHub.

5.2 What is version control?

Version control is a best practice for reproducible analyses, and widely used in industry and research (i.e., you will need to know how to use version control in your future job).

The purpose of version control is to keep track of changes to your files over time, so that you can recall specific versions at any point in your project.

Git is an open source version control software system that is very popular – 58% of data scientist use Git (Beckman et al. 2020). There are a number of other version control software available (e.g., Perforce).

5.3 Submitting assignments

5.3.1 Join our GitHub classroom

  1. Access our first assignment and click on “Accept this assignment”

  2. A window with information about what GitHub Classroom wants to access from your GitHub profile will appear. Click on “Authorize github”.

5.3.3 Clone assignment repository

  1. Go to our first assignment GitHub repository

  2. Click on the “Code” button and copy the git url (e.g., https://github.com/ua-dt-sci/test-assignment-yournamehere.git)

  3. Open RStudio

  4. Go “File” > “New Project…”

  5. In the pop-up window, select “Version Control”

  6. Then choose “Git”

  7. In the “Repository URL:” field enter the link to the first assignment repository from your GitHub account.

  8. Click “Create”

5.3.4 Modify files

For the first assignment, which is a test assignment so you’re all set up to submitting all of your assignments for this class, you need to modify READM.md only. For other assignments you will need to edit .R scripts.

5.3.5 Commit changes

  1. On the top right panel in RStudio (i.e., Environment quadrant), click on the “Git” tab

  2. You will see a list of files, indicating which files have been modified (a blue “M” shows next to modified file).

  3. Click on “Commit” on the top of this tab

  4. A new window will pop-up. Stage the files you want to commit (click on the check box next to file) and enter a commit message.

  5. Press “Commit” and if everything looks good, close the commit window.

  6. Click “Push” on top right

5.4 DATA CHALLENGE 01

Accept data challenge 01 assignment

References

Beckman, Matthew D, Mine Çetinkaya-Rundel, Nicholas J Horton, Colin W Rundel, Adam J Sullivan, and Maria Tackett. 2020. “Implementing Version Control with Git as a Learning Objective in Statistics Courses.” arXiv Preprint arXiv:2001.01988.