Module 4 R Basics
4.1 Before Class #4
Read A Million Lines of Bad Code a blog post by David Robinson. (549 words, 5 minutes)
Read What is Statistics Good For? (398 words, 3 min)
4.2 Dataframes
You will rarely work with individual numeric values, or even individual numeric vectors. Often, we have information organized in dataframes, which is R’s version of a spreadsheet.
Let’s go back to my imaginary pet’s ages (make sure you have the correct vector in your global environment).
We will now create a vector of strings or characters that holds my imaginary pets’ names (we have to be careful to keep the same order then the my_pets_ages
vector).
Let’s now create a dataframe that contains info about my pets.
# create dataframe
my_pets <- data.frame(name = my_pets_names, age = my_pets_ages)
# print out dataframe
my_pets
## name age
## 1 Daisy 8
## 2 Violet 2
## 3 Lily 6
## 4 Iris 3
## 5 Poppy 1
CHALLENGE
There’s a number of functions you can run on dataframes. Try running the following functions on my_pets
:
summary()
nrow()
ncol()
dim()
What other functions can/do you think/know of?
4.3 Slicing your dataframe
There are different ways you can slice or subset your dataframe.
You can use indices for rows and columns.
## name age
## 1 Daisy 8
## [1] "Daisy" "Violet" "Lily" "Iris" "Poppy"
## [1] "Daisy"
You can use a column name or a row name instead of an index.
## [1] 8 2 6 3 1
## name age
## 1 Daisy 8
## [1] 8
Or you can use $
to retrieve values from a column.
## [1] 8 2 6 3 1
## [1] 8
You can also use comparisons to filter your dataframe
## [1] 1
# use which() inside dataframe indexing my_pets[row_number, column_number]
my_pets[which(my_pets$age == 8),]
## name age
## 1 Daisy 8
## [1] "Daisy"
## [1] "Daisy"
## [1] "Daisy"
CHALLENGE
Print out a list of pet names that are older than 3.
4.4 Adding new variables (i.e., columns) to your dataframe
So far the my_pets
dataframe has two columns: name and age.
Let’s add a third column with the pets’ ages in human years. For that, we are going to use $
on with a variable (or column) name that does not exist in our dataframe yet. We will then assign to this variable the value in the age
column multiplied by 4.
# create new column called human_years
my_pets$human_years <- my_pets$age * 4
# print dataframe
my_pets
## name age human_years
## 1 Daisy 8 32
## 2 Violet 2 8
## 3 Lily 6 24
## 4 Iris 3 12
## 5 Poppy 1 4
Inspect the new my_pets
dataframe. What dimensions does it have now? How could you get a list of just the human years values in the data frame?
4.5 Descriptive stats on dataframes
Let’s explore some functions for descriptive statistics.
CHALLENGE
Try running the following functions on my_pets$age
and my_pets$human_years
:
mean()
sd()
median()
max()
min()
range()
What other functions can/do you think/know of?
4.6 Note on coding style
Coding style refers to how you name your objects and functions, how you comment your code, how you use spacing throughout your code, etc. If your coding style is consistent, your code is easier to read and easier to debug as a result. Here’s some guides, so you can develop your own coding style: