3. Baby steps: Basics of coding in RStudio, part 2

Mason A. Wirtz https://masonwirtz.github.io

Excercise 1

Alright, so we’ve gotten to know a few functions. Let’s go ahead and review some of the important ones, specifically the ones that we use really often, like the summary statistics.

Compute the mean, sd, min and max of the ageOfVampire variable in the Vampires data frame.

Click for Answer

SOLUTION

mean(Vampires$ageOfVampire)
[1] 84.5
sd(Vampires$ageOfVampire)
[1] 32.79366
min(Vampires$ageOfVampire)
[1] 14
max(Vampires$ageOfVampire)
[1] 198

Remember, when we want to reach into a data frame, we need to use the $ operator. This tells R to reach into a specific data frame and carry out the function on the variable that comes after the $ operator.

Exercise 2

Nice job, you’re doing fantastic! So, since you were able to complete the last activity, let’s kick it up a notch and get to some fun stuff.

You’ve had a hard day, because, let’s be honest, that’s academia. You need some encouragement, but no one is home to tell you how amazing you are. Let’s fix this!

Load in the package called praise and then call the praise() function. Run this as many times as you need until you feel like the awesomest person on Earth, because you are!!

Click for Answer

SOLUTION

# install.packages("praise")
library(praise)
praise()
[1] "You are stupendous!"

Remember, if we haven’t installed a package yet, we ALWAYS need to run the install.packages() function, with the package name in parentheses. After that, we need to LOAD the package, cause otherwise we have ‘downloaded’ it, but we haven’t actually ‘opened’ it yet.

Exercise 3

Alright, so now that we know how to load in packages and call some useful functions, what happens if we forget functions, or if we have something we want to do, but don’t remember or know a helpful function for this? Well, GOOGLE is our friend!

In our Vampires data frame, we want to know how many male and female vampires there are. There are a few important steps we need to take to do this.

Exercise 3.1

FIRST, we NEED to make sure that all of our variables that should be treated as factor vectors are, indeed, factors. If you have read in the Vampires data set, chances are the gender variable was saved as a character vector, which we don’t want.

Go on Google and try to find out which function we can use to change a CHARACTER vector in a data frame to a FACTOR vector (googling something like “change character to factor in r” should do the trick). Your GOAL is to change the gender variable in the Vampires data frame from a CHARACTER vector to a FACTOR vector (you can see if it worked using the class(Vampires$gender) function.)

Click for Answer

SOLUTION

So, there are actually a lot of ways to do this, depending on whether you are using tidyverse (we will learn about this in the next section) or base R. I’ll show you a Base R example.

So, in base R, we work with our $ operator. Since we want to change a vector in a data frame, we will first have to tell R to change a vector in the data frame, which we do by defining a new variable Vampires$gender. Since we already have gender in the Vampires data frame, by defining our variable in this way, it will overwrite the character vector as a factor vector.

Vampires$gender = as.factor(Vampires$gender)

Exercise 3.2

Awesome, you’re doing so well! So, since we have now changed our gender variable to a factor, we can now count the factor levels, i.e. how many different levels does the factor have (in our case, I only entered female/male for the sake of simplicity). Use the table() function to count the factor levels of the factor gender (what do we need to feed into the table() function to make it count the factor levels?)

Click for Answer

SOLUTION

You guessed it! All we need to do is enter the classic Vampires$gender into the table function to have it count the number of factor levels each.

table(Vampires$gender)

Female   Male 
    56     44 

Exercise 3.3

Let’s use what we just learned to answer the following questions:

  1. How many vampires in the data frame are dead, and how many alive?

  2. How many vampires were born on each continent?

  3. How many vampires are married and how many divorced?

Click for Answer

SOLUTION

  1. How many vampires in the data frame are dead, and how many alive?

Well, we first have to factor our variable, then use the table() function.

Vampires$deadOrAlive = as.factor(Vampires$deadOrAlive)
table(Vampires$deadOrAlive)

Alive  Dead 
   59    41 
  1. How many vampires were born on each continent?
Vampires$bornIn = as.factor(Vampires$bornIn)
table(Vampires$bornIn)

       Africa    Antarctica          Asia     Australia        Europa 
            2            30             2            15             5 
North America South America 
           19            27 
  1. How many vampires are married and how many divorced?
Vampires$maritalStatus = as.factor(Vampires$maritalStatus)
table(Vampires$maritalStatus)

Divorced  Married   Single 
      47       20       33 

Oh my God, why are so many divorced?!

Exercise 4

Alright, we’re going to do some really fun statistics, cause why not?

Try to do the following (feel free to group up for these!)

  1. Which variables in the Vampires data frame are NUMERIC?

  2. Choose two NUMERIC variables and run a correlation using the cor.test() function. This is a fantastic chance to use the help environment to find out what you should enter into the cor.test() function! smiley emoji

  3. Install and load the package report. Go back and save your correlation test as the variable cor. Then run report(cor) and thank me later.

  4. INTERMEDIATE: Change the type of the correlation to a Spearman’s correlation.

Click for Answer

SOLUTION

  1. Which variables in the Vampires data frame are NUMERIC?

So, this one is pretty easy, but we need to know the right function to do this! We can easily use the str() function, cause this gives us the classes of all variables in a data frame.

str(Vampires)
spec_tbl_df [100 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ idVampire          : num [1:100] 1 2 3 4 5 6 7 8 9 10 ...
 $ gender             : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 2 1 1 2 2 ...
 $ ageOfVampire       : num [1:100] 89 192 67 23 40 105 58 67 88 122 ...
 $ deadOrAlive        : Factor w/ 2 levels "Alive","Dead": 2 2 2 2 1 1 1 1 1 2 ...
 $ hasFangs           : chr [1:100] "No" "Yes" "Yes" "No" ...
 $ bornIn             : Factor w/ 7 levels "Africa","Antarctica",..: 6 6 7 5 4 2 4 6 2 2 ...
 $ vampType           : chr [1:100] "hybrid" "hybrid" "sanguinarian" "psychic" ...
 $ wellbeing          : num [1:100] 61.6 71.4 64.2 24.7 47.2 ...
 $ maritalStatus      : Factor w/ 3 levels "Divorced","Married",..: 3 1 1 2 1 2 3 1 3 3 ...
 $ employment         : chr [1:100] "Employed" "Employed" "Not Employed" "Employed" ...
 $ income             : num [1:100] 131670 153860 154087 113842 144047 ...
 $ visitedCities      : num [1:100] 10 4 3 97 33 41 43 119 16 11 ...
 $ numberOfChildren   : num [1:100] 3 1 1 2 3 1 5 1 4 1 ...
 $ numberChangedToVamp: num [1:100] 23 7 9 11 3 5 13 21 4 8 ...
 - attr(*, "spec")=
  .. cols(
  ..   idVampire = col_double(),
  ..   gender = col_character(),
  ..   ageOfVampire = col_double(),
  ..   deadOrAlive = col_character(),
  ..   hasFangs = col_character(),
  ..   bornIn = col_character(),
  ..   vampType = col_character(),
  ..   wellbeing = col_double(),
  ..   maritalStatus = col_character(),
  ..   employment = col_character(),
  ..   income = col_double(),
  ..   visitedCities = col_double(),
  ..   numberOfChildren = col_double(),
  ..   numberChangedToVamp = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 
  1. Choose two NUMERIC variables and run a correlation using the cor.test() function. This is a fantastic chance to use the help environment to find out what you should enter into the cor.test() function! smiley emoji

So, if you know a little about statistics, you know that a correlation is just the strength of the association between two variables. This logically means that we need to enter two variables into the correlation analysis. Let’s say we want to know whether income correlates with wellbeing in vampires. REMEMBER, we need to define in which data frames our variables are coming from using the $ operator!!

cor.test(Vampires$wellbeing, Vampires$income)

    Pearson's product-moment correlation

data:  Vampires$wellbeing and Vampires$income
t = 2.5368, df = 98, p-value = 0.01276
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.05447568 0.42398305
sample estimates:
      cor 
0.2482377 

Nice, looks like there is something going on there! (Remember, since the data generated here are generated newly every time I remake the website, the results you get could be entirely different, so don’t let this scare you!! If you got a result, then you did it just fine, go you!)

  1. Install and load the package report. Go back and save your correlation test as the variable cor. Then run report(cor) and thank me later.
# install.packages("report")
library(report)

cor = cor.test(Vampires$income, Vampires$wellbeing)
report(cor)
Effect sizes were labelled following Funder's (2019) recommendations.

The Pearson's product-moment correlation between Vampires$income and
Vampires$wellbeing is positive, statistically significant, and medium
(r = 0.25, 95% CI [0.05, 0.42], t(98) = 2.54, p = 0.013)
  1. INTERMEDIATE: Change the type of the correlation to a Spearman’s correlation.
cor.test(Vampires$income, Vampires$wellbeing, method = "spearman")

    Spearman's rank correlation rho

data:  Vampires$income and Vampires$wellbeing
S = 119390, p-value = 0.004373
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.2835884