Homework 5

Key

Click link above for example solutions to this assignment.

Instructions

Answer each of the following questions. Be sure to display all your code in the rendered version (use echo: true throughout). You can make this a global option for your whole document by putting it directly in the YAML of your qmd:

    title: "Homework 5"
    execute:
      echo: true
      warning: false

Download the billboard data set introduced in lecture to the same folder where you’re saving your qmd for this homework, or set the file path in read_csv to the correct location.

Exercises

  1. Read in the data, clean up the names, and pivot it in a way so the first few rows look like this:
> # A tibble: 5,307 × 6
>    artist  track                   time   date_entered  week  rank
>    <chr>   <chr>                   <time> <date>       <int> <dbl>
>  1 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       1    87
>  2 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       2    82
>  3 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       3    72
>  4 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       4    77
>  5 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       5    87
>  6 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       6    94
>  7 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       7    99
>  8 2Ge+her The Hardest Part Of ... 03:15  2000-09-02       1    91
>  9 2Ge+her The Hardest Part Of ... 03:15  2000-09-02       2    87
> 10 2Ge+her The Hardest Part Of ... 03:15  2000-09-02       3    92
> # ℹ 5,297 more rows
  1. Create a variable named date that corresponds to the week based on the date_entered. For instance, if the date_entered is 1-13-2000 and week is 1, then when week is 2 date will have a value of 1-20-2000. (Hint: Try using if_else() here). The first few rows should look something like this:
billboard_tidy_date
> # A tibble: 5,307 × 7
>    artist  track                   time   date_entered  week  rank date      
>    <chr>   <chr>                   <time> <date>       <int> <dbl> <date>    
>  1 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       1    87 2000-02-26
>  2 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       2    82 2000-03-04
>  3 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       3    72 2000-03-11
>  4 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       4    77 2000-03-18
>  5 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       5    87 2000-03-25
>  6 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       6    94 2000-04-01
>  7 2 Pac   Baby Don't Cry (Keep... 04:22  2000-02-26       7    99 2000-04-08
>  8 2Ge+her The Hardest Part Of ... 03:15  2000-09-02       1    91 2000-09-02
>  9 2Ge+her The Hardest Part Of ... 03:15  2000-09-02       2    87 2000-09-09
> 10 2Ge+her The Hardest Part Of ... 03:15  2000-09-02       3    92 2000-09-16
> # ℹ 5,297 more rows

If you get stuck on Problem 2, feel free to share in your submitted homework what you figured out and where you got stuck. You are also welcome to create a different column using any of the skills we have learned; describe what column you are creating and how you did it. You do not need Problem 2 in order to do the rest of the homework.

  1. Create a dataset of the song(s) with the most weeks in the top 3 by month for the year 2000. The final dataset for Problem 3 should look like this, though you can call the columns whatever you want:
> # A tibble: 19 × 4
>    month artist              track                   peak_weeks
>    <dbl> <chr>               <chr>                        <dbl>
>  1     1 Aguilera, Christina What A Girl Wants                3
>  2     2 Savage Garden       I Knew I Loved You               4
>  3     3 Lonestar            Amazed                           4
>  4     4 Hill, Faith         Breathe                          5
>  5     4 Santana             Maria, Maria                     5
>  6     5 Hill, Faith         Breathe                          4
>  7     5 Santana             Maria, Maria                     4
>  8     6 Aaliyah             Try Again                        2
>  9     6 Anthony, Marc       You Sang To Me                   2
> 10     6 Hill, Faith         Breathe                          2
> 11     6 Santana             Maria, Maria                     2
> 12     6 Vertical Horizon    Everything You Want              2
> 13     7 Aaliyah             Try Again                        4
> 14     8 Sisqo               Incomplete                       4
> 15     8 matchbox twenty     Bent                             4
> 16     9 Janet               Doesn't Really Matte...          5
> 17    10 Madonna             Music                            4
> 18    11 Creed               With Arms Wide Open              4
> 19    12 Destiny's Child     Independent Women Pa...          5
  1. Pick one month of 2000 and visualize the entire charting trajectory of the songs that spent at least 1 week in the top 3 during that month. Hint: Start with the data set created in question 3. An example of what this could look like for April is provided below. Your plot does not have to look just like this one; your plot should look polished and be easy to interpret, but it does not have to be fancy.

Note: This is one way to do this problem but there are many approaches to every coding puzzle in R. If this skeleton code is useful, use it. If not, I’m happy to chat through your approach in office hours 🤓

Replace all instances of function, variable, and value with what you think the correct answer should be. Additional hints provided by hovering over the code annotation.

billboard_top3_month_viz <- billboard_tidy_date |> 
1  mutate(month = function(variable),
2         year = function(variable),
3         top3 = if_else(variable <= value & variable == value, 1, 0)) |>
4  mutate(month_peak = ifelse(variable > 0, variable, NA),
         .by = c(month, artist, track)) |> 
5  filter(function(month_peak == "value"),
         .by = c(track, artist)) 

6library(ggrepel)
ggplot(billboard_top3_month_viz, 
7       aes(variable, variable, group = variable, color = variable)) +
8  function +
9  geom_label_repel(data = billboard_top3_month_viz |> function(variable, by = track),
10                   mapping = aes(label = variable))
1
What month is associated with each row’s chart position?
2
Are there multiple years in this dataset?! Given that we’re interested in the Billboard Top 100 for 2000 it might be useful to have a variable that allows us to discriminate between years.
3
To create some indicator of top 3 status you’ll need to provide two conditions (one variable needs to be less than or equal to a certain value and another needs to equal a certain value)
4
Need to create a variable that reflects when a particular song charted in the top 3 and NA otherwise
5
In order to get the entire trajectory of a song we can’t simply filter for the month when it peaked. Then we’d only be able to plot that snippet of its trajectory. Hint: Which function returns TRUE if even 1 element it’s given is TRUE?
6
Load if you want to add labels
7
You want to visualize the ranking trajectory of a song over time. Hint: group is the variable you want to visualize.
8
What geometry would be appropriate here?
9
To properly label this plot you’ll need to subset the data, otherwise it will try to plot a label for every date available.
10
Which variable’s text are you trying to label?
Before you submit:

Have you remembered to add embed-resources: true to your YAML?