Chapter 4 Lab 1: Data vis, wrangling, and programming (45 pts)

You can find the repo for this lab here

As we have done previously, go to Code, download zip and put folder on your computer. Open the project in that folder and then open the .Rmd. Remember that opening the project from that folder will set that folder as the “working directory”. Doing so means that when you read in your data, R will “look” in the right place to find it. Remember to rename the .Rmd file with your name in the file name.

This lab covers workflow and functions that you have seen and a few things that haven’t been explicitly demonstrated. Using functions that haven’t been explicitly demonstrated will help you in developing your ability to use new functions and workflows. Understanding how to utilize resources like Stack Overflow and package vignettes will help you solve coding problems. You can also get function and package help from the console by typing “?”. For example, you could type “?ggplot” or “?tidyverse” into the console and a help window will appear in the lower right. Also, when you are searching for coding help it is useful to be explicit about whether you are in base R or tidyverse.

4.1 Problem 1 (1 pts)

Load the tidyverse library.

Read in the pine_nfdr csv using read_csv(). Name this “pine_nfdr”.

Make a plot with the date on the x axis, discharge on the y axis. Show the discharge of the two watersheds as a line, coloring by watershed (StationID). Use a theme to remove the grey background. Label the axes appropriately with labels and units. Bonus: change the title of the legend to “Gauging Station”

4.2 Problem 2 (1 pts)

Make a boxplot to compare the discharge of Pine to NFDR for February 2010.

Hint: use the pipe operator and the filter() function.

4.3 Problem 3 (2 pts)

Read in the flashy csv. Name this “flashy”.

Create a new df called flashy_west that includes data for: MT, ID, WY, UT, CO, NV, AZ, and NM

For only sites in flashy_west: Plot PET (Potential Evapotranspiration) on the X axis and RBI (flashiness index) on the Y axis. Color the points based on what state they are in. Use the linedraw ggplot theme.

4.4 Problem 4 (2 pts)

We want to look at the amount of snow for each site in the flashy_west df. Problem is, we are only given the average amount of total precip (PPTAVG_BASIN) and the percentage of snow (SNOW_PCT_PRECIP).

Create a new column in the df called “snow_avg_basin” and make it equal to the average total precip times the percentage of snow (careful with the percentage number).

Make a barplot showing the amount of snow for each site in MT. Put station name on the x axis and snow amount on the y. You have to add something to geom_bar() to use it for a 2 variable plot. Use “?geom_bar” in the console and the internet to investigate.

The x axis of the resulting plot looks terrible! Rotate the X axis labels so we can read them.

4.5 Problem 5 (2 pts)

Create a new tibble called “flashy_west_thin” that contains the min, max, and mean PET for each state in flashy_west. Sort/arrange the tibble by mean PET from high to low. Give your columns meaningful names within the summarize function or using rename(). You haven’t seen rename yet. Use ?rename in the console or search for examples. When searching you need to indicate that you are looking for “tidyverse rename”. Rename is part of dplyr, which is part of tidyverse.

Be sure your code outputs the tibble.

4.6 Problem 6 (2 pts)

Take the tibble from problem 5 and create a new df by first creating a new column that is the Range of the PET (max PET - min PET). Then get rid of the max PET and min PET columns so the tibble just has columns for State, mean_PET, and PET_range.

Be sure your code outputs the tibble.

Save new_tib_thin as .csv files to your folder for this lab using the write_csv function. To get help on this function type “?write_csv” into the console.

4.7 Summary

  • How long did this lab take you?
  • Is there anything in particular you would like more practice with?
  • Knit this .Rmd as an .html and submit your lab_1_your_name.html on D2L.