Chapter 8 Lab 2: Statistics in hydrology (30 pts)
The repo for this lab can be found here
Address each of the questions in the code chunk below and/or by typing outside the chunk (for written answers).
8.1 Problem 1 (5 pts)
Load the tidyverse and patchwork libraries and read in the flashy and pine_nfdr datasets.
Using the flashy dataset, generate two new dataframes. One for the “WestMnts” and one for the “NorthEast” AGGECOREGION. Name these flashy_west and flasy_ne. Next make pdfs of the average basin rainfall (PPTAVG_BASIN) for the WestMnts (flashy_west) and NorthEast (flashy_ne) agricultural ecoregions. On each pdf add vertical lines showing the mean and median. Label the x axis “Average precipitation (mm)” and the y “Density”. Set the x scale limits to 0 - 500 using xlim(c(0, 500)). Save each ggplot as an object and stack them on top of each other. Provide a caption below the figure that states which is on top, which is on bottom, and which color is the mean and which is the median.
Use patchwork to stack the NE and West ggplots.
8.2 Problem 2 (5 pts)
Calculate the SD and IQR for precipitation for the MtnsWest and Northeast ag-ecoregions. Using the SD, IQR and density plots from above, comment on the distributions of precipitation for the MtnsWest and Northeast ag-ecoregions. Which has a larger spread?
8.3 Problem 3 (5 pts)
Next, make Q-Q plots and perform a Shapiro-Wilk test for normality on the precipitation data sets for the MtnsWest and Northeast ag-ecoregions. Using the results from these tests discuss whether or not the distributions are normal. Also if you based your decision as to whether the data sets were normal on the pdfs you developed in problem 1, the Q-Q test, and Shapiro-Wilk test would each lead you to same conclusion? Please comment.
8.4 Problem 4 (5 pts)
Make a plot that shows the distribution of the data from the PINE watershed and the NFDR watershed (two pdfs on the same plot). Log the x axis, label the x axis “Flow (cfs)” and the y axis “Density”.
8.5 Problem 5 (5 pts)
You want to compare how variable the discharge is in each of the watersheds in question 4. Which measure of spread would you use and why? If you wanted to measure the central tendency which measure would you use and why?