#Packages needed for this project
library(dataRetrieval)
library(broom)
library(tidyverse)
library(lubridate)

Here we can use the dataRetrieval package to get data from the internet. The whatNWISdata tells us what data are avalailable. The readNWISmeas provides the field measurement data. Info on the package can be found here:

And a link to the USGS parameter codes

For hydrology you want the physical parameters shown here

availableData <- whatNWISdata(siteNumber = '06752280')

meas <- readNWISmeas(siteNumbers = '06752280')

#Now that we have obtained the data, let's clean this down to only the columns we want/need. Remember that select(something) will do that for us. The new command you see here "rename" will rename the column headers. What we are doing here is saying rename column 1 as Date, and so on. The next thing is that we are going to filter the data down to only include the years (2008-2020) and "Good" measurements. When the USGS takes a flow measurement they rate it as Good, Fair, Poor, or Unspecified. We only want the good ones.  
meas <- meas %>% 
  select(measurement_dateTime, gage_height_va, discharge_va,  measured_rating_diff) %>% 
  rename(Date = 1, gage_height = 2, discharge = 3, rating = 4) %>% 
  mutate(year = year(Date)) %>% 
  filter(year %in% 2008:2021, rating == "Good")

str(meas)

Now let’s fit a non-linear model to the data. Rembmer that stage-discharge relations (the rating curve) are typically non-linear. We will use nls, which you can read up on with “?nls”. In this model a1, C, and n are parameters and we have to give them initial values. After that R will find the best fit for each parameter. The equation will be Q = C(Stage - a1)^n where C is a fitting parameter, a1 is the depth at which flow is 0 (so if flow was zero at 1 ft depth a1 = 1), and n is an exponent that gives the function its concave up shape.

#Here we give a1 an initial value
a1 <- 0

#Next we run the model. This line says run a nls model where Discharge = C*(gage height - a1)^n. In R statistical models the ~ is similar to equals. It basically means predict discharge using this function. 
model <- nls(discharge~C*(gage_height-a1)^n, start = list(C = 1, n = 1.2), data = meas, trace = FALSE)

#Now that we have run the model we can get a summary of the ouput. And we can also pull out the parameters that we are intersted in (C and n).
summary(model)
C1 <- coef(model)[1]
n1 <- coef(model)[2]

#Now we can plot the data and add a geom_smooth. This is just for visual, it isn't the actual model we just fit. Next we annotate the figure with the equations (i.e., the rating curve).
ggplot(meas, aes(x = gage_height, y = discharge, color = Date)) +
  geom_point() +
  theme_bw(base_size = 16) +
  xlab("Gage height (ft)") +
  ylab("Discharge (cfs)") +
  geom_smooth(color = "red") +
  annotate("text", x = 3, y = 1800, size = 7, parse = TRUE, label = as.character(expression(paste("Q =", 5.97, "*", Stage^{2.98}))))

#Here are other functions from the broom package that make it easier to look at the output of a model.
tidy(model)
glance(model)
augment(model)

Last have a look at all of the data. What do you see? Do you think the rating curve is consistent through time? Why or why not?

all_meas <- readNWISmeas(siteNumbers = '06752280')

all_meas <- all_meas %>% 
  select(measurement_dateTime, gage_height_va, discharge_va,  measured_rating_diff) %>% 
  rename(Date = 1, gage_height = 2, discharge = 3, rating = 4) 

Let’s plot the data and have a look.

#Here we have to transform the date format in order to change the color ramp from the default (blue) to something with more contrast. 
all_meas$Date <- as.Date(all_meas$Date, format = "%m-%d-%Y-h-m-s")

#Then make the plot
ggplot(all_meas, aes(x = gage_height, y = discharge, color = Date)) +
  geom_point() +
  theme_linedraw(base_size = 16) +
  xlab("Gage height (ft)") +
  ylab("Discharge (cfs)") +
  geom_smooth(color = "red") +
  scale_color_gradient(low = 'cyan', high = 'deeppink', trans = "date")

#Remember that you can always check the structure of the data frame with:
str(all_meas)