Chapter 12 Lab 5: Low Flow Analysis (15 pts)

In this lab I am giving you most of the code. Your task will be to use that code to analyze low flows at the Gallatin Gateway gage and answer the summary questions at the end of this markdown. For you deliverable submit your answers to the summary questions in a word doc.

Link to repo here

Reading: Definitions and characteristics of low flows in EPA Environmental Modeling Community of Practice

Load packages for analysis. zoo will allow us to easily perform rolling means, and moments will allow easy calculation of skewness.

library(zoo)
library(tidyverse)
library(dataRetrieval)
library(moments)
library(patchwork)

theme_set(theme_linedraw())

12.1 What are low flow statistics?

Low flow design flows can be specified based on hydrological or biological data. Biological methods look more at water quality standards relevant to biota. The hydrologic method just looks at the statistical distribution of low flows over a period of time.

  • Just from this simple definition, can you think of a management situation where it would make sense to use the biological method? The hydrologic method? What are the advantages to each?

We will focus on hydrologic methods. You will most frequently see low flow stats in the format of xQy. So for example 7Q10 or 1Q10 are common design flows. Let’s look at the EPA definition of these and then break them down.

“The 1Q10 and 7Q10 are both hydrologically based design flows. The 1Q10 is the lowest 1-day average flow that occurs (on average) once every 10 years. The 7Q10 is the lowest 7-day average flow that occurs (on average) once every 10 years.” -EPA https://www.epa.gov/ceam/definition-and-characteristics-low-flows#1Q10

So the first number, the 7 in 7Q10 is how many days we will average flow over to calculate the statistic. Why does does this matter? Why not always use a 1 day flow record?

Then the second number is the return-interval of the flow, or the probability that a flow of that magnitude or lower will occur any given year. The 10 in 7Q10 means there is a 10 percent chance that the associated 7-day average flow or below will occur in any given year. Another way of saying this is that a flow of that magnitude or below occurs on average once every 10 years. However expressing it this way can be dangerous, especially with the opposite type of extreme flows: Floods. Why do you think it could be dangerous to say a flow of this magnitude or below will happen on average once every 10 years?

So, to calculate a 7Q10 we need: * 7-day mean-daily flows * The minumum value per year of those 7-day mean-daily flows * The return intervals of those flows minimum yearly flows

Because a 7Q10 flow means * There is a 10% chance (return interval = 10) that a river will have a average weekly flow of that level or below in a given year.

12.2 Get data

We will calculate the 7Q10 low flow statistic for the Gallatin Gateway USGS gage (06043500) using daily discharge data from 1990-2022. (parameter = 00060)

In the code below you will see that I have added a line of code to add a water year (addWaterYear). This allows us to start doing our analyes on the water year (Oct - Oct) as opposed to calendar year like we have in the past.

site_no <- "06043500"
startDate <- as_date("1990-10-01")
endDate <- as_date("2022-10-01")
parameter <- "00060"

Qdat <- readNWISdv(site = site_no, parameterCd = parameter, startDate = startDate, endDate = endDate) %>%
  addWaterYear() %>% 
  renameNWISColumns() %>% 
  select(-agency_cd)

12.3 Create the X days average flow record

Remember the 7 in 7Q10 means we are looking at the 7-day average flow. We just have daily values from the USGS gage, so we need to create this data record.

To do this we will calculate a rolling average, also called a moving-window average. This just means you grab the first 7 days, average them, then move the window of the days you are averaging forward a day, and average again… all the way through the record.

For your rolling mean you can have the window look forward, backward, or forward and backward. For example, a forward window takes the average of X number of records and places the value at the beginning. Backward places that value at the end, and both would put it in the middle. In the function we will use to do this, forward is a left align, backward is right align, and both is centered.

For example

data window = 1, 2, 3, 4, 5 (lots of values before and after this)

mean = 3

forward window/left align: 3, NA, NA, NA, NA

backward window/right align: NA, NA, NA, NA, 3

both/center align: NA, NA, 3, NA, NA

We could certainly set up some code to calculate this, but there is a nice and fast function in the zoo package for calculating rolling means. It is called rollmean().

As we write the code to do this analysis, we are going to keep in mind that we may want to calculate a different type of low flow, like a 1Q10, so we are going to store the x and y of the xQy low flow statistic as objects rather than including them several places in the code. That way we can just change them in one place and run the analysis to compute a different statistic.

Use rollmean() to create a column called xdaymean that is the 7-day rolling average.

#set x and y for xQy design flow
Xday <- 7
YrecInt <- 10

#X day rolling mean, don't fill the ends of the timeseries,
#don't ignore NAs, use a backward-looking window (right align)

12.4 Look at what a rolling mean does.

We just added a new column with the rolling mean, so let’s plot it and see what it did to the discharge record.

Let’s look at June-August 1991. You can’t see too well what is going on in the full record.

Create a line plot (geom_line) of daily flow and rolling mean flow for 6/1/1991 to 9/1/1991.

What’s the rolling mean doing?

12.5 Calculate yearly minimums

Okay, we have our X-day rolling mean. Now we need to calculate the probability that a given magnitude flow or below will happen in a given year. Because we are concerned with a given year we need the lowest flow per year.

We will calculate minimum flow per year by gouping by the waterYear column, and using the summarize function to calculate the minimum flow per water year.

12.5.1 Challenge

  • create a new df called Q_yearly_mins that has year and minimum annual Q calculated from your 7-day rolling mean.

12.6 Calculate return interval

Now that we have a df that contains our yearly minimum flows, we can calculate the return interval as

\(Return Interval = (n + 1) / rank\)

Where n is the number of records in the data (number of years) and rank is the rank of each year’s low flow (lowest flow = rank 1 and so on). We can calculate the rank with the rank() function in base R. In the rank function we will specify that in the case of a tie, the first value gets the lower rank using ties.method = “first”. That code will look like:

mutate(rank = rank(minQ, ties.method = “first”))

You will also need to create a column of return interval with:

mutate(ReturnInterval = (length(rank) + 1)/rank)

  • the length() function returns the length (n) of the column.

We can then transform that to an exceedence probability as

\(Exceedence Probability = 1 / Return Interval\)

# add rank column and return interval column (uncomment the code below)

# Q_yearly_mins <- Q_yearly_mins %>%
#                 mutate(rank = rank(minQ, ties.method = "first")) %>%
#                 mutate(ReturnInterval = (length(rank) + 1)/rank) %>%
#                 mutate(ExceedProb = 1 / ReturnInterval)
  • use patchwork to create a figure with minQ on the y and return on the x and another with minQ on the y and exceedence probability on the x.

12.7 Fit to Pearson Type III distribution

Source for these calculations: https://water.usgs.gov/osw/bulletin17b/dl_flow.pdf

We now have everything we need to calculate what the 10-year return interval flow is (the 0.1 probability flow). To do this, we have to fit a distribution to our data and then use that fitted distribution to predict the value of the 10-year return interval flow.

This may sound a little complex, but let’s think about it this way:

  • You have some data, let’s say: heights of students
  • You did some tests on it and know it is a normal distribution
  • If you measure the mean and standard deviation of that distribution, you could create a “fitted” representation of your distrubtion by generating a normal distribution with the same mean and standard deviation with the rnorm() function.
  • Now you could plot that fitted, synthetic distribution as an ECDF and read the plot to determine, say, 10% of students (0.1 probability) are at or above what height?

Assume the average height from your data was 65 inches and the standard deviation was 6 inches (this is 100% made up), let’s look at it.

# fitteddistribution <- rnorm(1000, mean = 65, sd = 6) %>%
#   as_tibble()
# 
# ggplot(fitteddistribution, aes(x = value))+
#   stat_ecdf()+
#   xlab("height (in)")+
#   ylab("probability")

To get our 10 year return period (0.1 exceedence probability) we are going to do the same thing, except we know the distribution of the data isn’t normal, so we have to use a different distribution.

There are a bunch of “extreme value” distributions used in these types of analyses. When we talk about floods we will use the Gumbel distribution, for example. For this type of analysis, it is common to use the Pearson Type III distribution.

When we used the normal distribution example, we let R produce the distribution that fit our data. In this case we will use the equation that describes the Person Type III distribution. To predict flow at a given recurrence interval we will need the mean of the logged discharges (Xbar), the frequency factor (K), the standard deviation of the log discharges (S), skewness (g), and the standard normal variate (z). We will first compute this for all of the values in our dataset to see how the fitted values fit our calculated values.

Pearson Type III

\(Flow = exp(Xbar + KS)\)

where:

Xbar = mean of the log discharge you are investigating

K = frequency factor

S = standard deviation of log discharges

Frequency Factor

\(K = (2 / g) * ((1 +( g * z)) / 6 - ((g ^ 2) / 36)) ^ 3 - 1)\)

Skewness

g = skewness() from moments package

Standard normal variate

\(z = 4.91 * ((1 / y) ^ {0.14} - (1 - (1 / y)) ^ {0.14})\)

y = recurrence interval

#Measures of the distribution
# Xbar <- mean(log(Q_yearly_mins$minQ))
# S    <- sd(log(Q_yearly_mins$minQ))
# g    <- skewness(log(Q_yearly_mins$minQ))
 
#calculate z, K, to plot the fitted Pearson Type III (uncomment the code below)

# Q_yearly_mins <- Q_yearly_mins %>%
#   mutate(z = 4.91 * ((1 / ReturnInterval) ^ 0.14 - (1 - 1 / ReturnInterval) ^ 0.14)) %>%
#   mutate(K = (2 / g) * (((1 + (g * z) / 6 - (g ^ 2) / 36) ^ 3) - 1) ) %>%
#   mutate(Qfit = exp(Xbar + K * S))

Let’s look our results and see how they fit. Plot the return interval on the x axis and flow on the y. Plot minQ, the minimum Q data, and Qfit, the “data” from the the model fit.

Above we calculated z, K and the flow for each return interval in our data record to see how the distribution fit our data. We can see it fits quite well.

We can use the same calculations as we used on the entire record to calculate a specific return period of interest. In our case, the 10 year return period for the 7Q10.

We will set y equal to YrecInt, which we set above. This way we can just change it at the top of the code to run whatever xQy metric we want.

Here I am giving you the code, for fitting these models but look at the code above where we fit the distribution to all the data and the code below where we are specfically calculating the 7Q10 and be sure you understand what the code is doing. If you don’t, reach out to me so we can step through it.

#xQy ei: 7Q10
# y = YrecInt
# 
# #Find these values based on established relationships
# z    <- 4.91 * ((1 / y) ^ 0.14 - (1 - 1 / y) ^ 0.14)
# K    <- (2 / g) * (((1 + (g * z) / 6 - (g ^ 2) / 36) ^ 3) - 1) 
# 
# PearsonxQy <- exp(Xbar + K * S)

So, our 7Q10 flow in cfs for this gage is….

#Low flow stat (7Q10 in this case)
#PearsonxQy

12.8 Deliverable - Summary questions (12 pts)

  • submit your answers to these questions as a word doc on D2L.

12.8.1 Question 1 (6 pts)

  • Interpret your plots of minQ vs return interval and minQ vs exceedence probability. What does each tell you? Using these figures estimate the probability of having a year with minimum annual flow ~ 250 cfs. What return interval would you estimate for a year with miminum annual flow ~ 250 cfs?

12.8.2 Question 2 (3 pts)

  • what is your 7Q10?
  • In the context of low flows, describe what the 7Q10 is and how it might be used in water resource management.

12.8.3 Question 3 (6 pts)

  • You are a water resource planner for the city of Bozeman. Part of your job is to monitor low flows. You have noticed that the 7-day moving average has dropped below the 7Q10. What does this tell you? And what recommendations would you make to the city regarding water use?