Chapter 14 Climate trend analysis (18 pts)

See here for Repo

In this lab you are going to work with weather / climate data from the MSU weather station that has data from about 1900 - now. You will use the Mann-Kendall test to determine if there are significant trends in various climate data and will also calculate the slope of any trend using the Sens’s slope.

We will evaluate trends over different time frames. 100+ years and the climate normal period (1991 - 2020). As you work on this lab think about what might be an appropriate time frame for various questions. Longer (at least 30 y) is generally better. A few years really wouldn’t cut it. But think critically about why you would use different timescales. As an example, dendro-hydrology might use thousands of years to evaluate longer-term climate trends.

We will also start to discuss functions and iteration. We will work on interation and automation in the coming weeks.

14.1 Summary questions and deliverable

For this lab you will answer the following questions and submit your lab on Canvas as a word doc. Insert tables and figures into your word doc as appropriate. Always provide a caption for any tables and figures.

  1. Describe what the Mann-Kendall (MK) test does and what the Sen’s slope is and why they are appropriate for climate data. (2 pts)

  2. Provide a table (Table 1) of MK p-values and Sen’s slopes over the entire period (1900 – current). (4 pts)

  3. For the average temperature provide a figure that has time on the x and average T on the y. Fit a stat_smooth to this data “stat_smooth(method =”lm”)” (2 pt).

  4. Provide a table (Table 2) of MK p-values and Sen’s slopes over the climate normal period 1991 – 2020. Compare and contrast this to what you found (p-value and Sen’s slope) over the entire period from table 1. (5 pts)

  5. In this lab we have evaluated significance (i.e., MK p-values) in trends in climate data. For the data in Table 1, communicate: A) What this tells you about climate at the MSU weather station over the past 100+ years. B) Describe any similarities and/or differences in the statistics (MK p-value and Sen’s slope) for the entire record (1900 - current) vs. the normal period (1990 - 2020). Last, comment on what drives differences in these trends over 100 y vs 30 y. What is/are the mechanisms behind what you are seeing? (5 pts)

14.2 New packages

We will use the packages below. You will need to install the rnoaa package for downloading NOAA data, and the trend package for doing Mann-Kendall trend tests and computing the Sen’s slope.

rnooa is deprecated, but still works. The new NOAA package is still in development and doesn’t really work yet. Hence, we will install rnoaa as a remote from GitHub.

# library(tidyverse)
# library(plotly)
# library(rnoaa) # for downloading data from GHCN
# #install.packages("remotes")
# #remotes::install_github("ropensci/rnoaa")
# #install.packages("trend")
# library(trend) # for Mann-Kendall trend analysis and Sen's slope
# site <- "USC00241044" # the MSU weather station
# vars <- c("prcp", "tmax", "tmin") # params
# end <- as_date("2023-09-30") # can change to more recent, may or may not work since package is deprecated. 

14.3 Data download, exploration and cleaning

# met_data <- meteo_pull_monitors(
#   monitors = site,
#   keep_flags = FALSE,
#   date_min = NULL,
#   date_max = end,
#   var = vars
# )

Now that you have downloaded. Plot your data. Date vs. prcp. Date vs. tmax. Date vs. tmin. What do you notice? What are the units? Does everything seem ok or not? Does an internet seach of NOAA data and/or rnoaa help you understand what these units are?

Use this as a practice for rattling off ggplots.

Now that you had some batting practice with ggplot. There is another option. Writing a function.

I haven’t shown you functions yet, but here is a very simple function you can use for EDA and making quick plots.

Use functions when you do something over and over again. Functions work like this.

my_function <- function(x) { do_stuff() }

OR, more generally:

my_function <- function(…) { do_stuff() }

then you would call the function. More on functions in coming weeks.

# x <- met_data$date
# y <- met_data$tmax
# 
# plot_fun <- function(...){
#   met_data %>% 
#   ggplot(aes(x = x, y = y)) +
#   geom_point()
# }
# 
# plot_fun(x, y)

Now you need to adjust the data. What do you need to do?

# The units of these data are prcp = tenths of mm, snow = mm (we aren't using snow data today but just fyi), tmin = tenths of C, tmax = tenths of C. So we need to convert the prcp, tmin, and tmax data.   

Add a water year and filter the data to start on 10/1/1900. Use logic and think like a computer to reason through how to add a water year column. Chat with neighbors.

14.4 Computing annual values

Now that you have a cleaned data frame you will now begin the climate data analysis.

The first step is to create a data frame that has annual values called met_data_an that includes:

  • there are many others we could do, like the mean of the mins and the maxes, but we will just do the ones listed here for this project.

  • total annual prcp

  • the min of the minimum temperatures

  • the median of the minimum temperatures

  • the max of the minimum temperatures

  • the min of the maximum temperatures

  • the median of the maximum temperatures

  • the max of the maximum temperatures

  • the median of the average temperatures

  • note if you are cruising you can do the means as well and compare. We will start getting into iteration soon, which would be helpful. But we will save for later.

14.5 Exploratory data analysis (EDA) of annual values

For your own insight, make plots of the variables in met_data_an over time to see if there appear to be trends in any of the data.

If you want practice making a function you can do so here. Remember functions look like:

my_fun <- function(x){ do_stuff() }

Can also be written as

my_fun <- function(…){ do_stuff() }

14.6 Trend analysis and creating Table 1

Next, you will fill in a table. The table will have columns for:

  • total annual prcp
  • the min of the minimum temperatures
  • the median of the minimum temperatures
  • the max of the minimum temperatures
  • the min of the maximum temperatures
  • the median of the maximum temperatures
  • the max of the maximum temperatures
  • the median of the average temperatures

And values for:

  • the p-value for the Mann-Kendall test. We will use p < 0.05 as an indicator of a significant trend and the Sen’s slope.

  • the slope of the trend as given by the Sen’s slope.

# t_sens_tot_p <- sens.slope(met_data_an$tot_p) # this is how you would get the MK p-value and Sen's slope for one of the variables (tot_p) in the data frame. Can you use apply or map to do this for all the variables in the data frame? (we haven't seen apply and/or map yet, but give it a shot and see) 

# sens.slope stores the infomation in a list. Have a look. Try to understand how the data / info are stored and how you access things. 

# this is indexing to get p-value from that list. 
# t_sens_tot_p[3]

# how would you use indexing to pull the Sen's slope from that list? 

Rather than pulling each p-value and slope individually (which you will likely do today) you could use either apply or map to automate the process. Both apply and map return a list. “A list in R can contain many different data types inside it. A list is a collection of data which is ordered and changeable.”

Here is a link with information about lists and how to access elements within the list https://data-flair.training/blogs/r-list-tutorial/

Chapters 19 & 21 in RDS are both very useful resources for functions and iteration (e.g., apply, map).

Moving forward I will show a tutorial on apply, map and pulling information from lists. But you should work with a partner to test them out and see what they do.

14.7 See summary section at top for questions to answer and deliverable.

You will need to make a table. You can do that manually in word or do it here using kable. There are other ways to make tables, but kable is a good one.