Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Introduction to the weathergen Package
Je�rey D Walker, PhDAugust 02, 2015
Contents
1 Introduction 1
2 Load Historical Data 2
3 Daily Weather Generator 5
4 Algorithm Description 10
4.1 Annual Precipitation Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Daily Weather Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.1 KNN Annual Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.2 Daily Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Adjustment Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3.1 Precipitation Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3.2 Temperature Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1 Introduction
This document provides an introduction to the weathergen R package. This package provides functions forgenerating synthetic weather data using the method described by Steinschneider and Brown (2013):
Steinschneider, S., & Brown, C. (2013). A semiparametric multivariate, multisite weather generatorwith low-frequency variability for use in climate risk assessments. Water Resources Research,49(11), 7205–7220. doi:10.1002/wrcr.20528
The code in this package is based on the original scripts written by Scott Steinschneider, PhD and modifiedby Je�rey D Walker, PhD.
In order to run the code below, a few packages must be loaded first.
library(dplyr)library(lubridate)library(tidyr)library(ggplot2)library(gridExtra)library(zoo)library(moments)library(weathergen)theme_set(theme_bw())
1
# set random seed for reproducibility
set.seed(1)
2 Load Historical Data
The weathergen package includes a dataset of historical climate data for three cities in the Northeast US.These datasets are provided in the climate_cities dataset, which is a list containing dataframes for 3 cities:Boston, Providence, and Worcester:
data(climate_cities)str(climate_cities)
## List of 3## $ boston :�data.frame�: 22645 obs. of 5 variables:## ..$ DATE: Date[1:22645], format: "1949-01-01" ...## ..$ PRCP: num [1:22645] 0.35 0 0 0.17 16.67 ...## ..$ TMAX: num [1:22645] 5.99 2.14 3.75 5.46 7.07 14.3 9.21 12.9 9 7.33 ...## ..$ TMIN: num [1:22645] -0.84 -1.51 -3.15 -1.49 0.61 3.26 1.5 2.21 2.12 4.26 ...## ..$ WIND: num [1:22645] 10.93 13.11 9.57 7.68 3.31 ...## $ providence:�data.frame�: 22645 obs. of 5 variables:## ..$ DATE: Date[1:22645], format: "1949-01-01" ...## ..$ PRCP: num [1:22645] 0 0 0 0.65 12.9 18.4 0 0 0 0 ...## ..$ TMAX: num [1:22645] 3.68 0.93 4.06 5.44 9.95 ...## ..$ TMIN: num [1:22645] -1.96 -2.19 -4.68 -3.78 -1.26 2.65 0.66 2.25 0.18 2.87 ...## ..$ WIND: num [1:22645] 12.32 13.82 10.05 7.52 3.2 ...## $ worcester :�data.frame�: 22645 obs. of 5 variables:## ..$ DATE: Date[1:22645], format: "1949-01-01" ...## ..$ PRCP: num [1:22645] 0.47 0 0 0 9.23 ...## ..$ TMAX: num [1:22645] 6.6 -0.06 3.26 3.82 3.82 ...## ..$ TMIN: num [1:22645] -3.39 -3.39 -5.62 -8.39 -5.62 -1.73 -3.95 -2.28 -2.28 2.71 ...## ..$ WIND: num [1:22645] 11.21 13.18 9.58 7.06 3.83 ...
For this vignette, we will use the dataset for Boston, MA.
# extract dataset for boston
obs_day <- climate_cities[[�boston�]]
# subset complete water years using complete_water_years() function
obs_day <- obs_day[complete_water_years(obs_day$DATE),]
obs_day <- mutate(obs_day,WYEAR=wyear(DATE, start_month=10), # extract water year
MONTH=month(DATE), # extract month
TEMP=(TMIN+TMAX)/2) %>% # compute mean temperature
select(WYEAR, MONTH, DATE, PRCP, TEMP, TMIN, TMAX, WIND)
# compute annual timeseries by water year
obs_wyr <- group_by(obs_day, WYEAR) %>%summarise(PRCP=sum(PRCP),
TEMP=mean(TEMP),
2
TMIN=mean(TMIN),TMAX=mean(TMAX),WIND=mean(WIND))
# save the daily and annual timeseries to a list
obs <- list(day=obs_day, wyr=obs_wyr)
# summary of the daily timeseries
summary(obs[[�day�]])
## WYEAR MONTH DATE PRCP## Min. :1950 Min. : 1.000 Min. :1949-10-01 Min. : 0.000## 1st Qu.:1965 1st Qu.: 4.000 1st Qu.:1964-12-30 1st Qu.: 0.000## Median :1980 Median : 7.000 Median :1980-03-31 Median : 0.000## Mean :1980 Mean : 6.523 Mean :1980-03-31 Mean : 3.259## 3rd Qu.:1995 3rd Qu.:10.000 3rd Qu.:1995-07-01 3rd Qu.: 2.120## Max. :2010 Max. :12.000 Max. :2010-09-30 Max. :185.850## TEMP TMIN TMAX WIND## Min. :-18.68 Min. :-25.130 Min. :-14.05 Min. : 0.020## 1st Qu.: 3.09 1st Qu.: -0.850 1st Qu.: 6.98 1st Qu.: 4.090## Median : 10.86 Median : 6.070 Median : 15.64 Median : 6.330## Mean : 10.62 Mean : 5.903 Mean : 15.33 Mean : 6.724## 3rd Qu.: 18.86 3rd Qu.: 13.810 3rd Qu.: 23.98 3rd Qu.: 8.920## Max. : 31.99 Max. : 26.240 Max. : 38.15 Max. :23.980
This figure shows the daily timeseries of each climate variable.
gather(obs[[�day�]], VAR, VALUE, PRCP:WIND) %>%ggplot(aes(DATE, VALUE)) +geom_line() +facet_wrap(~VAR, scales=�free_y�, ncol=1) +labs(x=��, y=��, title=�Historical Daily Weather Data for Boston, MA�)
3
PRCP
TEMP
TMIN
TMAX
WIND
0
50
100
150
−20−10
0102030
−20−10
01020
0
20
40
05
10152025
1950 1960 1970 1980 1990 2000 2010
Historical Daily Weather Data for Boston, MA
This figure shows the annual sum of precipitation, and means of minimum temperature, maximum temperature,and wind speed by water year.
gather(obs[[�wyr�]], VAR, VALUE, PRCP:WIND) %>%ggplot(aes(WYEAR, VALUE)) +
geom_line() +facet_wrap(~VAR, scales=�free_y�, ncol=1) +labs(x=��, y=��, title=�Historical Annual Weather Data for Boston, MA�)
4
PRCP
TEMP
TMIN
TMAX
WIND
750
1000
1250
1500
10
11
12
5
6
7
14
15
16
17
6.0
6.5
7.0
1960 1980 2000
Historical Annual Weather Data for Boston, MA
3 Daily Weather Generator
The primary function for running the daily weather generator on a single site is wgen_daily(). The functionthen depends on a number of other functions, which are described in the next section.
The first argument to wgen_daily() is the daily timeseries of historical climate variables. This argumentmust be provided as a zoo object which is designed to store regular and irregular timeseries (see the zoopackage homepage for more details).
This code converts the daily timeseries (currently in obs[[�day�]]) to a zoo object. The first argumentprovides the values of each variable, the second argument provides the associated dates.
zoo_day <- zoo(x = obs[[�day�]][, c(�PRCP�, �TEMP�, �TMIN�, �TMAX�, �WIND�)],order.by = obs[[�day�]][[�DATE�]])
class(zoo_day)
## [1] "zoo"
summary(zoo_day)
5
## Index PRCP TEMP TMIN## Min. :1949-10-01 Min. : 0.000 Min. :-18.68 Min. :-25.130## 1st Qu.:1964-12-30 1st Qu.: 0.000 1st Qu.: 3.09 1st Qu.: -0.850## Median :1980-03-31 Median : 0.000 Median : 10.86 Median : 6.070## Mean :1980-03-31 Mean : 3.259 Mean : 10.62 Mean : 5.903## 3rd Qu.:1995-07-01 3rd Qu.: 2.120 3rd Qu.: 18.86 3rd Qu.: 13.810## Max. :2010-09-30 Max. :185.850 Max. : 31.99 Max. : 26.240## TMAX WIND## Min. :-14.05 Min. : 0.020## 1st Qu.: 6.98 1st Qu.: 4.090## Median : 15.64 Median : 6.330## Mean : 15.33 Mean : 6.724## 3rd Qu.: 23.98 3rd Qu.: 8.920## Max. : 38.15 Max. :23.980
The daily historical timeseries is then passed as the first argument to the wgen_daily() function. The otherarguments specify a number of parameters for the simulation. These parameters include:
• n_year - number of simulation years• start_month - first month of the water year (e.g. 10=October)• start_water_year - first water year to use for the simulated timeseries• include_leap_days - boolean flag to include or exclude leap days in the simulation timeseries (currently
only allows FALSE)• n_knn_annual - number of years to use in the annual k-nearest neighbors (knn) sampling algorithm• dry_wet_threshold - daily precipitation threshold between the dry and wet Markov states• wet_extreme_quantile_threshold - daily precipitation quantile used as threshold between the wet
and extreme Markov states• adjust_annual_precip - boolean flag to adjust simulated daily precipitation amounts to match the
simulated annual precipitation• annual_precip_adjust_limits - lower and upper limits for the daily precipitation adjustment factor
to match simulated annual precipitation (e.g. c(0.9, 1.1) will only adjust daily values between90-110%)
• dry_spell_changes - adjustment factor for changing the transition probabilities of dry spells• wet_spell_changes - adjustment factor for changing the transition probabilities of wet spells• prcp_mean_changes - multiplicative adjustment factor for changing the annual mean precipitation• prcp_cv_changes - multiplicative adjustment factor for changing the coe�cient of variation for precipi-
tation• temp_mean_changes - additive adjustment factor for changing the annual mean temperature
The last five parameters for assigning adjustment factors can each be either a single number, which is appliedto all months, or a vector of length 12 where each value is the change factor for a specific month (startingwith January).
sim <- wgen_daily(zoo_day,n_year=2,start_month=10,start_water_year=2000,include_leap_days=FALSE,n_knn_annual=100,dry_wet_threshold=0.3,wet_extreme_quantile_threshold=0.8,adjust_annual_precip=TRUE,
6
annual_precip_adjust_limits=c(0.9, 1.1),dry_spell_changes=1,wet_spell_changes=1,prcp_mean_changes=1,prcp_cv_changes=1,temp_mean_changes=0)
The wgen_daily() function returns a list object with the following elements:
str(sim, max=1)
## List of 6## $ obs :�data.frame�: 22280 obs. of 9 variables:## $ state_thresholds :�data.frame�: 12 obs. of 3 variables:## $ transition_matrices:List of 12## $ state_equilibria :List of 12## $ change_factors :List of 6## $ out :�data.frame�: 730 obs. of 11 variables:
The first element, sim[[�obs�]], is the historical daily timeseries, with additional columns for the wateryear (WYEAR), month (MONTH), and Markov state (d=Dry, w=Wet, e=Extreme).
head(sim[[�obs�]])
## DATE WYEAR MONTH STATE PRCP TEMP TMIN TMAX WIND## 1 1949-10-01 1950 10 d 0.00 13.765 8.89 18.64 8.53## 2 1949-10-02 1950 10 d 0.00 10.225 4.94 15.51 7.31## 3 1949-10-03 1950 10 d 0.00 12.665 6.06 19.27 3.50## 4 1949-10-04 1950 10 d 0.00 16.555 11.06 22.05 2.76## 5 1949-10-05 1950 10 d 0.28 18.970 14.02 23.92 5.59## 6 1949-10-06 1950 10 d 0.00 14.135 7.84 20.43 6.76
The second element, sim[[�state_thresholds�]], is a dataframe with the monthly precipitation thresholdsfor each Markov state. For example, in June the Markov state on a given day is dry if the precipitation onthat day is less than 0.3 mm/day, wet if it is between 0.3 and 3.33 mm/day, and extreme if it is greater than3.33 mm/day.
sim[[�state_thresholds�]]
## MONTH DRY_WET WET_EXTREME## 1 1 0.3 4.18## 2 2 0.3 3.80## 3 3 0.3 4.93## 4 4 0.3 4.03## 5 5 0.3 3.60## 6 6 0.3 3.33## 7 7 0.3 2.70## 8 8 0.3 3.12## 9 9 0.3 2.60## 10 10 0.3 2.88## 11 11 0.3 4.75## 12 12 0.3 4.60
7
The third element, sim[[�transition_matrices�]], is a list of length 12 where each element is the Markovstate transition matrix for each month. For example, the transition matrix for June is:
sim[[�transition_matrices�]][[6]]
#### d w e## d 0.6977169 0.1634703 0.1388128## w 0.5189189 0.2621622 0.2189189## e 0.3863014 0.2520548 0.3616438
The fourth element, sim[[�state_equilibria]]‘, is a list of length 12 where each element is the stateequilibrium of each monthly transition matrix. For example, the equilibrium for June is:
sim[[�state_equilibria�]][[6]]
## d w e## 0.5997119 0.2009608 0.1993273
The fifth element, sim[[�change_factors�]], is a list containing the monthly values for each adjustmentfactor as well as an additional set of adjustments that are computed from the simulated state transitions(ratio_probability_wet). Each of these elements will each be a vector of length 12 (even if the correspondingparameter to wgen_daily() was a single value, in which case the same value will be used for all months).
str(sim[[�change_factors�]])
## List of 6## $ ratio_probability_wet: num [1:12] 1 1 1 1 1 1 1 1 1 1 ...## $ dry_spell_changes : num [1:12] 1 1 1 1 1 1 1 1 1 1 ...## $ wet_spell_changes : num [1:12] 1 1 1 1 1 1 1 1 1 1 ...## $ prcp_mean : num [1:12] 1 1 1 1 1 1 1 1 1 1 ...## $ prcp_cv : num [1:12] 1 1 1 1 1 1 1 1 1 1 ...## $ temp_mean : num [1:12] 0 0 0 0 0 0 0 0 0 0 ...
The last element, sim[[�out�]], contains the simulated daily timeseries as a dataframe with the followingcolumns:
• SIM_YEAR - simulation year starting at 1• DATE - simulation date (starting on the first day of the start_month and start_water_year parameters,
e.g. 1999-10-01 for water year 2000 starting in month 10 (October))• MONTH - simulation month• WDAY - simulation “water day” which is the julian day starting at the beginning of the water year
(e.g. water day 1 = Oct 1)• SAMPLE_DATE - the historical date sampled from the KNN sampling algorithm and used for the each
simulation date• STATE - simulated Markov state• PRCP, TEMP, TMIN, TMAX, WIND - simulated climate variables
8
str(sim[[�out�]])
## �data.frame�: 730 obs. of 11 variables:## $ SIM_YEAR : int 1 1 1 1 1 1 1 1 1 1 ...## $ DATE : Date, format: "1999-10-01" "1999-10-02" ...## $ MONTH : num 10 10 10 10 10 10 10 10 10 10 ...## $ WDAY : int 1 2 3 4 5 6 7 8 9 10 ...## $ SAMPLE_DATE: Date, format: "2004-10-01" "1985-09-30" ...## $ STATE : Ord.factor w/ 3 levels "d"<"w"<"e": 1 1 1 1 2 3 3 1 1 1 ...## $ PRCP : num 0.284 0 0 0 2.25 ...## $ TEMP : num 15.8 17.3 15.2 12.7 12.5 ...## $ TMIN : num 12.07 10.11 10.19 7.48 7.84 ...## $ TMAX : num 19.5 24.6 20.3 17.9 17.1 ...## $ WIND : num 1.8 3.56 5.44 5.95 7.41 6.29 9.69 7.3 6.37 9.68 ...
This figure shows the simulated timeseries for each variable:
select(sim[[�out�]], DATE, PRCP, TEMP, TMIN, TMAX, WIND) %>%gather(VAR, VALUE, -DATE) %>%ggplot(aes(DATE, VALUE)) +geom_line() +facet_wrap(~VAR, ncol=1, scales=�free_y�) +labs(x=�Simulation Date�, y=�Simulated Value�, title=�Simulated Timeseries�) +theme_bw()
9
PRCP
TEMP
TMIN
TMAX
WIND
0
20
40
60
−10
0
10
20
30
−20−10
01020
0
10
20
30
0
5
10
15
2000−01 2000−07 2001−01 2001−07Simulation Date
Sim
ulat
ed V
alue
Simulated Timeseries
4 Algorithm Description
The daily weather generator uses the following algorithm:
10
• Simulate annual precipitation using an ARMA model (wavelet model coming soon)• For each simulation year:• Use KNN sampling to create a historical daily timeseries that only includes only years where annual
precipitation is similar to the current simulated annual precipitation• compute the monthly Markov state transition thresholds and assign states to daily timeseries• fit the monthly Markov chain transition probability matrices• adjust the transition matrices based on the dry and wet spell change factors• for each simulation day in the current year, use KNN sampling to sample the synthetic historical daily
values based on the state, precipitation and mean temperature of the current and previous days• Combine the daily simulation for each year• Adjust daily simulated precipitation values for each year to equal the simulated annual values (if
adjust_annual_precip=TRUE)• Adjust daily simulated precipitation values based on the precipitation adjustment factors• Adjust daily simulated minimum, maximum and mean temperature based on the temperature adjustment
factor
4.1 Annual Precipitation Simulation
the sim_annual_arima() function is used to simulate annual precipitation using an ARIMA model. Thearguments to this function include:
• x - historical annual precipitation as a zoo object
function first fits the ARIMA model to the historical annual precipitation. The ARIMA model is then used togenerate a simulation of annual precipitation for length n_year. The sim_annual_arma() function returns alist of length 3 containing the ARIMA model object (model), the historical annual precipitation (observed),and the simulated annual precipitation (values).
zoo_precip_wyr <- zoo(obs[[�wyr�]][[�PRCP�]], order.by = obs[[�wyr�]][[�WYEAR�]])sim_annual_prcp <- sim_annual_arima(x = zoo_precip_wyr, start_year = 2000, n_year = 40)
This function returns a list of length 3 containing elements:
• model - the ARIMA model object fitted to the historical data• x - the historical annual precipitation used to fit the model as a zoo object• out - the simulated annual precipitation as a zoo object
str(sim_annual_prcp, max.level=1)
## List of 3## $ model:List of 17## ..- attr(*, "class")= chr [1:2] "ARIMA" "Arima"## $ x :�zoo� series from 1950 to 2010## Data: num [1:61] 798 1235 1248 1186 1686 ...## Index: num [1:61] 1950 1951 1952 1953 1954 ...## $ out :�zoo� series from 2000 to 2039## Data: num [1:40] 1571 1404 1071 1234 1437 ...## Index: num [1:40] 2000 2001 2002 2003 2004 ...
11
The model element contains an ARIMA model object, which contains the parameters and error statistics ofthe fitted model:
summary(sim_annual_prcp[[�model�]])
## Series: x## ARIMA(0,0,0) with non-zero mean#### Coefficients:## intercept## 1190.3175## s.e. 27.0648#### sigma^2 estimated as 44683: log likelihood=-413.13## AIC=830.26 AICc=830.47 BIC=834.48#### Training set error measures:## ME RMSE MAE MPE MAPE MASE## Training set 8.013399e-14 211.3826 162.3144 -3.399893 14.55972 0.6760569## ACF1## Training set -0.0452353
The out element is an annual precipitation timeseries as a zoo object:
sim_annual_prcp[[�out�]]
## 2000 2001 2002 2003 2004 2005 2006## 1570.8299 1404.1184 1071.1577 1233.7399 1436.6759 1663.0372 1254.2111## 2007 2008 2009 2010 2011 2012 2013## 969.9498 982.4138 1614.2915 752.6348 1836.2482 1135.0726 1094.2667## 2014 2015 2016 2017 2018 2019 2020## 1223.6231 1387.6197 1254.3302 776.8215 1265.0490 1285.5294 1329.7351## 2021 2022 2023 2024 2025 2026 2027## 972.2932 689.1246 1121.7078 990.7092 1028.4217 988.7052 1106.1863## 2028 2029 2030 2031 2032 2033 2034## 1124.5317 1358.5977 1398.8313 1022.3674 1125.0388 1266.7207 1485.9976## 2035 2036 2037 2038 2039## 1178.4652 831.2053 1239.3271 1165.1438 1564.9916
This figure plots the simulated annual precipitation and shows the mean annual precipitation based on thehistorical dataset for reference.
data.frame(WYEAR=time(sim_annual_prcp[[�out�]]),PRCP=zoo::coredata(sim_annual_prcp[[�out�]])) %>%
ggplot(aes(WYEAR, PRCP)) +geom_line() +geom_hline(yint=mean(obs[[�wyr�]][[�PRCP�]]), color=�red�) +geom_text(aes(label=TEXT), data=data.frame(WYEAR=2000, PRCP=mean(obs[[�wyr�]][[�PRCP�]]), TEXT="Historical Mean"),
hjust=0, vjust=-1, color=�red�) +labs(x="Simulation Water Year", y="Annual Precip (mm/yr)", title="Simulated Annual Precipitation")
12
Historical Mean
900
1200
1500
1800
2000 2010 2020 2030 2040Simulation Water Year
Annu
al P
reci
p (m
m/y
r)Simulated Annual Precipitation
To determine whether the annual precipitation generator su�ciently reproduces the statistics of the historicaltimeseries, a Monte Carlo simulation is performed for 50 trials using the same historical timeseries.
# batch simulation to compare mean/sd/skew
sim_wyr_batch <- lapply(1:50, function(i) {sim_annual_arima(x = obs[[�wyr�]][[�PRCP�]], start_year = 2000, n_year = 20)[[�out�]]
})names(sim_wyr_batch) <- paste0(seq(1, length(sim_wyr_batch)))sim_wyr_batch <- do.call(merge, sim_wyr_batch) %>%
apply(MARGIN=2, FUN=function(x) { c(mean=mean(x), sd=sd(x), skew=skewness(x))}) %>%t
sim_wyr_stats <- rbind(t(colMeans(sim_wyr_batch)),c(mean=mean(obs[[�wyr�]][[�PRCP�]]),
sd=sd(obs[[�wyr�]][[�PRCP�]]),skew=skewness(obs[[�wyr�]][[�PRCP�]])))
sim_wyr_stats <- as.data.frame(sim_wyr_stats)sim_wyr_stats$dataset <- c(�obs�, �sim�)sim_wyr_stats <- gather(sim_wyr_stats, stat, value, mean, sd, skew)
The following figure shows the distributions of the mean, standard deviation, and skewness statistics over thesimulation trials. The overall mean value of each statistic (blue point) is also compared to the statistic valueof the historical timeseries (red point). This figure shows that the annual precipitation generator createssimulated timeseries with similar statistics as the historical timeseries.
as.data.frame(sim_wyr_batch) %>%mutate(trial=row_number()) %>%gather(stat, value, mean:skew) %>%ggplot(aes(x=stat, value)) +geom_boxplot() +geom_point(aes(x=stat, y=value, color=dataset), data=sim_wyr_stats, size=3) +scale_color_manual(��, values=c(�sim�=�deepskyblue�, �obs�=�orangered�),
labels=c(�sim�=�Mean of Simulated Trials�, �obs�=�Mean of Historical�)) +labs(x=��, y=��) +facet_wrap(~stat, scales=�free�)
13
mean sd skew
1100
1150
1200
1250
1300
200
250
300
−1.0
−0.5
0.0
0.5
1.0
mean sd skew
Mean of HistoricalMean of Simulated Trials
4.2 Daily Weather Simulation
The annual precipitation generated in the previous step is then used to drive a daily simulation for all weathervariables. For each year, a k-nearest neighbors algorithm samples n_knn_annual years from the historicalrecord with replacement. The daily historical timeseries for each year is then extracted and combined intoa daily timeseries, where the sequence of years is based on the KNN sampling. A Markov Chain model isthen fitted to the sampled historical daily and used to simulate a sequence of states for each simulation year.Finally, the simulated states are used to sample the historical daily values using a daily KNN algorithm.
4.2.1 KNN Annual Sampling
For each simulation year, a set of n_knn_annual years are sampled with replacement from the historicalrecord using inverse distance weighted sampling. Historical years with annual precipitation similar to thecurrent simulated annual precipitation are given a higher weight and thus more likely to be sampled thanhistorical years where the annual precipitation was much higher or lower than the simulated precipitation.
n_knn_annual <- 100
# generate vector of sampled years
sampled_years <- knn_annual(prcp=coredata(sim_annual_prcp[[�out�]])[1],obs_prcp=zoo_precip_wyr,n=n_knn_annual)
sampled_years
## [1] 1984 1998 1998 1990 2006 1998 1954 2006 1984 2006 1998 1958 1998 1958## [15] 1984 1960 1951 1984 1984 1967 1984 1970 1974 1956 1998 1998 1955 1998## [29] 2004 1955 1984 1998 1960 1998 1994 1952 1958 1998 1998 1998 2010 1996## [43] 1984 2006 1998 1998 1955 1961 1996 1954 1958 1998 1970 1984 1998 1983## [57] 2008 1967 1984 2004 1954 1958 2003 1998 1955 2009 1955 1998 1998 1987## [71] 1998 1998 2008 1996 1998 1998 2006 1976 2010 1998 1984 2004 1998 1998
14
## [85] 1998 1954 1954 1967 1998 2003 1998 1987 2006 1984 1998 2006 2006 1974## [99] 1970 1984
This figure shows the historical annual precipitation for each water year. The red line shows the annualprecipitation for the first simulated year. The red points show which historical years were included in theKNN sampling. Note that these points tend to be close to the target simulated precipitation.
ggplot() +geom_point(aes(WYEAR, PRCP, color=SAMPLED), data=obs[[�wyr�]] %>% mutate(SAMPLED=WYEAR %in% sampled_years)) +geom_hline(yint=sim_annual_prcp[[�out�]][1], linetype=2, color=�red�) +geom_text(aes(x=x, y=y, label=label),
data=data.frame(x=min(obs[[�wyr�]][[�WYEAR�]]),y=sim_annual_prcp[[�out�]][1],label=�Target Simulated Annual Precip�),
vjust=-0.5, hjust=0) +scale_color_manual(�Sampled in KNN�, values=c(�TRUE�=�orangered�, �FALSE�=�grey30�)) +labs(x=�Water Year�, y=�Annual Precipitation (mm/yr)�)
Target Simulated Annual Precip
750
1000
1250
1500
1960 1980 2000Water Year
Annu
al P
reci
pita
tion
(mm
/yr)
Sampled in KNNFALSETRUE
For each of the sampled years in sampled_years, the corresponding daily values are extracted from thehistorical time series and then combined into a single timeseries. The following figure shows the resultingdaily timeseries for the 100 years sampled for the first simulated water year. The points are colored by theirhistorical water year, and thus show years where the historical annual precip is closest to the simulated precipare repeated.
# loop through population years and extract historical daily values
sampled_days <- lapply(sampled_years, function(yr) {obs[[�day�]][which(obs[[�day�]][[�WYEAR�]]==yr), ]
}) %>%rbind_all() %>%mutate(SAMPLE_INDEX=row_number())
# plot combined daily timeseries based on the KNN sampled years
ggplot(sampled_days, aes(SAMPLE_INDEX, PRCP, color=WYEAR)) +
15
geom_point(size=1) +labs(x=�Index Day�, y=�Daily Precip (mm/day)�, title=�Synthetic Daily Timeseries from KNN Sampled Years�)
0
50
100
150
0 10000 20000 30000Index Day
Dai
ly P
reci
p (m
m/d
ay)
196019701980199020002010
WYEAR
Synthetic Daily Timeseries from KNN Sampled Years
4.2.2 Daily Simulation
The resulting timeseries of daily values for the sampled years is then used to run the sim_daily() function.In this function, a Markov Chain simulation is fitted to the synthetic daily timeseries.
4.2.2.1 State Thresholds The first step is to compute the state thresholds using the mc_state_threshold()function which takes a vector of daily precipitation, a vector of corresponding months, and thedry_wet_threshold and wet_extreme_quantile_threshold.
thresh <- mc_state_threshold(sampled_days[[�PRCP�]], sampled_days[[�MONTH�]],dry_wet_threshold=0.3, wet_extreme_quantile_threshold=0.8)
thresh_df <- do.call(rbind, thresh) %>%as.data.frame() %>%mutate(month=row_number()) %>%select(month, dry_wet, wet_extreme)
thresh_df
## month dry_wet wet_extreme## 1 1 0.3 4.600## 2 2 0.3 5.000## 3 3 0.3 4.970## 4 4 0.3 5.356## 5 5 0.3 8.100## 6 6 0.3 5.970## 7 7 0.3 3.280## 8 8 0.3 5.150## 9 9 0.3 1.800
16
## 10 10 0.3 3.150## 11 11 0.3 7.350## 12 12 0.3 4.694
This figure shows the threshold amounts between the Dry/Wet and Wet/Extreme states for each month.
gather(thresh_df, var, value, -month) %>%ggplot(aes(factor(month), value, fill=var)) +geom_bar(stat=�identity�, position=�dodge�) +scale_fill_manual(�Threshold�, values=c(�dry_wet�=�orangered�, �wet_extreme�=�deepskyblue�),
labels=c(�dry_wet�=�Dry-Wet�, �wet_extreme�=�Wet-Extreme�)) +labs(x=�Month�, y=�Threshold Amount (mm/day)�, title=�Markov State Thresholds by Month�)
0
2
4
6
8
1 2 3 4 5 6 7 8 9 10 11 12Month
Thre
shol
d Am
ount
(mm
/day
)
ThresholdDry−WetWet−Extreme
Markov State Thresholds by Month
4.2.2.2 Assign States After the state thresholds are determined, they are assigned to the syntheticdaily timeseries, sampled_days.
sampled_days$STATE <- mc_assign_states(sampled_days$PRCP, sampled_days$MONTH, c(�d�, �w�, �e�), thresh)head(sampled_days)
## Source: local data frame [6 x 10]#### WYEAR MONTH DATE PRCP TEMP TMIN TMAX WIND SAMPLE_INDEX STATE## 1 1984 10 1983-10-01 3.15 15.420 11.41 19.43 4.91 1 e## 2 1984 10 1983-10-02 5.35 16.630 12.47 20.79 1.33 2 e## 3 1984 10 1983-10-03 0.00 19.825 14.59 25.06 5.89 3 d## 4 1984 10 1983-10-04 0.00 20.760 13.24 28.28 7.72 4 d
17
## 5 1984 10 1983-10-05 0.55 15.110 10.38 19.84 8.08 5 w## 6 1984 10 1983-10-06 0.10 16.965 13.49 20.44 6.39 6 d
This figure shows the Markov state for each day in the first 5 simulated years.
filter(sampled_days, SAMPLE_INDEX <= 365*5) %>%ggplot(aes(SAMPLE_INDEX, PRCP, color=STATE)) +
geom_point(size=2) +scale_color_manual(�Markov State�,
values=c(�d�=�orangered�, �w�=�chartreuse3�, �e�=�deepskyblue�),labels=c(�d�=�Dry�, �w�=�Wet�, �e�=�Extreme�)) +
ylim(0, 50) +labs(x=�Sample Date Index�, y=�Daily Precipitation (mm/day)�, title=�Markov State of Synthetic Daily Timeseries�)
## Warning: Removed 17 rows containing missing values (geom_point).
0
10
20
30
40
50
0 500 1000 1500Sample Date Index
Dai
ly P
reci
pita
tion
(mm
/day
)
Markov StateDryWetExtreme
Markov State of Synthetic Daily Timeseries
4.2.2.3 Fit Markov Chain Transition Matrix After the states have been assigned ot the sampleddaily timeseries, the mc_fit() function can be used to fit a Markov Chain model for each month.
transition_matrices <- mc_fit(states=sampled_days[[�STATE�]], months=sampled_days[[�MONTH�]])
For example, the transition matrix for June is shown below. The probability of having a wet day follow a dryday is 0.26, and the probability of two dry days in a row is 0.66.
transition_matrices[[6]]
#### d w e
18
## d 0.65725542 0.25607354 0.08667104## w 0.44909502 0.28167421 0.26923077## e 0.24957841 0.39797639 0.35244519
The equilibrium state can be determined for each month using the mc_state_equilibrium() function. Thisfunction finds the eigenvector of a transition matrix. The equilibrium state for June is thus:
mc_state_equilibrium(transition_matrices[[6]])
## d w e## 0.5192105 0.2905118 0.1902777
Once the transition matrices have been fit to the sampled historical daily values, the daily generator can berun.
4.2.2.4 KNN and Markov Chain Simulation Given the fitted monthly transition matrices, the dailysimulation for each simulation year can be performed using the sim_mc_knn_day() function. This functionsimulates a chain of Markov states using the transition matrices, and uses a KNN sampling algorithm tochoose the values of each weather variable.
sampled_days <- dplyr::mutate(sampled_days,WDAY=waterday(DATE, start_month=10),STATE_PREV=lag(STATE),PRCP_PREV=lag(PRCP),TEMP_PREV=lag(TEMP),TMAX_PREV=lag(TMAX),TMIN_PREV=lag(TMIN),WIND_PREV=lag(WIND))
sim_days <- sim_mc_knn_day(x=sampled_days, n_year=1, states=c(�d�, �w�, �e�), transitions=transition_matrices,start_month=10, start_water_year=2000, include_leap_days=FALSE)
str(sim_days, max=1)
## �data.frame�: 365 obs. of 11 variables:## $ SIM_YEAR : num 1 1 1 1 1 1 1 1 1 1 ...## $ DATE : Date, format: "1999-10-01" "1999-10-02" ...## $ MONTH : num 10 10 10 10 10 10 10 10 10 10 ...## $ WDAY : int 1 2 3 4 5 6 7 8 9 10 ...## $ SAMPLE_DATE: Date, format: "1957-10-01" "1957-10-02" ...## $ STATE : Ord.factor w/ 3 levels "d"<"w"<"e": 2 1 1 1 2 1 1 1 1 1 ...## $ PRCP : num 0.35 0 0 0 0.88 0 0 0 0 0 ...## $ TEMP : num 19 14.7 12 10.9 14.9 ...## $ TMIN : num 12.39 9.66 5.89 6.51 9.93 ...## $ TMAX : num 25.7 19.6 18.1 15.3 19.8 ...## $ WIND : num 0.4 8.55 9.28 8.58 8.02 ...
This figure shows the simulated weather variables for the current simulation year.
19
select(sim_days, DATE, PRCP:WIND) %>%gather(VAR, VALUE, -DATE) %>%ggplot(aes(DATE, VALUE)) +geom_line() +facet_wrap(~VAR, scales=�free_y�, ncol=1) +labs(x=�Simulate Date�, y=�Value�, title=�Daily Simulation for One Year�)
PRCP
TEMP
TMIN
TMAX
WIND
0
20
40
60
80
−10
0
10
20
30
−10
0
10
20
0102030
0
5
10
15
Oct 1999 Jan 2000 Apr 2000 Jul 2000 Oct 2000Simulate Date
Valu
e
Daily Simulation for One Year
4.2.2.5 Annual Precipitation Adjustment Because the daily precipitation values are sampled directlyfrom historical values, the total precipitation for each simulation year will likely di�er from the correspondingsimulated annual precipitation.
The adjust_daily_to_annual() function is used to adjust the simulated daily precipitation values so thatthe sum of these values equals the simulated annual total precipitation.
20
For example, the current simulated annual precipitation is 1571 mm/yr, but the sum of the simulateddaily precipitation is 1101 mm/yr. To adjust the daily values such that their sum equals the simmulatedannual value, the daily values can be all multiplied by a factor of 1.43. The following code block uses theadjust_daily_to_annual() function to perform this adjustment for the first simulation year. Note thatthis function can be applied to the entire simulated timeseries containing multiple years, and adjust eachindividual accordingly.
orig_prcp <- sim_days[[�PRCP�]]adj_prcp <- adjust_daily_to_annual(values_day=sim_days[[�PRCP�]],
years_day=wyear(sim_days[[�DATE�]]),values_yr=coredata(sim_annual_prcp[[�out�]])[1],years_yr=time(sim_annual_prcp[[�out�]])[1],min_ratio=0.5, max_ratio=1.5)
c(Original=sum(orig_prcp), Adjusted=sum(adj_prcp$adjusted), Annual=coredata(sim_annual_prcp[[�out�]])[1])
## Original Adjusted Annual## 1100.57 1570.83 1570.83
This figure shows the original and adjusted daily precipitation values. Note how the adjusted values (dashedred line) are linearly scaled by a factor of 1.43.
data.frame(DATE=sim_days[[�DATE�]],ORIGINAL=orig_prcp,ADJUSTED=adj_prcp$adjusted) %>%
gather(VAR, VALUE, -DATE) %>%ggplot(aes(DATE, VALUE, color=VAR, linetype=VAR)) +geom_line() +scale_color_manual(��, values=c(�ORIGINAL�=�black�, �ADJUSTED�=�orangered�)) +scale_linetype_manual(��, values=c(�ORIGINAL�=1, �ADJUSTED�=2)) +labs(x=�Simulation Date�, y=�Daily Precipitation (mm/day)�)
0
30
60
90
Oct 1999 Jan 2000 Apr 2000 Jul 2000 Oct 2000Simulation Date
Dai
ly P
reci
pita
tion
(mm
/day
)
ORIGINALADJUSTED
21
4.3 Adjustment Factors
After generating the simulated daily timeseries, the precipitation and temperature variables can be adjustedusing the change factor parameters.
For this example, a 5-year simulated timeseries is first generated without using any change factors
sim <- wgen_daily(zoo_day,n_year=5,start_month=10,start_water_year=2000,include_leap_days=FALSE,n_knn_annual=100,dry_wet_threshold=0.3,wet_extreme_quantile_threshold=0.8,adjust_annual_precip=TRUE,annual_precip_adjust_limits=c(0.5, 1.5),dry_spell_changes=1,wet_spell_changes=1,prcp_mean_changes=1,prcp_cv_changes=1,temp_mean_changes=0)
4.3.1 Precipitation Adjustment
The simulated precipitation timeseries can then be adjusted using the adjust_daily_gamma() function.This function performs a quantile mapping by first fitting the non-zero precipitation values to a gammadistribution, adjusting the scale and shape parameters of the distribution, and replacing the precipitationvalues by mapping the original values through the adjusted distribution.
The following code block computes three sets of adjusted timeseries by adjusting the mean, the coe�cient ofvariation, or both parameters.
prcp.unadj <- sim[[�out�]][, c(�DATE�,�MONTH�,�PRCP�)]prcp.adj.mean <- adjust_daily_gamma(prcp.unadj[[�PRCP�]], prcp.unadj[[�MONTH�]],
mean_change=1.2, cv_change=1)prcp.adj.cv <- adjust_daily_gamma(prcp.unadj[[�PRCP�]], prcp.unadj[[�MONTH�]],
mean_change=1, cv_change=1.2)prcp.adj.both <- adjust_daily_gamma(prcp.unadj[[�PRCP�]], prcp.unadj[[�MONTH�]],
mean_change=1.2, cv_change=1.2)prcp.df <- data.frame(DATE=prcp.unadj[[�DATE�]],
UNADJESTED=prcp.unadj[[�PRCP�]],ADJUST_MEAN=prcp.adj.mean[[�adjusted�]],ADJUST_CV=prcp.adj.cv[[�adjusted�]],ADJUST_BOTH=prcp.adj.both[[�adjusted�]]) %>%
gather(GROUP, VALUE, -DATE)
This figure shows the unadjusted and three adjusted daily precipitation timeseries.
ggplot(prcp.df, aes(DATE, VALUE)) +geom_line() +facet_wrap(~GROUP) +labs(x=��, y=�Daily Precipitation (mm/day)�, title=�Unadjusted and Adjusted Daily Precipitation�)
22
UNADJESTED ADJUST_MEAN
ADJUST_CV ADJUST_BOTH
0
50
100
150
0
50
100
150
2000 2001 2002 2003 2004 2000 2001 2002 2003 2004
Dai
ly P
reci
pita
tion
(mm
/day
)Unadjusted and Adjusted Daily Precipitation
This figure shows the annual precipitation for the unadjusted and adjusted precipitation timeseries by wateryear.
mutate(prcp.df, WYEAR=wyear(DATE, start_month=10)) %>%group_by(WYEAR, GROUP) %>%summarise(VALUE=sum(VALUE)) %>%ggplot(aes(factor(WYEAR), VALUE, fill=GROUP)) +geom_bar(stat=�identity�, position=�dodge�) +scale_fill_discrete(��) +labs(x=�Water Year�, y=�Annual Precipitation (mm/yr)�, title=�Annual Precipitation With Adjustments�)
23
0
500
1000
1500
2000 2001 2002 2003 2004Water Year
Annu
al P
reci
pita
tion
(mm
/yr)
UNADJESTEDADJUST_MEANADJUST_CVADJUST_BOTH
Annual Precipitation With Adjustments
This figure shows the cumulative distribution frequency of each timeseries. Note that the y-axis has beentruncated to a maximum of 60 mm/yr in order to show the lower part of the distribution.
prcp.df %>%filter(VALUE>0) %>%group_by(GROUP) %>%arrange(VALUE) %>%mutate(ROW=row_number(),
PROB=(ROW-0.5)/n()) %>%ggplot(aes(PROB, VALUE, color=GROUP)) +geom_line() +ylim(0, 60) +scale_color_discrete(��) +labs(y=�Daily Precipitation (mm/day)�, x=�Non-Exceedence Probability�, title=�Distributions of Adjusted Precipitation Timeseries�)
## Warning: Removed 31 rows containing missing values (geom_path).
24
0
20
40
60
0.00 0.25 0.50 0.75 1.00Non−Exceedence Probability
Dai
ly P
reci
pita
tion
(mm
/day
)
UNADJESTEDADJUST_MEANADJUST_CVADJUST_BOTH
Distributions of Adjusted Precipitation Timeseries
4.3.2 Temperature Adjustment
The temperature timeseries can also be adjusted using a linear additive factor through the adjust_daily_additive()function. This function simply adds the adjustment factor to each daily value. In this example, the meantemperature is adjusted by a factor of 5 meaning each daily mean temperature is increased by 5 degrees.
temp.unadj <- sim[[�out�]][, c(�DATE�,�MONTH�,�TEMP�)]temp.adj <- adjust_daily_additive(temp.unadj[[�TEMP�]], temp.unadj[[�MONTH�]],
mean_change=5)temp.df <- data.frame(DATE=temp.unadj[[�DATE�]],
UNADJESTED=temp.unadj[[�TEMP�]],ADJUSTED=temp.adj[[�adjusted�]]) %>%
gather(GROUP, VALUE, -DATE)
This figure shows the unadjusted and adjusted daily mean temperature.
ggplot(temp.df, aes(DATE, VALUE, color=GROUP)) +geom_line() +scale_color_discrete(��) +labs(x=��, y=�Daily Mean Temperature (degC)�, title=�Unadjusted and Adjusted Daily Mean Temperature�)
25
−10
0
10
20
30
2000 2001 2002 2003 2004
Dai
ly M
ean
Tem
pera
ture
(deg
C)
UNADJESTEDADJUSTED
Unadjusted and Adjusted Daily Mean Temperature
This figure shows the unadjusted and adjusted annual mean temperature. Note how the adjusted annualmean temperatures are 5 degrees higher than the unadjustd temperatures.
mutate(temp.df, WYEAR=wyear(DATE, start_month = 10)) %>%group_by(WYEAR, GROUP) %>%summarise(VALUE=mean(VALUE)) %>%ggplot(aes(factor(WYEAR), VALUE, fill=GROUP)) +geom_bar(stat=�identity�, position=�dodge�) +scale_fill_discrete(��) +labs(x=��, y=�Annual Mean Temperature (degC)�, title=�Unadjusted and Adjusted Annual Mean Temperature�)
0
5
10
15
2000 2001 2002 2003 2004
Annu
al M
ean
Tem
pera
ture
(deg
C)
UNADJESTEDADJUSTED
Unadjusted and Adjusted Annual Mean Temperature
26