Introduction to the weathergen Package - Amazon S3 · Introduction to the weathergen Package Jerey D Walker, PhD August 02, 2015 Contents 1 Introduction 1 2 Load Historical Data 2

Introduction to the weathergen Package

Je�rey D Walker, PhDAugust 02, 2015

Contents

1 Introduction 1

2 Load Historical Data 2

3 Daily Weather Generator 5

4 Algorithm Description 10

4.1 Annual Precipitation Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.2 Daily Weather Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.2.1 KNN Annual Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.2.2 Daily Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.3 Adjustment Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.3.1 Precipitation Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.3.2 Temperature Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1 Introduction

This document provides an introduction to the weathergen R package. This package provides functions forgenerating synthetic weather data using the method described by Steinschneider and Brown (2013):

Steinschneider, S., & Brown, C. (2013). A semiparametric multivariate, multisite weather generatorwith low-frequency variability for use in climate risk assessments. Water Resources Research,49(11), 7205–7220. doi:10.1002/wrcr.20528

The code in this package is based on the original scripts written by Scott Steinschneider, PhD and modifiedby Je�rey D Walker, PhD.

In order to run the code below, a few packages must be loaded first.

library(dplyr)library(lubridate)library(tidyr)library(ggplot2)library(gridExtra)library(zoo)library(moments)library(weathergen)theme_set(theme_bw())

1

Appendix A

doi:10.1002/wrcr.20528

# set random seed for reproducibility

set.seed(1)

2 Load Historical Data

The weathergen package includes a dataset of historical climate data for three cities in the Northeast US.These datasets are provided in the climate_cities dataset, which is a list containing dataframes for 3 cities:Boston, Providence, and Worcester:

data(climate_cities)str(climate_cities)

## List of 3## $ boston :�data.frame�: 22645 obs. of 5 variables:## ..$ DATE: Date[1:22645], format: "1949-01-01" ...## ..$ PRCP: num [1:22645] 0.35 0 0 0.17 16.67 ...## ..$ TMAX: num [1:22645] 5.99 2.14 3.75 5.46 7.07 14.3 9.21 12.9 9 7.33 ...## ..$ TMIN: num [1:22645] -0.84 -1.51 -3.15 -1.49 0.61 3.26 1.5 2.21 2.12 4.26 ...## ..$ WIND: num [1:22645] 10.93 13.11 9.57 7.68 3.31 ...## $ providence:�data.frame�: 22645 obs. of 5 variables:## ..$ DATE: Date[1:22645], format: "1949-01-01" ...## ..$ PRCP: num [1:22645] 0 0 0 0.65 12.9 18.4 0 0 0 0 ...## ..$ TMAX: num [1:22645] 3.68 0.93 4.06 5.44 9.95 ...## ..$ TMIN: num [1:22645] -1.96 -2.19 -4.68 -3.78 -1.26 2.65 0.66 2.25 0.18 2.87 ...## ..$ WIND: num [1:22645] 12.32 13.82 10.05 7.52 3.2 ...## $ worcester :�data.frame�: 22645 obs. of 5 variables:## ..$ DATE: Date[1:22645], format: "1949-01-01" ...## ..$ PRCP: num [1:22645] 0.47 0 0 0 9.23 ...## ..$ TMAX: num [1:22645] 6.6 -0.06 3.26 3.82 3.82 ...## ..$ TMIN: num [1:22645] -3.39 -3.39 -5.62 -8.39 -5.62 -1.73 -3.95 -2.28 -2.28 2.71 ...## ..$ WIND: num [1:22645] 11.21 13.18 9.58 7.06 3.83 ...

For this vignette, we will use the dataset for Boston, MA.

# extract dataset for boston

obs_day <- climate_cities[[�boston�]]

# subset complete water years using complete_water_years() function

obs_day <- obs_day[complete_water_years(obs_day$DATE),]

obs_day <- mutate(obs_day,WYEAR=wyear(DATE, start_month=10), # extract water year

MONTH=month(DATE), # extract month

TEMP=(TMIN+TMAX)/2) %>% # compute mean temperature

select(WYEAR, MONTH, DATE, PRCP, TEMP, TMIN, TMAX, WIND)

# compute annual timeseries by water year

obs_wyr <- group_by(obs_day, WYEAR) %>%summarise(PRCP=sum(PRCP),

TEMP=mean(TEMP),

2

TMIN=mean(TMIN),TMAX=mean(TMAX),WIND=mean(WIND))

# save the daily and annual timeseries to a list

obs <- list(day=obs_day, wyr=obs_wyr)

# summary of the daily timeseries

summary(obs[[�day�]])

## WYEAR MONTH DATE PRCP## Min. :1950 Min. : 1.000 Min. :1949-10-01 Min. : 0.000## 1st Qu.:1965 1st Qu.: 4.000 1st Qu.:1964-12-30 1st Qu.: 0.000## Median :1980 Median : 7.000 Median :1980-03-31 Median : 0.000## Mean :1980 Mean : 6.523 Mean :1980-03-31 Mean : 3.259## 3rd Qu.:1995 3rd Qu.:10.000 3rd Qu.:1995-07-01 3rd Qu.: 2.120## Max. :2010 Max. :12.000 Max. :2010-09-30 Max. :185.850## TEMP TMIN TMAX WIND## Min. :-18.68 Min. :-25.130 Min. :-14.05 Min. : 0.020## 1st Qu.: 3.09 1st Qu.: -0.850 1st Qu.: 6.98 1st Qu.: 4.090## Median : 10.86 Median : 6.070 Median : 15.64 Median : 6.330## Mean : 10.62 Mean : 5.903 Mean : 15.33 Mean : 6.724## 3rd Qu.: 18.86 3rd Qu.: 13.810 3rd Qu.: 23.98 3rd Qu.: 8.920## Max. : 31.99 Max. : 26.240 Max. : 38.15 Max. :23.980

This figure shows the daily timeseries of each climate variable.

gather(obs[[�day�]], VAR, VALUE, PRCP:WIND) %>%ggplot(aes(DATE, VALUE)) +geom_line() +facet_wrap(~VAR, scales=�free_y�, ncol=1) +labs(x=��, y=��, title=�Historical Daily Weather Data for Boston, MA�)

3

PRCP

TEMP

TMIN

TMAX

WIND

0

50

100

150

−20−10

0102030

−20−10

01020

0

20

40

05

10152025

1950 1960 1970 1980 1990 2000 2010

Historical Daily Weather Data for Boston, MA

This figure shows the annual sum of precipitation, and means of minimum temperature, maximum temperature,and wind speed by water year.

gather(obs[[�wyr�]], VAR, VALUE, PRCP:WIND) %>%ggplot(aes(WYEAR, VALUE)) +

geom_line() +facet_wrap(~VAR, scales=�free_y�, ncol=1) +labs(x=��, y=��, title=�Historical Annual Weather Data for Boston, MA�)

4

PRCP

TEMP

TMIN

TMAX

WIND

750

1000

1250

1500

10

11

12

5

6

7

14

15

16

17

6.0

6.5

7.0

1960 1980 2000

Historical Annual Weather Data for Boston, MA

3 Daily Weather Generator

The primary function for running the daily weather generator on a single site is wgen_daily(). The functionthen depends on a number of other functions, which are described in the next section.

The first argument to wgen_daily() is the daily timeseries of historical climate variables. This argumentmust be provided as a zoo object which is designed to store regular and irregular timeseries (see the zoopackage homepage for more details).

This code converts the daily timeseries (currently in obs[[�day�]]) to a zoo object. The first argumentprovides the values of each variable, the second argument provides the associated dates.

zoo_day <- zoo(x = obs[[�day�]][, c(�PRCP�, �TEMP�, �TMIN�, �TMAX�, �WIND�)],order.by = obs[[�day�]][[�DATE�]])

class(zoo_day)

## [1] "zoo"

summary(zoo_day)

5

http://cran.r-project.org/web/packages/zoo/index.html

## Index PRCP TEMP TMIN## Min. :1949-10-01 Min. : 0.000 Min. :-18.68 Min. :-25.130## 1st Qu.:1964-12-30 1st Qu.: 0.000 1st Qu.: 3.09 1st Qu.: -0.850## Median :1980-03-31 Median : 0.000 Median : 10.86 Median : 6.070## Mean :1980-03-31 Mean : 3.259 Mean : 10.62 Mean : 5.903## 3rd Qu.:1995-07-01 3rd Qu.: 2.120 3rd Qu.: 18.86 3rd Qu.: 13.810## Max. :2010-09-30 Max. :185.850 Max. : 31.99 Max. : 26.240## TMAX WIND## Min. :-14.05 Min. : 0.020## 1st Qu.: 6.98 1st Qu.: 4.090## Median : 15.64 Median : 6.330## Mean : 15.33 Mean : 6.724## 3rd Qu.: 23.98 3rd Qu.: 8.920## Max. : 38.15 Max. :23.980

The daily historical timeseries is then passed as the first argument to the wgen_daily() function. The otherarguments specify a number of parameters for the simulation. These parameters include:

• n_year - number of simulation years• start_month - first month of the water year (e.g. 10=October)• start_water_year - first water year to use for the simulated timeseries• include_leap_days - boolean flag to include or exclude leap days in the simulation timeseries (currently

only allows FALSE)• n_knn_annual - number of years to use in the annual k-nearest neighbors (knn) sampling algorithm• dry_wet_threshold - daily precipitation threshold between the dry and wet Markov states• wet_extreme_quantile_threshold - daily precipitation quantile used as threshold between the wet

and extreme Markov states• adjust_annual_precip - boolean flag to adjust simulated daily precipitation amounts to match the

simulated annual precipitation• annual_precip_adjust_limits - lower and upper limits for the daily precipitation adjustment factor

to match simulated annual precipitation (e.g. c(0.9, 1.1) will only adjust daily values between90-110%)

• dry_spell_changes - adjustment factor for changing the transition probabilities of dry spells• wet_spell_changes - adjustment factor for changing the transition probabilities of wet spells• prcp_mean_changes - multiplicative adjustment factor for changing the annual mean precipitation• prcp_cv_changes - multiplicative adjustment factor for changing the coe�cient of variation for precipi-

tation• temp_mean_changes - additive adjustment factor for changing the annual mean temperature

The last five parameters for assigning adjustment factors can each be either a single number, which is appliedto all months, or a vector of length 12 where each value is the change factor for a specific month (startingwith January).

sim <- wgen_daily(zoo_day,n_year=2,start_month=10,start_water_year=2000,include_leap_days=FALSE,n_knn_annual=100,dry_wet_threshold=0.3,wet_extreme_quantile_threshold=0.8,adjust_annual_precip=TRUE,

6

annual_precip_adjust_limits=c(0.9, 1.1),dry_spell_changes=1,wet_spell_changes=1,prcp_mean_changes=1,prcp_cv_changes=1,temp_mean_changes=0)

The wgen_daily() function returns a list object with the following elements:

str(sim, max=1)

## List of 6## $ obs :�data.frame�: 22280 obs. of 9 variables:## $ state_thresholds :�data.frame�: 12 obs. of 3 variables:## $ transition_matrices:List of 12## $ state_equilibria :List of 12## $ change_factors :List of 6## $ out :�data.frame�: 730 obs. of 11 variables:

The first element, sim[[�obs�]], is the historical daily timeseries, with additional columns for the wateryear (WYEAR), month (MONTH), and Markov state (d=Dry, w=Wet, e=Extreme).

head(sim[[�obs�]])

## DATE WYEAR MONTH STATE PRCP TEMP TMIN TMAX WIND## 1 1949-10-01 1950 10 d 0.00 13.765 8.89 18.64 8.53## 2 1949-10-02 1950 10 d 0.00 10.225 4.94 15.51 7.31## 3 1949-10-03 1950 10 d 0.00 12.665 6.06 19.27 3.50## 4 1949-10-04 1950 10 d 0.00 16.555 11.06 22.05 2.76## 5 1949-10-05 1950 10 d 0.28 18.970 14.02 23.92 5.59## 6 1949-10-06 1950 10 d 0.00 14.135 7.84 20.43 6.76

The second element, sim[[�state_thresholds�]], is a dataframe with the monthly precipitation thresholdsfor each Markov state. For example, in June the Markov state on a given day is dry if the precipitation onthat day is less than 0.3 mm/day, wet if it is between 0.3 and 3.33 mm/day, and extreme if it is greater than3.33 mm/day.

sim[[�state_thresholds�]]

## MONTH DRY_WET WET_EXTREME## 1 1 0.3 4.18## 2 2 0.3 3.80## 3 3 0.3 4.93## 4 4 0.3 4.03## 5 5 0.3 3.60## 6 6 0.3 3.33## 7 7 0.3 2.70## 8 8 0.3 3.12## 9 9 0.3 2.60## 10 10 0.3 2.88## 11 11 0.3 4.75## 12 12 0.3 4.60

7

The third element, sim[[�transition_matrices�]], is a list of length 12 where each element is the Markovstate transition matrix for each month. For example, the transition matrix for June is:

sim[[�transition_matrices�]][[6]]

#### d w e## d 0.6977169 0.1634703 0.1388128## w 0.5189189 0.2621622 0.2189189## e 0.3863014 0.2520548 0.3616438

The fourth element, sim[[�state_equilibria]]‘, is a list of length 12 where each element is the stateequilibrium of each monthly transition matrix. For example, the equilibrium for June is:

sim[[�state_equilibria�]][[6]]

## d w e## 0.5997119 0.2009608 0.1993273

The fifth element, sim[[�change_factors�]], is a list containing the monthly values for each adjustmentfactor as well as an additional set of adjustments that are computed from the simulated state transitions(ratio_probability_wet). Each of these elements will each be a vector of length 12 (even if the correspondingparameter to wgen_daily() was a single value, in which case the same value will be used for all months).

str(sim[[�change_factors�]])

## List of 6## $ ratio_probability_wet: num [1:12] 1 1 1 1 1 1 1 1 1 1 ...## $ dry_spell_changes : num [1:12] 1 1 1 1 1 1 1 1 1 1 ...## $ wet_spell_changes : num [1:12] 1 1 1 1 1 1 1 1 1 1 ...## $ prcp_mean : num [1:12] 1 1 1 1 1 1 1 1 1 1 ...## $ prcp_cv : num [1:12] 1 1 1 1 1 1 1 1 1 1 ...## $ temp_mean : num [1:12] 0 0 0 0 0 0 0 0 0 0 ...

The last element, sim[[�out�]], contains the simulated daily timeseries as a dataframe with the followingcolumns:

• SIM_YEAR - simulation year starting at 1• DATE - simulation date (starting on the first day of the start_month and start_water_year parameters,

e.g. 1999-10-01 for water year 2000 starting in month 10 (October))• MONTH - simulation month• WDAY - simulation “water day” which is the julian day starting at the beginning of the water year

(e.g. water day 1 = Oct 1)• SAMPLE_DATE - the historical date sampled from the KNN sampling algorithm and used for the each

simulation date• STATE - simulated Markov state• PRCP, TEMP, TMIN, TMAX, WIND - simulated climate variables

8

str(sim[[�out�]])

## �data.frame�: 730 obs. of 11 variables:## $ SIM_YEAR : int 1 1 1 1 1 1 1 1 1 1 ...## $ DATE : Date, format: "1999-10-01" "1999-10-02" ...## $ MONTH : num 10 10 10 10 10 10 10 10 10 10 ...## $ WDAY : int 1 2 3 4 5 6 7 8 9 10 ...## $ SAMPLE_DATE: Date, format: "2004-10-01" "1985-09-30" ...## $ STATE : Ord.factor w/ 3 levels "d"<"w"<"e": 1 1 1 1 2 3 3 1 1 1 ...## $ PRCP : num 0.284 0 0 0 2.25 ...## $ TEMP : num 15.8 17.3 15.2 12.7 12.5 ...## $ TMIN : num 12.07 10.11 10.19 7.48 7.84 ...## $ TMAX : num 19.5 24.6 20.3 17.9 17.1 ...## $ WIND : num 1.8 3.56 5.44 5.95 7.41 6.29 9.69 7.3 6.37 9.68 ...

This figure shows the simulated timeseries for each variable:

select(sim[[�out�]], DATE, PRCP, TEMP, TMIN, TMAX, WIND) %>%gather(VAR, VALUE, -DATE) %>%ggplot(aes(DATE, VALUE)) +geom_line() +facet_wrap(~VAR, ncol=1, scales=�free_y�) +labs(x=�Simulation Date�, y=�Simulated Value�, title=�Simulated Timeseries�) +theme_bw()

9

PRCP

TEMP

TMIN

TMAX

WIND

0

20

40

60

−10

0

10

20

30

−20−10

01020

0

10

20

30

0

5

10

15

2000−01 2000−07 2001−01 2001−07Simulation Date

Sim

ulat

ed V

alue

Simulated Timeseries

4 Algorithm Description

The daily weather generator uses the following algorithm:

10

• Simulate annual precipitation using an ARMA model (wavelet model coming soon)• For each simulation year:• Use KNN sampling to create a historical daily timeseries that only includes only years where annual

precipitation is similar to the current simulated annual precipitation• compute the monthly Markov state transition thresholds and assign states to daily timeseries• fit the monthly Markov chain transition probability matrices• adjust the transition matrices based on the dry and wet spell change factors• for each simulation day in the current year, use KNN sampling to sample the synthetic historical daily

values based on the state, precipitation and mean temperature of the current and previous days• Combine the daily simulation for each year• Adjust daily simulated precipitation values for each year to equal the simulated annual values (if

adjust_annual_precip=TRUE)• Adjust daily simulated precipitation values based on the precipitation adjustment factors• Adjust daily simulated minimum, maximum and mean temperature based on the temperature adjustment

factor

4.1 Annual Precipitation Simulation

the sim_annual_arima() function is used to simulate annual precipitation using an ARIMA model. Thearguments to this function include:

• x - historical annual precipitation as a zoo object

function first fits the ARIMA model to the historical annual precipitation. The ARIMA model is then used togenerate a simulation of annual precipitation for length n_year. The sim_annual_arma() function returns alist of length 3 containing the ARIMA model object (model), the historical annual precipitation (observed),and the simulated annual precipitation (values).

zoo_precip_wyr <- zoo(obs[[�wyr�]][[�PRCP�]], order.by = obs[[�wyr�]][[�WYEAR�]])sim_annual_prcp <- sim_annual_arima(x = zoo_precip_wyr, start_year = 2000, n_year = 40)

This function returns a list of length 3 containing elements:

• model - the ARIMA model object fitted to the historical data• x - the historical annual precipitation used to fit the model as a zoo object• out - the simulated annual precipitation as a zoo object

str(sim_annual_prcp, max.level=1)

## List of 3## $ model:List of 17## ..- attr(*, "class")= chr [1:2] "ARIMA" "Arima"## $ x :�zoo� series from 1950 to 2010## Data: num [1:61] 798 1235 1248 1186 1686 ...## Index: num [1:61] 1950 1951 1952 1953 1954 ...## $ out :�zoo� series from 2000 to 2039## Data: num [1:40] 1571 1404 1071 1234 1437 ...## Index: num [1:40] 2000 2001 2002 2003 2004 ...

11

The model element contains an ARIMA model object, which contains the parameters and error statistics ofthe fitted model:

summary(sim_annual_prcp[[�model�]])

## Series: x## ARIMA(0,0,0) with non-zero mean#### Coefficients:## intercept## 1190.3175## s.e. 27.0648#### sigma^2 estimated as 44683: log likelihood=-413.13## AIC=830.26 AICc=830.47 BIC=834.48#### Training set error measures:## ME RMSE MAE MPE MAPE MASE## Training set 8.013399e-14 211.3826 162.3144 -3.399893 14.55972 0.6760569## ACF1## Training set -0.0452353

The out element is an annual precipitation timeseries as a zoo object:

sim_annual_prcp[[�out�]]

## 2000 2001 2002 2003 2004 2005 2006## 1570.8299 1404.1184 1071.1577 1233.7399 1436.6759 1663.0372 1254.2111## 2007 2008 2009 2010 2011 2012 2013## 969.9498 982.4138 1614.2915 752.6348 1836.2482 1135.0726 1094.2667## 2014 2015 2016 2017 2018 2019 2020## 1223.6231 1387.6197 1254.3302 776.8215 1265.0490 1285.5294 1329.7351## 2021 2022 2023 2024 2025 2026 2027## 972.2932 689.1246 1121.7078 990.7092 1028.4217 988.7052 1106.1863## 2028 2029 2030 2031 2032 2033 2034## 1124.5317 1358.5977 1398.8313 1022.3674 1125.0388 1266.7207 1485.9976## 2035 2036 2037 2038 2039## 1178.4652 831.2053 1239.3271 1165.1438 1564.9916

This figure plots the simulated annual precipitation and shows the mean annual precipitation based on thehistorical dataset for reference.

data.frame(WYEAR=time(sim_annual_prcp[[�out�]]),PRCP=zoo::coredata(sim_annual_prcp[[�out�]])) %>%

ggplot(aes(WYEAR, PRCP)) +geom_line() +geom_hline(yint=mean(obs[[�wyr�]][[�PRCP�]]), color=�red�) +geom_text(aes(label=TEXT), data=data.frame(WYEAR=2000, PRCP=mean(obs[[�wyr�]][[�PRCP�]]), TEXT="Historical Mean"),

hjust=0, vjust=-1, color=�red�) +labs(x="Simulation Water Year", y="Annual Precip (mm/yr)", title="Simulated Annual Precipitation")

12

Historical Mean

900

1200

1500

1800

2000 2010 2020 2030 2040Simulation Water Year

Annu

al P

reci

p (m

m/y

r)Simulated Annual Precipitation

To determine whether the annual precipitation generator su�ciently reproduces the statistics of the historicaltimeseries, a Monte Carlo simulation is performed for 50 trials using the same historical timeseries.

# batch simulation to compare mean/sd/skew

sim_wyr_batch <- lapply(1:50, function(i) {sim_annual_arima(x = obs[[�wyr�]][[�PRCP�]], start_year = 2000, n_year = 20)[[�out�]]

})names(sim_wyr_batch) <- paste0(seq(1, length(sim_wyr_batch)))sim_wyr_batch <- do.call(merge, sim_wyr_batch) %>%

apply(MARGIN=2, FUN=function(x) { c(mean=mean(x), sd=sd(x), skew=skewness(x))}) %>%t

sim_wyr_stats <- rbind(t(colMeans(sim_wyr_batch)),c(mean=mean(obs[[�wyr�]][[�PRCP�]]),

sd=sd(obs[[�wyr�]][[�PRCP�]]),skew=skewness(obs[[�wyr�]][[�PRCP�]])))

sim_wyr_stats <- as.data.frame(sim_wyr_stats)sim_wyr_stats$dataset <- c(�obs�, �sim�)sim_wyr_stats <- gather(sim_wyr_stats, stat, value, mean, sd, skew)

The following figure shows the distributions of the mean, standard deviation, and skewness statistics over thesimulation trials. The overall mean value of each statistic (blue point) is also compared to the statistic valueof the historical timeseries (red point). This figure shows that the annual precipitation generator createssimulated timeseries with similar statistics as the historical timeseries.

as.data.frame(sim_wyr_batch) %>%mutate(trial=row_number()) %>%gather(stat, value, mean:skew) %>%ggplot(aes(x=stat, value)) +geom_boxplot() +geom_point(aes(x=stat, y=value, color=dataset), data=sim_wyr_stats, size=3) +scale_color_manual(��, values=c(�sim�=�deepskyblue�, �obs�=�orangered�),

labels=c(�sim�=�Mean of Simulated Trials�, �obs�=�Mean of Historical�)) +labs(x=��, y=��) +facet_wrap(~stat, scales=�free�)

13

mean sd skew

1100

1150

1200

1250

1300

200

250

300

−1.0

−0.5

0.0

0.5

1.0

mean sd skew

Mean of HistoricalMean of Simulated Trials

4.2 Daily Weather Simulation

The annual precipitation generated in the previous step is then used to drive a daily simulation for all weathervariables. For each year, a k-nearest neighbors algorithm samples n_knn_annual years from the historicalrecord with replacement. The daily historical timeseries for each year is then extracted and combined intoa daily timeseries, where the sequence of years is based on the KNN sampling. A Markov Chain model isthen fitted to the sampled historical daily and used to simulate a sequence of states for each simulation year.Finally, the simulated states are used to sample the historical daily values using a daily KNN algorithm.

4.2.1 KNN Annual Sampling

For each simulation year, a set of n_knn_annual years are sampled with replacement from the historicalrecord using inverse distance weighted sampling. Historical years with annual precipitation similar to thecurrent simulated annual precipitation are given a higher weight and thus more likely to be sampled thanhistorical years where the annual precipitation was much higher or lower than the simulated precipitation.

n_knn_annual <- 100

# generate vector of sampled years

sampled_years <- knn_annual(prcp=coredata(sim_annual_prcp[[�out�]])[1],obs_prcp=zoo_precip_wyr,n=n_knn_annual)

sampled_years

## [1] 1984 1998 1998 1990 2006 1998 1954 2006 1984 2006 1998 1958 1998 1958## [15] 1984 1960 1951 1984 1984 1967 1984 1970 1974 1956 1998 1998 1955 1998## [29] 2004 1955 1984 1998 1960 1998 1994 1952 1958 1998 1998 1998 2010 1996## [43] 1984 2006 1998 1998 1955 1961 1996 1954 1958 1998 1970 1984 1998 1983## [57] 2008 1967 1984 2004 1954 1958 2003 1998 1955 2009 1955 1998 1998 1987## [71] 1998 1998 2008 1996 1998 1998 2006 1976 2010 1998 1984 2004 1998 1998

14

## [85] 1998 1954 1954 1967 1998 2003 1998 1987 2006 1984 1998 2006 2006 1974## [99] 1970 1984

This figure shows the historical annual precipitation for each water year. The red line shows the annualprecipitation for the first simulated year. The red points show which historical years were included in theKNN sampling. Note that these points tend to be close to the target simulated precipitation.

ggplot() +geom_point(aes(WYEAR, PRCP, color=SAMPLED), data=obs[[�wyr�]] %>% mutate(SAMPLED=WYEAR %in% sampled_years)) +geom_hline(yint=sim_annual_prcp[[�out�]][1], linetype=2, color=�red�) +geom_text(aes(x=x, y=y, label=label),

data=data.frame(x=min(obs[[�wyr�]][[�WYEAR�]]),y=sim_annual_prcp[[�out�]][1],label=�Target Simulated Annual Precip�),

vjust=-0.5, hjust=0) +scale_color_manual(�Sampled in KNN�, values=c(�TRUE�=�orangered�, �FALSE�=�grey30�)) +labs(x=�Water Year�, y=�Annual Precipitation (mm/yr)�)

Target Simulated Annual Precip

750

1000

1250

1500

1960 1980 2000Water Year

Annu

al P

reci

pita

tion

(mm

/yr)

Sampled in KNNFALSETRUE

For each of the sampled years in sampled_years, the corresponding daily values are extracted from thehistorical time series and then combined into a single timeseries. The following figure shows the resultingdaily timeseries for the 100 years sampled for the first simulated water year. The points are colored by theirhistorical water year, and thus show years where the historical annual precip is closest to the simulated precipare repeated.

# loop through population years and extract historical daily values

sampled_days <- lapply(sampled_years, function(yr) {obs[[�day�]][which(obs[[�day�]][[�WYEAR�]]==yr), ]

}) %>%rbind_all() %>%mutate(SAMPLE_INDEX=row_number())

# plot combined daily timeseries based on the KNN sampled years

ggplot(sampled_days, aes(SAMPLE_INDEX, PRCP, color=WYEAR)) +

15

geom_point(size=1) +labs(x=�Index Day�, y=�Daily Precip (mm/day)�, title=�Synthetic Daily Timeseries from KNN Sampled Years�)

0

50

100

150

0 10000 20000 30000Index Day

Dai

ly P

reci

p (m

m/d

ay)

196019701980199020002010

WYEAR

Synthetic Daily Timeseries from KNN Sampled Years

4.2.2 Daily Simulation

The resulting timeseries of daily values for the sampled years is then used to run the sim_daily() function.In this function, a Markov Chain simulation is fitted to the synthetic daily timeseries.

4.2.2.1 State Thresholds The first step is to compute the state thresholds using the mc_state_threshold()function which takes a vector of daily precipitation, a vector of corresponding months, and thedry_wet_threshold and wet_extreme_quantile_threshold.

thresh <- mc_state_threshold(sampled_days[[�PRCP�]], sampled_days[[�MONTH�]],dry_wet_threshold=0.3, wet_extreme_quantile_threshold=0.8)

thresh_df <- do.call(rbind, thresh) %>%as.data.frame() %>%mutate(month=row_number()) %>%select(month, dry_wet, wet_extreme)

thresh_df

## month dry_wet wet_extreme## 1 1 0.3 4.600## 2 2 0.3 5.000## 3 3 0.3 4.970## 4 4 0.3 5.356## 5 5 0.3 8.100## 6 6 0.3 5.970## 7 7 0.3 3.280## 8 8 0.3 5.150## 9 9 0.3 1.800

16

## 10 10 0.3 3.150## 11 11 0.3 7.350## 12 12 0.3 4.694

This figure shows the threshold amounts between the Dry/Wet and Wet/Extreme states for each month.

gather(thresh_df, var, value, -month) %>%ggplot(aes(factor(month), value, fill=var)) +geom_bar(stat=�identity�, position=�dodge�) +scale_fill_manual(�Threshold�, values=c(�dry_wet�=�orangered�, �wet_extreme�=�deepskyblue�),

labels=c(�dry_wet�=�Dry-Wet�, �wet_extreme�=�Wet-Extreme�)) +labs(x=�Month�, y=�Threshold Amount (mm/day)�, title=�Markov State Thresholds by Month�)

0

2

4

6

8

1 2 3 4 5 6 7 8 9 10 11 12Month

Thre

shol

d Am

ount

(mm

/day

)

ThresholdDry−WetWet−Extreme

Markov State Thresholds by Month

4.2.2.2 Assign States After the state thresholds are determined, they are assigned to the syntheticdaily timeseries, sampled_days.

sampled_days$STATE <- mc_assign_states(sampled_days$PRCP, sampled_days$MONTH, c(�d�, �w�, �e�), thresh)head(sampled_days)

## Source: local data frame [6 x 10]#### WYEAR MONTH DATE PRCP TEMP TMIN TMAX WIND SAMPLE_INDEX STATE## 1 1984 10 1983-10-01 3.15 15.420 11.41 19.43 4.91 1 e## 2 1984 10 1983-10-02 5.35 16.630 12.47 20.79 1.33 2 e## 3 1984 10 1983-10-03 0.00 19.825 14.59 25.06 5.89 3 d## 4 1984 10 1983-10-04 0.00 20.760 13.24 28.28 7.72 4 d

17

## 5 1984 10 1983-10-05 0.55 15.110 10.38 19.84 8.08 5 w## 6 1984 10 1983-10-06 0.10 16.965 13.49 20.44 6.39 6 d

This figure shows the Markov state for each day in the first 5 simulated years.

filter(sampled_days, SAMPLE_INDEX <= 365*5) %>%ggplot(aes(SAMPLE_INDEX, PRCP, color=STATE)) +

geom_point(size=2) +scale_color_manual(�Markov State�,

values=c(�d�=�orangered�, �w�=�chartreuse3�, �e�=�deepskyblue�),labels=c(�d�=�Dry�, �w�=�Wet�, �e�=�Extreme�)) +

ylim(0, 50) +labs(x=�Sample Date Index�, y=�Daily Precipitation (mm/day)�, title=�Markov State of Synthetic Daily Timeseries�)

## Warning: Removed 17 rows containing missing values (geom_point).

0

10

20

30

40

50

0 500 1000 1500Sample Date Index

Dai

ly P

reci

pita

tion

(mm

/day

)

Markov StateDryWetExtreme

Markov State of Synthetic Daily Timeseries

4.2.2.3 Fit Markov Chain Transition Matrix After the states have been assigned ot the sampleddaily timeseries, the mc_fit() function can be used to fit a Markov Chain model for each month.

transition_matrices <- mc_fit(states=sampled_days[[�STATE�]], months=sampled_days[[�MONTH�]])

For example, the transition matrix for June is shown below. The probability of having a wet day follow a dryday is 0.26, and the probability of two dry days in a row is 0.66.

transition_matrices[[6]]

#### d w e

18

## d 0.65725542 0.25607354 0.08667104## w 0.44909502 0.28167421 0.26923077## e 0.24957841 0.39797639 0.35244519

The equilibrium state can be determined for each month using the mc_state_equilibrium() function. Thisfunction finds the eigenvector of a transition matrix. The equilibrium state for June is thus:

mc_state_equilibrium(transition_matrices[[6]])

## d w e## 0.5192105 0.2905118 0.1902777

Once the transition matrices have been fit to the sampled historical daily values, the daily generator can berun.

4.2.2.4 KNN and Markov Chain Simulation Given the fitted monthly transition matrices, the dailysimulation for each simulation year can be performed using the sim_mc_knn_day() function. This functionsimulates a chain of Markov states using the transition matrices, and uses a KNN sampling algorithm tochoose the values of each weather variable.

sampled_days <- dplyr::mutate(sampled_days,WDAY=waterday(DATE, start_month=10),STATE_PREV=lag(STATE),PRCP_PREV=lag(PRCP),TEMP_PREV=lag(TEMP),TMAX_PREV=lag(TMAX),TMIN_PREV=lag(TMIN),WIND_PREV=lag(WIND))

sim_days <- sim_mc_knn_day(x=sampled_days, n_year=1, states=c(�d�, �w�, �e�), transitions=transition_matrices,start_month=10, start_water_year=2000, include_leap_days=FALSE)

str(sim_days, max=1)

## �data.frame�: 365 obs. of 11 variables:## $ SIM_YEAR : num 1 1 1 1 1 1 1 1 1 1 ...## $ DATE : Date, format: "1999-10-01" "1999-10-02" ...## $ MONTH : num 10 10 10 10 10 10 10 10 10 10 ...## $ WDAY : int 1 2 3 4 5 6 7 8 9 10 ...## $ SAMPLE_DATE: Date, format: "1957-10-01" "1957-10-02" ...## $ STATE : Ord.factor w/ 3 levels "d"<"w"<"e": 2 1 1 1 2 1 1 1 1 1 ...## $ PRCP : num 0.35 0 0 0 0.88 0 0 0 0 0 ...## $ TEMP : num 19 14.7 12 10.9 14.9 ...## $ TMIN : num 12.39 9.66 5.89 6.51 9.93 ...## $ TMAX : num 25.7 19.6 18.1 15.3 19.8 ...## $ WIND : num 0.4 8.55 9.28 8.58 8.02 ...

This figure shows the simulated weather variables for the current simulation year.

19

select(sim_days, DATE, PRCP:WIND) %>%gather(VAR, VALUE, -DATE) %>%ggplot(aes(DATE, VALUE)) +geom_line() +facet_wrap(~VAR, scales=�free_y�, ncol=1) +labs(x=�Simulate Date�, y=�Value�, title=�Daily Simulation for One Year�)

PRCP

TEMP

TMIN

TMAX

WIND

0

20

40

60

80

−10

0

10

20

30

−10

0

10

20

0102030

0

5

10

15

Oct 1999 Jan 2000 Apr 2000 Jul 2000 Oct 2000Simulate Date

Valu

e

Daily Simulation for One Year

4.2.2.5 Annual Precipitation Adjustment Because the daily precipitation values are sampled directlyfrom historical values, the total precipitation for each simulation year will likely di�er from the correspondingsimulated annual precipitation.

The adjust_daily_to_annual() function is used to adjust the simulated daily precipitation values so thatthe sum of these values equals the simulated annual total precipitation.

20

For example, the current simulated annual precipitation is 1571 mm/yr, but the sum of the simulateddaily precipitation is 1101 mm/yr. To adjust the daily values such that their sum equals the simmulatedannual value, the daily values can be all multiplied by a factor of 1.43. The following code block uses theadjust_daily_to_annual() function to perform this adjustment for the first simulation year. Note thatthis function can be applied to the entire simulated timeseries containing multiple years, and adjust eachindividual accordingly.

orig_prcp <- sim_days[[�PRCP�]]adj_prcp <- adjust_daily_to_annual(values_day=sim_days[[�PRCP�]],

years_day=wyear(sim_days[[�DATE�]]),values_yr=coredata(sim_annual_prcp[[�out�]])[1],years_yr=time(sim_annual_prcp[[�out�]])[1],min_ratio=0.5, max_ratio=1.5)

c(Original=sum(orig_prcp), Adjusted=sum(adj_prcp$adjusted), Annual=coredata(sim_annual_prcp[[�out�]])[1])

## Original Adjusted Annual## 1100.57 1570.83 1570.83

This figure shows the original and adjusted daily precipitation values. Note how the adjusted values (dashedred line) are linearly scaled by a factor of 1.43.

data.frame(DATE=sim_days[[�DATE�]],ORIGINAL=orig_prcp,ADJUSTED=adj_prcp$adjusted) %>%

gather(VAR, VALUE, -DATE) %>%ggplot(aes(DATE, VALUE, color=VAR, linetype=VAR)) +geom_line() +scale_color_manual(��, values=c(�ORIGINAL�=�black�, �ADJUSTED�=�orangered�)) +scale_linetype_manual(��, values=c(�ORIGINAL�=1, �ADJUSTED�=2)) +labs(x=�Simulation Date�, y=�Daily Precipitation (mm/day)�)

0

30

60

90

Oct 1999 Jan 2000 Apr 2000 Jul 2000 Oct 2000Simulation Date

Dai

ly P

reci

pita

tion

(mm

/day

)

ORIGINALADJUSTED

21

4.3 Adjustment Factors

After generating the simulated daily timeseries, the precipitation and temperature variables can be adjustedusing the change factor parameters.

For this example, a 5-year simulated timeseries is first generated without using any change factors

sim <- wgen_daily(zoo_day,n_year=5,start_month=10,start_water_year=2000,include_leap_days=FALSE,n_knn_annual=100,dry_wet_threshold=0.3,wet_extreme_quantile_threshold=0.8,adjust_annual_precip=TRUE,annual_precip_adjust_limits=c(0.5, 1.5),dry_spell_changes=1,wet_spell_changes=1,prcp_mean_changes=1,prcp_cv_changes=1,temp_mean_changes=0)

4.3.1 Precipitation Adjustment

The simulated precipitation timeseries can then be adjusted using the adjust_daily_gamma() function.This function performs a quantile mapping by first fitting the non-zero precipitation values to a gammadistribution, adjusting the scale and shape parameters of the distribution, and replacing the precipitationvalues by mapping the original values through the adjusted distribution.

The following code block computes three sets of adjusted timeseries by adjusting the mean, the coe�cient ofvariation, or both parameters.

prcp.unadj <- sim[[�out�]][, c(�DATE�,�MONTH�,�PRCP�)]prcp.adj.mean <- adjust_daily_gamma(prcp.unadj[[�PRCP�]], prcp.unadj[[�MONTH�]],

mean_change=1.2, cv_change=1)prcp.adj.cv <- adjust_daily_gamma(prcp.unadj[[�PRCP�]], prcp.unadj[[�MONTH�]],

mean_change=1, cv_change=1.2)prcp.adj.both <- adjust_daily_gamma(prcp.unadj[[�PRCP�]], prcp.unadj[[�MONTH�]],

mean_change=1.2, cv_change=1.2)prcp.df <- data.frame(DATE=prcp.unadj[[�DATE�]],

UNADJESTED=prcp.unadj[[�PRCP�]],ADJUST_MEAN=prcp.adj.mean[[�adjusted�]],ADJUST_CV=prcp.adj.cv[[�adjusted�]],ADJUST_BOTH=prcp.adj.both[[�adjusted�]]) %>%

gather(GROUP, VALUE, -DATE)

This figure shows the unadjusted and three adjusted daily precipitation timeseries.

ggplot(prcp.df, aes(DATE, VALUE)) +geom_line() +facet_wrap(~GROUP) +labs(x=��, y=�Daily Precipitation (mm/day)�, title=�Unadjusted and Adjusted Daily Precipitation�)

22

UNADJESTED ADJUST_MEAN

ADJUST_CV ADJUST_BOTH

0

50

100

150

0

50

100

150

2000 2001 2002 2003 2004 2000 2001 2002 2003 2004

Dai

ly P

reci

pita

tion

(mm

/day

)Unadjusted and Adjusted Daily Precipitation

This figure shows the annual precipitation for the unadjusted and adjusted precipitation timeseries by wateryear.

mutate(prcp.df, WYEAR=wyear(DATE, start_month=10)) %>%group_by(WYEAR, GROUP) %>%summarise(VALUE=sum(VALUE)) %>%ggplot(aes(factor(WYEAR), VALUE, fill=GROUP)) +geom_bar(stat=�identity�, position=�dodge�) +scale_fill_discrete(��) +labs(x=�Water Year�, y=�Annual Precipitation (mm/yr)�, title=�Annual Precipitation With Adjustments�)

23

0

500

1000

1500

2000 2001 2002 2003 2004Water Year

Annu

al P

reci

pita

tion

(mm

/yr)

UNADJESTEDADJUST_MEANADJUST_CVADJUST_BOTH

Annual Precipitation With Adjustments

This figure shows the cumulative distribution frequency of each timeseries. Note that the y-axis has beentruncated to a maximum of 60 mm/yr in order to show the lower part of the distribution.

prcp.df %>%filter(VALUE>0) %>%group_by(GROUP) %>%arrange(VALUE) %>%mutate(ROW=row_number(),

PROB=(ROW-0.5)/n()) %>%ggplot(aes(PROB, VALUE, color=GROUP)) +geom_line() +ylim(0, 60) +scale_color_discrete(��) +labs(y=�Daily Precipitation (mm/day)�, x=�Non-Exceedence Probability�, title=�Distributions of Adjusted Precipitation Timeseries�)

## Warning: Removed 31 rows containing missing values (geom_path).

24

0

20

40

60

0.00 0.25 0.50 0.75 1.00Non−Exceedence Probability

Dai

ly P

reci

pita

tion

(mm

/day

)

UNADJESTEDADJUST_MEANADJUST_CVADJUST_BOTH

Distributions of Adjusted Precipitation Timeseries

4.3.2 Temperature Adjustment

The temperature timeseries can also be adjusted using a linear additive factor through the adjust_daily_additive()function. This function simply adds the adjustment factor to each daily value. In this example, the meantemperature is adjusted by a factor of 5 meaning each daily mean temperature is increased by 5 degrees.

temp.unadj <- sim[[�out�]][, c(�DATE�,�MONTH�,�TEMP�)]temp.adj <- adjust_daily_additive(temp.unadj[[�TEMP�]], temp.unadj[[�MONTH�]],

mean_change=5)temp.df <- data.frame(DATE=temp.unadj[[�DATE�]],

UNADJESTED=temp.unadj[[�TEMP�]],ADJUSTED=temp.adj[[�adjusted�]]) %>%

gather(GROUP, VALUE, -DATE)

This figure shows the unadjusted and adjusted daily mean temperature.

ggplot(temp.df, aes(DATE, VALUE, color=GROUP)) +geom_line() +scale_color_discrete(��) +labs(x=��, y=�Daily Mean Temperature (degC)�, title=�Unadjusted and Adjusted Daily Mean Temperature�)

25

−10

0

10

20

30

2000 2001 2002 2003 2004

Dai

ly M

ean

Tem

pera

ture

(deg

C)

UNADJESTEDADJUSTED

Unadjusted and Adjusted Daily Mean Temperature

This figure shows the unadjusted and adjusted annual mean temperature. Note how the adjusted annualmean temperatures are 5 degrees higher than the unadjustd temperatures.

mutate(temp.df, WYEAR=wyear(DATE, start_month = 10)) %>%group_by(WYEAR, GROUP) %>%summarise(VALUE=mean(VALUE)) %>%ggplot(aes(factor(WYEAR), VALUE, fill=GROUP)) +geom_bar(stat=�identity�, position=�dodge�) +scale_fill_discrete(��) +labs(x=��, y=�Annual Mean Temperature (degC)�, title=�Unadjusted and Adjusted Annual Mean Temperature�)

0

5

10

15

2000 2001 2002 2003 2004

Annu

al M

ean

Tem

pera

ture

(deg

C)

UNADJESTEDADJUSTED

Unadjusted and Adjusted Annual Mean Temperature

26

Documents

Introduction to the weathergen Package - Amazon S3 · Introduction to the weathergen Package Jerey D Walker, PhD August 02, 2015 Contents 1 Introduction 1 2 Load Historical Data 2