4 ‘wave0’ - parameter ranges, targets and design points

A video presentation of this section can be found here.

In this section we set up the emulation task, defining the input parameter ranges, the calibration targets and all the data necessary to build the first wave of emulators.

Note that, for the sake of clarity, in this workshop we will adopt the word ‘data’ only when referring to the set of runs of the model that are used to train emulators. Empirical observations, that would inform our choice of targets if we were modelling a real-world scenario, will instead be referred to as ‘observations.’

First of all, let us set the parameter ranges:

ranges = list(
  b = c(1e-5, 1e-4), # birth rate
  mu = c(1e-5, 1e-4), # rate of death from other causes
  beta1 = c(0.05, 0.3), # infection rate at time t=0
  beta2 = c(0.1, 0.2), # infection rates at time t=100
  beta3 = c(0.3, 0.5), # infection rates at time t=270
  epsilon = c(0.01, 0.21), # rate of becoming infectious after infection
  alpha = c(0.01, 0.025), # rate of death from the disease
  gamma = c(0.01, 0.08), # recovery rate
  omega = c(0.002, 0.004) # rate at which immunity is lost following recovery
)

We then turn to the targets we will match: the number of infectious individuals \(I\) and the number of recovered individuals \(R\) at times \(t=25, 40, 100, 200\). The targets can be specified by a pair (val, sigma), where ‘val’ represents the measured value of the output and ‘sigma’ represents its standard deviation, or by a pair (min, max), where min represents the lower bound and max the upper bound for the target. Since in Workshop 1 we used the former formulation, here we will show the latter:

targets <- list(
  I25 = c(98.51, 133.25),
  I40 = c(117.17, 158.51),
  I100 = c(22.39, 30.29),
  I200 = c(0.578, 0.782),
  R25 = c(106.34, 143.9),
  R40 = c(218.28, 295.32),
  R100 = c(458.14, 619.84),
  R200 = c(377.6, 510.86)
)

The ‘sigmas’ in our targets list represent the uncertainty we have about the observations. Note that in general we can also choose to include model uncertainty in the ‘sigmas’ to reflect how accurate we think our model is.

For this workshop, we generate parameter sets using a Latin Hypercube design. A rule of thumb is to select at least \(10p\) parameter sets to train emulators, where \(p\) is the number of parameters (9 in this workshop). Using the function maximinLHS in the package lhs, we create a hypercube design with 100 parameter sets for the training set and one with 50 parameter sets for the validation set. Note that we could have chosen \(100\) parameter sets for the validation set: we chose \(50\) simply to highlight that it is not necessary to have as many validation points as training points. Especially when working with expensive models, this can be useful and can help us manage the computational burden of the procedure. After creating the two hypercube designs, we bind them together to create initial_LHS:

initial_LHS_training <- maximinLHS(100, 9)
initial_LHS_validation <- maximinLHS(50, 9)
initial_LHS <- rbind(initial_LHS_training, initial_LHS_validation)

Note that in initial_LHS each parameter is distributed on \([0,1]\). This is not exactly what we need, since each parameter has a different range. We therefore re-scale each component in initial_LHS multiplying it by the difference between the upper and lower bounds of the range of the corresponding parameter and then we add the lower bound for that parameter. In this way we obtain initial_points, which contains parameter values in the correct ranges.

initial_points <- setNames(data.frame(t(apply(initial_LHS, 1, 
                                        function(x) x * purrr::map_dbl(ranges, diff) + 
                                        purrr::map_dbl(ranges, ~.[[1]])))), names(ranges))

We then run the model for the parameter sets in initial_points through the get_results function, specifying that we are interested in the outputs for \(I\) and \(R\) at times \(t=25, 40, 100, 200\). Note that we use the progressr package to create a progress bar that updates every time a new parameter set is run: when you run the code on your own, this will indicate at what stage of the process you are.

initial_results <- list()
with_progress({
  p <- progressor(nrow(initial_points))
for (i in 1:nrow(initial_points)) {
  model_out <- get_results(unlist(initial_points[i,]), nreps = 25, outs = c("I", "R"), 
                           times = c(25, 40, 100, 200))
  initial_results[[i]] <- model_out
  p(message = sprintf("Run %g", i))
}
})

Note that initial_results is a list of length \(3750\), since it has a row for each of the 25 repetitions of the 150 parameter sets in initial_points. Finally, we bind all elements in initial_results to obtain a data frame wave0 and we split it in two parts: the first 2500 elements (corresponding to the first 100 parameter sets) for the training set, all_training, and the last 1250 for the validation set (corresponding to the last 50 parameter sets), all_valid.

wave0 <- data.frame(do.call('rbind', initial_results))
all_training <- wave0[1:2500,]
all_valid <- wave0[2501:3750,]
output_names <- c("I25", "I40", "I100", "I200", "R25", "R40", "R100", "R200")