Surrogate Model Construction
In this tutorial we will illustrate how to construct a (small) surrogate in serial on any computer, and our workflow for building large surrogates on clusters (specific to the Grid Engine queuing system based cluster we use).
Setup
We begin by installing the packages we need in a clean environment:
using Pkg
# create a new environment for the surrogate constructions
Pkg.activate("surrogate_env")
Pkg.add(url="https://github.com/isaacsas/SPRFittingPaper2023.jl.git")
Pkg.add("JLD")
Serial Surrogate Construction
We first demonstrate how to build a (small) surrogate in serial (i.e. on a single CPU core). We start by loading the needed packages:
using SPRFittingPaper2023, JLD
We next define the parameter ranges we wish to tabulate the surrogate over. Reaction rates are specified via a range in log10
space, i.e. log10(kon)
, log10(koff)
, and log10(konb)
. The reach is specified in linear space:
logkon_range = (-3.0, 2.0)
logkoff_range = (-4.0, -1.0)
logkonb_range = (-3.0, 1.5)
reach_range = (2.0, 35.0) # in nm
surpars = SurrogateParams(; logkon_range, logkoff_range, logkonb_range, reach_range)
SurrogateParams{Float64}
logkon_range = (-3.0, 2.0)
logkoff_range = (-4.0, -1.0)
logkonb_range = (-3.0, 1.5)
reach_range = (2.0, 35.0)
[antigen] = 125.23622683286348
[antibody] = 1.0
CP = 1.0
We collect these parameters into a SurrogateParams
structure. Note that here we will build the surrogate with a fixed internal antibody concentration of 1.0
nM and antigen concentration of
SPRFittingPaper2023.DEFAULT_SIM_ANTIGENCONCEN
125.23622683286348
in units of μM. This is feasible to do because the antibody concentration only arises in product with $k_{\text{on}}$, so when fitting we can (internally) interpret the surrogate logkon
values as representing $\log_{10}(k_{\text{on}} [\text{Ab}])$. Similarly, as explained in the methods section of [1], using a fixed antigen concentration is not problematic as we can analytically transform any fit reach value using the internal antigen concentration to a physical reach value corresponding to the true experimental antigen concentration.
Next we specify temporal information for the surrogate:
tstop = 600.0 # simulation end time
tsavelen = 601 # number of time points to save (must be integer currently)
tstop_AtoB = 150.0 # time to remove free antibodies
tsave = collect(range(0.0, tstop, tsavelen)) # times to save at
simpars = SimParams(; tstop, tstop_AtoB, tsave)
SimParams{Float64, 3}
number of particles (N) = 1000
tstop = 600.0
tstop_AtoB = 150.0
number of save points = 601
domain length (L) = 236.68625663347206
resample_initlocs = true
nsims = 1000
We collect these parameters into a SimParams
object. Note that it contains many other parameters for which we typically just use the default value when building a surrogate (for example, by default distributing antigen particles uniformly within a cube). The default number of antigen is
simpars.N
1000
Finally, we specify how many points to tabulate over for each of the four parameters. Parameters are spaced uniformly in log10
space for the rates and linear space for the reach:
# here the order is number of points for
# [logkon, logkoff, logkonb, reach, time]
surrogate_size = (3, 3, 3, 3, tsavelen)
(3, 3, 3, 3, 601)
Note, this is a very small surrogate, which we would not use in any practical fitting assay. More typical values are given in our manuscript (often 30-50 points per parameter).
We are now ready to build and save the surrogate
outfile = tempname() # just use a temporary file name
save_surrogate(outfile, surrogate_size, surpars, simpars)
Note that the surrogate by default saves curves that correspond to the
\[\frac{\text{average number of bound antibodies}}{\text{ the number of antigen in the system}}.\]
Surrogate Format
The surrogate is stored in a Julia JLD file. We can see the raw data in the surrogate via the JLD load
command:
surdata = load(outfile)
Dict{String, Any} with 15 entries:
"CP" => 1.0
"logkonb_range" => (-3.0, 1.5)
"FirstMoment" => [0.0 0.0 0.0; 0.0 0.0 0.0; 0.0 0.0 0.0;;; 0.0 0.0 0.0; 0.…
"logkoff_range" => (-4.0, -1.0)
"L" => 236.686
"surrogate_size" => (3, 3, 3, 3, 601)
"antibodyconcen" => 1.0
"tstop" => 600.0
"antigenconcen" => 125.236
"DIM" => 3
"N" => 1000
"logkon_range" => (-3.0, 2.0)
"reach_range" => (2.0, 35.0)
"tsave" => [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0 … 591…
"tstop_AtoB" => 150.0
To access a given field we can say
surdata["tstop"]
600.0
In particular, surdata["FirstMoment"]
will correspond to the table of solution curves.
To load the surrogate for use in fitting we instead use
sur = Surrogate(outfile)
Surrogate{NTuple{5, Int64}, Interpolations.BSplineInterpolation{Float64, 5, Array{Float64, 5}, Interpolations.BSpline{Interpolations.Linear{Interpolations.Throw{Interpolations.OnGrid}}}, NTuple{5, Base.OneTo{Int64}}}, Float64, Float64, 3}((3, 3, 3, 3, 601), [0.0 0.0 0.0; 0.0 0.0 0.0; 0.0 0.0 0.0;;; 0.0 0.0 0.0; 0.0 0.0 0.0; 0.0 0.0 0.0;;; 0.0 0.0 0.0; 0.0 0.0 0.0; 0.0 0.0 0.0;;;; 0.0 0.0 0.0; 0.0 0.0 0.0; 0.0 0.0 0.0;;; 0.0 0.0 0.0; 0.0 0.0 0.0; 0.0 0.0 0.0;;; 0.0 0.0 0.0; 0.0 0.0 0.0; 0.0 0.0 0.0;;;; 0.0 0.0 0.0; 0.0 0.0 0.0; 0.0 0.0 0.0;;; 0.0 0.0 0.0; 0.0 0.0 0.0; 0.0 0.0 0.0;;; 0.0 0.0 0.0; 0.0 0.0 0.0; 0.0 0.0 0.0;;;;; 0.0010920000000000003 0.0009960000000000001 0.0009160000000000007; 0.2746818181818181 0.2710588235294119 0.25921999999999995; 1.0 0.9999032258064516 0.9990560000000002;;; 0.0009760000000000004 0.0010520000000000004 0.0009199999999999997; 0.2682592592592592 0.27013333333333334 0.2574799999999998; 1.0 0.9999487179487179 0.999;;; 0.0010159999999999991 0.0009400000000000002 0.0009199999999999996; 0.26850000000000007 0.2744516129032258 0.25867200000000007; 0.9998 0.9996111111111111 0.9986199999999997;;;; 0.000924 0.0009440000000000002 0.000888; 0.27166666666666667 0.2691034482758621 0.2584159999999999; 1.0 0.9999629629629629 0.9989880000000005;;; 0.0010959999999999998 0.0010040000000000008 0.0009760000000000002; 0.26899999999999996 0.26749999999999996 0.25578000000000023; 0.9984666666666666 0.9984666666666666 0.9973840000000006;;; 0.0009799999999999998 0.0008360000000000002 0.0010320000000000004; 0.23969565217391303 0.2427352941176471 0.24318750000000006; 0.8436666666666666 0.8426 0.8453684210526317;;;; 0.0009720000000000003 0.0010040000000000003 0.0009199999999999996; 0.27469565217391306 0.2703636363636363 0.2576800000000001; 1.0 0.9998666666666667 0.9989439999999999;;; 0.0010360000000000005 0.0011039999999999989 0.0009240000000000002; 0.25381818181818183 0.2576428571428572 0.24841999999999995; 0.9880666666666665 0.9888 0.9874400000000003;;; 0.0010599999999999997 0.0010679999999999997 0.0010080000000000002; 0.2363684210526316 0.233 0.2381578947368421; 0.6203333333333333 0.6201333333333334 0.6427999999999999;;;;; 0.002115999999999999 0.001956 0.0018239999999999997; 0.4695909090909091 0.46702941176470586 0.429936; 1.0 0.9998709677419356 0.998884;;; 0.001983999999999999 0.002039999999999999 0.0017599999999999996; 0.4662592592592593 0.46626666666666666 0.4283120000000001; 1.0 0.9999743589743589 0.9990279999999999;;; 0.0019039999999999999 0.0019080000000000006 0.001768; 0.4676785714285715 0.4684516129032259 0.4294279999999999; 0.9998 0.9995555555555555 0.9987000000000003;;;; 0.001856 0.001976000000000003 0.001736000000000001; 0.46646666666666664 0.4651379310344828 0.42974000000000023; 1.0 0.9999629629629629 0.9991399999999997;;; 0.002124 0.002044 0.0019320000000000006; 0.45141176470588235 0.44779166666666664 0.41685599999999995; 0.9984666666666666 0.9984000000000001 0.9974399999999998;;; 0.00196 0.0017799999999999988 0.002016000000000001; 0.38243478260869557 0.3825882352941177 0.3750625; 0.8436666666666666 0.8426666666666667 0.8473157894736841;;;; 0.0019199999999999994 0.002096000000000002 0.0017999999999999997; 0.4718260869565217 0.4651515151515152 0.4285519999999999; 1.0 0.9997999999999999 0.9989320000000003;;; 0.0020519999999999987 0.0020759999999999997 0.001924; 0.4026818181818182 0.4003214285714286 0.3896720000000003; 0.9880666666666665 0.9887333333333332 0.987468;;; 0.002016 0.002120000000000001 0.0018720000000000002; 0.35957894736842105 0.3587058823529412 0.3603157894736842; 0.6203333333333333 0.6211333333333334 0.6552;;;;; … ;;;;; 0.13204400000000005 0.026944000000000003 0.0; 0.956409090909091 0.23926470588235293 0.0; 0.9553333333333334 0.24335483870967745 0.0;;; 0.13286400000000007 0.027032000000000018 0.0; 0.9555925925925927 0.23576666666666668 0.0; 0.9568000000000001 0.24661538461538454 0.0;;; 0.13200399999999995 0.026759999999999996 0.0002560000000000001; 0.9543214285714287 0.24241935483870974 0.0009399999999999998; 0.9532666666666668 0.23900000000000002 0.0008839999999999998;;;; 0.13357200000000002 0.037571999999999994 0.0; 0.9527333333333333 0.2720689655172414 0.0; 0.9554666666666667 0.26862962962962966 0.0;;; 0.13021999999999992 0.109584 0.0; 0.7689411764705882 0.431875 4.000000000000001e-6; 0.9555333333333333 0.4506000000000001 4.0000000000000015e-6;;; 0.12982800000000003 0.113976 0.08997999999999996; 0.6300869565217392 0.43335294117647055 0.2945; 0.8142000000000001 0.4458 0.2846315789473684;;;; 0.13309600000000002 0.071384 0.0; 0.9392173913043478 0.35739393939393954 0.0; 0.9574666666666666 0.3658 0.0;;; 0.13039200000000006 0.12843999999999992 0.0026440000000000005; 0.5788636363636364 0.49353571428571424 0.008704000000000002; 0.9494666666666667 0.5191333333333334 0.008732000000000007;;; 0.1292080000000001 0.129836 0.12510800000000005; 0.5204736842105265 0.49688235294117644 0.4074736842105263; 0.6126666666666667 0.5087333333333335 0.4076;;;;; 0.13202800000000003 0.02687199999999999 0.0; 0.9563636363636364 0.23873529411764707 0.0; 0.9553333333333334 0.24283870967741938 0.0;;; 0.13284800000000002 0.026956000000000025 0.0; 0.9555555555555558 0.2349 0.0; 0.9568000000000001 0.24587179487179492 0.0;;; 0.13200399999999995 0.026664 0.0002560000000000001; 0.95425 0.24196774193548384 0.0009399999999999998; 0.9532666666666668 0.23822222222222222 0.0008839999999999998;;;; 0.13355999999999987 0.037507999999999965 0.0; 0.9526666666666667 0.2714137931034483 0.0; 0.9552666666666668 0.26788888888888884 0.0;;; 0.130208 0.10957199999999996 0.0; 0.7689117647058823 0.4316666666666666 4.000000000000001e-6; 0.9554666666666667 0.4503333333333333 4.0000000000000015e-6;;; 0.12982800000000003 0.11397199999999999 0.08996400000000003; 0.6300869565217392 0.43317647058823533 0.2943125; 0.8140000000000002 0.4455333333333333 0.28442105263157896;;;; 0.13309200000000004 0.07131600000000003 0.0; 0.9391739130434783 0.3569090909090909 0.0; 0.9572000000000002 0.36533333333333334 0.0;;; 0.13039200000000006 0.12843999999999992 0.002636; 0.5788636363636364 0.49346428571428563 0.008643999999999999; 0.9494 0.5188666666666667 0.008672000000000001;;; 0.1292080000000001 0.129836 0.1250960000000001; 0.5204736842105265 0.49688235294117644 0.40742105263157896; 0.6126666666666667 0.5084666666666666 0.4074;;;;; 0.13200799999999993 0.02678399999999998 0.0; 0.956318181818182 0.23814705882352943 0.0; 0.9552 0.24196774193548382 0.0;;; 0.13283599999999993 0.02688399999999999 0.0; 0.9554074074074074 0.23426666666666668 0.0; 0.9565333333333335 0.24510256410256406 0.0;;; 0.131992 0.026539999999999987 0.0002560000000000001; 0.9541428571428571 0.2413225806451613 0.0009399999999999998; 0.9532666666666668 0.23750000000000002 0.0008839999999999998;;;; 0.13355599999999992 0.03740399999999998 0.0; 0.9526000000000001 0.27086206896551723 0.0; 0.955 0.267074074074074 0.0;;; 0.13020400000000001 0.10956000000000002 0.0; 0.7688529411764706 0.43133333333333335 4.000000000000001e-6; 0.9552000000000002 0.44986666666666664 4.0000000000000015e-6;;; 0.12982399999999997 0.11395999999999994 0.089924; 0.6299999999999999 0.43305882352941183 0.29387499999999994; 0.8140000000000002 0.44506666666666667 0.2843684210526316;;;; 0.13309200000000004 0.07125200000000004 0.0; 0.9390869565217391 0.3563030303030303 0.0; 0.9571333333333335 0.36506666666666665 0.0;;; 0.13039200000000006 0.12843999999999992 0.0026120000000000006; 0.5787727272727273 0.4933214285714285 0.008580000000000006; 0.9492000000000002 0.5188000000000001 0.008596000000000005;;; 0.1292080000000001 0.129836 0.125092; 0.520421052631579 0.49688235294117644 0.40731578947368424; 0.6126666666666667 0.5084 0.4073333333333334], SurrogateParams{Float64}((-3.0, 2.0), (-4.0, -1.0), (-3.0, 1.5), (2.0, 35.0), 125.23622683286348, 1.0, 1.0), SimParams{Float64, 3}(1000, 600.0, 150.0, [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0 … 591.0, 592.0, 593.0, 594.0, 595.0, 596.0, 597.0, 598.0, 599.0, 600.0], 236.68625663347206, StaticArraysCore.SVector{3, Float64}[[-26.578240497475704, -69.33731233012644, -17.62388388486869], [-79.04499072353985, -28.003015114823626, -21.011899569778592], [34.521464087559224, -7.421758163818382, -92.63938238361715], [-59.53685078225315, -50.69606662593509, 93.60662043432481], [58.51277098735781, 111.743039689602, 103.7862702725096], [-61.79631694046582, 96.98022379615512, -105.99633145061163], [100.57563394403337, -110.5662500236813, -30.111395948573673], [-26.23458819182403, -116.05225786275396, 98.54123651686788], [49.026719193255246, 105.70396576937657, 16.715640706169808], [6.8019793035159495, 63.55513696417802, -26.529368063886423] … [99.0755682619459, -8.545358667847267, 70.32405765265736], [-26.985196746439215, 13.579641213685022, 53.178352463648764], [-48.94656946880711, 0.4612015632962141, -110.48131800754413], [-58.403036649730815, -89.54667936140076, -39.75601568715756], [-88.23483911982605, -100.37127919211906, -2.1547411150136497], [73.8010439211539, 18.131436355122503, -31.01783516827595], [-44.29265853780559, -57.027841453262575, 49.12697452410322], [19.565948802584415, 43.70724386599639, -9.887380501172672], [-95.00010747808975, 6.869720891280991, -13.438643462252173], [-107.09508428110678, 99.27206213732089, -50.32102658635627]], true, 1000))
The use of the loaded surrogate for fitting will be illustrated in the next tutorial.
Parallel Surrogate Construction
Below we explain our workflow for constructing the surrogate via parallel simulations using the Grid Engine queuing system. The basic workflow and scripts we link were designed for this system, but should be adaptable to other queue-based clusters.
The basic approach is
- Save a metadata file with all information needed to build the surrogate.
- Construct pieces of the surrogate as independent single-core jobs on the cluster.
- Merge the pieces of the surrogate back together into a single complete surrogate.
The surrogate used in [1] can be downloaded from here. Below we give the basic scripts and commands used in its construction (note constructing such a surrogate generally requires in total 500-2000 hours of cpu time on the Boston University cluster).
Our workflow is
- Edit "make_surrogate_metadata.jl" for your system and then run it in Julia (via
include("make_surrogate_metadata.jl")
). - Edit the bash scripts "queue_job.sh" and "run_sim.sh" for your system as appropriate and run queue_job.sh to submit the parallel surrogate jobs.
- After all jobs finish, edit "merge_surrogate_slices.jl" as appropriate and run it in Julia via
include("merge_surrogate_slices.jl")
to merge the output from each cluster job into one complete surrogate.
The linked scripts should correspond to those used to construct the surrogate in [1].
Bibliography
- A. Huhn, D. Nissley, ..., C. M. Deane, S. A. Isaacson, and O. Dushek, Analysis of emergent bivalent antibody binding identifies the molecular reach as a critical determinant of SARS-CoV-2 neutralisation potency, in review, available on bioRxiv (2024).