Introduction

This “Getting Started with bbr” vignette takes the user through some basic scenarios for modeling with NONMEM using bbr, introducing you to its standard workflow and functionality.

bbr is an R interface for running bbi. Together they provide a solution for managing projects involving modeling and simulation with a number of software solutions used in pharmaceutical sciences. Currently, only NONMEM is supported, so this vignette will only address that.

Setup

Installing bbi

The first time you use bbr on a system or disk, you must first install bbi.

Setting bbi_exe_path

The bbi executable is a single binary file, you just need to tell bbr where to find it. This can be done by setting the bbr.bbi_exe_path option like so:

options("bbr.bbi_exe_path" = "/data/apps/bbi")

Please note that this must be set for every new R session, so it’s recommended to include the above snippet in your .Rprofile, replacing the "/data/apps/bbi" with the absolute path from your system/project, if you have installed it somewhere different.

Also note that this can be pointed to anywhere on your system; installing to /data/apps/bbi is merely a convention used at Metrum Research Group, where bbr was developed.

Installing with use_bbi()

bbr provides an installer function named use_bbi() that makes it easy to install bbi from within R.

bbr::use_bbi()

With no arguments passed, use_bbi() will default to installing bbi at the path you have set in options("bbr.bbi_exe_path"). This is why we recommend setting this first (and putting it in your .Rprofile so that it is sourced for every new R session). If this option is not set, use_bbi() will fall back to the bbi in my $PATH (accessed via Sys.which("bbi")) and then an OS-dependent default, in that order. See ?use_bbi() if you would like more details.

The use_bbi() function also optionally takes the absolute path where you want to install bbi, though we recommend most users to follow the pattern above: first specifying a path in options("bbr.bbi_exe_path"), and then letting use_bbi() install to that path by default.

Updating bbi or checking if it’s installed

If you’re not sure if you have bbi installed, you can use bbi_version() to check. This will return the version number of your installation, if one is found, or an empty string if you do not have bbi installed. You can also use bbi_current_release() to see the most current version available and run use_bbi() as specified above if you want to update.

You can also check the bbi documentation for manual installation instructions from the command line.

bbi.yaml configuration file

To actually submit models with bbi, you will also need a bbi.yaml configuration file. Think of this as containing “global default settings” for your bbi runs. Those settings will not be discussed here, but know that they can be modified globally by editing that file, or model-by-model as described in the “Passing arguments to bbi” section below.

bbi_init()

The bbi_init() function will create a bbi.yaml, with the default settings, in the specified directory. Optionally, you can pass the path to a bbi.yaml to the .config_path argument of any bbr function that needs one. However, by default these functions will look for one in the same folder as the model being submitted or manipulated. Therefore, if you will have a number of models in the same folder, it is easiest to put a bbi.yaml that folder.

MODEL_DIR <- "../model/nonmem/basic"  # this should be relative to your working directory
bbi_init(.dir = MODEL_DIR,            # the directory to create the bbi.yaml in
         .nonmem_dir = "/opt/NONMEM", # location of NONMEM installation
         .nonmem_version = "nm74gf")  # default NONMEM version to use

Here we define a define variable MODEL_DIR with the path to the directory containing our models. Notice that you must also pass the path to a working installation of NONMEM, and a default NONMEM version to use.

Note this only needs to be done once for each folder you are modeling in. Once the bbi.yaml exists, you will not need to run bbi_init() again unless you want to create another one; for example if you move to modeling in a different directory, or if you want a different set of default settings for some specific subset of models.

Initial modeling run

Create model object

To begin modeling, first create a model object using new_model(). This is an S3 object which you will pass to all of the other bbr functions that submit, summarize, or manipulate models.

The first argument (.path) must be the path to your model control stream without file extension. For instance, the call below assumes you have a control stream named 1.ctl or 1.mod in the directory you just assigned to MODEL_DIR above.

mod1 <- new_model(file.path(MODEL_DIR, 1))

This new_model() call will also create a 1.yaml file in that same directory, which stores model metadata like description and tags (discussed below). If you ever need to recreate this model object in memory, just run mod1 <- read_model(file.path(MODEL_DIR, 1)) to rebuild it from the YAML file on disk.

The model object you have just created can be passed to various functions which you will see in a minute. Now that we’ve created a model, the first thing we will do is submit the model to be run.

Inspect initial estimates

You can view the initial parameter estimates can be extracted and formatted using initial_estimates(). Additional columns will indicate any bounds on $THETA records, as well as the record number (such as when multiple $OMEGA records are used). You can optionally flag which estimates were FIXED or not.

initial_est_df <- initial_estimates(mod1, flag_fixed = TRUE)
initial_est_df
#> # A tibble: 8 × 7
#>   parameter_names  init lower_bound upper_bound fixed record_type record_number
#>   <chr>           <dbl>       <dbl>       <dbl> <lgl> <chr>               <int>
#> 1 THETA(1)         2              0          NA FALSE theta                   1
#> 2 THETA(2)         3              0          NA FALSE theta                   1
#> 3 THETA(3)        10              0          NA FALSE theta                   1
#> 4 THETA(4)         0.02          NA          NA FALSE theta                   1
#> 5 THETA(5)         1             NA          NA FALSE theta                   1
#> 6 OMEGA(1,1)       0.05          NA          NA FALSE omega                   1
#> 7 OMEGA(2,2)       0.2           NA          NA FALSE omega                   1
#> 8 SIGMA(1,1)       1             NA          NA TRUE  sigma                   1

If stored as an R object, you can extract the full $OMEGA or $SIGMA matrices. This will diagonally concatenate all matrices of the same record type, where NA values indicate the value was not specified in the control stream file.

attr(initial_est_df, "omega_mat") # attr(initial_est_df, "sigma_mat") for SIGMA
#>      [,1] [,2]
#> [1,] 0.05   NA
#> [2,]   NA  0.2
#> attr(,"nmrec_record_size")
#> [1] 2
#> attr(,"nmrec_flags")
#> attr(,"nmrec_flags")$fixed
#>       [,1]  [,2]
#> [1,] FALSE    NA
#> [2,]    NA FALSE

Submit model

This will return a process object. We won’t discuss this object in this vignette, but it contains some information about the submission call. Please note that checking on a model run in progress is not fully implemented. For now, users should check on their runs manually (by looking at the output directory) and only proceed to the next steps once it has successfully completed.

Passing arguments to bbi

There are a number of arguments that bbi can take to modify how models are run. You can print a list of available arguments using the print_bbi_args() helper function.

print_bbi_args()
#> additional_post_work_envs (character): Any additional values (as ENV KEY=VALUE)
#>   to provide for the post execution environment. Command-line option:
#>   --additional_post_work_envs
#> background (logical): RAW NMFE OPTION - Tells nonmem not to scan StdIn for
#>   control characters. Command-line option: --background
#> clean_lvl (numeric): clean level used for file output from a given (set of)
#>   runs (default 1). Command-line option: --clean_lvl
#> config (character): Path (relative or absolute) to another bbi.yaml to load.
#>   Command-line option: --config
#> copy_lvl (numeric): copy level used for file output from a given (set of) runs.
#>   Command-line option: --copy_lvl
#> debug (logical): debug mode. Command-line option: --debug
#> delay (numeric): Selects a random number of seconds between 1 and this value to
#>   stagger / jitter job execution. Assists in dealing with large volumes of work
#>   dealing with the same data set. May avoid NMTRAN issues about not being able
#>   read / close files. Command-line option: --delay
#> ext_file (character): name of custom ext-file. Command-line option: --ext-file
#> git (logical): whether git is used. Command-line option: --git
#> json (logical): json tree of output, if possible. Command-line option: --json
#> licfile (character): RAW NMFE OPTION - Specify a license file to use with NMFE
#>   (Nonmem). Command-line option: --licfile
#> log_file (character): If populated, specifies the file into which to store the
#>   output / logging details from bbi. Command-line option: --log_file
#> maxlim (numeric): RAW NMFE OPTION - Set the maximum values for the buffers used
#>   by Nonmem (if 0, don't pass -maxlim to nmfe) (default 2). Command-line
#>   option: --maxlim. Compatibility note: Default changed from unset to 2 with
#>   bbi v3.2.0.
#> mpi_exec_path (character): The fully qualified path to mpiexec. Used for nonmem
#>   parallel operations (default '/usr/local/mpich3/bin/mpiexec'). Command-line
#>   option: --mpi_exec_path
#> nm_version (character): Version of nonmem from the configuration list to use.
#>   Command-line option: --nm_version
#> nm_qual (logical): Whether or not to execute with nmqual (autolog.pl).
#>   Command-line option: --nmqual
#> nobuild (logical): RAW NMFE OPTION - Skips recompiling and rebuilding on nonmem
#>   executable. Command-line option: --nobuild
#> no_ext_file (logical): do not use ext file. Command-line option: --no-ext-file
#> no_grd_file (logical): do not use grd file. Command-line option: --no-grd-file
#> no_shk_file (logical): do not use shk file. Command-line option: --no-shk-file
#> overwrite (logical): Whether or not to remove existing output directories if
#>   they are present. Command-line option: --overwrite
#> parafile (character): Location of a user-provided parafile to use for parallel
#>   execution. Command-line option: --parafile
#> parallel (logical): Whether or not to run nonmem in parallel mode. Command-line
#>   option: --parallel
#> parallel_timeout (numeric): The amount of time to wait for parallel operations
#>   in nonmem before timing out (default 2147483647). Command-line option:
#>   --parallel_timeout
#> post_work_executable (character): A script or binary to run when job execution
#>   completes or fails. Command-line option: --post_work_executable
#> prcompile (logical): RAW NMFE OPTION - Forces PREDPP compilation. Command-line
#>   option: --prcompile
#> prdefault (logical): RAW NMFE OPTION - Do not recompile any routines other than
#>   FSUBS. Command-line option: --prdefault. Compatibility note: This option
#>   isn't available in bbi until v3.2.0.
#> prsame (logical): RAW NMFE OPTION - Indicates to nonmem that the PREDPP
#>   compilation step should be skipped. Command-line option: --prsame
#> preview (logical): preview action, but don't actually run command. Command-line
#>   option: --preview
#> save_config (logical): Whether or not to save the existing configuration to a
#>   file with the model (default true). Command-line option: --save_config
#> threads (numeric): number of threads to execute with locally or nodes to
#>   execute on in parallel (default 4). Command-line option: --threads
#> tprdefault (logical): RAW NMFE OPTION - Test if is okay to do -prdefault.
#>   Command-line option: --tprdefault. Compatibility note: This option isn't
#>   available in bbi until v3.2.0.
#> verbose (logical): verbose output. Command-line option: --verbose

As discussed in “Setup” above, these can be set globally in the bbi.yaml file, and you can see the default values of them in that file. However, specific arguments can also be set or changed for each model. This can be done in two ways:

Note that any bbi_args attached to a model object will override the relevant settings in bbi.yaml, and that .bbi_args passed into a submit_ or _summary call will override the relevant settings in both bbi.yaml and bbi_args attached to the model object.

See the docs for any of the functions just mentioned for more details on usage and syntax.

Overwriting output from a previously run model

It is common to run a model, make some tweaks to it, and then run it again. However, to avoid accidentally deleting model outputs, bbr will error by default if it sees existing output when trying to submit a model. To automatically overwrite any previous model output, just pass .overwrite = TRUE to submit_model().

mod1 %>% submit_model(.overwrite = TRUE)

You can also inherit this setting, either by attaching overwrite = TRUE to a model with add_bbi_args() or by setting it globally with overwrite: true in the bbi.yaml file for your project. However, either of these settings will be overridden by directly passing the argument as shown above.

Summarize model

Once the model run has completed, users can get a summary object containing much of the commonly used diagnostic information in a named list.

sum1 <- mod1 %>% model_summary()
print(names(sum1))
#> [1] "absolute_model_path" "run_details"         "run_heuristics"     
#> [4] "parameters_data"     "parameter_names"     "ofv"                
#> [7] "condition_number"    "shrinkage_details"   "success"

These elements can be accessed manually or extracted with built-in helper functions like so:

param_df1 <- sum1 %>% param_estimates()
param_df1
#> # A tibble: 9 × 8
#>   parameter_names estimate   stderr random_effect_sd random_effect_sdse fixed
#>   <chr>              <dbl>    <dbl>            <dbl>              <dbl> <lgl>
#> 1 THETA1            2.32   8.72e- 2           NA              NA        FALSE
#> 2 THETA2           54.6    3.38e+ 0           NA              NA        FALSE
#> 3 THETA3          463.     3.03e+ 1           NA              NA        FALSE
#> 4 THETA4           -0.0820 5.61e- 2           NA              NA        FALSE
#> 5 THETA5            4.18   1.38e+ 0           NA              NA        FALSE
#> 6 OMEGA(1,1)        0.0985 2.03e- 2            0.314           3.24e- 2 FALSE
#> 7 OMEGA(2,1)        0      1   e+10            0               1   e+10 TRUE 
#> 8 OMEGA(2,2)        0.157  2.72e- 2            0.396           3.44e- 2 FALSE
#> 9 SIGMA(1,1)        1      1   e+10            1               1   e+10 TRUE 
#> # ℹ 2 more variables: diag <lgl>, shrinkage <dbl>

Specific parameter estimates can also be extracted and formatted using the get_omega(), get_theta(), and get_sigma() helpers:

sum1 %>% get_theta()
sum1 %>% get_omega()
sum1 %>% get_sigma()
sum1 %>% get_theta()
#>      THETA1      THETA2      THETA3      THETA4      THETA5 
#>   2.3171600  54.6151000 462.5140000  -0.0820082   4.1795900
cat("\n")
sum1 %>% get_omega()
#>           OMEGA_1  OMEGA_2
#> OMEGA_1 0.0985328 0.000000
#> OMEGA_2 0.0000000 0.156825
cat("\n")
sum1 %>% get_sigma()
#>         SIGMA_1
#> SIGMA_1       1

To see how to load summaries of multiple models to an easy-to-read tibble, see the Creating a Model Summary Log vignette.

You can also use the nm_join() function to load the input data and any output tables from your model into a single tibble.

join_df <- nm_join(mod1) 
join_df %>% 
  select(ID, TIME, DV, CL, V, NPDE, CWRES)
#> # A tibble: 779 × 7
#>       ID  TIME    DV    CL     V  NPDE CWRES
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1     1  0     1.22  43.4  463. 0.185 0.291
#>  2     1  0.25 12.6   43.4  463. 2.47  3.01 
#>  3     1  0.5  11.2   43.4  463. 2.71  2.68 
#>  4     1  0.75 17.7   43.4  463. 2.94  4.24 
#>  5     1  1    21.9   43.4  463. 2.94  5.25 
#>  6     1  1.25 16.2   43.4  463. 2.94  3.88 
#>  7     1  1.5  18.7   43.4  463. 2.94  4.47 
#>  8     1  1.75 23.8   43.4  463. 2.94  5.69 
#>  9     1  2    16.5   43.4  463. 2.94  3.94 
#> 10     1  2.25 17.3   43.4  463. 2.94  4.15 
#> # ℹ 769 more rows

Iteration

Much of the benefit of bbr is leveraged in the model iteration workflow, and the run log summaries that can be created afterwards. For example, imagine you look at these model results and want to begin iterating on them with a new model.

If you are now in a new R session and no longer have your mod1 object in memory, you can easily rebuild it from the YAML file on disk with read_model():

mod1 <- read_model(file.path(MODEL_DIR, 1))

copy_model_from()

Now you can create a new model object, based on the original, copying and renaming the control stream in the process. By default, the copy_model_from() function will increment the name of the new model to the next available integer in your modeling directory, but you can also pass a custom name to the .new_model argument. See ?copy_model_from for more details and other optional arguments.

The copy_model_from() call below will create both 2.ctl and 2.yaml files in the same directory as the parent model, and return the model object corresponding to them. (copy_model_from() also stores the model’s “ancestry” which can be useful later in the project, as shown in the Using the based_on field vignette.)

mod2 <- copy_model_from(mod1) %>%
  update_model_id()
#> replacing 1 with 2 in /data/home/kylem/.cache/R/renv/library/bbr-e43ce16f/R-4.1/x86_64-pc-linux-gnu/bbr/model/nonmem/basic/2.ctl

The new control stream file 2.ctl will be identical to its parent model, though you can use update_model_id() (as above) to update to the new model ID in places like $TABLE statements.

Your new control stream can now be edited with any desired changes and then submitted and summarized exactly as above. At any time, you can also see the difference between a model and its parent using model_diff().

# manually edit control stream, then...
model_diff(mod2)
mod2 %>% submit_model()
mod2 %>% model_summary()

Adding tags and notes

After looking at these results, the user can add tags, which can later be used to organize your modeling runs.

mod1 <- mod1 %>% add_tags("orig acop model")
mod2 <- mod2 %>% add_tags("2 compartment")

Note that using free text for your tags is discouraged, for reasons mentioned in the ?modify_tags help page. For simplicity’s sake, we ignore that advice here, but please read it before using tags extensively in the wild.

In addition to tags, the user can add notes that can be referenced later.

mod2 <- mod2 %>% 
  add_notes("2 compartment model more appropriate than 1 compartment")

The model_diff() and tags_diff() function make it easy to compare models to their parent model. See the “Using based_on” vignette for examples.

Continue to iterate…

Now the iteration process continues with a third model. Note that you can tell copy_model_from() to inherit the tags from the parent model and automatically add them to the new model.

mod3 <- copy_model_from(mod2, .inherit_tags = TRUE)

Submit and summarize as before.

# manually edit control stream, then...
mod3 %>% submit_model()
mod3 %>% model_summary()

Add tags and notes for filtering in run log, described next.

mod3 <- mod3 %>% 
  add_tags(c("combined RUV", "iiv CL")) %>%
  add_notes("Added combined error structure because it seemed like a good idea")

Inheriting parameter estimates

As suggested above, the typical iteration process involves using copy_model_from() to create a new model object, followed by manually editing the control stream file. However, in some instances it can be valuable to set the initial parameter estimates of a model using the final estimates of a previously executed model. Often this is used to carry forward the final estimates of a “parent” model to be used as the initial estimates of the “child” model; for example a model created with copy_model_from().

This is supported via inherit_param_estimates(), which automates the overwriting of the control steam file of your new model, using the final parameter estimates from the previously executed parent model. If no parent model is specified, it will default to using the first entry in get_based_on(.mod). You also have the option to keep or discard existing bounds when setting the initial estimates for THETA records specifically.

mod4 <- copy_model_from(mod1, "4") %>%
  inherit_param_estimates(bounds = "keep", digits = 3) %>%
  add_notes("Initial estimates inherited from mod1 results")

You can also tweak or ‘jitter’ the initial parameter estimates using tweak_initial_estimates(). This can be used both in conjunction with inherit_param_estimates() or as a standalone call:

mod5 <- copy_model_from(mod1, "5") %>% 
  inherit_param_estimates(digits = 3) %>% 
  tweak_initial_estimates(digits = 3, .p = 0.2)

mod6 <- copy_model_from(mod1, "6") %>% 
  tweak_initial_estimates(digits = 3, .p = 0.2)

Run log

At any point, the user can easily construct a “run log” tibble to summarize all models run up to this point.

Before we move on, note that you can get even more information about your models from the config_log() and summary_log() functions, as well as add_config() and add_summary() which automatically join the columns output from those functions against the tibble output from run_log(). See the “Further Reading” section below for links to vignettes demonstrating those functions.

log_df <- run_log(MODEL_DIR)
log_df
#> # A tibble: 6 × 10
#>   absolute_model_path         run   yaml_md5 model_type description bbi_args    
#>   <chr>                       <chr> <chr>    <chr>      <chr>       <list>      
#> 1 /data/home/kylem/.cache/R/… 1     3ff1756… nonmem     original a… <named list>
#> 2 /data/home/kylem/.cache/R/… 2     41f8d47… nonmem     NA          <named list>
#> 3 /data/home/kylem/.cache/R/… 3     03a1f9d… nonmem     NA          <named list>
#> 4 /data/home/kylem/.cache/R/… 4     09bed41… nonmem     NA          <named list>
#> 5 /data/home/kylem/.cache/R/… 5     36d4841… nonmem     NA          <named list>
#> 6 /data/home/kylem/.cache/R/… 6     36d4841… nonmem     NA          <named list>
#> # ℹ 4 more variables: based_on <list>, tags <list>, notes <list>, star <lgl>

The run_log() returns a tibble which can be manipulated like any other tibble. However, several of the columns (tags, notes, and based_on for example) are list columns, which complicates how you can interact with them. We provide some helper functions to more seamlessly interact with these log tibbles, as well as some sample tidyverse code below.

Viewing tags example

The bbr::collapse_to_string() function can collapse any list column into a string representation of its contents. It is specifically designed for collapsing columns like tags, notes, and based_on into a more human-readable format.

log_df %>% 
  collapse_to_string(tags, notes) %>%
  select(run, tags, notes)
#> # A tibble: 6 × 3
#>   run   tags                                 notes                              
#>   <chr> <chr>                                <chr>                              
#> 1 1     acop tag, other tag, orig acop model NA                                 
#> 2 2     2 compartment                        2 compartment model more appropria…
#> 3 3     2 compartment, combined RUV, iiv CL  Added combined error structure bec…
#> 4 4     NA                                   Initial estimates inherited from m…
#> 5 5     NA                                   NA                                 
#> 6 6     NA                                   NA

Filtering tags example

All *_log() functions have in .include argument that will filter the resulting log. Only models that contain a tag or name (run) that matches an element of the vector passed to .include will be included.

run_log(MODEL_DIR, .include = "2 compartment") %>%
  collapse_to_string(tags, notes) %>%
  select(run, tags, notes)
#> # A tibble: 2 × 3
#>   run   tags                                notes                               
#>   <chr> <chr>                               <chr>                               
#> 1 2     2 compartment                       2 compartment model more appropriat…
#> 2 3     2 compartment, combined RUV, iiv CL Added combined error structure beca…

Further reading

Hopefully this has given you a good start on understanding the capabilities and basic workflow of bbr. Please see the other vignettes for demonstrations of more advanced or specific functionality.