Using the based_on field

Introduction

This vignette demonstrates how to use the based_on field to track a model’s ancestry through the model development process. You will also see one common use for this: using the tibble output from config_log() to check that your models are up-to-date. By “up-to-date” we mean that none of the model files or data files have changed since the model was run.

If you are new to bbr, the “Getting Started with bbr” vignette will take you through some basic scenarios for modeling with NONMEM using bbr, introducing you to its standard workflow and functionality.

Setup

There is some initial set up necessary for using bbr. Please refer to the “Getting Started” vignette, mentioned above, if you have not done this yet. Once this is done, load the library.

library(bbr)
library(dplyr)
library(purrr)

Modeling process

The modeling process will always start with an initial model, which we create with the new_model() call.

MODEL_DIR <- "../nonmem"

mod1 <- new_model(file.path(MODEL_DIR, 1))

From there, the iterative model development process proceeds. The copy_model_from() function will do several things, including creating a new model file and filling in some relevant metadata. Notably, it will also add the model that you copied from into the based_on field for the new model.

mod2 <- copy_model_from(.parent_mod = mod1, .new_model = 2)

get_based_on(mod2)
#> [1] "/tmp/RtmpBFC8RE/temp_libpath322f861338b52/bbr/model/nonmem/basic/1"

diff helpers

There are several helper functions for comparing a model to its parent. tags_diff() compares the tags attached to the two models, while model_diff() compares the model files on disk.

mod1 <- add_tags(mod1, c("the same tag", "an old tag"))
mod2 <- add_tags(mod2, c("the same tag", "a new tag"))
tags_diff(mod2)
#> In 2 but not parent(s):  a new tag
#> In parent(s) but not 2:  an old tag

model_diff(mod2)




<
 

2



>
 

1


@@ 1,3 @@
@@ 1,3 @@



<
 


$PROBLEM From bbr: see 2.yaml for details




>
 


$PROBLEM PK model 1 cmt base





 
 

 



 
 

 




 
 


$INPUT ID TIME MDV EVID DV AMT  SEX WT ETN NUM




 
 


$INPUT ID TIME MDV EVID DV AMT  SEX WT ETN NUM

By default both functions compare the model to whatever model is returned from get_based_on(), but they also have a .mod2 argument which can take any arbitrary model object and compare with that instead. See ?tags_diff and ?model_diff for more details on usage.

More iteration

Now we continue iterating on our model. NOTE: In a real model development process, these models would obviously be run and the diagnostics examined before moving on. For the sake of brevity, imagine that all happens “behind the curtain” in this example. In other words, in between each of the calls to copy_model_from() you would be doing all of the normal iterative modeling work.

# ...submit mod2...look at diagnostics...decide on changes for next iteration...

mod3 <- copy_model_from(mod2, 3)

# ...submit mod3...look at diagnostics...decide on changes for next iteration...

mod4 <- copy_model_from(mod3, 4)

# ...submit mod4...look at diagnostics...decide to go back to mod2 as basis for next iteration...

mod5 <- copy_model_from(mod2, 5)

# ...submit mod5...look at diagnostics...decide on changes for next iteration...

mod6 <- copy_model_from(mod5, 6)

# ...submit mod6...look at diagnostics...decide you're done!

Now that you have arrived at your final model, you can add a description to identify it, which will be used shortly for filtering the run_log() tibble.

mod6 <- mod6 %>% add_description("Final model")

Operating on a model object

As seen above, you can simply use mod$based_on to see what is stored in the based_on field of a given model. However, there are two additional helper functions that are useful to know.

get_based_on

First, by using get_based_on() you can retrieve the absolute path to all models in the based_on field.

mod6 %>% get_based_on()
#> [1] "/tmp/RtmpBFC8RE/temp_libpath322f861338b52/bbr/model/nonmem/basic/5"

This is useful because the path(s) retrieved will unambiguously identify the parent model(s) and can therefore be passed to things like read_model() or model_summary() like so:

parent_mod <- mod6 %>% get_based_on() %>% read_model()
str(parent_mod)
#> List of 4
#>  $ model_type         : chr "nonmem"
#>  $ based_on           : chr "2"
#>  $ absolute_model_path: chr "/tmp/RtmpBFC8RE/temp_libpath322f861338b52/bbr/model/nonmem/basic/5"
#>  $ yaml_md5           : chr "ac0ba292b017a72bb316359f0df09bb2"
#>  - attr(*, "class")= chr [1:4] "bbi_nonmem_model" "bbi_base_model" "bbi_model" "list"

get_model_ancestry

The second helper function walks up the tree of inheritence by iteratively calling get_based_on() on each parent model to determine the full set of models that led up to the current model.

mod6 %>% get_model_ancestry()
#> [1] "/tmp/RtmpBFC8RE/temp_libpath322f861338b52/bbr/model/nonmem/basic/1"
#> [2] "/tmp/RtmpBFC8RE/temp_libpath322f861338b52/bbr/model/nonmem/basic/2"
#> [3] "/tmp/RtmpBFC8RE/temp_libpath322f861338b52/bbr/model/nonmem/basic/5"

In this case, model 6 was based on 5, which was based on 2, which in turn was based on 1. You will see one example of how this can be useful in the “Final model family” section below.

Using the run log

While it may be useful to look at the ancestry of a single model object, it may be even more useful to use the based_on field later in the modeling process when you are looking back and trying to summarize the model activities as a whole. The run_log() function is helpful for this. It returns a tibble with metadata about each model.

log_df <- run_log(MODEL_DIR)
log_df
#> # A tibble: 6 × 10
#>   absolute_model_path    run   yaml_md5 model_type description bbi_args based_on
#>   <chr>                  <chr> <chr>    <chr>      <chr>       <list>   <list>  
#> 1 /tmp/RtmpBFC8RE/temp_… 1     5ec8e22… nonmem     NA          <NULL>   <NULL>  
#> 2 /tmp/RtmpBFC8RE/temp_… 2     9adbb90… nonmem     NA          <NULL>   <chr>   
#> 3 /tmp/RtmpBFC8RE/temp_… 3     ac0ba29… nonmem     NA          <NULL>   <chr>   
#> 4 /tmp/RtmpBFC8RE/temp_… 4     6818db1… nonmem     NA          <NULL>   <chr>   
#> 5 /tmp/RtmpBFC8RE/temp_… 5     ac0ba29… nonmem     NA          <NULL>   <chr>   
#> 6 /tmp/RtmpBFC8RE/temp_… 6     feb9bf3… nonmem     Final model <NULL>   <chr>   
#> # ℹ 3 more variables: tags <list>, notes <list>, star <lgl>

Among other things, the run log contains any descriptions that have been assigned to each model. Here we use dplyr::filter() and dplyr::pull() to get the path to the final model.

final_model_path <- 
  log_df %>% 
  filter(description == "Final model") %>%
  pull(absolute_model_path)

final_model_path
#> [1] "/tmp/RtmpBFC8RE/temp_libpath322f861338b52/bbr/model/nonmem/basic/6"

Next we can use the get_model_ancestry() function to filter the tibble to only the models that led up to the final model.

log_df %>% 
  filter(absolute_model_path %in% get_model_ancestry(final_model_path)) %>%
  collapse_to_string(based_on) %>% # collapses list column for easier printing
  select(run, based_on)
#> # A tibble: 3 × 2
#>   run   based_on
#>   <chr> <chr>   
#> 1 1     NA      
#> 2 2     1       
#> 3 5     2

As you can see, models 3 and 4 are discarded because they did not lead to the final model. Review “Modeling Process” section above if you are not sure why this is the case. We will use the two techniques together in the “Final model family” section below.

Checking if models are up-to-date with `check_up_to_date()` and `config_log()`

Now imagine you are coming back to this project some time later and want to make sure that all of the outputs you have are still up-to-date with the model files and data currently in the project.

When bbi runs a model, it creates a file named bbi_config.json in the output directory. This file contains a lot of information about the state and configuration at the time when the model was run. Notably, it contains an md5 digest of both the model file and the data file at execution time. The following functions compare the md5 digests (stored during model execution) against the model and data files as they currently exist on disk at the time these functions are called.

`check_up_to_date()` for a single model

The check_up_to_date() function takes a model object and invisibly returns a two-element logical vector. The return is invisible because, if there are any changes, it will also print a message telling you which files have changed. Either way, you can also inspect the returned object like so.

res <- check_up_to_date(mod1)
print(res)
#> model  data 
#>  TRUE  TRUE

`config_log()` for multiple models

The config_log() function parses these bbi_config.json files and extracts some relevant information to a bbi_config_log_df tibble. It contains model_has_changed and data_has_changed columns that serve as a check that the outputs are up-to-date with the current model and data.

You can call config_log directly, but it is often useful to join it to a run log automatically with run_log() %>% add_config().

log_df <- log_df %>% add_config()
log_df %>% select(run, model_has_changed, data_has_changed)
#> # A tibble: 6 × 3
#>   run   model_has_changed data_has_changed
#>   <chr> <lgl>             <lgl>           
#> 1 1     FALSE             FALSE           
#> 2 2     FALSE             FALSE           
#> 3 3     TRUE              FALSE           
#> 4 4     TRUE              FALSE           
#> 5 5     FALSE             FALSE           
#> 6 6     FALSE             FALSE

One important note: check_up_to_date() and config_log() return opposite boolean values. check_up_to_date() returns TRUE if the files are up to date, whereas the *_has_changed columns contain TRUE if something has changed.

Final model family

From the model_has_changed column in the previous example, you can see that some of the model files have changed since they were run. However, you may only care about your final model and the models that led to it. You can use the description and based_on columns from the run_log() to filter to only those models.

final_model_family <- bind_rows(
  log_df %>% 
    filter(absolute_model_path %in% get_model_ancestry(final_model_path)), # the ancestors of the final model
  log_df %>% 
    filter(description == "Final model") # the final model itself
)

final_model_family %>% 
  collapse_to_string(based_on) %>%
  select(run, based_on, description, model_has_changed, data_has_changed)
#> # A tibble: 4 × 5
#>   run   based_on description model_has_changed data_has_changed
#>   <chr> <chr>    <chr>       <lgl>             <lgl>           
#> 1 1     NA       NA          FALSE             FALSE           
#> 2 2     1        NA          FALSE             FALSE           
#> 3 5     2        NA          FALSE             FALSE           
#> 4 6     5        Final model FALSE             FALSE

When we filter to only those models, you can see that they are all still up-to-date. Great news.

Seth Green