This vignette demonstrates how to use the based_on
field
to track a model’s ancestry through the model development process. You
will also see one common use for this: using the tibble output from
config_log()
to check that your models are up-to-date. By
“up-to-date” we mean that none of the model files or data files have
changed since the model was run.
If you are new to bbr
, the “Getting
Started with bbr” vignette will take you through some basic
scenarios for modeling with NONMEM using bbr
, introducing
you to its standard workflow and functionality.
The modeling process will always start with an initial model, which
we create with the new_model()
call.
MODEL_DIR <- "../nonmem"
From there, the iterative model development process proceeds. The
copy_model_from()
function will do several things,
including creating a new model file and filling in some relevant
metadata. Notably, it will also add the model that you copied
from into the based_on
field for the new
model.
mod2 <- copy_model_from(.parent_mod = mod1, .new_model = 2)
get_based_on(mod2)
#> [1] "/data/home/barrettk/.cache/R/renv/library/bbr-e68b8766/R-4.1/x86_64-pc-linux-gnu/bbr/model/nonmem/basic/1"
There are several helper functions for comparing a model to its
parent. tags_diff()
compares the tags attached to the two
models, while model_diff()
compares the model files on
disk.
mod1 <- add_tags(mod1, c("the same tag", "an old tag"))
mod2 <- add_tags(mod2, c("the same tag", "a new tag"))
tags_diff(mod2)
#> In 2 but not parent(s): a new tag
#> In parent(s) but not 2: an old tag
model_diff(mod2)
@@ 1,3 @@@@ 1,3 @@<$PROBLEM From bbr: see 2.yaml for details>$PROBLEM PK model 1 cmt base$INPUT ID TIME MDV EVID DV AMT SEX WT ETN NUM$INPUT ID TIME MDV EVID DV AMT SEX WT ETN NUM
By default both functions compare the model to whatever model is
returned from get_based_on()
, but they also have a
.mod2
argument which can take any arbitrary model object
and compare with that instead. See ?tags_diff
and ?model_diff
for more details on usage.
Now we continue iterating on our model. NOTE: In a real
model development process, these models would obviously be run and the
diagnostics examined before moving on. For the sake of brevity, imagine
that all happens “behind the curtain” in this example. In other words,
in between each of the calls to copy_model_from()
you would be doing all of the normal iterative modeling work.
# ...submit mod2...look at diagnostics...decide on changes for next iteration...
mod3 <- copy_model_from(mod2, 3)
# ...submit mod3...look at diagnostics...decide on changes for next iteration...
mod4 <- copy_model_from(mod3, 4)
# ...submit mod4...look at diagnostics...decide to go back to mod2 as basis for next iteration...
mod5 <- copy_model_from(mod2, 5)
# ...submit mod5...look at diagnostics...decide on changes for next iteration...
mod6 <- copy_model_from(mod5, 6)
# ...submit mod6...look at diagnostics...decide you're done!
Now that you have arrived at your final model, you can add a
description to identify it, which will be used shortly for filtering the
run_log()
tibble.
mod6 <- mod6 %>% add_description("Final model")
As seen above, you can simply use mod$based_on
to see
what is stored in the based_on
field of a given model.
However, there are two additional helper functions that are useful to
know.
First, by using get_based_on()
you can retrieve the
absolute path to all models in the based_on
field.
mod6 %>% get_based_on()
#> [1] "/data/home/barrettk/.cache/R/renv/library/bbr-e68b8766/R-4.1/x86_64-pc-linux-gnu/bbr/model/nonmem/basic/5"
This is useful because the path(s) retrieved will unambiguously
identify the parent model(s) and can therefore be passed to things like
read_model()
or model_summary()
like so:
parent_mod <- mod6 %>% get_based_on() %>% read_model()
str(parent_mod)
#> List of 4
#> $ model_type : chr "nonmem"
#> $ based_on : chr "2"
#> $ absolute_model_path: chr "/data/home/barrettk/.cache/R/renv/library/bbr-e68b8766/R-4.1/x86_64-pc-linux-gnu/bbr/model/nonmem/basic/5"
#> $ yaml_md5 : chr "ac0ba292b017a72bb316359f0df09bb2"
#> - attr(*, "class")= chr [1:4] "bbi_nonmem_model" "bbi_base_model" "bbi_model" "list"
The second helper function walks up the tree of inheritence by
iteratively calling get_based_on()
on each parent model to
determine the full set of models that led up to the current model.
mod6 %>% get_model_ancestry()
#> [1] "/data/home/barrettk/.cache/R/renv/library/bbr-e68b8766/R-4.1/x86_64-pc-linux-gnu/bbr/model/nonmem/basic/1"
#> [2] "/data/home/barrettk/.cache/R/renv/library/bbr-e68b8766/R-4.1/x86_64-pc-linux-gnu/bbr/model/nonmem/basic/2"
#> [3] "/data/home/barrettk/.cache/R/renv/library/bbr-e68b8766/R-4.1/x86_64-pc-linux-gnu/bbr/model/nonmem/basic/5"
In this case, model 6
was based on 5
, which
was based on 2
, which in turn was based on 1
.
You will see one example of how this can be useful in the “Final model
family” section below.
While it may be useful to look at the ancestry of a single model
object, it may be even more useful to use the based_on
field later in the modeling process when you are looking back and trying
to summarize the model activities as a whole. The run_log()
function is helpful for this. It returns a tibble with metadata about
each model.
log_df <- run_log(MODEL_DIR)
log_df
#> # A tibble: 6 × 10
#> absolute_model_path run yaml_md5 model_type description bbi_args based_on
#> <chr> <chr> <chr> <chr> <chr> <list> <list>
#> 1 /data/home/barrettk/.… 1 5ec8e22… nonmem NA <NULL> <NULL>
#> 2 /data/home/barrettk/.… 2 9adbb90… nonmem NA <NULL> <chr>
#> 3 /data/home/barrettk/.… 3 ac0ba29… nonmem NA <NULL> <chr>
#> 4 /data/home/barrettk/.… 4 6818db1… nonmem NA <NULL> <chr>
#> 5 /data/home/barrettk/.… 5 ac0ba29… nonmem NA <NULL> <chr>
#> 6 /data/home/barrettk/.… 6 feb9bf3… nonmem Final model <NULL> <chr>
#> # ℹ 3 more variables: tags <list>, notes <list>, star <lgl>
Among other things, the run log contains any descriptions that have
been assigned to each model. Here we use dplyr::filter()
and dplyr::pull()
to get the path to the final model.
final_model_path <-
log_df %>%
filter(description == "Final model") %>%
pull(absolute_model_path)
final_model_path
#> [1] "/data/home/barrettk/.cache/R/renv/library/bbr-e68b8766/R-4.1/x86_64-pc-linux-gnu/bbr/model/nonmem/basic/6"
Next we can use the get_model_ancestry()
function to
filter the tibble to only the models that led up to the final model.
log_df %>%
filter(absolute_model_path %in% get_model_ancestry(final_model_path)) %>%
collapse_to_string(based_on) %>% # collapses list column for easier printing
select(run, based_on)
#> # A tibble: 3 × 2
#> run based_on
#> <chr> <chr>
#> 1 1 NA
#> 2 2 1
#> 3 5 2
As you can see, models 3 and 4 are discarded because they did not lead to the final model. Review “Modeling Process” section above if you are not sure why this is the case. We will use the two techniques together in the “Final model family” section below.
check_up_to_date()
and config_log()
Now imagine you are coming back to this project some time later and want to make sure that all of the outputs you have are still up-to-date with the model files and data currently in the project.
When bbi
runs a model, it creates a file named
bbi_config.json
in the output directory. This file contains
a lot of information about the state and configuration at the time when
the model was run. Notably, it contains an md5 digest of both the model
file and the data file at execution time. The following
functions compare the md5 digests (stored during model execution)
against the model and data files as they currently exist on
disk at the time these functions are called.
check_up_to_date()
for a single model
The check_up_to_date()
function takes a model object and
invisibly returns a two-element logical vector. The return is invisible
because, if there are any changes, it will also print a message telling
you which files have changed. Either way, you can also inspect the
returned object like so.
res <- check_up_to_date(mod1)
print(res)
#> model data
#> TRUE TRUE
config_log()
for multiple models
The config_log()
function parses these
bbi_config.json
files and extracts some relevant
information to a bbi_config_log_df
tibble. It contains
model_has_changed
and data_has_changed
columns
that serve as a check that the outputs are up-to-date with the current
model and data.
You can call config_log
directly, but it is often useful
to join it to a run log automatically with
run_log() %>% add_config()
.
log_df <- log_df %>% add_config()
log_df %>% select(run, model_has_changed, data_has_changed)
#> # A tibble: 6 × 3
#> run model_has_changed data_has_changed
#> <chr> <lgl> <lgl>
#> 1 1 FALSE FALSE
#> 2 2 FALSE FALSE
#> 3 3 TRUE FALSE
#> 4 4 TRUE FALSE
#> 5 5 FALSE FALSE
#> 6 6 FALSE FALSE
One important note: check_up_to_date()
and
config_log()
return opposite boolean values.
check_up_to_date()
returns TRUE
if the files
are up to date, whereas the *_has_changed
columns
contain TRUE
if something has changed.
From the model_has_changed
column in the previous
example, you can see that some of the model files have changed since
they were run. However, you may only care about your final model and the
models that led to it. You can use the description
and
based_on
columns from the run_log()
to filter
to only those models.
final_model_family <- bind_rows(
log_df %>%
filter(absolute_model_path %in% get_model_ancestry(final_model_path)), # the ancestors of the final model
log_df %>%
filter(description == "Final model") # the final model itself
)
final_model_family %>%
collapse_to_string(based_on) %>%
select(run, based_on, description, model_has_changed, data_has_changed)
#> # A tibble: 4 × 5
#> run based_on description model_has_changed data_has_changed
#> <chr> <chr> <chr> <lgl> <lgl>
#> 1 1 NA NA FALSE FALSE
#> 2 2 1 NA FALSE FALSE
#> 3 5 2 NA FALSE FALSE
#> 4 6 5 Final model FALSE FALSE
When we filter to only those models, you can see that they are all still up-to-date. Great news.