vignettes/using-summary-log.Rmd
using-summary-log.Rmd
This vignette demonstrates how to use summary_log()
to
extract model diagnostics like the objective function value, condition
number, and parameter counts.
If you are new to bbr
, the “Getting
Started with bbr” vignette will take you through some basic
scenarios for modeling with NONMEM using bbr
, introducing
you to its standard workflow and functionality.
There is a lot of information in the bbi_summary_log_df
tibble that is output from summary_log()
. However, it is
important to note that all of this, and quite a bit more, is contained
in the bbi_nonmem_summary
object output from
model_summary()
. If you are trying to dig deep into the
outputs of a small number of models, see the Summarize
section of the “Getting Started” vignette for an introduction to
that functionality.
summary_log()
is more useful for getting a slightly
higher-level view of a larger batch of models; potentially all the
models in a given project, or something like a large group of
bootstrapped runs.
There is some initial set up necessary for using bbr
.
Please refer to the “Getting
Started” vignette, mentioned above, if you have not done this yet.
Once this is done, load the library.
As mentioned above, the bbi_summary_log_df
tibble
contains a lot of information, which is a subset of what is contained in
the bbi_nonmem_summary
object returned from
model_summary()
.
MODEL_DIR <- system.file("model", "nonmem", "complex", package = "bbr")
sum1 <-
read_model(file.path(MODEL_DIR, "iovmm")) %>%
model_summary()
names(sum1)
#> [1] "absolute_model_path" "run_details" "run_heuristics"
#> [4] "parameters_data" "parameter_names" "ofv"
#> [7] "condition_number" "shrinkage_details" "success"
For example, the run_details
section alone contains
wealth of information about this model run:
str(sum1$run_details)
#> List of 16
#> $ version : chr "7.4.4"
#> $ run_start : num NA
#> $ run_end : chr "Tue Aug 25 16:50:27 EDT 2020"
#> $ estimation_time : num 132
#> $ postprocess_time : num 1.04
#> $ cpu_time : num 134
#> $ function_evaluations : int 380
#> $ significant_digits : int 3
#> $ problem_text : chr "10 mixture model and IOV on CL"
#> $ mod_file : num NA
#> $ estimation_method : chr "First Order Conditional Estimation"
#> $ data_set : chr "../MixSim.csv"
#> $ number_of_subjects : int 300
#> $ number_of_obs : int 12600
#> $ number_of_data_records: int 13500
#> $ output_files_used : chr [1:5] "iovmm.lst" "iovmm.cpu" "iovmm.ext" "iovmm.grd" ...
Much of this is very useful, but it’s also a bit intimidating, and it can take some work to unpack it all and find the bits and pieces you’re looking for.
The summary_log()
function is designed to extract some
of the most relevant diagnostics and model outputs from a batch of model
summaries and organize them into a more easily digestible tibble. Like
run_log()
and config_log()
, it takes two
arguments:
.base_dir
– Directory to look for models in..recurse
– Logical indicating whether to search
recursively in subdirectories. This is TRUE
by
default.
sum_df <- summary_log(MODEL_DIR)
names(sum_df)
#> [1] "absolute_model_path" "run"
#> [3] "bbi_summary" "needed_fail_flags"
#> [5] "error_msg" "problem_text"
#> [7] "estimation_method" "number_of_subjects"
#> [9] "number_of_obs" "ofv"
#> [11] "param_count" "condition_number"
#> [13] "any_heuristics" "covariance_step_aborted"
#> [15] "large_condition_number" "eigenvalue_issues"
#> [17] "correlations_not_ok" "parameter_near_boundary"
#> [19] "hessian_reset" "has_final_zero_gradient"
#> [21] "minimization_terminated" "eta_pval_significant"
#> [23] "prderr"
The specific columns returned are described below, though there is
also a list of them, with brief definitions, in the summary_log()
docs that can be accessed any time with ?summary_log()
in the console.
The first column is absolute_model_path
which contains
an absolute path that unambiguously identifies each model. This serves
as the primary key for the tibble. The second column is simply the
basename
of this path, which is just a convenience for
printing and viewing.
The third column contains the bbi_nonmem_summary
object,
discussed above, for each model. This can be extracted and manipulated
if you would like more detailed data from it.
The error_msg
and needed_fail_flags
columns
describe whether bbi
had any trouble parsing the model
outputs. These won’t be discussed in detail here. Refer to the summary_log()
docs for more information.
The next batch of columns contain the core diagnostics and model
outputs. As mentioned above, descriptions of what each column contains
can be found in the summary_log()
docs.
sum_df %>%
collapse_to_string(estimation_method) %>%
select(
run,
ofv,
param_count,
estimation_method,
problem_text,
number_of_subjects,
number_of_obs,
condition_number
)
#> # A tibble: 6 × 8
#> run ofv param_count estimation_method problem_text number_of_subjects
#> <chr> <dbl> <int> <chr> <chr> <int>
#> 1 1001 3843. 15 MCMC Bayesian An… Run# 1001.1 240
#> 2 acop-fa… 2675. 7 First Order Cond… PK model 1 … 40
#> 3 acop-iov 44159. 9 First Order Cond… PK model 1 … 39
#> 4 acop-on… NA NA NA PK model 1 … 40
#> 5 example… -10839. 21 Stochastic Appro… RUN# exampl… 400
#> 6 iovmm 14722. 11 First Order Cond… 10 mixture … 300
#> # ℹ 2 more variables: number_of_obs <int>, condition_number <dbl>
The run_heuristics
element of the
bbi_nonmem_summary
object contains a number of logical
values indicating whether particular heuristic issues were found in the
model. Note that these are not necessarily errors with the
model run, but are closer to warning flags that should possibly be
investigated. Each heuristic is described in more detail in the summary_log()
docs.
Note that all heuristics will be FALSE
by
default (and never NA
) and will only be
TRUE
if they are explicitly triggered. For
example, large_condition_number
will be FALSE
even in the case when a condition number was not calculated at all.
All of the heuristic flags are pivoted out to their own columns in
the bbi_summary_log_df
tibble. It’s useful to note that,
except for needed_fail_flags
(discussed above), these are
the only logical columns in the tibble and can therefore be
easily selected with tidyselect::where(is.logical)
.
(Note: where()
only became available in
tidyselect (>= 1.1.0)
, released May 2020.)
#> # A tibble: 6 × 13
#> run needed_fail_flags any_heuristics covariance_step_aborted
#> <chr> <lgl> <lgl> <lgl>
#> 1 1001 TRUE TRUE FALSE
#> 2 acop-fake-bayes FALSE TRUE FALSE
#> 3 acop-iov FALSE TRUE FALSE
#> 4 acop-onlysim FALSE FALSE FALSE
#> 5 example2_saemimp FALSE FALSE FALSE
#> 6 iovmm FALSE TRUE FALSE
#> # ℹ 9 more variables: large_condition_number <lgl>, eigenvalue_issues <lgl>,
#> # correlations_not_ok <lgl>, parameter_near_boundary <lgl>,
#> # hessian_reset <lgl>, has_final_zero_gradient <lgl>,
#> # minimization_terminated <lgl>, eta_pval_significant <lgl>, prderr <lgl>
Notice that there is also an any_heuristics column, which can easily be used to filter to only runs that had at least one heuristic flag triggered.
Just like config_log()
has add_config()
, you can also use
add_summary()
to join all of these columns onto an existing
bbi_run_log_df
(the tibble output from
run_log()
). This can be useful if you have a run log that
you have previously filtered on something like the tags
or based_on
columns, and you would like to append some simple diagnostics.
# filter to two specific runs
log_df <- run_log(MODEL_DIR, .include = c("acop-iov", "iovmm"))
# add summary columns
log_df <- log_df %>% add_summary()
log_df %>% select(run, ofv, param_count, any_heuristics)
#> # A tibble: 2 × 4
#> run ofv param_count any_heuristics
#> <chr> <dbl> <int> <lgl>
#> 1 acop-iov 44159. 9 TRUE
#> 2 iovmm 14722. 11 TRUE