Creating a Model Summary Log

Introduction

This vignette demonstrates how to use summary_log() to extract model diagnostics like the objective function value, condition number, and parameter counts.

If you are new to bbr, the “Getting Started with bbr” vignette will take you through some basic scenarios for modeling with NONMEM using bbr, introducing you to its standard workflow and functionality.

There is a lot of information in the bbi_summary_log_df tibble that is output from summary_log(). However, it is important to note that all of this, and quite a bit more, is contained in the bbi_nonmem_summary object output from model_summary(). If you are trying to dig deep into the outputs of a small number of models, see the Summarize section of the “Getting Started” vignette for an introduction to that functionality.

summary_log() is more useful for getting a slightly higher-level view of a larger batch of models; potentially all the models in a given project, or something like a large group of bootstrapped runs.

Setup

There is some initial set up necessary for using bbr. Please refer to the “Getting Started” vignette, mentioned above, if you have not done this yet. Once this is done, load the library.

library(bbr)
library(dplyr)
library(tidyselect)

What’s in a Model Summary?

As mentioned above, the bbi_summary_log_df tibble contains a lot of information, which is a subset of what is contained in the bbi_nonmem_summary object returned from model_summary().

MODEL_DIR <- system.file("model", "nonmem", "complex", package = "bbr")

sum1 <- 
  read_model(file.path(MODEL_DIR, "iovmm")) %>%
  model_summary()

names(sum1)
#> [1] "absolute_model_path" "run_details"         "run_heuristics"     
#> [4] "parameters_data"     "parameter_names"     "ofv"                
#> [7] "condition_number"    "shrinkage_details"   "success"

For example, the run_details section alone contains wealth of information about this model run:

str(sum1$run_details)
#> List of 16
#>  $ version               : chr "7.4.4"
#>  $ run_start             : num NA
#>  $ run_end               : chr "Tue Aug 25 16:50:27 EDT 2020"
#>  $ estimation_time       : num 132
#>  $ postprocess_time      : num 1.04
#>  $ cpu_time              : num 134
#>  $ function_evaluations  : int 380
#>  $ significant_digits    : int 3
#>  $ problem_text          : chr "10 mixture model and IOV on CL"
#>  $ mod_file              : num NA
#>  $ estimation_method     : chr "First Order Conditional Estimation"
#>  $ data_set              : chr "../MixSim.csv"
#>  $ number_of_subjects    : int 300
#>  $ number_of_obs         : int 12600
#>  $ number_of_data_records: int 13500
#>  $ output_files_used     : chr [1:5] "iovmm.lst" "iovmm.cpu" "iovmm.ext" "iovmm.grd" ...

Much of this is very useful, but it’s also a bit intimidating, and it can take some work to unpack it all and find the bits and pieces you’re looking for.

What’s in the Summary Log?

The summary_log() function is designed to extract some of the most relevant diagnostics and model outputs from a batch of model summaries and organize them into a more easily digestible tibble. Like run_log() and config_log(), it takes two arguments:

.base_dir – Directory to look for models in.
.recurse – Logical indicating whether to search recursively in subdirectories. This is TRUE by default.

sum_df <- summary_log(MODEL_DIR)
names(sum_df)
#>  [1] "absolute_model_path"     "run"                    
#>  [3] "bbi_summary"             "needed_fail_flags"      
#>  [5] "error_msg"               "problem_text"           
#>  [7] "estimation_method"       "number_of_subjects"     
#>  [9] "number_of_obs"           "ofv"                    
#> [11] "aic"                     "bic"                    
#> [13] "param_count"             "condition_number"       
#> [15] "any_heuristics"          "covariance_step_aborted"
#> [17] "large_condition_number"  "eigenvalue_issues"      
#> [19] "correlations_not_ok"     "parameter_near_boundary"
#> [21] "hessian_reset"           "has_final_zero_gradient"
#> [23] "minimization_terminated" "eta_pval_significant"   
#> [25] "prderr"

The specific columns returned are described below, though there is also a list of them, with brief definitions, in the summary_log() docs that can be accessed any time with ?summary_log() in the console.

Housekeeping Columns

The first column is absolute_model_path which contains an absolute path that unambiguously identifies each model. This serves as the primary key for the tibble. The second column is simply the basename of this path, which is just a convenience for printing and viewing.

The third column contains the bbi_nonmem_summary object, discussed above, for each model. This can be extracted and manipulated if you would like more detailed data from it.

The error_msg and needed_fail_flags columns describe whether bbi had any trouble parsing the model outputs. These won’t be discussed in detail here. Refer to the summary_log() docs for more information.

Run Details Columns

The next batch of columns contain the core diagnostics and model outputs. As mentioned above, descriptions of what each column contains can be found in the summary_log() docs.

sum_df %>% 
  collapse_to_string(estimation_method) %>%
  select(
    run,
    ofv, 
    param_count, 
    estimation_method, 
    problem_text, 
    number_of_subjects, 
    number_of_obs, 
    condition_number
  )
#> # A tibble: 6 × 8
#>   run          ofv param_count estimation_method problem_text number_of_subjects
#>   <chr>      <dbl>       <int> <chr>             <chr>                     <int>
#> 1 1001       3843.          15 MCMC Bayesian An… Run# 1001.1                 240
#> 2 acop-fa…   2675.           7 First Order Cond… PK model 1 …                 40
#> 3 acop-iov  44159.           9 First Order Cond… PK model 1 …                 39
#> 4 acop-on…     NA           NA NA                PK model 1 …                 40
#> 5 example… -10839.          21 Stochastic Appro… RUN# exampl…                400
#> 6 iovmm     14722.          11 First Order Cond… 10 mixture …                300
#> # ℹ 2 more variables: number_of_obs <int>, condition_number <dbl>

Run Heuristics Columns

The run_heuristics element of the bbi_nonmem_summary object contains a number of logical values indicating whether particular heuristic issues were found in the model. Note that these are not necessarily errors with the model run, but are closer to warning flags that should possibly be investigated. Each heuristic is described in more detail in the summary_log() docs.

Note that all heuristics will be FALSE by default (and never NA) and will only be TRUE if they are explicitly triggered. For example, large_condition_number will be FALSE even in the case when a condition number was not calculated at all.

All of the heuristic flags are pivoted out to their own columns in the bbi_summary_log_df tibble. It’s useful to note that, except for needed_fail_flags (discussed above), these are the only logical columns in the tibble and can therefore be easily selected with tidyselect::where(is.logical). (Note: where() only became available in tidyselect (>= 1.1.0), released May 2020.)

sum_df %>% select(run, where(is.logical), -needed_fail_flags)

#> # A tibble: 6 × 13
#>   run              needed_fail_flags any_heuristics covariance_step_aborted
#>   <chr>            <lgl>             <lgl>          <lgl>                  
#> 1 1001             TRUE              TRUE           FALSE                  
#> 2 acop-fake-bayes  FALSE             TRUE           FALSE                  
#> 3 acop-iov         FALSE             TRUE           FALSE                  
#> 4 acop-onlysim     FALSE             FALSE          FALSE                  
#> 5 example2_saemimp FALSE             FALSE          FALSE                  
#> 6 iovmm            FALSE             TRUE           FALSE                  
#> # ℹ 9 more variables: large_condition_number <lgl>, eigenvalue_issues <lgl>,
#> #   correlations_not_ok <lgl>, parameter_near_boundary <lgl>,
#> #   hessian_reset <lgl>, has_final_zero_gradient <lgl>,
#> #   minimization_terminated <lgl>, eta_pval_significant <lgl>, prderr <lgl>

Notice that there is also an any_heuristics column, which can easily be used to filter to only runs that had at least one heuristic flag triggered.

Add Summary

Just like config_log() has add_config(), you can also use add_summary() to join all of these columns onto an existing bbi_run_log_df (the tibble output from run_log()). This can be useful if you have a run log that you have previously filtered on something like the tags or based_on columns, and you would like to append some simple diagnostics.

# filter to two specific runs
log_df <- run_log(MODEL_DIR, .include = c("acop-iov", "iovmm"))

# add summary columns
log_df <- log_df %>% add_summary()

log_df %>% select(run, ofv, param_count, any_heuristics)
#> # A tibble: 2 × 4
#>   run         ofv param_count any_heuristics
#>   <chr>     <dbl>       <int> <lgl>         
#> 1 acop-iov 44159.           9 TRUE          
#> 2 iovmm    14722.          11 TRUE

Seth Green