resampling

resampling

resample_df(df, key_cols, strat_cols = NULL, n = NULL,
  key_col_name = "KEY", replace = TRUE)

Arguments

df	data frame
key_cols	key columns to resample on
strat_cols	columns to maintain proportion for stratification
n	number of unique sampled keys, defaults to match dataset
key_col_name	name of outputted key column. Default to "KEY"
replace	whether to stratify with replacement

Details

This function is valuable when generating a large simulated population where you goal is to create resampled sub-populations in addition to being able to maintain certain stratifications of factors like covariate distributions

A new keyed column will be created (defaults to name 'KEY') that contains the uniquely created new samples. This allows one to easily compare against the key'd columns. Eg, if you would like to see how many times a particular individual was resampled you can check the original ID column against the number of key's associated with that ID number.

Examples

library(PKPDmisc)
library(dplyr, quiet = TRUE)

# simple example resampling by ID maintaining Gender distribution, with 10 individuals
resample_df(sd_oral_richpk, key_cols = "ID", strat_cols = "Gender", n = 10)
#> # A tibble: 120 x 10
#>      KEY    ID  Time   Amt  Conc   Age Weight Gender Race       Dose
#>    <int> <int> <dbl> <int> <dbl> <dbl>  <dbl> <fct>  <fct>     <int>
#>  1     1     5  0     5000   0    50.9   73.8 Female Caucasian  5000
#>  2     1     5  0.25     0  17.6  50.9   73.8 Female Caucasian  5000
#>  3     1     5  0.5      0  28.7  50.9   73.8 Female Caucasian  5000
#>  4     1     5  1        0  46.8  50.9   73.8 Female Caucasian  5000
#>  5     1     5  2        0  51.6  50.9   73.8 Female Caucasian  5000
#>  6     1     5  3        0  50.8  50.9   73.8 Female Caucasian  5000
#>  7     1     5  4        0  49.1  50.9   73.8 Female Caucasian  5000
#>  8     1     5  6        0  25.6  50.9   73.8 Female Caucasian  5000
#>  9     1     5  8        0  24.4  50.9   73.8 Female Caucasian  5000
#> 10     1     5 12        0  17.3  50.9   73.8 Female Caucasian  5000
#> # ... with 110 more rows

# for a more complex example lets resample "simulated" data with multiple replicates
subset_data <- sd_oral_richpk %>%
   filter(ID < 20)

# make 'simulated' data with 5 replicates and combine to single dataframe
rep_dat <- lapply(1:5, function(x) {
subset_data %>%
  mutate(REP = x)
  }) %>% bind_rows()

# now when we resample we also want to maintain the ID+REP relationship as resampling
# just the ID would give all rows associated for an ID with all reps, rather than 
# a single "unit" of ID/REP
resample_df(rep_dat, key_cols = c("ID", "REP"))
#> # A tibble: 1,140 x 11
#>      KEY    ID  Time   Amt  Conc   Age Weight Gender Race   Dose   REP
#>    <int> <int> <dbl> <int> <dbl> <dbl>  <dbl> <fct>  <fct> <int> <int>
#>  1     1     8  0     5000   0    56.7   85.0 Male   Asian  5000     5
#>  2     1     8  0.25     0  16.3  56.7   85.0 Male   Asian  5000     5
#>  3     1     8  0.5      0  23.0  56.7   85.0 Male   Asian  5000     5
#>  4     1     8  1        0  32.4  56.7   85.0 Male   Asian  5000     5
#>  5     1     8  2        0  36.4  56.7   85.0 Male   Asian  5000     5
#>  6     1     8  3        0  44.4  56.7   85.0 Male   Asian  5000     5
#>  7     1     8  4        0  37.0  56.7   85.0 Male   Asian  5000     5
#>  8     1     8  6        0  27.6  56.7   85.0 Male   Asian  5000     5
#>  9     1     8  8        0  26.4  56.7   85.0 Male   Asian  5000     5
#> 10     1     8 12        0  18.8  56.7   85.0 Male   Asian  5000     5
#> # ... with 1,130 more rows

# check to see that stratification is maintained
rep_dat %>% group_by(Gender) %>% tally
#> # A tibble: 2 x 2
#>   Gender     n
#>   <fct>  <int>
#> 1 Female   300
#> 2 Male     840
resample_df(rep_dat, key_cols=c("ID", "REP"), strat_cols="Gender") %>%
  group_by(Gender) %>% tally
#> # A tibble: 2 x 2
#>   Gender     n
#>   <fct>  <int>
#> 1 Female   300
#> 2 Male     840

rep_dat %>% group_by(Gender, Race) %>% tally
#> # A tibble: 8 x 3
#> # Groups:   Gender [?]
#>   Gender Race          n
#>   <fct>  <fct>     <int>
#> 1 Female Caucasian   180
#> 2 Female Hispanic     60
#> 3 Female Other        60
#> 4 Male   Asian       120
#> 5 Male   Black       180
#> 6 Male   Caucasian    60
#> 7 Male   Hispanic    180
#> 8 Male   Other       300

resample_df(rep_dat, key_cols=c("ID", "REP"), strat_cols=c("Gender", "Race")) %>%
  group_by(Gender, Race) %>% tally
#> # A tibble: 8 x 3
#> # Groups:   Gender [?]
#>   Gender Race          n
#>   <fct>  <fct>     <int>
#> 1 Female Caucasian   180
#> 2 Female Hispanic     60
#> 3 Female Other        60
#> 4 Male   Asian       120
#> 5 Male   Black       180
#> 6 Male   Caucasian    60
#> 7 Male   Hispanic    180
#> 8 Male   Other       300

Arguments

Details

Examples

Contents