resampling

resample_df(df, key_cols, strat_cols = NULL, n = NULL,
  key_col_name = "KEY", replace = TRUE)

Arguments

df

data frame

key_cols

key columns to resample on

strat_cols

columns to maintain proportion for stratification

n

number of unique sampled keys, defaults to match dataset

key_col_name

name of outputted key column. Default to "KEY"

replace

whether to stratify with replacement

Details

This function is valuable when generating a large simulated population where you goal is to create resampled sub-populations in addition to being able to maintain certain stratifications of factors like covariate distributions

A new keyed column will be created (defaults to name 'KEY') that contains the uniquely created new samples. This allows one to easily compare against the key'd columns. Eg, if you would like to see how many times a particular individual was resampled you can check the original ID column against the number of key's associated with that ID number.

Examples

library(PKPDmisc) library(dplyr, quiet = TRUE) # simple example resampling by ID maintaining Gender distribution, with 10 individuals resample_df(sd_oral_richpk, key_cols = "ID", strat_cols = "Gender", n = 10)
#> # A tibble: 120 x 10 #> KEY ID Time Amt Conc Age Weight Gender Race Dose #> <int> <int> <dbl> <int> <dbl> <dbl> <dbl> <fct> <fct> <int> #> 1 1 5 0 5000 0 50.9 73.8 Female Caucasian 5000 #> 2 1 5 0.25 0 17.6 50.9 73.8 Female Caucasian 5000 #> 3 1 5 0.5 0 28.7 50.9 73.8 Female Caucasian 5000 #> 4 1 5 1 0 46.8 50.9 73.8 Female Caucasian 5000 #> 5 1 5 2 0 51.6 50.9 73.8 Female Caucasian 5000 #> 6 1 5 3 0 50.8 50.9 73.8 Female Caucasian 5000 #> 7 1 5 4 0 49.1 50.9 73.8 Female Caucasian 5000 #> 8 1 5 6 0 25.6 50.9 73.8 Female Caucasian 5000 #> 9 1 5 8 0 24.4 50.9 73.8 Female Caucasian 5000 #> 10 1 5 12 0 17.3 50.9 73.8 Female Caucasian 5000 #> # ... with 110 more rows
# for a more complex example lets resample "simulated" data with multiple replicates subset_data <- sd_oral_richpk %>% filter(ID < 20) # make 'simulated' data with 5 replicates and combine to single dataframe rep_dat <- lapply(1:5, function(x) { subset_data %>% mutate(REP = x) }) %>% bind_rows() # now when we resample we also want to maintain the ID+REP relationship as resampling # just the ID would give all rows associated for an ID with all reps, rather than # a single "unit" of ID/REP resample_df(rep_dat, key_cols = c("ID", "REP"))
#> # A tibble: 1,140 x 11 #> KEY ID Time Amt Conc Age Weight Gender Race Dose REP #> <int> <int> <dbl> <int> <dbl> <dbl> <dbl> <fct> <fct> <int> <int> #> 1 1 8 0 5000 0 56.7 85.0 Male Asian 5000 5 #> 2 1 8 0.25 0 16.3 56.7 85.0 Male Asian 5000 5 #> 3 1 8 0.5 0 23.0 56.7 85.0 Male Asian 5000 5 #> 4 1 8 1 0 32.4 56.7 85.0 Male Asian 5000 5 #> 5 1 8 2 0 36.4 56.7 85.0 Male Asian 5000 5 #> 6 1 8 3 0 44.4 56.7 85.0 Male Asian 5000 5 #> 7 1 8 4 0 37.0 56.7 85.0 Male Asian 5000 5 #> 8 1 8 6 0 27.6 56.7 85.0 Male Asian 5000 5 #> 9 1 8 8 0 26.4 56.7 85.0 Male Asian 5000 5 #> 10 1 8 12 0 18.8 56.7 85.0 Male Asian 5000 5 #> # ... with 1,130 more rows
# check to see that stratification is maintained rep_dat %>% group_by(Gender) %>% tally
#> # A tibble: 2 x 2 #> Gender n #> <fct> <int> #> 1 Female 300 #> 2 Male 840
resample_df(rep_dat, key_cols=c("ID", "REP"), strat_cols="Gender") %>% group_by(Gender) %>% tally
#> # A tibble: 2 x 2 #> Gender n #> <fct> <int> #> 1 Female 300 #> 2 Male 840
rep_dat %>% group_by(Gender, Race) %>% tally
#> # A tibble: 8 x 3 #> # Groups: Gender [?] #> Gender Race n #> <fct> <fct> <int> #> 1 Female Caucasian 180 #> 2 Female Hispanic 60 #> 3 Female Other 60 #> 4 Male Asian 120 #> 5 Male Black 180 #> 6 Male Caucasian 60 #> 7 Male Hispanic 180 #> 8 Male Other 300
resample_df(rep_dat, key_cols=c("ID", "REP"), strat_cols=c("Gender", "Race")) %>% group_by(Gender, Race) %>% tally
#> # A tibble: 8 x 3 #> # Groups: Gender [?] #> Gender Race n #> <fct> <fct> <int> #> 1 Female Caucasian 180 #> 2 Female Hispanic 60 #> 3 Female Other 60 #> 4 Male Asian 120 #> 5 Male Black 180 #> 6 Male Caucasian 60 #> 7 Male Hispanic 180 #> 8 Male Other 300