Given a set of bin ranges, assign each value to a bin
set_bins(
x,
breaks = stats::quantile(x, na.rm = TRUE),
lower_bound = -Inf,
upper_bound = Inf,
quiet = TRUE,
between = NULL,
inclusive = TRUE
)
numeric vector to assign bins
breaks for each bin, defaults to quantiles
set a lower bound for the first bin, defaults to -Inf
set an upper bound for the last bind, defaults to Inf
whether to give additional information regarding bins and assigned range for each
defaults to NULL, a special case of setting all inside the specified range
include max value of largest user defined bin even though lower bins are non-inclusive
Given a set of quantiles/bins/etc established from a separate dataset, it can be useful to assign the same bins to new or simulated data for comparisons or to do additional analysis such as assign dropouts etc. This function can be used to take the breakpoints to establish bins quickly and easily
If there is concern over data being outside the range of the assigned breaks, one can assign -Inf to lower and/or Inf to upper to make sure all values will be assigned to a bin
To use the between functionality, you must specify the range you wish to bin between, and those values will be assigned to bin 1, with all values below as 0 and all values above as 2. See the examples for more details
set_bins_df
: This function creates bins from a dataframe and outputs both the binning column
as well as a label column with the range of values associated with a given bin
x <- Theoph$conc
head(x)
#> [1] 0.74 2.84 6.57 10.50 9.66 8.58
#basic example
res <- set_bins(x)
head(res)
#> [1] 1 1 3 4 4 4
table(res)
#> res
#> 1 2 3 4
#> 33 33 32 34
res
#> [1] 1 1 3 4 4 4 4 4 3 3 2 1 1 4 4 4 3 3 3 2 2 1 1 2 3 4 4 4 3 3 2 2 1 1 1 2 4
#> [38] 4 4 3 3 3 2 1 1 1 3 4 4 4 4 3 3 2 1 1 1 2 3 3 3 2 2 2 1 1 1 1 1 2 3 3 3 2
#> [75] 2 2 1 1 2 2 4 4 3 3 2 2 2 1 1 4 4 4 3 3 3 2 2 2 1 1 2 2 3 4 4 4 4 4 3 1 1
#> [112] 2 4 4 3 3 2 2 2 1 1 1 1 2 4 4 4 4 3 3 2 1
#assign all obs < lower bound to NA
res <- set_bins(x,
breaks = stats::quantile(x, na.rm = TRUE, probs = c(0.1, 0.5, 1)),
lower_bound = 1)
head(res)
#> [1] NA 0 1 1 1 1
table(res)
#> res
#> 0 1
#> 52 66
res
#> [1] NA 0 1 1 1 1 1 1 1 1 0 NA 0 1 1 1 1 1 1 0 0 0 NA 0 1
#> [26] 1 1 1 1 1 0 0 0 NA 0 0 1 1 1 1 1 1 0 0 NA 0 1 1 1 1
#> [51] 1 1 1 0 0 NA 0 0 1 1 1 0 0 0 0 0 NA NA 0 0 1 1 1 0 0
#> [76] 0 0 NA 0 0 1 1 1 1 0 0 0 0 NA 1 1 1 1 1 1 0 0 0 0 NA
#> [101] 0 0 1 1 1 1 1 1 1 0 NA 0 1 1 1 1 0 0 0 0 NA NA 0 0 1
#> [126] 1 1 1 1 1 0 0
#use inclusive argument to get desired bins
## include max value of largest user defined bin
xbreak <- stats::quantile(x, na.rm = TRUE, probs= c(0, 0.5, 1))
xupper = Inf
res1 <- set_bins(x, breaks = xbreak, upper_bound = xupper, inclusive = TRUE)
table(res1)
#> res1
#> 1 2
#> 66 66
## do not include max value of largest user-defined bin- create new bin for it
res2 <- set_bins(x, breaks = xbreak, upper_bound = xupper, inclusive = FALSE)
table(res2)
#> res2
#> 1 2 3
#> 66 65 1
res2
#> [1] 1 1 2 2 2 2 2 2 2 2 1 1 1 2 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 2
#> [38] 2 2 2 2 2 1 1 1 1 2 3 2 2 2 2 2 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 2 2 2 1
#> [75] 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 2 2 2 2 2 2 2 1 1
#> [112] 1 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1
# use between argument to cut obs at certain values. For example, want a bin of conc between 3-7
res <- set_bins(x, between = c(3, 7))
head(res)
#> [1] 0 0 1 2 2 2
table(res)
#> res
#> 0 1 2
#> 34 62 36