Given a set of bin ranges, assign each value to a bin

set_bins(
  x,
  breaks = stats::quantile(x, na.rm = TRUE),
  lower_bound = -Inf,
  upper_bound = Inf,
  quiet = TRUE,
  between = NULL,
  inclusive = TRUE
)

Arguments

x

numeric vector to assign bins

breaks

breaks for each bin, defaults to quantiles

lower_bound

set a lower bound for the first bin, defaults to -Inf

upper_bound

set an upper bound for the last bind, defaults to Inf

quiet

whether to give additional information regarding bins and assigned range for each

between

defaults to NULL, a special case of setting all inside the specified range

inclusive

include max value of largest user defined bin even though lower bins are non-inclusive

Details

Given a set of quantiles/bins/etc established from a separate dataset, it can be useful to assign the same bins to new or simulated data for comparisons or to do additional analysis such as assign dropouts etc. This function can be used to take the breakpoints to establish bins quickly and easily

If there is concern over data being outside the range of the assigned breaks, one can assign -Inf to lower and/or Inf to upper to make sure all values will be assigned to a bin

To use the between functionality, you must specify the range you wish to bin between, and those values will be assigned to bin 1, with all values below as 0 and all values above as 2. See the examples for more details

See also

set_bins_df: This function creates bins from a dataframe and outputs both the binning column as well as a label column with the range of values associated with a given bin

Examples

x <- Theoph$conc

head(x)
#> [1]  0.74  2.84  6.57 10.50  9.66  8.58

#basic example
res <- set_bins(x)

head(res)
#> [1] 1 1 3 4 4 4

table(res)
#> res
#>  1  2  3  4 
#> 33 33 32 34 
res
#>   [1] 1 1 3 4 4 4 4 4 3 3 2 1 1 4 4 4 3 3 3 2 2 1 1 2 3 4 4 4 3 3 2 2 1 1 1 2 4
#>  [38] 4 4 3 3 3 2 1 1 1 3 4 4 4 4 3 3 2 1 1 1 2 3 3 3 2 2 2 1 1 1 1 1 2 3 3 3 2
#>  [75] 2 2 1 1 2 2 4 4 3 3 2 2 2 1 1 4 4 4 3 3 3 2 2 2 1 1 2 2 3 4 4 4 4 4 3 1 1
#> [112] 2 4 4 3 3 2 2 2 1 1 1 1 2 4 4 4 4 3 3 2 1

#assign all obs < lower bound to NA
res <- set_bins(x,
    breaks = stats::quantile(x, na.rm = TRUE, probs = c(0.1, 0.5, 1)),
    lower_bound = 1)

head(res)
#> [1] NA  0  1  1  1  1

table(res)
#> res
#>  0  1 
#> 52 66 
res
#>   [1] NA  0  1  1  1  1  1  1  1  1  0 NA  0  1  1  1  1  1  1  0  0  0 NA  0  1
#>  [26]  1  1  1  1  1  0  0  0 NA  0  0  1  1  1  1  1  1  0  0 NA  0  1  1  1  1
#>  [51]  1  1  1  0  0 NA  0  0  1  1  1  0  0  0  0  0 NA NA  0  0  1  1  1  0  0
#>  [76]  0  0 NA  0  0  1  1  1  1  0  0  0  0 NA  1  1  1  1  1  1  0  0  0  0 NA
#> [101]  0  0  1  1  1  1  1  1  1  0 NA  0  1  1  1  1  0  0  0  0 NA NA  0  0  1
#> [126]  1  1  1  1  1  0  0

#use inclusive argument to get desired bins
## include max value of largest user defined bin
xbreak <- stats::quantile(x, na.rm = TRUE, probs= c(0, 0.5, 1))
xupper = Inf

res1 <- set_bins(x, breaks = xbreak, upper_bound = xupper, inclusive = TRUE)

table(res1)
#> res1
#>  1  2 
#> 66 66 

## do not include max value of largest user-defined bin- create new bin for it
res2 <- set_bins(x, breaks = xbreak, upper_bound = xupper, inclusive = FALSE)

table(res2)
#> res2
#>  1  2  3 
#> 66 65  1 
res2
#>   [1] 1 1 2 2 2 2 2 2 2 2 1 1 1 2 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 2
#>  [38] 2 2 2 2 2 1 1 1 1 2 3 2 2 2 2 2 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 2 2 2 1
#>  [75] 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 2 2 2 2 2 2 2 1 1
#> [112] 1 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1

# use between argument to cut obs at certain values. For example, want a bin of conc between 3-7
res <- set_bins(x,  between = c(3, 7)) 

head(res)
#> [1] 0 0 1 2 2 2

table(res)
#> res
#>  0  1  2 
#> 34 62 36