Summarize Data — summarize

Summarize input data to prepare for passing to plot_forest(). Takes a data.frame or tibble, calculates the relevant confidence intervals, and returns a tibble that can be passed directly to plot_forest(). See Details section for data specification and format.

summarize_data(
  data,
  value,
  group,
  group_level = NULL,
  metagroup = NULL,
  replicate = NULL,
  probs = c(0.05, 0.95),
  statistic = c("median", "mean", "geo_mean"),
  rep_probs = c(0.025, 0.975),
  rep_statistic = c("median", "mean", "geo_mean")
)

Arguments

data: A dataframe or tibble to summarize. See Details section for required format.
value: name of the column in data to perform calculations on (i.e. median/mean, lower, and upper CI)
group: name of the column in data that defines groups within the data. Often, this will contain the names of the covariates you are grouping by.
group_level: (optional) name of the column in data that contains subgroups to group by. For example, if your group column contains covariates like WEIGHT and AGE, this column could contain categories like underweight, average, overweight, young, mid, elderly, etc.
metagroup: (optional) name of the column in data that contains metagroups. Similar to facet wrap, if passed, this will cause plot_forest() to produce independent plots per metagroup.
replicate: (optional) name of the column in data that contains to an index of replicates, for example with multiple simulations or bootstrapping. If specified, plot_forest() will draw additional CI's of the individual statistics, as small lines above each primary line.
probs: numeric vector of length two, both between 0 and 1, corresponding to your lower and upper tail probabilities. Defaults to c(0.05, 0.95)
statistic: is the actual statistic to output (i.e. median/mean)
rep_probs: same as probs but used only when replicate is passed for the minor intervals (i.e. the small lines) above the major interval (i.e. the big lines).
rep_statistic: same as statistic but used only when replicate is passed for the minor intervals (i.e. the small lines) above the major interval (i.e. the big lines).

Details

Input Data

The tibble passed to data must be in a "long" format and has 2-5 columns: value, group, and optionally any of group_level, metagroup, and/or replicate. These are each described in detail in the input arguments section.

Output Data

The tibble output from this function has one of two formats, depending on whether replicate was passed (details below).

Either way, the output tibble has a column named group, containing the values in the column you passed to the group argument, and optionally analogous columns for group_level and metagroup if those were passed.

Without replicate If replicate is not passed, the output data has three additional columns mid, lo, and hi, containing the summarized values corresponding to what was passed to statistic (mid) and probs (lo/hi).
With replicate If replicate is passed, the output data has nine additional columns mid_mid, mid_lo, mid_hi, plus three more each for lo_* and hi_*, containing the summarized values. In this case, the mid_mid, lo_mid, and hi_mid correspond to the values of the major interval (i.e. the big lines and data point) and the *_mid, *_lo, and *_hi correspond to the values for each minor interval (i.e. the small lines).