library(purrr)
library(dplyr)
library(yspec)
This vignette shows you how to add labels to the columns of a data set. Labels are descriptions for every column in the data set that get attached as attributes for each (column) list in the data frame.
Labels must be \(\le\) 40 characters long.
6.1 Load specification object and data set
We’ll use the examples provided in the package
<- ys_help$data()
data <- ys_help$spec() spec
The data
as_tibble(data)
# A tibble: 4,360 × 29
C NUM ID SUBJ TIME SEQ CMT EVID AMT DV AGE WT CRCL
<lgl> <int> <int> <int> <dbl> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
1 NA 1 1 1 0 0 1 1 5 0 28.0 55.2 114.
2 NA 2 1 1 0.61 1 2 0 NA 61.0 28.0 55.2 114.
3 NA 3 1 1 1.15 1 2 0 NA 91.0 28.0 55.2 114.
4 NA 4 1 1 1.73 1 2 0 NA 122. 28.0 55.2 114.
5 NA 5 1 1 2.15 1 2 0 NA 126. 28.0 55.2 114.
6 NA 6 1 1 3.19 1 2 0 NA 84.7 28.0 55.2 114.
7 NA 7 1 1 4.21 1 2 0 NA 62.1 28.0 55.2 114.
8 NA 8 1 1 5.09 1 2 0 NA 49.1 28.0 55.2 114.
9 NA 9 1 1 6.22 1 2 0 NA 64.2 28.0 55.2 114.
10 NA 10 1 1 8.09 1 2 0 NA 59.6 28.0 55.2 114.
# ℹ 4,350 more rows
# ℹ 16 more variables: ALB <dbl>, BMI <dbl>, AAG <dbl>, SCR <dbl>, AST <dbl>,
# ALT <dbl>, HT <dbl>, CP <int>, TAFD <dbl>, TAD <dbl>, LDOS <int>,
# MDV <int>, BLQ <int>, PHASE <int>, STUDY <int>, RF <chr>
The spec
spec
name info unit short source
C cd- . comment character ysdb_internal
NUM --- . record number ysdb_internal
ID --- . subject identifier ysdb_internal
SUBJ c-- . subject identifier ysdb_internal
TIME --- hour TIME look
SEQ -d- . SEQ .
CMT --- . compartment number ysdb_internal
EVID -d- . event ID ysdb_internal
AMT --- mg dose amount ysdb_internal
DV --- micrograms/L dependent variable ysdb_internal
AGE --- years age ysdb_internal
WT --- kg weight ysdb_internal
CRCL --- ml/min CRCL .
ALB --- g/dL albumin ysdb_internal
BMI --- m2/kg BMI ysdb_internal
AAG --- mg/dL alpha-1-acid glycoprotein .
SCR --- mg/dL serum creatinine .
AST --- . aspartate aminotransferase .
ALT --- . alanine aminotransferase .
HT --- cm height ysdb_internal
CP -d- . Child-Pugh score look
TAFD --- hours time after first dose .
TAD --- hours time after dose .
LDOS --- mg last dose amount .
MDV -d- . MDV ysdb_internal
BLQ -d- . below limit of quantification .
PHASE --- . study phase indicator .
STUDY -d- . study number .
RF cd- . renal function stage .
6.2 Use ys_add_labels
<- ys_add_labels(data,spec) data
It isn’t obvious that anything was done here
as_tibble(data)
# A tibble: 4,360 × 29
C NUM ID SUBJ TIME SEQ CMT EVID AMT DV AGE WT CRCL
<lgl> <int> <int> <int> <dbl> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
1 NA 1 1 1 0 0 1 1 5 0 28.0 55.2 114.
2 NA 2 1 1 0.61 1 2 0 NA 61.0 28.0 55.2 114.
3 NA 3 1 1 1.15 1 2 0 NA 91.0 28.0 55.2 114.
4 NA 4 1 1 1.73 1 2 0 NA 122. 28.0 55.2 114.
5 NA 5 1 1 2.15 1 2 0 NA 126. 28.0 55.2 114.
6 NA 6 1 1 3.19 1 2 0 NA 84.7 28.0 55.2 114.
7 NA 7 1 1 4.21 1 2 0 NA 62.1 28.0 55.2 114.
8 NA 8 1 1 5.09 1 2 0 NA 49.1 28.0 55.2 114.
9 NA 9 1 1 6.22 1 2 0 NA 64.2 28.0 55.2 114.
10 NA 10 1 1 8.09 1 2 0 NA 59.6 28.0 55.2 114.
# ℹ 4,350 more rows
# ℹ 16 more variables: ALB <dbl>, BMI <dbl>, AAG <dbl>, SCR <dbl>, AST <dbl>,
# ALT <dbl>, HT <dbl>, CP <int>, TAFD <dbl>, TAD <dbl>, LDOS <int>,
# MDV <int>, BLQ <int>, PHASE <int>, STUDY <int>, RF <chr>
How can you tell that the labels were added?
<- map(data, attr, "label")
labs
1:5] labs[
$C
[1] "comment character"
$NUM
[1] "record number"
$ID
[1] "subject identifier"
$SUBJ
[1] "subject identifier"
$TIME
[1] "time after first dose"
Or do this
str(data)
6.3 Where does label come from?
Ideally, we’d like to be writing in a label
entry for every column in the data set. You can set the ys.require.label
option to TRUE
to require this when loading the spec (an error will be generated).
But yspec has a function called ys_get_label()
that will form a label for you. Here are the rules:
- If
label
exists for a column, it will be used - Otherwise, if
long
is found and it is<=
40 characters, it be used - Otherwise,
short
will be used; reminder thatshort
defaults to the column name (col
) too
Let’s look at some examples
ys_get_label(spec)[1:3]
$C
[1] "comment character"
$NUM
[1] "record number"
$ID
[1] "subject identifier"
ys_get_label(spec$NUM)
[1] "record number"
$NUM$label spec
NULL
$C$label spec
NULL
6.4 Custom label formation
Just as an example, we can add a custom labeling function. For example, I want the label to be the column name.
Set up a function that takes the column data as the first argument
<- function(x,...) x[["col"]] label_fun
Now, pass that function into ys_add_labels()
<- ys_add_labels(data, spec, fun = label_fun) data
And check the output
map(data, attr, "label")[1:5]
$C
[1] "C"
$NUM
[1] "NUM"
$ID
[1] "ID"
$SUBJ
[1] "SUBJ"
$TIME
[1] "TIME"
6.5 Extract the label field
Recall that the yspec object is just a list. We can always map across that list and grab the label field
map(spec, "label")[1:5]
$C
NULL
$NUM
NULL
$ID
NULL
$SUBJ
NULL
$TIME
[1] "time after first dose"