6  Label dataset columns

This vignette shows you how to add labels to the columns of a data set. Labels are descriptions for every column in the data set that get attached as attributes for each (column) list in the data frame.

Labels must be \(\le\) 40 characters long.

library(purrr)
library(dplyr)
library(yspec)

6.1 Load specification object and data set

We’ll use the examples provided in the package

data <- ys_help$data()
spec <- ys_help$spec()

The data

as_tibble(data)
# A tibble: 4,360 × 29
   C       NUM    ID  SUBJ  TIME   SEQ   CMT  EVID   AMT    DV   AGE    WT  CRCL
   <lgl> <int> <int> <int> <dbl> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
 1 NA        1     1     1  0        0     1     1     5   0    28.0  55.2  114.
 2 NA        2     1     1  0.61     1     2     0    NA  61.0  28.0  55.2  114.
 3 NA        3     1     1  1.15     1     2     0    NA  91.0  28.0  55.2  114.
 4 NA        4     1     1  1.73     1     2     0    NA 122.   28.0  55.2  114.
 5 NA        5     1     1  2.15     1     2     0    NA 126.   28.0  55.2  114.
 6 NA        6     1     1  3.19     1     2     0    NA  84.7  28.0  55.2  114.
 7 NA        7     1     1  4.21     1     2     0    NA  62.1  28.0  55.2  114.
 8 NA        8     1     1  5.09     1     2     0    NA  49.1  28.0  55.2  114.
 9 NA        9     1     1  6.22     1     2     0    NA  64.2  28.0  55.2  114.
10 NA       10     1     1  8.09     1     2     0    NA  59.6  28.0  55.2  114.
# … with 4,350 more rows, and 16 more variables: ALB <dbl>, BMI <dbl>,
#   AAG <dbl>, SCR <dbl>, AST <dbl>, ALT <dbl>, HT <dbl>, CP <int>, TAFD <dbl>,
#   TAD <dbl>, LDOS <int>, MDV <int>, BLQ <int>, PHASE <int>, STUDY <int>,
#   RF <chr>

The spec

spec
 name  info unit         short                         source       
 C     cd-  .            comment character             ysdb_internal
 NUM   ---  .            record number                 ysdb_internal
 ID    ---  .            subject identifier            ysdb_internal
 SUBJ  c--  .            subject identifier            ysdb_internal
 TIME  ---  hour         TIME                          look         
 SEQ   -d-  .            SEQ                           .            
 CMT   ---  .            compartment number            ysdb_internal
 EVID  -d-  .            event ID                      ysdb_internal
 AMT   ---  mg           dose amount                   ysdb_internal
 DV    ---  micrograms/L dependent variable            ysdb_internal
 AGE   ---  years        age                           ysdb_internal
 WT    ---  kg           weight                        ysdb_internal
 CRCL  ---  ml/min       CRCL                          .            
 ALB   ---  g/dL         albumin                       ysdb_internal
 BMI   ---  m2/kg        BMI                           ysdb_internal
 AAG   ---  mg/dL        alpha-1-acid glycoprotein     .            
 SCR   ---  mg/dL        serum creatinine              .            
 AST   ---  .            aspartate aminotransferase    .            
 ALT   ---  .            alanine aminotransferase      .            
 HT    ---  cm           height                        ysdb_internal
 CP    -d-  .            Child-Pugh score              look         
 TAFD  ---  hours        time after first dose         .            
 TAD   ---  hours        time after dose               .            
 LDOS  ---  mg           last dose amount              .            
 MDV   -d-  .            MDV                           ysdb_internal
 BLQ   -d-  .            below limit of quantification .            
 PHASE ---  .            study phase indicator         .            
 STUDY -d-  .            study number                  .            
 RF    cd-  .            renal function stage          .            

6.2 Use ys_add_labels

data <- ys_add_labels(data,spec)

It isn’t obvious that anything was done here

as_tibble(data)
# A tibble: 4,360 × 29
   C       NUM    ID  SUBJ  TIME   SEQ   CMT  EVID   AMT    DV   AGE    WT  CRCL
   <lgl> <int> <int> <int> <dbl> <int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
 1 NA        1     1     1  0        0     1     1     5   0    28.0  55.2  114.
 2 NA        2     1     1  0.61     1     2     0    NA  61.0  28.0  55.2  114.
 3 NA        3     1     1  1.15     1     2     0    NA  91.0  28.0  55.2  114.
 4 NA        4     1     1  1.73     1     2     0    NA 122.   28.0  55.2  114.
 5 NA        5     1     1  2.15     1     2     0    NA 126.   28.0  55.2  114.
 6 NA        6     1     1  3.19     1     2     0    NA  84.7  28.0  55.2  114.
 7 NA        7     1     1  4.21     1     2     0    NA  62.1  28.0  55.2  114.
 8 NA        8     1     1  5.09     1     2     0    NA  49.1  28.0  55.2  114.
 9 NA        9     1     1  6.22     1     2     0    NA  64.2  28.0  55.2  114.
10 NA       10     1     1  8.09     1     2     0    NA  59.6  28.0  55.2  114.
# … with 4,350 more rows, and 16 more variables: ALB <dbl>, BMI <dbl>,
#   AAG <dbl>, SCR <dbl>, AST <dbl>, ALT <dbl>, HT <dbl>, CP <int>, TAFD <dbl>,
#   TAD <dbl>, LDOS <int>, MDV <int>, BLQ <int>, PHASE <int>, STUDY <int>,
#   RF <chr>

How can you tell that the labels were added?

labs <- map(data, attr, "label")

labs[1:5]
$C
[1] "comment character"

$NUM
[1] "record number"

$ID
[1] "subject identifier"

$SUBJ
[1] "subject identifier"

$TIME
[1] "time after first dose"

Or do this

str(data)

6.3 Where does label come from?

Ideally, we’d like to be writing in a label entry for every column in the data set. You can set the ys.require.label option to TRUE to require this when loading the spec (an error will be generated).

But yspec has a function called ys_get_label() that will form a label for you. Here are the rules:

  1. If label exists for a column, it will be used
  2. Otherwise, if long is found and it is <= 40 characters, it be used
  3. Otherwise, short will be used; reminder that short defaults to the column name (col) too

Let’s look at some examples

ys_get_label(spec)[1:3]
$C
[1] "comment character"

$NUM
[1] "record number"

$ID
[1] "subject identifier"
ys_get_label(spec$NUM)
[1] "record number"
spec$NUM$label
NULL
spec$C$label
NULL

6.4 Custom label formation

Just as an example, we can add a custom labeling function. For example, I want the label to be the column name.

Set up a function that takes the column data as the first argument

label_fun <- function(x,...) x[["col"]]

Now, pass that function into ys_add_labels()

data <- ys_add_labels(data, spec, fun = label_fun)

And check the output

map(data, attr, "label")[1:5]
$C
[1] "C"

$NUM
[1] "NUM"

$ID
[1] "ID"

$SUBJ
[1] "SUBJ"

$TIME
[1] "TIME"

6.5 Extract the label field

Recall that the yspec object is just a list. We can always map across that list and grab the label field

map(spec, "label")[1:5]
$C
NULL

$NUM
NULL

$ID
NULL

$SUBJ
NULL

$TIME
[1] "time after first dose"