# inline/lookup.yml
AMT:
short: dose amount
unit: nmol
type: numeric
AMTMG:
short: dose amount
unit: mg
type: numeric
WT:
short: patient weight
unit: lbs
This document shows how data set column definitions can be entered into a lookup file which can be accessed by multiple data specification files within a project. This document also discusses an internal lookup data base that is always available for individual data sets to look up standardized column information for commonly used data items in our workflow.
5 Create two files
5.1 The lookup file
The lookup file that is to be accessed by other data specification files is just another data specification file. For example, create a file called lookup.yml
and enter this information.
This information must be valid yspec data specification format and (generally) valid yaml.
5.2 The specification file
This is just the standard data specification file
# inline/look-spec-1.yml
SETUP__:
description: PKPD analysis data set
lookup_file: lookup.yml
C:
short: comment character
AMT: !look
Notice two things about this file: we included a lookup_file
section in the SETUP__
section and we referenced our lookup.yml
file. By default, yspec expects that the lookup file is in the same directory as the spec file. Also, in the AMT
column, we used the !look
handler to indicate that we wanted that data to be looked up.
Alternatively, we could just pass in empty data and yspec will assume that you want to try to look up that data
SETUP__:
description: PKPD analysis data set
lookup_file: lookup.yml
C:
short: comment character
AMT:
Finally, we can import a column from the lookup file under a new name in the working spec
SETUP__:
description: PKPD analysis data set
lookup_file: lookup.yml
C:
short: comment character
AMT:
lookup: AMTMG
In this snippet, we are asking for the AMTMG
column from the lookup and bringing it in as AMT
in the working spec.
6 Multiple lookup files
yspec can support multiple file locations in the lookup_file
field. When listing multiple files, you might have the same column in both files. In that case, the lookup file that you list first takes the priority.
For example, here’s a data specification file that references two lookup files
# inline/spec-multi-lookup.yml
SETUP__:
lookup_file: ["lookup-multi-2.yml", "lookup-multi-1.yml"]
A:
lookup: true
B:
lookup: true
There isn’t much to this file: we have two colunms (A
and B
) and the both inherit all the information from a lookup file. If any column appears in both files, then the data from lookup-multi-2.yml
takes precedence.
The lookup specification code for the first lookup file is
# inline/lookup-multi-2.yml
A:
short: Apple
The lookup specification file for the second lookup file is
# inline/lookup-multi-1.yml
A:
short: letter A
B:
short: letter B
We can list both files in the lookup_file
field, putting the highest priority file first.
Another way to say this is that the first file gets the last word
We load the main data specification file as usual to verify how the lookup worked
<- ys_load("inline/spec-multi-lookup.yml")
spec spec
name info unit short source
A --- . Apple lookup-multi-2
B --- . letter B lookup-multi-1
In this example, the A
column was in both lookups, but it was lookup-multi-2.yml
(listed first) that was found in the lookup process.
7 Internal lookup data base
There is an internal data base of common data set columns that yspec will attach by default. So, with no lookup file defined, we could write the following in our specification file
# inline/look-spec-2.yml
SETUP__:
description: PKPD analysis data set
use_internal_db: true
C:
AMT:
MDV:
EVID:
WT:
EGFR:
ALB: !look
ZIP_CODE:
values: 55378
We can read this data in and have the columns defined
library(yspec)
library(dplyr)
<- ys_load("inline/look-spec-2.yml")
spec
spec
name info unit short source
C cd- . comment character ysdb_internal
AMT --- . dose amount ysdb_internal
MDV -d- . MDV ysdb_internal
EVID -d- . event ID ysdb_internal
WT --- kg weight ysdb_internal
EGFR --- ml/min/1.73m2 eGFR ysdb_internal
ALB --- g/dL albumin ysdb_internal
ZIP_CODE --- . ZIP_CODE .
8 Tracking the lookup status of each column
This all can get confusing about where each column is coming from. You can audit the spec object and find you where a lookup event happened
ys_lookup_source(spec)
# A tibble: 8 × 2
col lookup_source
<chr> <chr>
1 C ysdb_internal.yml
2 AMT ysdb_internal.yml
3 MDV ysdb_internal.yml
4 EVID ysdb_internal.yml
5 WT ysdb_internal.yml
6 EGFR ysdb_internal.yml
7 ALB ysdb_internal.yml
8 ZIP_CODE look-spec-2.yml
Here, we can see that most of the columns came from the internal data base and that the one column (ZIP_CODE
) came by our own specification.
You can also re-create the lookup object (just a named list) for a specification object. Just click open the arrow to see the output.
ys_get_lookup(spec) %>% glimpse()
List of 23
$ C :List of 6
..$ short : chr "comment character"
..$ values : chr [1:2] "." "C"
..$ decode : chr [1:2] "analysis row" "commented row"
..$ type : chr "character"
..$ col : chr "C"
..$ lookup_source: chr "ysdb_internal.yml"
$ ID :List of 4
..$ short : chr "subject identifier"
..$ type : chr "numeric"
..$ col : chr "ID"
..$ lookup_source: chr "ysdb_internal.yml"
$ USUBJID:List of 4
..$ short : chr "unique subject identifier"
..$ type : chr "character"
..$ col : chr "USUBJID"
..$ lookup_source: chr "ysdb_internal.yml"
$ SUBJ :List of 4
..$ short : chr "subject identifier"
..$ type : chr "character"
..$ col : chr "SUBJ"
..$ lookup_source: chr "ysdb_internal.yml"
$ STUDYID:List of 4
..$ short : chr "study identifier"
..$ type : chr "character"
..$ col : chr "STUDYID"
..$ lookup_source: chr "ysdb_internal.yml"
$ CMT :List of 4
..$ short : chr "compartment number"
..$ type : chr "numeric"
..$ col : chr "CMT"
..$ lookup_source: chr "ysdb_internal.yml"
$ EVID :List of 4
..$ short : chr "event ID"
..$ values :List of 2
.. ..$ observation: int 0
.. ..$ dose : int 1
..$ col : chr "EVID"
..$ lookup_source: chr "ysdb_internal.yml"
$ AMT :List of 4
..$ short : chr "dose amount"
..$ type : chr "numeric"
..$ col : chr "AMT"
..$ lookup_source: chr "ysdb_internal.yml"
$ RATE :List of 4
..$ short : chr "infusion rate"
..$ type : chr "numeric"
..$ col : chr "RATE"
..$ lookup_source: chr "ysdb_internal.yml"
$ II :List of 4
..$ short : chr "inter-dose interval"
..$ type : chr "numeric"
..$ col : chr "II"
..$ lookup_source: chr "ysdb_internal.yml"
$ SS :List of 5
..$ short : chr "steady state indicator"
..$ values : int [1:2] 0 1
..$ decode : chr [1:2] "non-steady state indicator" "steady state indicator"
..$ col : chr "SS"
..$ lookup_source: chr "ysdb_internal.yml"
$ MDV :List of 6
..$ values :List of 2
.. ..$ non-missing: int 0
.. ..$ missing : int 1
..$ type : chr "numeric"
..$ long : chr "missing DV indicator"
..$ comment : chr "per NONMEM specifications"
..$ col : chr "MDV"
..$ lookup_source: chr "ysdb_internal.yml"
$ DV :List of 4
..$ short : chr "dependent variable"
..$ type : chr "numeric"
..$ col : chr "DV"
..$ lookup_source: chr "ysdb_internal.yml"
$ WT :List of 5
..$ short : chr "weight"
..$ unit : chr "kg"
..$ type : chr "numeric"
..$ col : chr "WT"
..$ lookup_source: chr "ysdb_internal.yml"
$ EGFR :List of 5
..$ short : chr "eGFR"
..$ long : chr "estimated glomerular filtration rate"
..$ unit : chr "ml/min/1.73m2"
..$ col : chr "EGFR"
..$ lookup_source: chr "ysdb_internal.yml"
$ BMI :List of 5
..$ long : chr "body mass index"
..$ unit : chr "m2/kg"
..$ type : chr "numeric"
..$ col : chr "BMI"
..$ lookup_source: chr "ysdb_internal.yml"
$ HT :List of 5
..$ about : chr [1:2] "height" "cm"
..$ long : chr "Height"
..$ type : chr "numeric"
..$ col : chr "HT"
..$ lookup_source: chr "ysdb_internal.yml"
$ ALB :List of 6
..$ long : chr "serum albumin"
..$ unit : chr "g/dL"
..$ short : chr "albumin"
..$ type : chr "numeric"
..$ col : chr "ALB"
..$ lookup_source: chr "ysdb_internal.yml"
$ AGE :List of 4
..$ about : chr [1:2] "age" "years"
..$ type : chr "numeric"
..$ col : chr "AGE"
..$ lookup_source: chr "ysdb_internal.yml"
$ SEX :List of 3
..$ values :List of 2
.. ..$ male : int 0
.. ..$ female: int 1
..$ col : chr "SEX"
..$ lookup_source: chr "ysdb_internal.yml"
$ NUM :List of 4
..$ short : chr "record number"
..$ type : chr "numeric"
..$ col : chr "NUM"
..$ lookup_source: chr "ysdb_internal.yml"
$ BQL :List of 5
..$ short : chr "data point below the LOQ"
..$ type : chr "numeric"
..$ values :List of 2
.. ..$ 0: chr "not below quantitation limit"
.. ..$ 1: chr "below quantitation limit"
..$ col : chr "BQL"
..$ lookup_source: chr "ysdb_internal.yml"
$ LOQ :List of 4
..$ short : chr "assay limit of quantification"
..$ type : chr "numeric"
..$ col : chr "LOQ"
..$ lookup_source: chr "ysdb_internal.yml"