The yspec
package will read your data specification file
when it is written in a specific yaml
format. Using the
object created from this file, you can validate data assembly outputs,
label the data frame to be written in sas xport format, create a
define.pdf
document, and more.
For each data set in your project that needs documentation, create a
yaml file that lists the columns in the data set along with details
about the data in that column. This yaml file can be loaded into your R
session into an object that you can work with. This is referred to as a
spec
object. The term spec
refers to a single
documented data object / data file.
Once all of your data sets have been documented with their own yaml
file, you can create another object called a yproj
objects.
This is used to template the rendering of a single integrated data
definitions file for your entire project.
Keep reading the vignette to see how it works.
An example specification file looks like:
SETUP__:
description: Example PopPK analysis data set
sponsor: example-project
projectnumber: EXAMPK1011F
use_internal_db: true
glue:
bmiunits: "kg/m$^2$"
flags:
covariate: [AGE:SCR, HT, AST:ALT]
lookup_file: "look.yml"
extend_file: "analysis1-ext.yml"
C:
NUM:
ID:
SUBJ: !look
TIME: !look
label: time after first dose
unit: hour
SEQ:
label: data type
values: {observation: 0, dose: 1}
CMT:
EVID:
make_factor: true
lookup: true
AMT: !look
unit: mg
DV: !look
unit: "micrograms/L"
Once the data specification yaml
file is written, it can
be loaded in R
spec <- ys_load(specfile)
spec
name info unit short source
C cd- . comment character ysdb_internal
NUM --- . record number ysdb_internal
ID --- . subject identifier ysdb_internal
SUBJ c-- . subject identifier ysdb_internal
TIME --- hour TIME look
SEQ -d- . SEQ .
CMT --- . compartment number ysdb_internal
EVID -d- . event ID ysdb_internal
AMT --- mg dose amount ysdb_internal
DV --- micrograms/L dependent variable ysdb_internal
AGE --- years age ysdb_internal
WT --- kg weight ysdb_internal
CRCL --- ml/min CRCL .
ALB --- g/dL albumin ysdb_internal
BMI --- m2/kg BMI ysdb_internal
AAG --- mg/dL alpha-1-acid glycoprotein .
SCR --- mg/dL serum creatinine .
AST --- . aspartate aminotransferase .
ALT --- . alanine aminotransferase .
HT --- cm height ysdb_internal
CP -d- . Child-Pugh score look
TAFD --- hours time after first dose .
TAD --- hours time after dose .
LDOS --- mg last dose amount .
MDV -d- . MDV ysdb_internal
BLQ -d- . below limit of quantification .
PHASE --- . study phase indicator .
STUDY -d- . study number .
RF cd- . renal function stage .
Data from specific columns can be printed
spec$WT
name value
col WT
type numeric
short weight
unit kg
range 40 to 100
or summarized
summary(spec, WT, DV, EGFR)
name info unit short source
1 C cd- . comment character ysdb_internal
2 NUM --- . record number ysdb_internal
3 ID --- . subject identifier ysdb_internal
4 SUBJ c-- . subject identifier ysdb_internal
5 TIME --- hour TIME look
6 SEQ -d- . SEQ .
7 CMT --- . compartment number ysdb_internal
8 EVID -d- . event ID ysdb_internal
9 AMT --- mg dose amount ysdb_internal
10 DV --- micrograms/L dependent variable ysdb_internal
11 AGE --- years age ysdb_internal
12 WT --- kg weight ysdb_internal
13 CRCL --- ml/min CRCL .
14 ALB --- g/dL albumin ysdb_internal
15 BMI --- m2/kg BMI ysdb_internal
16 AAG --- mg/dL alpha-1-acid glycoprotein .
17 SCR --- mg/dL serum creatinine .
18 AST --- . aspartate aminotransferase .
19 ALT --- . alanine aminotransferase .
20 HT --- cm height ysdb_internal
21 CP -d- . Child-Pugh score look
22 TAFD --- hours time after first dose .
23 TAD --- hours time after dose .
24 LDOS --- mg last dose amount .
25 MDV -d- . MDV ysdb_internal
26 BLQ -d- . below limit of quantification .
27 PHASE --- . study phase indicator .
28 STUDY -d- . study number .
29 RF cd- . renal function stage .
short
: a short name for the column (e.g
weight
); this will default to the column name
(col
) … sometimes that makes senselabel
: to be used to label the data set and to populate
the define.pdf
document
(e.g. patient weight at baseline
)unit
: when it’s appropriate (e.g. kg
)decode
: for discrete data items (e.g. if SEX is
values: [0,1]
then then use
decode: [male, female]
; or include the
make_factor: true
fieldUse the ys_check()
function, with the data frame as the
first argument and the spec object as the second argument
data <- ys_help$data()
ys_check(data, spec)
## The data set passed all checks.
The specification object can be rendered to a specification file with
the ys_document
function
ys_document(spec, stem = "working_document")
With output here.
ys_document
will pass along arguments to
rmarkdown::render
so that you can control those aspects of
how the document is rendered. You can also create custom output formats
to get the data table to render in the way that you like.
To create an project-wide listing of documented data sets, we create
a yproj
or project object. We create this from the spec
objects that we read about in the previous section. Let’s load another
object to use along with the object loaded in the previous section.
pdspec <- load_spec_ex("DEM104101F_PKPD.yml")
Now, we have two objects to work with:
head(spec)
## name info unit short source
## 1 C cd- . comment character ysdb_internal
## 2 NUM --- . record number ysdb_internal
## 3 ID --- . subject identifier ysdb_internal
## 4 SUBJ c-- . subject identifier ysdb_internal
## 5 TIME --- hour TIME look
## 6 SEQ -d- . SEQ .
## 7 CMT --- . compartment number ysdb_internal
## 8 EVID -d- . event ID ysdb_internal
## 9 AMT --- mg dose amount ysdb_internal
## 10 DV --- micrograms/L dependent variable ysdb_internal
head(pdspec)
## name info unit short source
## 1 C c-- . C .
## 2 MDV --- . MDV .
## 3 SEQ -d- . SEQ .
## 4 AMT --- mg AMT .
## 5 II --- hours II .
## 6 CMT --- . Compartment .
## 7 TAFD --- hours TAFD .
## 8 WT --- kg Weight .
## 9 EGFR --- ml/min/1.73 m2 eGFR .
## 10 SEX -d- . SEX .
We can create a project object from both objects
proj <- ys_project(spec,pdspec)
proj
## projectnumber: EXAMPK1011F
## sponsor: example-project
## --------------------------------------------
## datafiles:
## name description data_stem
## analysis1 Example PopPK analysis data set analysis1
## DEM104101F_PKPD Population PKPD analysis data set DEM104101F_PKPD
To render the project file we’ll use the same
ys_document()
function.
This time, we’ll add some extra (optional) arguments that will help us
get the document to look the way we want:
ys_document(
proj,
stem = "project_document",
build_dir = definetemplate(),
author = "Michelle Johnson",
title = "Analysis data specification"
)
Using the build_dir
argument gets us the document
rendered with Metrum Research Group branding. Also, author
and title
are passed into the configuration fields for this
document.
To get a document that is formatted according to FDA requirements, use:
ys_document(
proj,
type = "regulatory",
stem = "fda_document",
build_dir = definetemplate(),
author = "Michelle Johnson",
title = "Analysis data specification"
)