2 Get started – The yspec Book

2.1 Introduction

The yspec package will read your data specification file when it is written in a specific yaml format. Using the object created from this file, you can validate data assembly outputs, label the data frame to be written in sas xport format, create a define.pdf document, and more.

For each data set in your project that needs documentation, create a yaml file that lists the columns in the data set along with details about the data in that column. This yaml file can be loaded into your R session into an object that you can work with. This is referred to as a spec object. The term spec refers to a single documented data object / data file.

Once all of your data sets have been documented with their own yaml file, you can create another object called a yproj objects. This is used to template the rendering of a single integrated data definitions file for your entire project.

Keep reading the vignette to see how it works.

library(yspec)

2.2 Example spec file

An example specification file looks like:

specfile <- ys_help$file()

SETUP__:
  description: Example PopPK analysis data set
  sponsor: example-project
  projectnumber: EXAMPK1011F
  use_internal_db: true
  glue: 
    bmiunits: "kg/m$^2$"
  flags: 
    covariate: [AGE:SCR, HT, AST:ALT]
  lookup_file: "look.yml"
  extend_file: "analysis1-ext.yml"
C:
NUM:
ID:
SUBJ: !look
TIME: !look
  label: time after first dose
  unit: hour
SEQ: 
  label: data type
  values: {observation: 0, dose: 1}
CMT:
EVID:
  make_factor: true

Once the data specification yaml file is written, it can be loaded in R

spec <- ys_load(specfile)

head(spec)

     name info         unit              short        source
  1     C  cd-            .  comment character ysdb_internal
  2   NUM  ---            .      record number ysdb_internal
  3    ID  ---            . subject identifier ysdb_internal
  4  SUBJ  c--            . subject identifier ysdb_internal
  5  TIME  ---         hour               TIME          look
  6   SEQ  -d-            .                SEQ             .
  7   CMT  ---            . compartment number ysdb_internal
  8  EVID  -d-            .           event ID ysdb_internal
  9   AMT  ---           mg        dose amount ysdb_internal
  10   DV  --- micrograms/L dependent variable ysdb_internal

Data from specific columns can be printed

spec$WT

   name  value    
   col   WT       
   type  numeric  
   short weight   
   unit  kg       
   range 40 to 100

spec$EVID

 name  value          
 col   EVID           
 type  numeric        
 short event ID       
 value 0 : observation
       1 : dose

2.2.1 Items that you should be including for most columns

short: a short name for the column (e.g weight); this will default to the column name (col) … sometimes that makes sense
label: to be used to label the data set and to populate the define.pdf document (e.g. patient weight at baseline)
unit: when it’s appropriate (e.g. kg)
decode: for discrete data items (e.g. if SEX is values: [0,1] then then use decode: [male, female]; or include the make_factor: true field

2.3 Check a data set against the spec

Use the ys_check() function, with the data frame as the first argument and the spec object as the second argument

data <- ys_help$data()

ys_check(data, spec)

2.4 Example to render spec

The specification object can be rendered to a specification file with the ys_document function

ys_document(spec, stem = "working_document")

ys_document will pass along arguments to rmarkdown::render so that you can control those aspects of how the document is rendered. You can also create custom output formats to get the data table to render in the way that you like.

2.5 Example project object

To create an project-wide listing of documented data sets, we create a yproj or project object. We create this from the spec objects that we read about in the previous section. Let’s load another object to use along with the object loaded in the previous section.

pdspec <- load_spec_ex("DEM104101F_PKPD.yml")

Now, we have two objects to work with:

head(spec)

   name info         unit              short        source
1     C  cd-            .  comment character ysdb_internal
2   NUM  ---            .      record number ysdb_internal
3    ID  ---            . subject identifier ysdb_internal
4  SUBJ  c--            . subject identifier ysdb_internal
5  TIME  ---         hour               TIME          look
6   SEQ  -d-            .                SEQ             .
7   CMT  ---            . compartment number ysdb_internal
8  EVID  -d-            .           event ID ysdb_internal
9   AMT  ---           mg        dose amount ysdb_internal
10   DV  --- micrograms/L dependent variable ysdb_internal

head(pdspec)

   name info           unit       short source
1     C  c--              .           C      .
2   MDV  ---              .         MDV      .
3   SEQ  -d-              .         SEQ      .
4   AMT  ---             mg         AMT      .
5    II  ---          hours          II      .
6   CMT  ---              . Compartment      .
7  TAFD  ---          hours        TAFD      .
8    WT  ---             kg      Weight      .
9  EGFR  --- ml/min/1.73 m2        eGFR      .
10  SEX  -d-              .         SEX      .

We can create a project object from both objects

proj <- ys_project(spec,pdspec)

proj

projectnumber:  EXAMPK1011F 
sponsor:        example-project 
--------------------------------------------
datafiles: 
 name            description                       data_stem      
 analysis1       Example PopPK analysis data set   analysis1      
 DEM104101F_PKPD Population PKPD analysis data set DEM104101F_PKPD

2.6 Render a project file

2.6.1 Working document

To render the project file we’ll use the same ys_document() function.
This time, we’ll add some extra (optional) arguments that will help us get the document to look the way we want:

ys_document(
  proj, 
  stem = "project_document", 
  build_dir = definetemplate(),
  author = "Michelle Johnson", 
  title = "Analysis data specification"
)

Using the build_dir argument gets us the document rendered with Metrum Research Group branding. Also, author and title are passed into the configuration fields for this document.

2.6.2 Regulatory document

To get a document that is formatted according to FDA requirements, use:

ys_document(
  proj, 
  type = "regulatory",
  stem = "fda_document", 
  build_dir = definetemplate(),
  author = "Michelle Johnson", 
  title = "Analysis data specification"
)