SETUP__:
description: PKPD analysis data set
use_internal_db: true
projectnumber: FOO123
sponsor: MetrumRG3.1 General considerations
yspec uses standard yaml syntax to state the data set column definitions.
Here are some important things to remember about yaml syntax as you code your data specification file
- Use double-quotes around any value you want to be a string
trueandyesby themselves will be rendered asTRUE; use"yes"if you need that word by itself as a value for a fieldfalseandnoby themselves will be returned asFALSE; use"no"if you need that word by itself as a value for a field- If a string starts with a piece of punctuation, make sure to put the entire string in double quotes
- Use
short: "> QL"notshort: > QL - Use
values: [".", C]notvalues: [.,C] - Use
address: "123 Main St."notaddress: 123 Main St. - Use
label: "line 1 \\n line 2"notlabel: line 1 \n line2
- Use
Instructions for including TeX in the yaml specification code are provided in a section below.
3.2 Organization
Save your data specification code in a file, typically with a .yaml or .yml file extension.
At the top of the file, include a block called SETUP__:; this is where the data set meta data is stored. For example
See the details below for other files that can be included here.
Next, list each data set column in order, with the data column name starting in the first column and ending with a colon. For example:
WT:
short: weight
unit: kg
range: [50, 150]This specifies a “short” name for this column as well as a unit and a range. A complete listing is provided below.
You can see an fully worked example by running
ys_help$yaml()See the ?ys_help help topic for more information.
Or, you can export a collection of package assets with this command
ys_help$export(output="assets")See the [ys_help] topic for more information.
3.3 SETUP__ specification fields
description:<chr>; a short, label-like description of the data setprojectnumber:<chr>the project reference number; may be incorporated into rendered define documents; when the project number is given in the first yspec object in a project object, that project number will be rendered in the project-wide define documentsponsor:<chr>the project sponsor; when the project sponsor is given in the first yspec object in a project object, that project sponsor name will be rendered in the project-wide define documentdata_path:<chr>; a path locating the data set associated with the specdata_stem:<chr>; the stem (no extension) for the data set associated with the spec; usually the stem of the data file is the same as the stem of the spec, but they can also be differentlookup_file:<chr>; a yaml array of other yaml files where yspec will look for column lookup informationuse_internal_db:<logical> (true/false); iftrue, then yspec will load the internal column lookup databaseimport:<chr>; give the name of a<file>to import into the current data spec; all columns from<file>are imported as is; additional columns may also be listed with the normal syntax and these columns will appear after the imported columnscharacter_last:<logical>; if true, automatically push all non-numeric columns to the end of the data specification listcomment_col:<chr>; identify the column that is used to store comments; the comment column will not be pushed to the back whencharacter_lastis trueglue:<map>; specify name/value pairs; in the yaml data specification, use<<name>>in the text andvaluewill glued into the text after it has been sanitized; intended use is to allow LaTeX code to evade the sanitizermax_nchar_label:integer; the maximum number of characters allowed in thelabelfieldmax_nchar_short:integer; the maximum number of characters allowed in theshortfieldmax_nchar_col:integer; the maximum number of characters allowed in the data set column nameflags:<map>; for each key, an array of column names where a logical data item will be set in thedotslistextend_file:<character>; the name of another data definitions file in the same directory that can be used to extend the current data specification object
3.4 Data column specification fields
short: short-name- a short name for the column; don’t include unit here
unit: numeric- the unit of measure
range: [min-value, max-value]- indicates continuous data
values: [val1, val2, valn]- specify each valid value
- indicates discrete data
values: {decode1: val1, decode22: val2}- put the decode into the specification of values using a default yaml map structure (the decode is to the left of the
:) - Note the curly brackets, not square brackets
- put the decode into the specification of values using a default yaml map structure (the decode is to the left of the
decode: [decode1, decode2, decode3]- separate the
decodefrom thevaluesspecification - see example below for clearer way to input very long decodes
- separate the
longvalues: true- print the values in a (long)
yaml-formatted list
- print the values in a (long)
comment: just whatever you want to saycomment: > say something on multiple lines of textsource: ADSL.xpt- where the data came from
- include both the sdtm domain and variable name
about: [short-name, unit]- this is a convenience structure
label: a label for the column; the label must be 40 or fewer characters and will get written into the define file as well as the data frame prior to writing out to sas xport formatlong: a longer name to describe the columndots:- a named list of whatever you want to carry along in the object; the
dotslist isn’t used by any rendering function in the yspec package, but might be used by a custom rendering function
- a named list of whatever you want to carry along in the object; the
axis:- a short-ish name that can be used for axis titles for plots
- generally, don’t include unit; yspec helpers will add that automatically by default
- if
shortwill work for your axis title (as it is … with no modification), yspec will use that if noaxisfield is used
type:- can be
numeric,character, orinteger - this is optional; the default is
numeric
- can be
make_factor: iftrue, then the column will be able to be converted to a factor regardless of whetherdecodeis included or notlookup:- logical; if
truethen the definition for the column is looked up in thelookup_files(specified inSETUP__:) - use the
!lookhandler to indicate lookup
- logical; if
3.5 Namespaces
Namespaces are alternative representation of certain column data fields
unitshortlabellongdecodecomment
You can create namespaces by attaching a .<name> suffix to eligible fields.
For example, we can create a “tex” representation for unit like this
DV:
short: dependent variable
unit: "microgram/mL"
unit.tex: "$\\mu$g/mL"Here, the unit: entry states the value for unit in the base namespace, the default data you get on load. Using unit.tex: introduces an entry for the tex namespace. After loading the spec, you can change to this namespace using
spec <- ys_load(...)
spec_tex <- ys_namespace(spec, "tex") Any time you attach a .<name> suffix to a field, yspec will interpret that as an attempt to enter namespace data. The user is responsible for creating and organizing namespaces and naming them. yspec will create the base namespace. Also, when rendering a data specification document, yspec will attempt to switch to the tex namespace if it exists. Beyond that, yspec is agnostic to the names of the namespaces you create.
As another example, we can have alternate short names depending on whether or not we are using that name to create axis titles for a plot
EGFR:
short: estimated creatinine clearance
short.plot: eGFRor decode
SEX:
values: [0, 1]
decode: [male, female]
decode.letter: [m, f]3.6 Defaults
- If
typeis not given, then it will default tonumeric
3.7 Examples
3.7.1 Continuous values
- The
aboutarray provides a short name and unit - Any time
rangeis given, the data is assumed to be continuous
WT:
about: [weight, kg]
range: [5, 300]This is equivalent to
WT:
short: weight
unit: kg
range: [5,300]3.7.2 Character data
- Using
valuesindicates discrete data
RACE:
values: [White, Black, Native American, Other]Any other array input structure can be used. For example
RACE:
values:
- White
- Black
- Native American
- OtherBy default, values are printed as comma-separated list. To get them to print in long format
RACE:
values: [White, Black, Native American, Other]
longvalues: true3.7.3 Discrete data with decode
Method 1
- Notice that the yaml key can only be simple character value
- Also, we use curly braces to specify a list like this
- Finally it is a
:that separates decode (on the left) and the value (on the right).
SEX:
values: {dude: 0, gal: 1}Special handlers are available that add some flexibility to this value / decode specification.
The !value:decode handler allows you to put the value on the left and decode on the right
SEX:
values: !value:decode
0 : dude
1 : galThe default behavior can be achieved with
SEX:
values: !value:decode
dude: 0
gal: 1The handlers also allow associating multiple values with a single decode
To get multiple values with the same decode
STUDY:
values: !decode:value
phase 1 : [101, 102, 103]
phase 2 : [201, 202, 203]
phase 3 : [301, 302, 303]Method 2
- These are more complicated decodes
- Put the
valuesanddecodein brackets (array)
BQL:
values: [0,1]
decode: [not below quantitation limit, below quantitation limit]Method 3 Really, it’s the same as method 2, but easier to type and read when the decode gets really long
BQL:
values: [0, 1]
decode:
- not below the quantitation limit of 2 ng/ml
- below the quantitation limit of 2 ng/ml3.7.4 Look up column definition
Either fill in the lookup field or use the !look handler
CMT:
lookup: trueCMT: !lookYou can also give the column name to import
HT:
lookup: HT_INCHESIn this example, there would be a column called HT_INCHES in the lookup file that would be imported under the name HT.
3.7.5 Include TeX in data specification document
Most define documents get rendered via xtable and the text gets processed by a sanitize function. yspec implements a custom sanitize function called ys_sanitize(), which is similar to xtable::sanitize, but whitelists some symbols so they do not get sanitized.
To protect TeX code from the sanitizer, first create a field in SETUP__ called glue with a map between a name and some corresponding TeX code. In the following example, we with to write \(\mu\)g/L, so we create a name called mugL and map it to $\\mu$g/L:
SETUP__:
glue: {mugL: "$\\mu$g/L"}Once the map is in place, we can write the data set column definition like this:
DV:
unit: "<<mugL>>"When the table for the define document is rendered, first the sanitizer will run, but it won’t find anything in the unit field for the DV column. Then yspec will call glue() and replace <<mugL>> with $\\mu%g/L.
Notice that we put all of the values in quotes; this is good practice to ensure that yaml will parse the value as a character data item when reading in the spec.
3.7.6 flags
The flags section in SETUP__: is available for you to name sets of columns in the work in spec. For example, the following code defines a flag called covariate and it names three columns (WT, AGE, and CRCL) to carry this tag
SETUP__:
flags:
covariate: [WT, AGE, CRCL]When yspec loads a yaml file that contains flags, it will go into every column in the spec and add a logical flag in dots that indicates whether or not that column is a member of that covariate set. For this example, all columns in the spec will have dots$covariate set to FALSE except for WT, AGE, and CRCL where it will be set to TRUE.
The user can appear to this information when filtering the spec. Filtering like this will return a yspec object containing only WT, AGE, and CRCL.
ys_filter(spec, covariate)Note that this flagging process will not overwrite a flag that the user already set in a specific column. In this example, AGE will not be flagged as a covariate, but WT and CRCL will.
SETUP__:
flags:
covariate: [WT, AGE, CRCL]
WT:
short: weight
AGE:
short: age
dots: {covariate: false}
CRCL:
short: creatinine clearanceIt’s recommended that flags are given in the SETUP__ information only, but the user can override as needed.