yspec uses standard yaml syntax to state the data set column definitions.
NOTES
true
and yes
by themselves will be
rendered as TRUE
; use "yes"
if you need that
word by itself as a value for a fieldfalse
and no
by themselves will be
returned as FALSE
; use "no"
if you need that
word by itself as a value for a fieldshort: "> QL"
not
short: > QL
values: [".", C]
not
values: [.,C]
address: "123 Main St."
not
address: 123 Main St.
label: "line 1 \\n line 2"
not
label: line 1 \n line2
Instructions for including TeX in the yaml specification code are provided in a section below.
Save your data specification code in a file, typically with a
.yaml
file extension.
At the top of the file, include a block called SETUP__:
;
this is where the data set meta data is stored. For example
SETUP__:
description: PKPD analysis data set
use_internal_db: true
projectnumber: FOO123
sponsor: MetrumRG
See the details below for other files that can be included here.
Next, list each data set column in order, with the data column name starting in the first column and ending with a colon. For example:
This specifies a “short” name for this column as well as a unit and a range. A complete listing is provided below.
You can see an fully worked example by running
ys_help$yaml()
See the ?ys_help
help topic for more information.
Or, you can export a collection of package assets with this command
ys_help$export(output="assets")
See the [ys_help] topic for more information.
SETUP__
specification fields
description
: <chr>
; a short,
label-like description of the data setprojectnumber
: <chr>
the project
reference number; may be incorporated into rendered define documents;
when the project number is given in the first yspec object in a project
object, that project number will be rendered in the project-wide define
documentsponsor
: <chr>
the project sponsor;
when the project sponsor is given in the first yspec object in a project
object, that project sponsor name will be rendered in the project-wide
define documentdata_path
: <chr>
; a path locating
the data set associated with the specdata_stem
: <chr>
; the stem (no
extension) for the data set associated with the spec; usually the stem
of the data file is the same as the stem of the spec, but they can also
be differentlookup_file
: <chr>
; a yaml array of
other yaml files where yspec will look for column lookup
informationuse_internal_db
:
<logical> (true/false)
; if true
, then
yspec will load the internal column lookup databaseimport
: <chr>
; give the name of a
<file>
to import into the current data spec; all
columns from <file>
are imported as is; additional
columns may also be listed with the normal syntax and these columns will
appear after the imported columnscharacter_last
: <logical>
; if true,
automatically push all non-numeric columns to the end of the data
specification listcomment_col
: <chr>
; identify the
column that is used to store comments; the comment column will not be
pushed to the back when character_last
is trueglue
: <map>
; specify name/value
pairs; in the yaml data specification, use
<<name>>
in the text and value
will glued into the text after it has been sanitized; intended use is to
allow LaTeX code to evade the sanitizermax_nchar_label
: integer
; the maximum
number of characters allowed in the label
fieldmax_nchar_short
: integer
; the maximum
number of characters allowed in the short
fieldmax_nchar_col
: integer
; the maximum number
of characters allowed in the data set column nameflags
: <map>
; for each key, an array
of column names where a logical data item will be set in the
dots
listshort: short-name
unit: numeric
range: [min-value, max-value]
values: [val1, val2, valn]
values: {decode1: val1, decode22: val2}
:
)decode: [decode1, decode2, decode3]
decode
from the values
specificationlongvalues: true
yaml
-formatted listcomment: just whatever you want to say
comment: > say something on multiple lines of text
source: ADSL.xpt
about: [short-name, unit]
label
: a label for the column; the label must be 40 or
fewer characters and will get written into the define file as well as
the data frame prior to writing out to sas xport formatlong: a longer name to describe the column
dots
:
dots
list isn’t used by any rendering function in the yspec
package, but might be used by a custom rendering functionaxis
:
short
will work for your axis title (as it is … with
no modification), yspec will use that if no axis
field is
usedtype
:
numeric
, character
, or
integer
numeric
make_factor
: if true
, then the column will
be able to be converted to a factor regardless of whether
decode
is included or notlookup
:
true
then the definition for the column is
looked up in the lookup_files
(specified in
SETUP__:
)!look
handler to indicate lookupNamespaces are alternative representation of certain column data fields
unit
short
label
long
decode
comment
You can create namespaces by attaching a .<name>
suffix to eligible fields.
For example, we can create a “tex” representation for
unit
like this
Here, the unit:
entry states the value for unit in the
base
namespace, the default data you get on load. Using
unit.tex:
introduces an entry for the tex
namespace. After loading the spec, you can change to this namespace
using
spec <- ys_load(...)
spec_tex <- ys_namespace(spec, "tex")
Any time you attach a .<name>
suffix to a field,
yspec
will interpret that as an attempt to enter namespace
data. The user is responsible for creating and organizing namespaces and
naming them. yspec
will create the base
namespace. Also, when rendering a data specification document,
yspec
will attempt to switch to the tex
namespace if it exists. Beyond that, yspec
is agnostic to
the names of the namespaces you create.
As another example, we can have alternate short
names
depending on whether or not we are using that name to create axis titles
for a plot
or decode
type
is not given, then it will default to
numeric
about
array provides a short name and unitrange
is given, the data is assumed to be
continuousThis is equivalent to
values
indicates discrete dataAny other array input structure can be used. For example
By default, values
are printed as comma-separated list.
To get them to print in long format
Method 1
:
that separates decode (on the left)
and the value (on the right).Special handlers are available that add some flexibility to this value / decode specification.
The !value:decode
handler allows you to put the value on
the left and decode on the right
The default behavior can be achieved with
The handlers also allow associating multiple values with a single decode
To get multiple values with the same decode
STUDY:
values: !decode:value
phase 1 : [101, 102, 103]
phase 2 : [201, 202, 203]
phase 3 : [301, 302, 303]
Method 2
values
and deode
in brackets
(array)Method 3 Really, it’s the same as method 2, but easier to type and read when the decode gets really long
Either fill in the lookup
field or use the
!look
handler
You can also give the column name to import
In this example, there would be a column called
HT_INCHES
in the lookup file that would be imported under
the name HT
.
Most define documents get rendered via xtable
and the
text gets processed by a sanitize function. yspec implements a custom
sanitize function called ys_sanitize()
, which is similar to
xtable::sanitize
, but whitelists some symbols so they do
not get sanitized.
To protect TeX code from the sanitizer, first create a field in
SETUP__
called glue
with a map between a name
and some corresponding TeX code. In the following example, we with to
write
g/L,
so we create a name called mugL
and map it to
$\\mu$g/L
:
Once the map is in place, we can write the data set column definition like this:
When the table for the define document is rendered, first the
sanitizer will run, but it won’t find anything in the unit
field for the DV
column. Then yspec will call
glue()
and replace <<mugL>>
with
$\\mu%g/L
.
Notice that we put all of the values in quotes; this is good practice to ensure that yaml will parse the value as a character data item when reading in the spec.
flags
The flags
section in SETUP__:
is available
for you to name sets of columns in the work in spec. For example, the
following code defines a flag called covariate
and it names
three columns (WT
, AGE
, and CRCL
)
to carry this tag
When yspec loads a yaml file that contains flags
, it
will go into every column in the spec and add a logical flag in
dots
that indicates whether or not that column is a member
of that covariate set. For this example, all columns in the spec will
have dots$covariate
set to FALSE
except for
WT
, AGE
, and CRCL
where it will
be set to TRUE
.
The user can appear to this information when filtering the spec.
Filtering like this will return a yspec object containing only
WT
, AGE
, and CRCL
.
ys_filter(spec, covariate)
Note that this flagging process will not overwrite a flag that the
user already set in a specific column. In this example, AGE
will not be flagged as a covariate, but WT
and
CRCL
will.
SETUP__:
flags:
covariate: [WT, AGE, CRCL]
WT:
short: weight
AGE:
short: age
dots: {covariate: false}
CRCL:
short: creatinine clearance
It’s recommended that flags
are given in the
SETUP__
information only, but the user can override as
needed.