This is a vignette to help you become familiar with using
assign_id(). Check out the
function documentation for additional background and
examples.
assign_id() helps you assign a unique numeric identifier
to each subject in your data set. Its purpose is to ensure the same
numeric identifier is consistently assigned to each subject throughout a
project.
Below, we will go through the two main use cases of
assign_id(). These are when you are assigning ID either for
the first time or according to an existing file.
We will use the example derived PK data set from
mrgda.
For the purposes of this vignette, a subject identifier column
USUBJID is created and the ID column is
removed.
When creating a new data set where IDs have not been assigned
previously, assign_id() only requires two arguments:
ID column to (.data)## ┌ ID Summary ───────────────────────────────────────────┐
## │ │
## │ Number of subjects detected and assigned IDs: 178 │
## │ │
## └───────────────────────────────────────────────────────┘
A summary is shown in the R console, detailing how many unique
subjects were assigned an ID. Notice as well that the only change to
pk_data is one additional column.
## [1] "Size of pk_data 2534 x 28"
## [1] "Size of data_w_id 2534 x 29"
When data sets are updated or new subsets are created,
assign_id() ensures the ID mapping is consistent
throughout. To do this, we can leverage the
.previously_derived_path argument.
First we will create a new data set pk_data_reordered
which has the same contents as pk_data but is arranged in a
different order.
## # A tibble: 6 × 1
## USUBJID
## <chr>
## 1 STUDY-X-1
## 2 STUDY-X-10
## 3 STUDY-X-11
## 4 STUDY-X-12
## 5 STUDY-X-13
## 6 STUDY-X-14
## # A tibble: 6 × 1
## USUBJID
## <chr>
## 1 STUDY-X-1
## 2 STUDY-X-2
## 3 STUDY-X-3
## 4 STUDY-X-4
## 5 STUDY-X-5
## 6 STUDY-X-6
However, when we inform assign_id() we want the ID
assignment to match that of data_w_id, then the mapping is
consistent.
data_w_id_reordered <-
pk_data_reordered %>%
assign_id(.previously_derived_path = lookup_path, .subject_col = "USUBJID")## ┌ ID Summary ───────────────────────────────────────────┐
## │ │
## │ Number of subjects detected and assigned IDs: 178 │
## │ │
## └───────────────────────────────────────────────────────┘
Notice that despite the sorting differences, both
data_w_id and data_w_id_reordered have the
same ID assignment.
## # A tibble: 6 × 2
## USUBJID ID
## <chr> <dbl>
## 1 STUDY-X-1 1
## 2 STUDY-X-2 2
## 3 STUDY-X-3 3
## 4 STUDY-X-4 4
## 5 STUDY-X-5 5
## 6 STUDY-X-6 6
## # A tibble: 6 × 2
## USUBJID ID
## <chr> <int>
## 1 STUDY-X-1 1
## 2 STUDY-X-2 2
## 3 STUDY-X-3 3
## 4 STUDY-X-4 4
## 5 STUDY-X-5 5
## 6 STUDY-X-6 6
In the event that there are new subjects in your data that were not present in the previously derived data set, then a new unique ID will be assigned to them.
assign_id() does this by finding the maximum ID present
in the previous data and setting all new ID’s higher than it. An example
of this is shown below.
old_data <- dplyr::tibble(USUBJID = c("A", "B", "C", "D"), ID = c(1, 2, 4, 19))
lookup_path <- paste0(tempfile(), ".csv")
mrgda:::write_csv_dots(
x = old_data,
file = lookup_path
)
data <- dplyr::tibble(USUBJID = c("E", "F", "G"))
assign_id(.data = data, .subject_col = "USUBJID", .previously_derived_path = lookup_path)## ┌ ID Summary ─────────────────────────────────────────┐
## │ │
## │ Number of subjects detected and assigned IDs: 3 │
## │ │
## └─────────────────────────────────────────────────────┘
## # A tibble: 3 × 2
## USUBJID ID
## <chr> <dbl>
## 1 E 20
## 2 F 21
## 3 G 22
Notice how the largest ID in the previous data set was 19. Since
subjects E, F and G were not
present in old_data their ID’s were all set to be larger
than 19.