By stream we mean a list that pre-specifies the output file names,
replicate numbers and possibly input objects for a simulation. Passing
locker initiates a call to setup_locker(), which sets up or resets
the output directories. It is the responsibility of the user to take
advantage of the features provided by paquet to ensure the safety of outputs
stored in locker space.
Usage
new_stream(x, ...)
# S3 method for list
new_stream(x, locker = NULL, format = NULL, ask = FALSE, noreset = FALSE, ...)
# S3 method for data.frame
new_stream(
x,
nchunk,
cols = "ID",
locker = NULL,
format = NULL,
ask = FALSE,
noreset = FALSE,
...
)
# S3 method for numeric
new_stream(x, ...)
# S3 method for character
new_stream(x, ...)Arguments
- x
A list or vector to template the stream; for the
numericmethod, passing a single number will fillxwith a sequence of that length.- ...
Additional arguments passed to
file_set().- locker
Passed to
setup_locker()asdir; important to note that the directory will be unlinked if it exists and is an established locker directory.- format
Passed to
format_stream().- ask
If
TRUE, thenconfig_locker()will be called on the locker space; once this is called, all future attempts to reset the locker contents will require user confirmation viautils::askYesNo(); theaskrequirement can be revoked by callingconfig_locker().- noreset
If
TRUEthenconfig_locker()will be called on the locker directory withnoreset = TRUEto prevent future resets; note that this is essentially a dead end; there is no way to make the locker space writable using public api; use this option if you really want to safeguard the output and assume complete control over the fate of these files.- nchunk
The number of chunks.
- cols
The name(s) of the column(s) specifying unique IDs to use to split the
data.frameinto chunks; this could be a uniqueIDor a combination of columns that when pasted together form a unique ID.
Value
A list with the following elements:
ithe position numberfilethe output file namexthe input object.
The list has class file_stream as well as locker_stream (if locker was
passed) and a class attribute for the output if format was passed.
Details
All methods contain ask and noreset arguments which get passed to
setup_locker(). Set ask to TRUE in order to require confirmation
(using utils::askYesNo()) every time the command is run again; set
noreset to TRUE to immediately revoke permission to reset the locker
space. Be sure to consider using these options to prevent accidentally
resetting the locker space.
For the data.frame method, the data are chunked into a list by columns
listed in cols. Ideally, this is a single column that operates as
a unique ID across the data set and is used by chunk_by_id() to
form the chunks. Alternatively, cols can be multiple column names which
are pasted together to form a unique ID that is used for splitting
via chunk_by_cols().
Examples
x <- new_stream(3)
x[[1]]
#> $i
#> [1] 1
#>
#> $file
#> [1] "1-3"
#>
#> $x
#> [1] 1
#>
#> attr(,"file_set_item")
#> [1] TRUE
new_stream(2, locker = file.path(tempdir(), "foo"))
#> [[1]]
#> [[1]]$i
#> [1] 1
#>
#> [[1]]$file
#> [1] "/var/folders/5w/2ky5lwcj1zq7kyk4c3zg3zpw0000gp/T//Rtmpx1IZit/foo/1-2"
#>
#> [[1]]$x
#> [1] 1
#>
#> attr(,"file_set_item")
#> [1] TRUE
#>
#> [[2]]
#> [[2]]$i
#> [1] 2
#>
#> [[2]]$file
#> [1] "/var/folders/5w/2ky5lwcj1zq7kyk4c3zg3zpw0000gp/T//Rtmpx1IZit/foo/2-2"
#>
#> [[2]]$x
#> [1] 2
#>
#> attr(,"file_set_item")
#> [1] TRUE
#>
#> attr(,"class")
#> [1] "file_stream" "locker_stream" "list"
df <- data.frame(ID = c(1,2,3,4))
x <- new_stream(df, nchunk = 2)
x[[2]]
#> $i
#> [1] 2
#>
#> $file
#> [1] "2-2"
#>
#> $x
#> ID
#> 3 3
#> 4 4
#>
#> attr(,"file_set_item")
#> [1] TRUE
format_is_set(x[[2]])
#> [1] FALSE
x <- new_stream(3, format = "fst")
format_is_set(x[[2]])
#> [1] TRUE