Skip to contents

By stream we mean a list that pre-specifies the output file names, replicate numbers and possibly input objects for a simulation. Passing locker initiates a call to setup_locker(), which sets up or resets the output directories. It is the responsibility of the user to take advantage of the features provided by paquet to ensure the safety of outputs stored in locker space.

Usage

new_stream(x, ...)

# S3 method for list
new_stream(x, locker = NULL, format = NULL, ask = FALSE, noreset = FALSE, ...)

# S3 method for data.frame
new_stream(
  x,
  nchunk,
  cols = "ID",
  locker = NULL,
  format = NULL,
  ask = FALSE,
  noreset = FALSE,
  ...
)

# S3 method for numeric
new_stream(x, ...)

# S3 method for character
new_stream(x, ...)

Arguments

x

A list or vector to template the stream; for the numeric method, passing a single number will fill x with a sequence of that length.

...

Additional arguments passed to file_set().

locker

Passed to setup_locker() as dir; important to note that the directory will be unlinked if it exists and is an established locker directory.

format

Passed to format_stream().

ask

If TRUE, then config_locker() will be called on the locker space; once this is called, all future attempts to reset the locker contents will require user confirmation via utils::askYesNo(); the ask requirement can be revoked by calling config_locker().

noreset

If TRUE then config_locker() will be called on the locker directory with noreset = TRUE to prevent future resets; note that this is essentially a dead end; there is no way to make the locker space writable using public api; use this option if you really want to safeguard the output and assume complete control over the fate of these files.

nchunk

The number of chunks.

cols

The name(s) of the column(s) specifying unique IDs to use to split the data.frame into chunks; this could be a unique ID or a combination of columns that when pasted together form a unique ID.

Value

A list with the following elements:

  • i the position number

  • file the output file name

  • x the input object.

The list has class file_stream as well as locker_stream (if locker was passed) and a class attribute for the output if format was passed.

Details

All methods contain ask and noreset arguments which get passed to setup_locker(). Set ask to TRUE in order to require confirmation (using utils::askYesNo()) every time the command is run again; set noreset to TRUE to immediately revoke permission to reset the locker space. Be sure to consider using these options to prevent accidentally resetting the locker space.

For the data.frame method, the data are chunked into a list by columns listed in cols. Ideally, this is a single column that operates as a unique ID across the data set and is used by chunk_by_id() to form the chunks. Alternatively, cols can be multiple column names which are pasted together to form a unique ID that is used for splitting via chunk_by_cols().

Examples

x <- new_stream(3)
x[[1]]
#> $i
#> [1] 1
#> 
#> $file
#> [1] "1-3"
#> 
#> $x
#> [1] 1
#> 
#> attr(,"file_set_item")
#> [1] TRUE

new_stream(2, locker = file.path(tempdir(), "foo"))
#> [[1]]
#> [[1]]$i
#> [1] 1
#> 
#> [[1]]$file
#> [1] "/var/folders/5w/2ky5lwcj1zq7kyk4c3zg3zpw0000gp/T//Rtmpx1IZit/foo/1-2"
#> 
#> [[1]]$x
#> [1] 1
#> 
#> attr(,"file_set_item")
#> [1] TRUE
#> 
#> [[2]]
#> [[2]]$i
#> [1] 2
#> 
#> [[2]]$file
#> [1] "/var/folders/5w/2ky5lwcj1zq7kyk4c3zg3zpw0000gp/T//Rtmpx1IZit/foo/2-2"
#> 
#> [[2]]$x
#> [1] 2
#> 
#> attr(,"file_set_item")
#> [1] TRUE
#> 
#> attr(,"class")
#> [1] "file_stream"   "locker_stream" "list"         

df <- data.frame(ID = c(1,2,3,4))
x <- new_stream(df, nchunk = 2)
x[[2]]
#> $i
#> [1] 2
#> 
#> $file
#> [1] "2-2"
#> 
#> $x
#>   ID
#> 3  3
#> 4  4
#> 
#> attr(,"file_set_item")
#> [1] TRUE

format_is_set(x[[2]])
#> [1] FALSE

x <- new_stream(3, format = "fst")
format_is_set(x[[2]])
#> [1] TRUE