Cleans data by:
formatting cameras and records tables: casting specified columns
splitting records data in records and cameras (if needed)
if
only_shared_camisTRUE: selecting the subset of cameras present in both records and cameras tablesif
reorderisTRUE: moving columns inrec_typeandcam_typeto the beginning of the table.
Arguments
- dat
The data to clean. It can be either a list with one component
$dataor adatapackageobject (inheriting list). Either way, the data are in the$dataslot with two components:$deployments(cameras table)$observations(records table)
- cam_type
A named list containing the name of the function to cast types for the cameras table. It is used only if
split = FALSE. IfNULL, the cameras table will not be modified or its columns reordered. The list's names are the names of the columns to cast indat$data$deployments. For details on the content of this list, see the documentation of thecast_columnsfunction.- rec_type
A named list containing the name of the function to cast types for the records table. If
split = TRUE, the type conversion is performed before the split: so future columns of the cameras table to cast should be in this list. IfNULL, the records table will not be modified or its columns reordered. The list's names are the names of the columns to cast indat$data$observations. For details on the content of this list, see the documentation of thecast_columnsfunction.Logical; restrict final data to shared cameras that are in
dat$data$deploymentsand indat$data$observations?- cam_col_dfrec
Name of the column with cameras names in records (needed only if
splitoronly_shared_camareTRUE)- cam_col_dfcam
Name of the column with cameras names in cameras (needed only if
only_shared_camisTRUE). Defaults tocam_col_dfrecifonly_shared_camisTRUE. IfNULLwill be assumed to be the same ascam_col_dfrec.- split
Logical; should the camera data be extracted from the records table by splitting the data?
- cam_cols
A character vector of the columns in
dfrecthat should be moved to thedat$data$deploymentsdataframe ifsplit = TRUE.- reorder
Reorder the columns indicated in
cam_typeorrec_typeat the beginning of the table?- add_rowid
Should row IDs be added to the observations df? If yes, row names in the form of "ID_xx" are added to the the dataframe.
Value
An object of the same type as the original input,
but where dat$data$deployments and dat$data$observations have been
cleaned as described above.
Examples
# Create synthetic dataset
records <- data.frame(foo = 1:6,
species = c("pigeon", "mouse", "pigeon", "mouse", "mouse", "pigeon"),
date = c("2022-01-01", "2022-03-01",
"2022-01-02", "2022-01-12", "2022-01-22",
"2022-01-03"),
time = c("10:22:01", "22:12:01",
"11:54:33", "07:14:38", "18:01:34",
"12:11:34"),
camera = c("A", "A", "B", "B", "B", "C"))
cameras <- data.frame(camera = c("A", "B", "C"),
lat = c("20.12", "20.22", "22.34"),
lon = c("33.44", "33.45", "33.42"))
dat <- list(data = list(observations = records,
deployments = cameras))
# Clean data
rec_type <- list(date = list("as.Date",
format = "%Y-%m-%d"),
time = "times")
cam_type <- list(lat = "as.numeric",
lon = "as.numeric")
# Clean data converts columns to the appropriate types
# and reorders columns
clean_data(dat,
rec_type = rec_type,
cam_type = cam_type)
#> $data
#> $data$observations
#> foo species date time camera
#> 1 1 pigeon 2022-01-01 10:22:01 A
#> 2 2 mouse 2022-03-01 22:12:01 A
#> 3 3 pigeon 2022-01-02 11:54:33 B
#> 4 4 mouse 2022-01-12 07:14:38 B
#> 5 5 mouse 2022-01-22 18:01:34 B
#> 6 6 pigeon 2022-01-03 12:11:34 C
#>
#> $data$deployments
#> camera lat lon
#> 1 A 20.12 33.44
#> 2 B 20.22 33.45
#> 3 C 22.34 33.42
#>
#>
