Cleans data by:
formatting cameras and records tables: casting specified columns
splitting records data in records and cameras (if needed)
if
only_shared_cam
isTRUE
: selecting the subset of cameras present in both records and cameras tablesif
reorder
isTRUE
: moving columns inrec_type
andcam_type
to the beginning of the table.
Arguments
- dat
The data to clean. It can be either a list with one component
$data
or adatapackage
object (inheriting list). Either way, the data are in the$data
slot with two components:$deployments
(cameras table)$observations
(records table)
- cam_type
A named list containing the name of the function to cast types for the cameras table. It is used only if
split = FALSE
. IfNULL
, the cameras table will not be modified or its columns reordered. The list's names are the names of the columns to cast indat$data$deployments
. For details on the content of this list, see the documentation of thecast_columns
function.- rec_type
A named list containing the name of the function to cast types for the records table. If
split = TRUE
, the type conversion is performed before the split: so future columns of the cameras table to cast should be in this list. IfNULL
, the records table will not be modified or its columns reordered. The list's names are the names of the columns to cast indat$data$observations
. For details on the content of this list, see the documentation of thecast_columns
function.- only_shared_cam
Logical; restrict final data to shared cameras that are in
dat$data$deployments
and indat$data$observations
?- cam_col_dfrec
Name of the column with cameras names in records (needed only if
split
oronly_shared_cam
areTRUE
)- cam_col_dfcam
Name of the column with cameras names in cameras (needed only if
only_shared_cam
isTRUE
). Defaults tocam_col_dfrec
ifonly_shared_cam
isTRUE
. IfNULL
will be assumed to be the same ascam_col_dfrec
.- split
Logical; should the camera data be extracted from the records table by splitting the data?
- cam_cols
A character vector of the columns in
dfrec
that should be moved to thedat$data$deployments
dataframe ifsplit = TRUE
.- reorder
Reorder the columns indicated in
cam_type
orrec_type
at the beginning of the table?- add_rowid
Should row IDs be added to the observations df? If yes, row names in the form of "ID_xx" are added to the the dataframe.
Value
An object of the same type as the original input,
but where dat$data$deployments
and dat$data$observations
have been
cleaned as described above.
Examples
# Create synthetic dataset
records <- data.frame(foo = 1:6,
species = c("pigeon", "mouse", "pigeon", "mouse", "mouse", "pigeon"),
date = c("2022-01-01", "2022-03-01",
"2022-01-02", "2022-01-12", "2022-01-22",
"2022-01-03"),
time = c("10:22:01", "22:12:01",
"11:54:33", "07:14:38", "18:01:34",
"12:11:34"),
camera = c("A", "A", "B", "B", "B", "C"))
cameras <- data.frame(camera = c("A", "B", "C"),
lat = c("20.12", "20.22", "22.34"),
lon = c("33.44", "33.45", "33.42"))
dat <- list(data = list(observations = records,
deployments = cameras))
# Clean data
rec_type <- list(date = list("as.Date",
format = "%Y-%m-%d"),
time = "times")
cam_type <- list(lat = "as.numeric",
lon = "as.numeric")
# Clean data converts columns to the appropriate types
# and reorders columns
clean_data(dat,
rec_type = rec_type,
cam_type = cam_type)
#> $data
#> $data$observations
#> foo species date time camera
#> 1 1 pigeon 2022-01-01 10:22:01 A
#> 2 2 mouse 2022-03-01 22:12:01 A
#> 3 3 pigeon 2022-01-02 11:54:33 B
#> 4 4 mouse 2022-01-12 07:14:38 B
#> 5 5 mouse 2022-01-22 18:01:34 B
#> 6 6 pigeon 2022-01-03 12:11:34 C
#>
#> $data$deployments
#> camera lat lon
#> 1 A 20.12 33.44
#> 2 B 20.22 33.45
#> 3 C 22.34 33.42
#>
#>