Skip to contents

Cleans data by:

  • formatting cameras and records tables: casting specified columns

  • splitting records data in records and cameras (if needed)

  • if only_shared_cam is TRUE: selecting the subset of cameras present in both records and cameras tables

  • if reorder is TRUE: moving columns in rec_type and cam_type to the beginning of the table.

Usage

clean_data(
  dat,
  cam_type = NULL,
  rec_type = NULL,
  only_shared_cam = FALSE,
  cam_col_dfrec = NULL,
  cam_col_dfcam = ifelse(only_shared_cam, cam_col_dfrec, NULL),
  split = FALSE,
  cam_cols = ifelse(split, cam_col_dfrec, NULL),
  reorder = FALSE,
  add_rowid = FALSE
)

Arguments

dat

The data to clean. It can be either a list with one component $data or a datapackage object (inheriting list). Either way, the data are in the $data slot with two components:

  • $deployments (cameras table)

  • $observations (records table)

cam_type

A named list containing the name of the function to cast types for the cameras table. It is used only if split = FALSE. If NULL, the cameras table will not be modified or its columns reordered. The list's names are the names of the columns to cast in dat$data$deployments. For details on the content of this list, see the documentation of the cast_columns function.

rec_type

A named list containing the name of the function to cast types for the records table. If split = TRUE, the type conversion is performed before the split: so future columns of the cameras table to cast should be in this list. If NULL, the records table will not be modified or its columns reordered. The list's names are the names of the columns to cast in dat$data$observations. For details on the content of this list, see the documentation of the cast_columns function.

only_shared_cam

Logical; restrict final data to shared cameras that are in dat$data$deployments and in dat$data$observations?

cam_col_dfrec

Name of the column with cameras names in records (needed only if split or only_shared_cam are TRUE)

cam_col_dfcam

Name of the column with cameras names in cameras (needed only if only_shared_cam is TRUE). Defaults to cam_col_dfrec if only_shared_cam is TRUE. If NULL will be assumed to be the same as cam_col_dfrec.

split

Logical; should the camera data be extracted from the records table by splitting the data?

cam_cols

A character vector of the columns in dfrec that should be moved to the dat$data$deployments dataframe if split = TRUE.

reorder

Reorder the columns indicated in cam_type or rec_type at the beginning of the table?

add_rowid

Should row IDs be added to the observations df? If yes, row names in the form of "ID_xx" are added to the the dataframe.

Value

An object of the same type as the original input, but where dat$data$deployments and dat$data$observations have been cleaned as described above.

Examples

# Create synthetic dataset
records <- data.frame(foo = 1:6,
                      species = c("pigeon", "mouse", "pigeon", "mouse", "mouse", "pigeon"),
                      date = c("2022-01-01", "2022-03-01", 
                               "2022-01-02", "2022-01-12", "2022-01-22",
                               "2022-01-03"),
                      time = c("10:22:01", "22:12:01",
                               "11:54:33", "07:14:38", "18:01:34", 
                               "12:11:34"),
                      camera = c("A", "A", "B", "B", "B", "C"))
cameras <- data.frame(camera = c("A", "B", "C"),
                      lat = c("20.12", "20.22", "22.34"),
                      lon = c("33.44", "33.45", "33.42"))
dat <- list(data = list(observations = records,
                        deployments = cameras))
                        
# Clean data
rec_type <- list(date = list("as.Date",
                             format = "%Y-%m-%d"),
                 time = "times")
cam_type <- list(lat = "as.numeric",
                 lon = "as.numeric")

# Clean data converts columns to the appropriate types 
# and reorders columns
clean_data(dat,
           rec_type = rec_type,
           cam_type = cam_type)
#> $data
#> $data$observations
#>   foo species       date     time camera
#> 1   1  pigeon 2022-01-01 10:22:01      A
#> 2   2   mouse 2022-03-01 22:12:01      A
#> 3   3  pigeon 2022-01-02 11:54:33      B
#> 4   4   mouse 2022-01-12 07:14:38      B
#> 5   5   mouse 2022-01-22 18:01:34      B
#> 6   6  pigeon 2022-01-03 12:11:34      C
#> 
#> $data$deployments
#>   camera   lat   lon
#> 1      A 20.12 33.44
#> 2      B 20.22 33.45
#> 3      C 22.34 33.42
#> 
#>