This vignette demonstrates how the functions included in this package can be used to read and clean different data formats.

Write data in tempfile

# records and cameras in separate files ------------------------------------------
data(recordTableSample, package = "camtrapR")
data(camtraps, package = "camtrapR")

# Create subfolder
dir.create(paste0(tempdir(), "/csv"))

# Write files
recordfile <- paste0(tempdir(), "/csv/records.csv")
camtrapfile <- paste0(tempdir(), "/csv/camtraps.csv")

write.csv(recordTableSample, recordfile, 
          row.names = FALSE)
write.csv(camtraps, camtrapfile, 
          row.names = FALSE)
# records and cameras in same file ------------------------------------------
# Create file
recordcam <- recordTableSample |>
  dplyr::left_join(camtraps, by = "Station")

# Create subfolder
dir.create(paste0(tempdir(), "/csvcam"))

# Write file
recordcamfile <- paste0(tempdir(), "/csvcam/recordcam.csv")
write.csv(recordcam, recordcamfile, 
          row.names = FALSE)

Records and cameras in separate csv files

First, we see how data import and cleaning is performed with two csv files (records and cameras):

Read data

dat <- read_data(path_rec = recordfile,
                 path_cam = camtrapfile,
                 sep_rec = ",", sep_cam = ",")
head(dat$data$observations) |> 
Station Species DateTimeOriginal Date Time delta.time.secs delta.time.mins delta.time.hours delta.time.days Directory FileName
StationA PBE 2009-04-21 00:40:00 2009-04-21 00:40:00 0 0 0.0 0.0 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/PBE StationA__2009-04-21__00-40-00(1).JPG
StationA PBE 2009-04-22 20:19:00 2009-04-22 20:19:00 157140 2619 43.6 1.8 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/PBE StationA__2009-04-22__20-19-00(1).JPG
StationA PBE 2009-04-23 00:07:00 2009-04-23 00:07:00 13560 226 3.8 0.2 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/PBE StationA__2009-04-23__00-07-00(1).JPG
StationA PBE 2009-05-07 17:11:00 2009-05-07 17:11:00 1270920 21182 353.0 14.7 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/PBE StationA__2009-05-07__17-11-00(1).JPG
StationA VTA 2009-04-10 05:07:00 2009-04-10 05:07:00 0 0 0.0 0.0 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/VTA StationA__2009-04-10__05-07-00(1).JPG
StationA VTA 2009-05-06 19:06:00 2009-05-06 19:06:00 2296740 38279 638.0 26.6 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/VTA StationA__2009-05-06__19-06-00(1).JPG
head(dat$data$deployments) |> 
Station utm_y utm_x Setup_date Retrieval_date Problem1_from Problem1_to
StationA 604000 526000 02/04/2009 14/05/2009 NA NA
StationB 606000 523000 03/04/2009 16/05/2009 NA NA
StationC 607050 525000 04/04/2009 17/05/2009 12/05/2009 17/05/2009

The imported file is a list with one component $data containing 2 dataframes:

  • $observations contains the records
  • $deployments contains the cameras information

Clean data

This step ensures all columns have the desired type. It will also move these columns to the beginning of the table.

To cast data to the appropriate type, this function has two arguments, created below: rec_type (for the records table) and cam_type (for the cameras table).

rec_type <- list(Station = "as.character",
                 Date = list("as_date",
                             format = "%Y-%m-%d"),
                 Time = "times",
                 DateTimeOriginal = list("as.POSIXct",
                                         tz = "Etc/GMT-8"))

cam_type <- list(Station = "as.character",
                 Setup_date = list("as.Date",
                                   format = "%d/%m/%Y"), 
                 Retrieval_date = list("as.Date",
                                       format = "%d/%m/%Y"))

These lists contain the information about how to convert column types.

  • Values contain the casting function to apply (e.g. "as.Date" will translate to as.Date(x)). Values cal also be lists to provide additional arguments: for instance, list("as.Date", format = "%d/%m/%Y") will translate to as.Date(x, format = "%d/%m/%Y").
  • the names of the list give the corresponding column of the data that should be casted.
dat_clean <- clean_data(dat, 
                        rec_type = rec_type,
                        cam_type = cam_type)
head(dat_clean$data$observations) |> 
Station Species DateTimeOriginal Date Time delta.time.secs delta.time.mins delta.time.hours delta.time.days Directory FileName
StationA PBE 2009-04-21 00:40:00 2009-04-21 00:40:00 0 0 0.0 0.0 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/PBE StationA__2009-04-21__00-40-00(1).JPG
StationA PBE 2009-04-22 20:19:00 2009-04-22 20:19:00 157140 2619 43.6 1.8 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/PBE StationA__2009-04-22__20-19-00(1).JPG
StationA PBE 2009-04-23 00:07:00 2009-04-23 00:07:00 13560 226 3.8 0.2 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/PBE StationA__2009-04-23__00-07-00(1).JPG
StationA PBE 2009-05-07 17:11:00 2009-05-07 17:11:00 1270920 21182 353.0 14.7 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/PBE StationA__2009-05-07__17-11-00(1).JPG
StationA VTA 2009-04-10 05:07:00 2009-04-10 05:07:00 0 0 0.0 0.0 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/VTA StationA__2009-04-10__05-07-00(1).JPG
StationA VTA 2009-05-06 19:06:00 2009-05-06 19:06:00 2296740 38279 638.0 26.6 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/VTA StationA__2009-05-06__19-06-00(1).JPG
head(dat_clean$data$deployments) |> 
Station utm_y utm_x Setup_date Retrieval_date Problem1_from Problem1_to
StationA 604000 526000 2009-04-02 2009-05-14 NA NA
StationB 606000 523000 2009-04-03 2009-05-16 NA NA
StationC 607050 525000 2009-04-04 2009-05-17 12/05/2009 17/05/2009

In case cameras in records and in the cameras file do not match, clean_data has an option allowing to keep only shared cameras. We create a new dataset where the observations table has Stations A, B and C and the deployments table has stations B, C and D:

# Initialize new data
dat_diffcam <- dat

# Replace a camera in deployments
newcam <- dat_diffcam$data$deployments[1, ]
newcam$Station <- "StationD"

dat_diffcam$data$deployments <- dat_diffcam$data$deployments |> 
  filter(Station != "StationA") |> 

#> [1] "StationA" "StationB" "StationC"
#> [1] "StationB" "StationC" "StationD"

Cleaning the data will keep only data with cameras that are common between the two datasets (B and C);

dat_diffcam_clean <- clean_data(dat_diffcam, 
                                rec_type = rec_type,
                                cam_type = cam_type,
                                cam_col_dfrec = "Station",
                                cam_col_dfcam = "Station", 
                                only_shared_cam = TRUE)

#> [1] "StationB" "StationC"
#> [1] "StationB" "StationC"

Records and cameras in the same csv (1 csv file)

Then, we see how data import and cleaning is performed with a unique csv file containing records and cameras information:

Read data

dat <- read_data(path_rec = recordcamfile,
                 sep_rec = ",")
head(dat$data$observations) |> 
Station Species DateTimeOriginal Date Time delta.time.secs delta.time.mins delta.time.hours delta.time.days Directory FileName utm_y utm_x Setup_date Retrieval_date Problem1_from Problem1_to
StationA PBE 2009-04-21 00:40:00 2009-04-21 00:40:00 0 0 0.0 0.0 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/PBE StationA__2009-04-21__00-40-00(1).JPG 604000 526000 02/04/2009 14/05/2009 NA NA
StationA PBE 2009-04-22 20:19:00 2009-04-22 20:19:00 157140 2619 43.6 1.8 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/PBE StationA__2009-04-22__20-19-00(1).JPG 604000 526000 02/04/2009 14/05/2009 NA NA
StationA PBE 2009-04-23 00:07:00 2009-04-23 00:07:00 13560 226 3.8 0.2 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/PBE StationA__2009-04-23__00-07-00(1).JPG 604000 526000 02/04/2009 14/05/2009 NA NA
StationA PBE 2009-05-07 17:11:00 2009-05-07 17:11:00 1270920 21182 353.0 14.7 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/PBE StationA__2009-05-07__17-11-00(1).JPG 604000 526000 02/04/2009 14/05/2009 NA NA
StationA VTA 2009-04-10 05:07:00 2009-04-10 05:07:00 0 0 0.0 0.0 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/VTA StationA__2009-04-10__05-07-00(1).JPG 604000 526000 02/04/2009 14/05/2009 NA NA
StationA VTA 2009-05-06 19:06:00 2009-05-06 19:06:00 2296740 38279 638.0 26.6 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/VTA StationA__2009-05-06__19-06-00(1).JPG 604000 526000 02/04/2009 14/05/2009 NA NA
head(dat$data$deployments) |> 

Again the imported file is a list with one component $data:

  • $data$observations contains the cameras and records information
  • $data$deployments is NULL (because only one file was imported)

Clean data

In this step, will split the information from the observations table between observations and deployments. To do this, clean_data will move all columns listed in cam_cols in the deployments table. The column containing cameras IDs must be specified in the cam_col_dfrec argument (so that this column is kept in the observations table).

Since at the beginning, all columns are in the observations dataframe, the casting specifications should be in the rec_type argument.

cam_cols <- c("Station", "Setup_date", "Retrieval_date", 
              "utm_y", "utm_x", "Problem1_from", "Problem1_to")

rec_type2 <- list(Station = "as.character",
                  Date = list("as_date",
                             format = "%Y-%m-%d"),
                  Time = "times",
                  DateTimeOriginal = list("as.POSIXct",
                                         tz = "Etc/GMT-8"),
                  Setup_date = list("as.Date",
                                    format = "%d/%m/%Y"), 
                  Retrieval_date = list("as.Date",
                                        format = "%d/%m/%Y"),
                  Problem1_from = list("as.Date",
                                       format = "%d/%m/%Y"),
                  Problem1_to = list("as.Date",
                                     format = "%d/%m/%Y"))
dat_clean <- clean_data(dat, 
                        rec_type = rec_type2,
                        cam_col_dfrec = "Station",
                        cam_cols = cam_cols,
                        split = TRUE)
head(dat_clean$data$observations) |> 
Station Species DateTimeOriginal Date Time delta.time.secs delta.time.mins delta.time.hours delta.time.days Directory FileName
StationA PBE 2009-04-21 00:40:00 2009-04-21 00:40:00 0 0 0.0 0.0 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/PBE StationA__2009-04-21__00-40-00(1).JPG
StationA PBE 2009-04-22 20:19:00 2009-04-22 20:19:00 157140 2619 43.6 1.8 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/PBE StationA__2009-04-22__20-19-00(1).JPG
StationA PBE 2009-04-23 00:07:00 2009-04-23 00:07:00 13560 226 3.8 0.2 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/PBE StationA__2009-04-23__00-07-00(1).JPG
StationA PBE 2009-05-07 17:11:00 2009-05-07 17:11:00 1270920 21182 353.0 14.7 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/PBE StationA__2009-05-07__17-11-00(1).JPG
StationA VTA 2009-04-10 05:07:00 2009-04-10 05:07:00 0 0 0.0 0.0 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/VTA StationA__2009-04-10__05-07-00(1).JPG
StationA VTA 2009-05-06 19:06:00 2009-05-06 19:06:00 2296740 38279 638.0 26.6 C:/Users/niedballa/Documents/R/win-library/3.1/camtrapR/pictures/sample_images/StationA/VTA StationA__2009-05-06__19-06-00(1).JPG
head(dat_clean$data$deployments) |> 
Station Setup_date Retrieval_date utm_y utm_x Problem1_from Problem1_to
StationA 2009-04-02 2009-05-14 604000 526000 NA NA
StationB 2009-04-03 2009-05-16 606000 523000 NA NA
StationC 2009-04-04 2009-05-17 607050 525000 2009-05-12 2009-05-17

CamtrapDP format (json file)

Then, we see how data import and cleaning is performed with a dataset in camtrapDP format.

Read data

The read_data function can also read json files corresponding to the camtrapDP datapackage.

camtrap_dp_file <- system.file(
  "extdata", "mica", "datapackage.json", 
  package = "camtraptor"
dat <- read_data(path_rec = camtrap_dp_file)

# dat <- read_data(path_rec = "")

Internally, we use the function read_camtrap_dp from the camtraptor package (here, it would give the same result to use use directly this function).

The imported object is a list with several slots, and the observations and deployments info are in the $data slot.

#> [1] "datapackage" "list"
#>  [1] "name"          "id"            "profile"       "created"      
#>  [5] "sources"       "contributors"  "organizations" "project"      
#>  [9] "spatial"       "temporal"      "taxonomic"     "platform"     
#> [13] "resources"     "directory"     "data"

head(dat$data$observations) |> 
observationID deploymentID sequenceID mediaID timestamp observationType cameraSetup taxonID taxonIDReference scientificName count countNew lifeStage sex behaviour individualID classificationMethod classifiedBy classificationTimestamp classificationConfidence comments _id vernacularNames.en
ef2f7140-ae97-4b44-8309-ab1ffbc02879 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 8f5ffbf2-52c4-4dd2-b502-d93a8aa64640 NA 2020-07-29 05:29:41 unknown FALSE NA NA NA NA NA NA NA NA NA human NA NA NA NA NA NA NA
68686a75-ad44-4676-b45e-2b85f60e4d11 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 1d98da96-c3ce-4479-9b97-8883cd33724f NA 2020-07-29 05:38:55 blank FALSE NA NA NA NA NA NA NA NA NA human NA NA NA NA NA NA NA
3d065f23-426a-449b-8693-f3c6b2ac9c7a 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 89b807ca-697f-4452-8349-b235622c94fa NA 2020-07-29 05:46:48 animal FALSE DGPL Anas strepera 4 NA subadult unknown NA NA human Danny Van der beeck 2020-08-17 06:57:28 NA NA NA gadwall krakeend
1fa8b00a-8f0e-485e-b98e-eb9bb1ee5a9f 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 710eac2a-e261-478d-84e8-de99f418138c NA 2020-07-30 04:29:31 animal FALSE DGPL Anas strepera 1 NA adult female NA NA human Danny Van der beeck 2020-08-17 06:57:50 NA NA NA gadwall krakeend
e57d97b3-f15b-48f5-a86c-477952bf8b7a 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 149f42ec-37bb-499b-9dac-24a90979e92c NA 2020-07-31 04:43:33 animal FALSE DGP6 Anas platyrhynchos 2 NA unknown unknown NA NA human Danny Van der beeck 2020-08-17 07:03:19 NA NA NA mallard wilde eend
f5707f70-c264-4f81-9e2f-06d55ad23d37 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 45ee3031-93c2-4684-a917-d33d5274780f NA 2020-08-02 05:00:14 animal FALSE DGP6 Anas platyrhynchos 5 NA subadult unknown NA NA human Danny Van der beeck 2020-08-18 21:09:30 NA NA NA mallard wilde eend
head(dat$data$deployments) |> 
deploymentID locationID locationName longitude latitude coordinateUncertainty start end setupBy cameraID cameraModel cameraInterval cameraHeight cameraTilt cameraHeading timestampIssues baitUse session array featureType habitat tags comments _id
29b7d356-4bb4-4ec4-b792-2af5cc32efa8 2df5259b-b4b4-4f43-8cf7-effcced06d6f B_DL_val 5_beek kleine vijver 5.655 51.181 NA 2020-07-29 05:29:41 2020-08-08 04:20:40 Danny Van der beeck NA NA 0 0.7 NA NA NA NA NA NA NA NA boven de stroom van 29/07/2020 tot 08/08/2020 120 foto’s NA
577b543a-2cf1-4b23-b6d2-cda7e2eac372 ff1535c0-6b5d-44be-b3ef-c4d4204dad74 B_DL_val 3_dikke boom 5.659 51.184 NA 2020-06-19 21:00:00 2020-06-28 23:33:22 Danny Van der beeck NA NA 0 0.8 NA NA NA NA NA NA NA NA linkeroever van 19/06/2020 tot 29/06/2020 63 foto’s NA
62c200a9-0e03-4495-bcd8-032944f6f5a1 ce943ced-1bcf-4140-9a2e-e8ee5e8c10e6 B_DM_val 4_’t WAD 4.013 50.699 NA 2021-03-27 20:38:18 2021-04-18 21:25:00 Davy NA NA 0 1.0 NA NA NA NA NA NA NA NA NA NA NA
7ca633fa-64f8-4cfc-a628-6b0c419056d7 3232bcfd-5dfa-496e-b7ab-14593bb1b7f1 Mica Viane 3.898 50.742 NA 2019-10-09 11:18:07 2019-10-23 10:00:16 Axel Neukermans NA NA 0 2.0 NA NA NA NA NA NA NA NA boven de stroom CAM_244 HC600 Boven de Stroom NA

Clean data

Here, the data follows the camtrapDP standard and does not need cleaning. However, for this demonstration we change the time stamp type to character:

dat$data$observations$timestamp <- as.character(dat$data$observations$timestamp)

#> [1] "character"
rec_type <- list(timestamp = list("as.POSIXct",
                                  tz = "UTC"))

dat_clean <- clean_data(dat, 
                        rec_type = rec_type)

In the cleaned data, timestamp is converted back to POSIX:

#> [1] "POSIXct" "POSIXt"

The timezone is UTC, as we specified in the casting function:

attr(dat_clean$data$observations$timestamp, "tzone")
#> [1] "UTC"

Else, the data is unchanged.

head(dat_clean$data$observations) |> 
observationID deploymentID sequenceID mediaID timestamp observationType cameraSetup taxonID taxonIDReference scientificName count countNew lifeStage sex behaviour individualID classificationMethod classifiedBy classificationTimestamp classificationConfidence comments _id vernacularNames.en
ef2f7140-ae97-4b44-8309-ab1ffbc02879 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 8f5ffbf2-52c4-4dd2-b502-d93a8aa64640 NA 2020-07-29 05:29:41 unknown FALSE NA NA NA NA NA NA NA NA NA human NA NA NA NA NA NA NA
68686a75-ad44-4676-b45e-2b85f60e4d11 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 1d98da96-c3ce-4479-9b97-8883cd33724f NA 2020-07-29 05:38:55 blank FALSE NA NA NA NA NA NA NA NA NA human NA NA NA NA NA NA NA
3d065f23-426a-449b-8693-f3c6b2ac9c7a 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 89b807ca-697f-4452-8349-b235622c94fa NA 2020-07-29 05:46:48 animal FALSE DGPL Anas strepera 4 NA subadult unknown NA NA human Danny Van der beeck 2020-08-17 06:57:28 NA NA NA gadwall krakeend
1fa8b00a-8f0e-485e-b98e-eb9bb1ee5a9f 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 710eac2a-e261-478d-84e8-de99f418138c NA 2020-07-30 04:29:31 animal FALSE DGPL Anas strepera 1 NA adult female NA NA human Danny Van der beeck 2020-08-17 06:57:50 NA NA NA gadwall krakeend
e57d97b3-f15b-48f5-a86c-477952bf8b7a 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 149f42ec-37bb-499b-9dac-24a90979e92c NA 2020-07-31 04:43:33 animal FALSE DGP6 Anas platyrhynchos 2 NA unknown unknown NA NA human Danny Van der beeck 2020-08-17 07:03:19 NA NA NA mallard wilde eend
f5707f70-c264-4f81-9e2f-06d55ad23d37 29b7d356-4bb4-4ec4-b792-2af5cc32efa8 45ee3031-93c2-4684-a917-d33d5274780f NA 2020-08-02 05:00:14 animal FALSE DGP6 Anas platyrhynchos 5 NA subadult unknown NA NA human Danny Van der beeck 2020-08-18 21:09:30 NA NA NA mallard wilde eend
head(dat_clean$data$deployments) |> 
deploymentID locationID locationName longitude latitude coordinateUncertainty start end setupBy cameraID cameraModel cameraInterval cameraHeight cameraTilt cameraHeading timestampIssues baitUse session array featureType habitat tags comments _id
29b7d356-4bb4-4ec4-b792-2af5cc32efa8 2df5259b-b4b4-4f43-8cf7-effcced06d6f B_DL_val 5_beek kleine vijver 5.655 51.181 NA 2020-07-29 05:29:41 2020-08-08 04:20:40 Danny Van der beeck NA NA 0 0.7 NA NA NA NA NA NA NA NA boven de stroom van 29/07/2020 tot 08/08/2020 120 foto’s NA
577b543a-2cf1-4b23-b6d2-cda7e2eac372 ff1535c0-6b5d-44be-b3ef-c4d4204dad74 B_DL_val 3_dikke boom 5.659 51.184 NA 2020-06-19 21:00:00 2020-06-28 23:33:22 Danny Van der beeck NA NA 0 0.8 NA NA NA NA NA NA NA NA linkeroever van 19/06/2020 tot 29/06/2020 63 foto’s NA
62c200a9-0e03-4495-bcd8-032944f6f5a1 ce943ced-1bcf-4140-9a2e-e8ee5e8c10e6 B_DM_val 4_’t WAD 4.013 50.699 NA 2021-03-27 20:38:18 2021-04-18 21:25:00 Davy NA NA 0 1.0 NA NA NA NA NA NA NA NA NA NA NA
7ca633fa-64f8-4cfc-a628-6b0c419056d7 3232bcfd-5dfa-496e-b7ab-14593bb1b7f1 Mica Viane 3.898 50.742 NA 2019-10-09 11:18:07 2019-10-23 10:00:16 Axel Neukermans NA NA 0 2.0 NA NA NA NA NA NA NA NA boven de stroom CAM_244 HC600 Boven de Stroom NA