Occupancy models in R

Geek meeting

January 13th, 2026

Lisa Nicvert

Postdoctoral researcher
@FRB-CESAB

Introduction

Simple occupancy model

Occupancy models in R

Conclusion

References

Introduction

Presence $\neq$ detection

Present, but not detected

Introduction

First published by MacKenzie et al. (2002) in the context of species occurrence modelling
Many extensions: dynamic occupancy (MacKenzie et al. 2003), multiple species (Rota et al. 2016), continuous detection process (MacKenzie et al. 2003)…
Here: original simple occupancy model

Simple occupancy model

Model

For site $i$ and visit $j$: \[ y_{ij} \sim Bern(z_i~p) \] \[ z_i \sim Bern(\psi) \]

Occupancy models in R

Simulate occupancy models in R

set.seed(42)

M <- 100 # Number of sites
p <- 0.4 # Detection probability
psi <- 0.8 # Occupancy

# Simulate a number of visits for each site
nvisit <- rpois(n = M, lambda = 3)
nvisit[nvisit == 0] <- 1 # Don't allow zero visits

# Initialize vectors
z <- vector(mode = "numeric", length = M)
y <- vector(mode = "list", length = M)

for (i in 1:M) { # For each site
  # Simulate true presence/absence at site i
  zi <- rbinom(n = 1, size = 1, prob = psi)
  
  # Simulate observed presence/absence at site i 
  # for all visits
  yij <- rbinom(n = nvisit[i], 
                size = 1, prob = p*zi)
  
  z[i] <- zi # True sites states
  y[[i]] <- yij # Detections
}

Simulate occupancy models in R

True proportion of occupied sites ($z_i = 1$)

sum(z)/M

[1] 0.73

Proportion of sites with at least one detection (naive occupancy)

sum(sapply(y, function(yi) any(yi != 0)))/M

[1] 0.51

Inference hypotheses

Repeated visits (parameter identifiability)
Site remains in the same state during the entire study period (closure assumption)
No false detections

Inference with `unmarked`

unmarked: R package for occupancy inference (maximum likelihood estimation = frequentist statistics).

Format data

library(unmarked)

# Format the list of observed detections y
max_visit <- max(sapply(y, length)) # Get maximum number of detections

# Transform y to matrix
y_matrix <- matrix(data = NA,
                   nrow = M,
                   ncol = max_visit)
for (i in 1:M) { # For each site
  nvisit_i <- length(y[[i]]) # Get number of visits
  y_matrix[i, 1:nvisit_i] <- y[[i]] # Fill n-th first rows with detection history
}

# Cast y_matrix to unmarkedFrameOccu
y_occu <- unmarked::unmarkedFrameOccu(y_matrix)

Format data

library(unmarked)

# Format the list of observed detections y
max_visit <- max(sapply(y, length)) # Get maximum number of detections

# Transform y to matrix
y_matrix <- matrix(data = NA,
                   nrow = M,
                   ncol = max_visit)
for (i in 1:M) { # For each site
  nvisit_i <- length(y[[i]]) # Get number of visits
  y_matrix[i, 1:nvisit_i] <- y[[i]] # Fill n-th first rows with detection history
}

# Each row contains the detections at a site, filled with NAs for 
# sites that have less visits than the most visited one
head(y_matrix, 3)

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    0    0    0    1    1   NA   NA   NA
[2,]    1    0    0    1    1    1   NA   NA
[3,]    0    0   NA   NA   NA   NA   NA   NA

# Cast y_matrix to unmarkedFrameOccu
y_occu <- unmarked::unmarkedFrameOccu(y_matrix)

Format data

library(unmarked)

# Format the list of observed detections y
max_visit <- max(sapply(y, length)) # Get maximum number of detections

# Transform y to matrix
y_matrix <- matrix(data = NA,
                   nrow = M,
                   ncol = max_visit)
for (i in 1:M) { # For each site
  nvisit_i <- length(y[[i]]) # Get number of visits
  y_matrix[i, 1:nvisit_i] <- y[[i]] # Fill n-th first rows with detection history
}

# Each row contains the detections at a site, filled with NAs for 
# sites that have less visits than the most visited one
# head(y_matrix, 3)

# Cast y_matrix to unmarkedFrameOccu
y_occu <- unmarked::unmarkedFrameOccu(y_matrix)
head(y_occu, 3)

Data frame representation of unmarkedFrame object.
  y.1 y.2 y.3 y.4 y.5 y.6 y.7 y.8
1   0   0   0   1   1  NA  NA  NA
2   1   0   0   1   1   1  NA  NA
3   0   0  NA  NA  NA  NA  NA  NA

Inference with `unmarked::occu`

# Infer occupancy
occ <- unmarked::occu(formula = ~1 ~1, 
                      data = y_occu)

formula: gives the formulas for the logit of $p$ and $\psi$.
data: observed detections

summary(occ)


Call:
unmarked::occu(formula = ~1 ~ 1, data = y_occu)

Occupancy (logit-scale):
 Estimate    SE    z P(>|z|)
     1.03 0.406 2.54  0.0112

Detection (logit-scale):
 Estimate    SE     z  P(>|z|)
   -0.623 0.184 -3.38 0.000722

AIC: 360.8983 
Number of sites: 100

Get estimates

Get estimates on the natural scale:

# Get detection (p)
unmarked::backTransform(occ, type = "det")

# Get occupancy (state parameter, psi)
unmarked::backTransform(occ, type = "state")

Backtransformed linear combination(s) of Detection estimate(s)

 Estimate     SE LinComb (Intercept)
    0.349 0.0419  -0.623           1

Transformation: logistic 
Backtransformed linear combination(s) of Occupancy estimate(s)

 Estimate     SE LinComb (Intercept)
    0.737 0.0788    1.03           1

Transformation: logistic

Inference with `cmdstanr`

Stan: Bayesian software (optimized algorithms for the MCMC samplers)
cmdstanr R package
Binomial model specification (more efficient): \[n_i \sim Binom({n_{\text{visit}}}_i, z_i~p) \quad \text{and} \quad z_i \sim Bern(\psi)\]

Format data

n <- sapply(y, sum) # number of detections
nvisit <- sapply(y, length) # number of visits

# List of parameters for Stan
dat <- list(M = M,
            n = n,
            nvisit = nvisit)

Stan code blocks

data {
  // Code block to define data variables
}

parameters {
  // Code block to define parameters
}

model {
  // Code block to infer parameters
}

Data block

Defining variables (typed)

data {
  int<lower=1> M; // Number of sites
  array[M] int nvisit; // Number of visits per sites-years
  array[M] int<lower=0> n; // Observations vector
}

Parameters block

For parameters we want to infer (logit scale, more efficient in Stan)

parameters {
  real psi_logit; // value of psi on the logit scale
  real p_logit; // value of p on the logit scale
}

Model block

Priors: flat normal priors centered on zero = initializing the log-posterior with a log-transformed normal density

model {
  // 1. Priors
  target += normal_lpdf(psi_logit | 0, 3);
  target += normal_lpdf(p_logit | 0, 3);

  // 2. Variables
  int nvi;

  // 3. Model specification
  for (i in 1:M) { // iterate over sites
    nvi = nvisit[i]; // number of visits
    if (n[i] > 0) { // the species was seen
      // Update log-likelihood: species was detected | present
      target += log_inv_logit(psi_logit) + 
        binomial_logit_lpmf(n[i] | nvi, p_logit);
    } else {
      // Update log-likelihood: species was non-detected | present
      // or non-detected | absent
      target += log_sum_exp(log_inv_logit(psi_logit) + binomial_logit_lpmf(0 | nvi, p_logit), 
        log1m_inv_logit(psi_logit));
    }
  }
}

Model block

Priors: flat normal priors centered on zero = initializing the log-posterior with a log-transformed normal density
(Optional) Intermediate variables: nvi (number of visits for site $i$)

model {
  // 1. Priors
  target += normal_lpdf(psi_logit | 0, 3);
  target += normal_lpdf(p_logit | 0, 3);

  // 2. Variables
  int nvi;

  // 3. Model specification
  for (i in 1:M) { // iterate over sites
    nvi = nvisit[i]; // number of visits
    if (n[i] > 0) { // the species was seen
      // Update log-likelihood: species was detected | present
      target += log_inv_logit(psi_logit) + 
        binomial_logit_lpmf(n[i] | nvi, p_logit);
    } else {
      // Update log-likelihood: species was non-detected | present
      // or non-detected | absent
      target += log_sum_exp(log_inv_logit(psi_logit) + binomial_logit_lpmf(0 | nvi, p_logit), 
        log1m_inv_logit(psi_logit));
    }
  }
}

Model block

Priors: flat normal priors centered on zero = initializing the log-posterior with a log-transformed normal density
(Optional) Intermediate variables: nvi (number of visits for site $i$)
Model: Stan cannot model discrete variables, so we have to update the posterior probability density manually

model {
  // 1. Priors
  target += normal_lpdf(psi_logit | 0, 3);
  target += normal_lpdf(p_logit | 0, 3);

  // 2. Variables
  int nvi;

  // 3. Model specification
  for (i in 1:M) { // iterate over sites
    nvi = nvisit[i]; // number of visits
    if (n[i] > 0) { // the species was seen
      // Update log-likelihood: species was detected | present
      target += log_inv_logit(psi_logit) + 
        binomial_logit_lpmf(n[i] | nvi, p_logit);
    } else {
      // Update log-likelihood: species was non-detected | present
      // or non-detected | absent
      target += log_sum_exp(log_inv_logit(psi_logit) + binomial_logit_lpmf(0 | nvi, p_logit), 
        log1m_inv_logit(psi_logit));
    }
  }
}

Inference with `$sample`

compile the Stan model (Stan code is written in a file)

library(cmdstanr)

model <- cmdstanr::cmdstan_model(stan_file = stan_file)

infer the model object

stanfit <- model$sample(data = dat, # Data list
                        iter_warmup = 200, # Number of burn-in iterations
                        iter_sampling = 500, # Number of post-burn-in iterations
                        chains = 4, # Number of chains
                        parallel_chains = 4 # Number of parallel cores
                        )

Get estimates

Get parameter estimates:

(stan_inf <- stanfit$summary())

# A tibble: 5 × 10
  variable      mean   median     sd    mad       q5      q95  rhat ess_bulk
  <chr>        <dbl>    <dbl>  <dbl>  <dbl>    <dbl>    <dbl> <dbl>    <dbl>
1 lp__      -115.    -114.    1.02   0.801  -117.    -114.     1.01     882.
2 psi_logit    1.18     1.11  0.484  0.440     0.491    2.06   1.00    1011.
3 p_logit     -0.651   -0.649 0.190  0.183    -0.970   -0.332  1.00     771.
4 p            0.344    0.343 0.0427 0.0409    0.275    0.418  1.00     771.
5 psi          0.754    0.753 0.0795 0.0808    0.620    0.887  1.00    1011.
# ℹ 1 more variable: ess_tail <dbl>

Conclusion

simple occupancy model equations
infer with unmarked and cmdstanr
$p$ and $\psi$ often not constant: can be modeled as $\text{logit}(\psi_i) = \beta_0 + \beta_1 x_i$ or $\text{logit}(p_{ij}) = \beta_0 + \beta_1 x_{ij}$
blog post on the Tips and tricks blog!

Thank you!

References

MacKenzie, Darryl I., James D. Nichols, James E. Hines, Melinda G. Knutson, and Alan B. Franklin. 2003. “Estimating Site Occupancy, Colonization, and Local Extinction When a Species Is Detected Imperfectly.” Ecology 84 (8): 2200–2207. https://doi.org/10.1890/02-3090.

MacKenzie, Darryl I., James D. Nichols, Gideon B. Lachman, Sam Droege, J. Andrew Royle, and Catherine A. Langtimm. 2002. “Estimating Site Occupancy Rates When Detection Probabilities Are Less Than One.” Ecology 83 (8): 2248–55. https://doi.org/10.1890/0012-9658(2002)083[2248:ESORWD]2.0.CO;2.

Rota, Christopher T., Marco A. R. Ferreira, Roland W. Kays, Tavis D. Forrester, Elizabeth L. Kalies, William J. McShea, Arielle W. Parsons, and Joshua J. Millspaugh. 2016. “A Multispecies Occupancy Model for Two or More Interacting Species.” Methods in Ecology and Evolution 7 (10): 1164–73. https://onlinelibrary.wiley.com/doi/abs/10.1111/2041-210X.12587.

Other resources

Richard A. Erickson’s GitHub repository with Stan occupancy models tutorials
Olivier Gimenez’s workshop with resources on occupancy modelling with unmarked

Table of contents

Introduction

Presence \(\neq\) detection

Present, but not detected

Introduction

Simple occupancy model

Model

Occupancy models in R

Simulate occupancy models in R

Simulate occupancy models in R

Inference hypotheses

Inference with `unmarked`

Format data

Format data

Format data

Inference with `unmarked::occu`

Get estimates

Inference with `cmdstanr`

Format data

Stan code blocks

Data block

Parameters block

Model block

Model block

Model block

Inference with `$sample`

Get estimates

Conclusion

Conclusion

References

References

Other resources

Table of contents

Introduction

Presence \(\neq\) detection

Present, but not detected

Introduction

Simple occupancy model

Model

Occupancy models in R

Simulate occupancy models in R

Simulate occupancy models in R

Inference hypotheses

Inference with unmarked

Format data

Format data

Format data

Inference with unmarked::occu

Get estimates

Inference with cmdstanr

Format data

Stan code blocks

Data block

Parameters block

Model block

Model block

Model block

Inference with $sample

Get estimates

Conclusion

Conclusion

References

References

Other resources

Inference with `unmarked`

Inference with `unmarked::occu`

Inference with `cmdstanr`

Inference with `$sample`