Occupancytuts - Goodness of Fit Test

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

occupancyTuts: Goodness of fit test

Prerequisites

tutPrePost(tut = "gof", type = "pre")

The tutorial prerequisites are listed below. To run a tutorial, use learnr::runtutorial(‘name’, package =
‘occupancyTuts’)

name description

single_season An introduction to the single season patch occupancy tutorial with no covariates, as
presented in the seminal paper by MacKenzie et al. 2002. Estimating site occupancy
rates when detection probabilities are less than one. Ecology 83(8):2248–2255.

Suggested/Potential Readings
MacKenzie, D., and L. Bailey. 2004. Assessing fit of site occupancy models. Journal of Agricultural,
Biological and Ecological Statistics 9:300-318.

Warton, D., J. Stoklosa, G. Gullera-Arroita, D. MacKenzie, and A. Welsh. 2017. Graphical


diagnostics for occupancy models with imperfect detection. Methods in Ecology and Evolution
8:408-419. (https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.12761)

Background
To introduce this tutorial, let’s return to some very first steps we covered in previous tutorials. We’ll start
by looking at an object called “obs” that was loaded when you launched the tutorial.

obs

127.0.0.1:15249/#section-gof-test-in-rpresence 1/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

This object is an 8 row by 4 column dataframe for a single season occupancy model with 3 surveys. With
3 surveys, there are 2^3 = 8 possible encounter histories. The column “Frequency” gives the number of
sites where a given history was observed. For example, 45 sites had a “000” history. In this example, a
total of 75 sites were surveyed.

A while back, we said you can build a model with perfect fit. Prove it, you say? OK. Let’s start by finding
the probability of getting a ‘000’, ‘001’, ‘010’, etc. history directly from the raw data. How do we compute
these probabilities? Simple! Just divide the frequency of each history by the total number of sites
surveyed, which is 75. This is added to the table below as the column “Probability”. The final column
takes the natural log of the probabilities:

# add in the probabilities


obs$Probability <- obs$Frequency/sum(obs$Frequency)

# add the natural log of the probabilities


obs$LN_Prob <- log(obs$Probability)

# print the table


obs

127.0.0.1:15249/#section-gof-test-in-rpresence 2/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

This model perfectly fits the observed data. If we say that probability of “rolling” a 000 history is 0.6, then
we expect that 45 of 75 sites (60%) would exhibit this history. Voila! A perfect fit. This same approach
applies to the other histories.

👉🏾The log likelihood of this model is the sum of the final column, and -2 times this result is the -2 log
likelihood. By convention, this is called the “saturated model”. It is a benchmark model that is perfectly fit
to the data.

👉🏼 However, keep in mind that the saturated model does not include parameters we’re interested in,
namely occupancy and detection probabilities.

By now, you have some idea of how to run a single season occupancy model. Let’s create a “pao” object
and run a very simple single season occupancy model on these same data. Below, we use createPao()
to create the “pao” object, and then pass this object to the occMod() function to run the “dot” model
(where both ψ and p are constants):

# create the pao object


pao <- createPao(
data = obs[, 1:3],
frq = obs[, 'Frequency', drop = TRUE]
)

# run the dot model


dot <- occMod(
data = pao,
model = list(psi ~ 1, p ~ 1),
type = "so"
)

# print the summary


summary(dot)

## Model name=psi(1)p(1)
## AIC=197.1277
## -2*log-likelihood=193.1277
## num. par=2
## Warning: Numerical convergence may not have been reached. Parameter estimates converged

Here, we have a model that maximized the log likelihood of the data (or minimizes the -2 log likelihood)
with the single-season occupancy framework. Notice the -2 log likelihood from this model differs from the
-2 log likelihood from the saturated model. Let’s have a look at the coefficients with the coef() function:

# print the beta coefficients for psi


coef(dot, "psi")

127.0.0.1:15249/#section-gof-test-in-rpresence 3/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

# print the beta coefficients for p


coef(dot, "p")

Here, the coef() function returns the unbounded beta coefficients that maximized the log likelihood.
Now, let’s look at their back-transformed probabilities with the print_one_site_estimates() function,
where we pass in the dot model and ask for the first site by index number. This is appropriate as we have
no covariate values.

print_one_site_estimates(mod = dot, site = 1)

The results shown are back-transformed probabilities, squishing the unbounded beta values back to the
0-1 probability scale. You could convey these results in a report. But . . .

Quiz

127.0.0.1:15249/#section-gof-test-in-rpresence 4/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

👉🏿 We need a test that will allow us to answer this question, and that test is called a “goodness of fit” test.
Goodness of fit is the extent to which observed data match the values expected by theory. You do not
want to use a model that could not produce your observed data.

Until recently, a method of assessing goodness of fit for occupancy models was lacking. Fortunately,
Darryl MacKenzie and Larissa Bailey outlined a method that is fairly straight-forward. Their paper,
published in the Journal of Agricultural, Biological and Ecological Statistics, is titled “Assessing fit of site
occupancy models.”

What might make the data not fit the occupancy framework? Let’s return to the single season occupancy
model assumptions:

1. The system is closed to changes in the occupancy status of site during the sampling period. This
means that the species cannot go extinct from a site, or cannot colonize a vacant site over the
course of sampling. Individuals within the population can be born, die, or move, but these
processes cannot influence the occupancy status of a site between sampling periods.
2. Species are not falsely detected.
3. Detection at a site is independent of detection at other sites.

Larissa Bailey notes that lack of fit generally comes in two “flavors” of problems (1) violations of the
assumptions listed above, or (2) an inappropriate model structure, in which the data are shaped by
covariates that weren’t considered or were improperly modeled. When a model doesn’t fit, often times the
precision of parameter estimates is underestimated (the standard errors look better than they actually
are), leading to false inferences about species occupancy and/or detection. Bad!

The MacKenzie and Bailey goodness-of-fit test not only informs about the lack of fit, but also where the
lack of fit occurs. Examining the detailed output of a goodness of fit test can reveal data-coding errors or
outliers. Goodness of fit is also tied to model selection approaches that rely on the most general model
fitting reasonably well; we will introduce model selection in a separate tutorial.

In this tutorial, we’ll examine ways to determine if occupancy model generated from field data fits our
expectations. By that we mean that we’ll compare the field-collected data with data we would expect from
a model where we know the true values of occupancy and detection.

Objectives
Understand the need to formally test for goodness of fit.
Learn how to compute goodness of fit statistics.

To understand what can go wrong in fitting models.

To learn how to interpret output of goodness of fit tests in RPresence.

127.0.0.1:15249/#section-gof-test-in-rpresence 5/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

Chi Square goodness of fit test


The GOF test in MacKenzie and Bailey is based on a Pearson’s Chi-square statistic:

2
2
∑N
i=1 (Oi − Ei )
χ =
Ei2

where there are N unique detection histories and Oi is the number of observed sites with the ith unique
detection history, and Ei is the expected number of sites with the ith unique detection history. This value
is calculated, and is then located on a given theoretical Chi-Square Distribution; we use this location to
draw conclusions about fit.

Because the Chi-Square Distribution is central to our discussion, let’s take a moment to overview what
this probability distribution actually is before diving into the occupancy tests:

https://youtu.be/zHpSX0S7CTg (https://youtu.be/zHpSX0S7CTg)

10 chisq

The spreadsheet that is featured in this video comes with the occupancyTuts package, and may be
found on your machine at this filepath:

# find the filepath to the featured Excel file


paste0(find.package("occupancyTuts"), "/extdata/gof.xlsx")

## [1] "C:/Users/WCS Indonesia/AppData/Local/R/win-library/4.4/occupancyTuts/extdata/gof.x

In R, there is a “family” of functions for working with the chi-square distribution. These are:

1. dchisq()
127.0.0.1:15249/#section-gof-test-in-rpresence 6/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

2. pchisq()
3. qchisq()
4. rchisq()

For this tutorial, our focus is on the dchisq() function, which is the probability density function, and the
pchisq() function, which returns the proportion of the area under the curve to the left of a given chi-
square value. For example, the code below sets up a dataframe with all combinations of “x” (span from
0.1 to 20 in increments of 0.1) and “df”, which stands for “degrees of freedom” (with values of 1, 2, 3, 5,
and 10), We then add a column called “Probability”, which in turn calls dchisq() function to return the
probability density for each “x” under a given chi-square distribution. See if you can follow the code:

# create a dataframe with combinitions of x and df


chidata <- expand.grid(
x = seq(from = 0.1, to = 20, by = 0.1),
df = c(1, 2, 3, 5, 10)
)

# add in the probability


chidata$Probability <- dchisq(chidata$x, chidata$df)

# plot with ggplot


ggplot(
data = chidata,
aes(x = x, y = Probability, color = as.factor(df))
) +
geom_line(linewidth = 2) +
theme(legend.position = "top") +
labs(
title = "Alternative Chi-Square distributions",
x = "x",
y = "Probability Density") +
scale_color_discrete("Degrees of Freedom: ") +
ylim(0, 0.5) +
geom_vline(xintercept = 6, linetype = "dashed")

127.0.0.1:15249/#section-gof-test-in-rpresence 7/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

Each of these distributions is generated from the dchisq() probability density function; the area under
the curve is 1 in each case.

The other function of interest is the pchisq() , in which you provide a value of “x” and the function will
return the proportion of the curve that is to the left of the given value.

pchisq(6, df = 3)

## [1] 0.8883898

Here, we pass in a value of 6, and find where this value occurs in a chi-square distribution with 3 degrees
of freedom. This value is highlighted by a dashed vertical line in the above graph. The answer is
0.8883898, meaning that almost 89% of the distribution lies to the left of the value 6. We can find
proportion of the curve that falls to the right of the x value by setting the “lower.tail” argument to FALSE:

pchisq(6, df = 3, lower.tail = FALSE)

## [1] 0.1116102

👉🏻 Don’t forget that the Chi-Square Distribution is a “null distribution” – as we’ve seen, its shape is
controlled by a single parameter called “degrees of freedom” which reflects the number of randomly drawn
z scores that are squared and then added together. We can use this distribution to draw inferences
(conclusions) about “fit”.

127.0.0.1:15249/#section-gof-test-in-rpresence 8/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

Let’s see how this works in a very general sense:

https://youtu.be/ZK0rLZT_IC8 (https://youtu.be/ZK0rLZT_IC8)

10 chitest

Occupancy goodness of fit test


Now that you have a general understanding of the theoretical Chi-Square distribution, let’s see how it is
used in a simple single-season occupancy framework. We’ll work with the R object called “obs” to begin
with. There were 75 sites in total, and three surveys. The frequency is given in the column “Frequency”. If
you recall, we computed the “Probability” from the “raw” data to compute the log likelihood of the
saturated model.

obs

127.0.0.1:15249/#section-gof-test-in-rpresence 9/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

After all that work, we can now dismiss the saturated model, as it does not provide probabilities
that are expressed in terms of the parameters we are most interested in, namely occupancy and
detection.

In the code below, we start building a goodness of fit table to be filled in step by step. First, we use the
unite() function from the package, tidyr to combine the three surveys into a new column called “EH”,
and then keep only this new column and their frequencies. Along the way we rename the column
“Frequency” to “Observed” (meaning, observed frequencies):

gof_table <- obs %>%


tidyr::unite("EH", -c(4:6), sep = "") %>%
rename(Observed = Frequency) %>%
select("EH", "Observed")

# show the observed data


gof_table

The next step is to compute the expected frequencies, not based on the saturated model, but based on
the parameters from the occupancy model of interest. In the tutorial background section, we ran the
model psi ~ 1 and p ~ 1, and named the resulting output “dot”. Let’s look at the summary:

# look at the dot model


summary(dot)

127.0.0.1:15249/#section-gof-test-in-rpresence 10/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

## Model name=psi(1)p(1)
## AIC=197.1277
## -2*log-likelihood=193.1277
## num. par=2
## Warning: Numerical convergence may not have been reached. Parameter estimates converged

Remember, this is the simplest occupancy model that we can run. It estimates two parameters: a beta
estimate for “psi” and a beta estimate for “p”. These are on the logit scale. Because this is a dot model,
we can use the print_one_site_estimates() function to show the back-transformed values as
probabilities. Let’s try it:

R Code  Start Over  Run Code

Now, create an object called “psi” that stores the probability that a site is occupied, and create an
object called “p” that stores the probability of detection.

R Code  Start Over  Hint  Run Code

Well done.

👉🏽Now we have stored estimates of occupancy and detection probability for our trolls dataset. Remember
from the encounter-history tutorial when we computed the probability of observing each history using the
occupancy parameters? We’ll use this approach now for our goodness of fit test.

Write equations that yield the probability of observing each history, then add the probabilities to the
gof_table as a column called “Probability”.

R Code  Start Over  Hints  Run Code

How did you do? Let’s show the goodness of fit table, which includes the encounter histories, the
observed frequencies, and the probability of observing each history under the model, “dot”. Let’s look at
our table so far:

# show the history probabilities


gof_table

127.0.0.1:15249/#section-gof-test-in-rpresence 11/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

Since we’ve listed every possible history, the sum of the probabilities should equal 1.0. This is easily
confirmed:

# sum the history probabilities


sum(gof_table$Probability)

## [1] 1

Did the sum of your histories = 1.0? If not, check for typos, or if you forgot a component of the history
000. To get the expected number of sites with each history, we just multiply those probabilities by the
number of sites (75).

# compute the expected number of sites


gof_table$Expected <- 75 * gof_table$Probability

# show the goodness of fit table


gof_table

127.0.0.1:15249/#section-gof-test-in-rpresence 12/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

Now our table shows the observed frequencies, and the expected frequencies under the model, “dot”.
Now we can compute a chi-square test statistic. The chi-squared test of independence will compare the
observed frequencies with the expected frequencies, and output a “chi-squared test statistic”, which is
designated as χ 2 . To compute the observed test statistic, the following formula is used:

(Oi − Ei ) 2
k
χ =∑
2

i=1
Ei

Each cell is designated by the letter i, which we let take values from 1 to k = 8 cells. For each cell, we
compute the observed frequency (Oi ) minus the expected frequency (here designated Ei ), square the
result, and divide by the expected value. This is done on a cell-by-cell basis.

The results are summed to yield the observed χ 2 test statistic.

(45 − 45.0000056) 2 (1 − 1.1397273) 2 (15 − 14.419177) 2


χ2 = + +. . . + = 5.4226
45.0000056 1.1397273 14.419177
Because the calculation of the numerator for each cell involves squaring, the χ 2 result will always be
positive (or 0 if the observed frequencies perfectly match the expected frequencies).

Enter code to calculate the chi-square test statistic. Call this result “chi_stat”:

R Code  Start Over  Hint  Run Code

We now have a test statistic assessing the fit of our model. Since it is a chi-square statistic, we can look-
up/compute the probability of getting a chi-square value at least that high. To compute the probability, we
need to know the degrees of freedom.

127.0.0.1:15249/#section-gof-test-in-rpresence 13/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

👉🏾 You might guess that the degrees of freedom should be 8 - 1 = 7. You’re close, but not correct. The test
statistic is the sum of 8 chi-square values, only 7 of which are independent (remember, we can get the
number of 000 histories by number of sites minus sum of other histories), so the degrees of freedom is 7 as
you may expect. However, to generate the probabilities, we estimated 2 parameters in the model, so we
need to subtract 2 more degrees of freedom. So, the appropriate chi-square distribution is one that has
degrees of freedom = 5.

Let’s plot a Chi Square distribution with 5 degrees of freedom, and locate our calculated test statistic on
it:

# set up a vector of x values


x = seq(from = 0.1, to = 20, by = 0.1)

# create a dataframe
chidata <- data.frame(
x = x,
df = 5,
Probability = dchisq(x = x, df = 5)
)

# plot the data with ggplot


ggplot(chidata,
aes(x = x, y = Probability)) +
geom_line(linewidth = 2) +
labs(
title = "Chi-Square distribution with 5 DF",
x = "x",
y = "Probability Density") +
ylim(0, 0.5) +
geom_vline(xintercept = chi_stat ,linetype = "dashed")

127.0.0.1:15249/#section-gof-test-in-rpresence 14/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

Use the pchisq() function to determine what proportion of the distribution falls to the right of our
observed chi square test statistic. Watch your tail!

R Code  Start Over  Hint  Run Code

Quiz

Remember that the Chi-Square distribution is a “null” distribution. If the observed data perfectly match the
expected data, the test statistic ends up being 0, which is in the left tail of the distribution. We’re
concerned with test statistics that are unusually high – those located in the right tail of the distribution.
These values are possible to obtain just by chance, but are rarely observed in the null distribution. If our
test statistic falls in the right tail (conventionally in the 5% tail), we say the observed data (i.e., our
encounter histories) are inconsistent with the model (in this case, the dot model). That is, the data do not
fit the model. If the data do not fit the model, the model is a poor characterization of the data for any
number of reasons, and should not be used for inference.

127.0.0.1:15249/#section-gof-test-in-rpresence 15/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

What can go wrong, part I


The preceding example seems fairly straight-forward, but most studies are not that simple. As we learned
in a previous section, the number of possible detection-histories grows exponentially with the number of
surveys. We have 8 possible detection-histories in this example, which is 2^3. Each additional survey
doubles the number of possible histories. For example, with 4 surveys, there are 2^4 = 16 possible
histories, and with 5 surveys there are 2^5 = 32 possible histories; that’s 32 values that make up the final
chi-square calculation.

Let’s see what happens to our chi-square table as we increase the number of surveys on a simulated
dataset from 100 sites. Use the slider to set the number of surveys, and inspect the columns called
“Observed” and “Expected”. Note the chi-square value, degrees of freedom, and the probability
returned.

Number of surveys:
2
0 8

2 6

👉🏽A well-known fact about this chi-square test is that when expected values are small, the test does not
perform well. As you can see from the table, when you get to 6-8 surveys, there are many histories with
small (< 2.0) expected values. These small expected values cause unusually large chi-square values, leading
to a significant probability level (even though the data were generated to fit the model).

One resolution to this problem is to pool observed and expected values which are < 2.0. This could be
done here, but each time data are pooled, 1 degree of freedom is lost. So, many datasets would end up
with less than 0 degrees of freedom, making it impossible to compute a probability level.

What can go wrong, part II


Another problem when testing for goodness-of-fit crops up when the dataset contains missing values.
This can be the result of many things, from survey detection apparatus failure (e.g., camera batteries died
or memory card full) to the site being temporarily inaccessible (e.g., road to site blocked by orcs). When
K
this happens, the number of possible detection-histories goes from 2 (2 possible values for each
K
survey, 0 or 1) to 3 (3 possible values for each survey, 0, 1 or NA). This makes the problem of small
expected values mentioned previously even greater. In practice, only a subset of possible histories which
include missing data will occur in a dataset, but those must be accounted for.

To handle missing data, the approach used by RPresence is to treat each group of histories which
contain missing data as a separate “cohort”. As a simple example, consider a study where we have
complete data except for a few sites which were inaccessible during survey 2. We would compute

127.0.0.1:15249/#section-gof-test-in-rpresence 16/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

expected values and chi-square values for all histories with complete data as before, then compute
expected values and chi-square values for the histories with missing data. Then we add the chi-square
values to get a total chi-square statistic for the model.

How do you compute the expected number of histories when you have missing data? First, we need to
compute the probability of those histories. This is done just as is done with non-missing data, except that
the p or (1 − p) in the history probability corresponding to the survey(s) with missing data is omitted. For
example, if the detection history is 1 0 0 _ _ 1 (species detected in survey 1, not detected in surveys 2 or
3, not surveyed in surveys 4 or 5, detected in survey 6), the probability of the history would be:

P (h100−−1 ) = ψp1 (1 − p2 )(1 − p3 )p6

Notice that we have information about detection from surveys 1,2,3 and 6 so the p’s from those surveys
appear in the probability formula. We don’t have any information from surveys 4 or 5, so p4 and p5 don’t
appear in the formula.

If another site had the history, 1 0 1 _ _ 0, the formula would be:

P (h101−−0 ) = ψp1 (1 − p2 )p3 (1 − p6 )

If another site had the history, 0 0 0 _ _ 0, the formula would be:

P (h000−−0 ) = ψ(1 − p1 )(1 − p2 )(1 − p3 )(1 − p6 ) + 1 − ψ

Let’s assume that those are the only three sites with missing data in our dataset. We would have two
cohorts for our GOF test: one for all sites without missing data, and one cohort with histories where
surveys 4 and 5 have missing data.

Computation of the chi-square values for the first cohort would be done as usual but would not include
these three sites. So, to get expected values for that cohort we would be multiplying the history
probabilities by (N − 3) instead of N . To get expected values for the second cohort, we would multiply
those two probabilities by 3, which is the number of sites in that cohort. The total test statistic is obtained
by adding the chi-square values for the two cohorts.

If there were some sites which had missing data in surveys 1 and 3, then those histories would make up
a third cohort. When you have lots of combinations of surveys with missing data, there will be many
cohorts. These additional cohorts reduce the number of sites for the first cohort (the non- missing data
cohort), reducing the power of that part of the GOF test. Since the size of the missing-data cohorts tends
to be small, those expected values tend to be small as well. This can compound the problem mentioned
earlier with small expected values.

127.0.0.1:15249/#section-gof-test-in-rpresence 17/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

The MacKenzie and Bailey GOF


👉🏻 The solution proposed by Darryl MacKenzie and Larissa Bailey for the single-season model is to
simulate many datasets with occupancy and detection parameters equal to the estimated ones from the
“real” data. Since we know the simulated datasets fit our model, we can compare the chi-square value from
our real data against the distribution of chi-square values from the simulated datasets.

If the real data fits the model, the chi-square value should be within the distribution of chi-square values
of the simulated datasets. If our chi-square value is much higher than the ones from the simulated
datasets, then we would conclude that the real data do not fit the model.

Let’s see how this works conceptually in a spreadsheet environment (which let’s us focus on the
concepts so we don’t get lost in the coding weeds):

https://youtu.be/qngXdBx4-Qc (https://youtu.be/qngXdBx4-Qc)

10 mb gof

The spreadsheet that is featured in this video comes with the occupancyTuts package, and may be
found on your machine at this filepath:

paste0(find.package("occupancyTuts"), "/extdata/gof.xlsm")

## [1] "C:/Users/WCS Indonesia/AppData/Local/R/win-library/4.4/occupancyTuts/extdata/gof.x

127.0.0.1:15249/#section-gof-test-in-rpresence 18/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

This simulation can be easily carried out in R. Press the button to begin. The histogram represents the
test statistics from 2000 simulations for a same model where psi = 0.75, p = 0.4, 100 sites and 6
surveys. Each time the button below is clicked, it simulates a “real” dataset generated from the same
model (potentially your dataset). A GOF test statistic is computed on the dataset and is represented
by the red line. The proportion of the bars to the right of the red line indicate probability level
(probability of getting a test statistic as high or higher than the observed test statistic). If the x value of
the red line falls in the upper 5% tail of this simulated distribution, you could conclude the data do not
fit the model. We have simulated data here which we know fits the model, but you’ll find that
sometimes the red line is far to the right, leading to the false conclusion that the data do not fit the
model. That’s random variation!

Click many times

GOF test in RPresence


For the single-season model, RPresence can compute the GOF test by including the argument,
“modfitboot” = N in the function call to occMod() , where N = the number of simulated datasets to
generate. As noted above, many of the expected values can be quite small so a single observed
frequency value can cause the GOF test statistic to become very large. To make sure the simulated
datasets include instances of cases where a few of the histories have small expected values, it is
recommended to set this N to be quite large (e.g., 10,000). With that many simulated datasets, the
distribution of chi-square stats should include quite a few large values, giving the real-data chi-square a

127.0.0.1:15249/#section-gof-test-in-rpresence 19/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

better chance of falling within the range of simulated-data chi-square values. If you’d like to be able to
reproduce your GOF test, make sure to set a random number seed in the occMod() function call by
passing in a seed to the “modfitseed” argument.

In the code below, we show you how easy it is to run a GOF test in RPresence. Let’s rerun our “dot”
model as we did before, but add the arguments, modfitboot = 200 and modfitseed = 321 to the call of
the occMod function. (We’re just doing 200 here to save time and space in the output document. This is
far too few simulations).

# run the dot model and include the goodness of fit test
dot <- occMod(
model = list(psi ~ 1, p ~ 1),
data = pao,
modfitboot = 200,
modfitseed = 321
)

# print the list element named "gof" to view the fit statistics
print(dot$gof)

## $chat
## [1] 1.045
##
## $TS
## TestStat TS_low TS_High
## 5.4226 0.2929 18.2146
##
## $Prb
## [1] 0.3881
##
## $tbl
## History Observed Expected Chi-square
## [1,] "000( 0 0)" "45" "45.000000012" "0"
## [2,] "001( 0 1)" "1" "1.139727227" "0.02"
## [3,] "010( 0 2)" "1" "1.139727227" "0.02"
## [4,] "011( 0 3)" "1" "4.053878877" "2.3"
## [5,] "100( 0 4)" "2" "1.139727227" "0.65"
## [6,] "101( 0 5)" "7" "4.053878877" "2.14"
## [7,] "110( 0 6)" "3" "4.053878877" "0.27"
## [8,] "111( 0 7)" "15" "14.419181676" "0.02"
##
## $bootstraps
## [1] 200

The GOF output includes 5 list items.

1. “tbl”. Let’s start with the observed and expected value table, which is in the list element named
“tbl”. This looks a lot like the table we computed. Can you see that it is a matrix of characters?

127.0.0.1:15249/#section-gof-test-in-rpresence 20/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

2. “TS”. The “TS” item lists the chi-square test-statistic from the data (TestStat), followed by the
lowest test-statistic (TS_low) from the 200 simulated datasets, followed by the highest (TS_High).
You may think that the “TestStat” should be the sum of the table’s “Chi-square” column. Let’s
confirm that now:

<<<<<<< HEAD

sum(as.numeric(dot$gof$tbl[,"Chi-square"]))

## [1] 5.42

======= >>>>>>> 06fb8bd887fdc376706018ef3a8dfec9ec378468 3. “Prb”. The list element “Prb” gives


the probability of getting a test statistic from data which fits the model (simulated data) as high or higher
that the test statistic from the real data. (It is the proportion of the area of the bars in the histogram which
is to the right of the red line in the previous graph.)

4. “bootstraps”. This is the number of simulated datasets created to perform the analysis (the
modfitboot argument we used in the occMod function call).

5. “chat”. Now have a look at the list element named “chat”, which is pronounced “c-hat”. Remember
earlier when we mentioned that a poor-fitting model could lead to biased standard errors? The
“chat” output variable ( ^
c ) is called the variance inflation factor and is commonly used to adjust
variances to account for lack of fit. The idea is that variances from a mis-specified or poor-fitting
^ to adjust the variances from the
model will be under-estimated. We can multiply our variance by c
model to make them less biased. (This statistic is also used in model selection, which we’ll cover in
^ to be equal to 1.0 as our real-data
a different tutorial.) If our data fit the model, we would expect c
chi-square statistic should be equal to the mean of the simulated chi-square values. In that case,
we would be multiplying our variances by 1.0, meaning that no adjustment to the variances is
necessary.

The MacKenzie and Bailey GOF test was run behind the scenes, and this output summarizes the results.
As usual, the output contains a lot of information!

Some important things to keep in mind:

👉🏻 The “Prb” element is typically the first thing to inspect. Conventionally, if it is < 0.05, you have good
reason to suspect a lack of fit.

👉🏽 The “modfitseed” argument in the occMod function call should be supplied if you want to ensure that
you get the same result when rerunning an analysis. Remember, the MacKenzie and Bailey GOF relies on
simulation, so unless you set the random number seed, you will not get the exact same results should you
re-run the model.

127.0.0.1:15249/#section-gof-test-in-rpresence 21/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

👉🏿 Variances should be inflated when the c-hat estimate value exceeds 1. Values less than 1 can be safely
ignored. The variance can be inflated directly by setting the “chat” argument when running the coef() and
fitted() methods, as illustrated below.

# adjust the standard errors and confidence intervals of a model by changing the chat argu
coef(
object = dot,
param = "psi",
prob = 0.95,
chat = 1.1674
)

Beta_est se lowerCI upperCI lower upper


<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>

psi_A1.Int -0.387695 0.256785 -0.853504 0.07811396 -0.8909935 0.1156035

1 row

# adjust the standard errors and confidence intervals for the fitted function
fitted(
object = dot,
param = "psi",
chat = 1.1674
)

Real_est se lower_0.95 upper_0.95


<dbl> <dbl> <dbl> <dbl>

psi_unit1 0.4042723 0.06184299 0.28306 0.5254846

psi_unit2 0.4042723 0.06184299 0.28306 0.5254846

psi_unit3 0.4042723 0.06184299 0.28306 0.5254846

psi_unit4 0.4042723 0.06184299 0.28306 0.5254846

psi_unit5 0.4042723 0.06184299 0.28306 0.5254846

psi_unit6 0.4042723 0.06184299 0.28306 0.5254846

psi_unit7 0.4042723 0.06184299 0.28306 0.5254846

psi_unit8 0.4042723 0.06184299 0.28306 0.5254846

8 rows

127.0.0.1:15249/#section-gof-test-in-rpresence 22/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

Lack of fit
What should be done if you suspect a lack of fit? The first course of action is to check for errors in the
data. Looking at the observed and expected values can sometimes indicate coding errors in the data. For
example, if most sites contain only a few detections over the set of surveys, but one site has mostly
detections, it may indicate a transcription error, or perhaps a malfunctioning camera-trap.

If the dataset consists of a large number of surveys, another potential solution to the problem of large χ 2
GOF statistics is to “pool” surveys. If survey data are collected daily for 30 days, the data could be pooled
into weekly surveys where each weekly survey would contain a “1” if any of the daily surveys in that week
contained a “1”. This would greatly reduce the number of surveys, without affecting the occupancy
estimate.

If lack of fit is due to violations of the model assumptions, a more complex model might be required.
Remember the key assumptions of the single season occupancy model:

The system is closed to changes in the occupancy status of site during the sampling period. This
means that the species cannot go extinct from a site, or cannot colonize a vacant site over the
course of sampling. Individuals within the population can be born, die, or move, but these
processes cannot influence the occupancy status of a site between sampling periods.

Species are not falsely detected.

Detection at a site is independent of detection at other sites.

Violations could point you to a different kind of occupancy analysis. For example, if there is unmodeled
heterogeneity in detection probabilities, the “heterogeneity” model could be used instead of the simple
single-season model. If occupancy changes within the survey periods, the multi-season model may be
required. These alternatives are discussed in separate tutorials.

A sort-of “last resort” for poor-fitting models is to accept the fact that the model does not fit well, but try to
determine if the poor fit creates bias in the occupancy estimates or variances. We said it is possible that
lack of fit may cause biased occupancy estimates and/or variances, but it is possible that the lack of fit
does not affect them. If the former, you can adjust the variances by multiplying them by c-hat (discussed
in the previous section).

If the reason for the poor fit of the data is known, simulation studies could help determine the effect on
occupancy estimates and variances.

Chapter Summary
We’ve shown the reasoning on why GOF tests are needed and the mechanics on how a test statistic is
computed. Hopefully, this will help you understand the RPresence GOF output. We’ve also shown how
the GOF test can work well with sufficient expected values and tried to show the potential pitfalls in doing
a GOF test when expected values are small.

127.0.0.1:15249/#section-gof-test-in-rpresence 23/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

The goodness-of-fit test is a statistical test and is based on probabilities, and just like parameter
estimates, there is random variation. Even without the possible problems discussed earlier, the result of
the GOF test is a probability that the data fit the model. It is entirely possible for the GOF test to indicate
that a dataset does not fit a model when in fact, it was generated exactly according to the model. We’ve
already seen this in action! Some simulated datasets end up having very high chi square test statistics
just by chance. It is comparable to flipping a coin ten times and getting tails all ten times and concluding
that it is probably not a fair coin. If thousands of people flip fair coins, some of them will get 10 tails in a
row.

With modeling, if you try 20 models with random or unrelated covariates, we would expect 1 of the 20
models to produce a significant (at 5% level) covariate effect. With respect to GOF tests, we would
expect 1 of 20 datasets which in reality fit the model to produce a significant GOF test, indicating failure
of the data to fit the model. Thus, its possible your observed data really do fit the model. Exercise all due
diligence when inspecting your GOF results.

👉🏽 If you’d like a pdf of this document, click the “Open in browser” button (upper-left of tutorial) and then
use the browser “print” function (right-click, print) and print to pdf. If you want to include quiz questions
and R exercises, make sure to provide answers to them before printing.

What’s next?
tutPrePost(tut = "gof", type = "post")

Absolutely fabulous! Suggested follow-ups are listed below. To run a tutorial, use
learnr::runtutorial(‘name’, package = ‘occupancyTuts’)

name priority description

sitecovs Optional Analyzing occupancy data with covariates that affect site
occupancy, including the analysis of null, continuous, categorical,
additive, polynomial, and interactive models in RPresence.

ss_corr_det Optional Simulating and analyzing the single-season correlated detections


occupancy model, in which detections are spatially correlated with
each other.

ss_false_pos Optional Simulating and analyzing the single-season occupancy model with
identification errors, including both false negatives and false
positives.

ss_mixture Optional Simulating and analyzing single-season occupancy data with


heterogeneous detections (mixtures of detection probabilities that
are not captured by covariates).

127.0.0.1:15249/#section-gof-test-in-rpresence 24/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

name priority description

ss_multi_method Optional Simulating and analyzing single-season occupancy models where


multiple methods are used in detection.

ss_multistate Optional Simulating and analyzing the single-season occupancy model in


which occupancy includes more than just 1 state (presence); e.g.,
age-, sex-, breeding-status).

ss_species_richness Optional Single-season species richness occupancy tutorial, in which a pool


of species is monitored in a single season and analyzed with
occupancy approaches.

ss_two_species Optional Single-season two-species interaction occupancy model, in which


the goal is to determine whether a site is occupied by two different
species, and to assess if the species affect each other’s detection
and occupancy probabilities.

study_design Optional An introduction to how to design an occupancy study for a target


species.

surveycovs Optional Analyzing occupancy data with covariates that affect detection
rates, including the analysis of null, continuous, categorical,
additive, polynomial, and interactive models in RPresence

wrangling Optional An introduction to common data wrangling techniques for


occupancy analysis, including brief overviews of the packages
lubridate, dplyr, tidyr, and ggplot2.

127.0.0.1:15249/#section-gof-test-in-rpresence 25/26
6/12/24, 3:38 PM occupancyTuts: Goodness of fit test

127.0.0.1:15249/#section-gof-test-in-rpresence 26/26

You might also like