Journal of Statistical Software: Modeling Population Growth in R With The
Journal of Statistical Software: Modeling Population Growth in R With The
Journal of Statistical Software: Modeling Population Growth in R With The
Abstract
The growth of populations is of interest in a broad variety of fields, such as epidemi-
ology, economics or biology. Although a large variety of growth models are available in
the scientific literature, their application usually requires advanced knowledge of mathe-
matical programming and statistical inference, especially when modelling growth under
dynamic environmental conditions. This article presents the biogrowth package for R,
which implements functions for modelling the growth of populations. It can predict growth
under static or dynamic environments, considering the effect of an arbitrary number of
environmental factors. Moreover, it can be used to fit growth models to data gathered
under static or dynamic environmental conditions. The package allows the user to fix
any model parameter prior to the fit, an approach that can mitigate identifiability issues
associated to growth models. The package includes common S3 methods for visualization
and statistical analysis (summary of the fit, predictions, . . . ), easing result interpretation.
It also includes functions for model comparison/selection. We illustrate the functions in
biogrowth using examples from food science and economy.
Keywords: kinetic modelling, model fitting, predictions, dynamic modeling, R, predictive mi-
crobiology, uncertainty.
1. Introduction
The analysis of growth is a research focus of many scientific fields. However, the actual
meaning of the term “growth” can vary widely. In biology, growth can refer to an increase
in the size of an individual organism or in the number of members in a population due to
2 biogrowth: Modeling Population Growth in R
replication, whereas in economic studies growth may relate to the change in the monetary
value of the goods produced by an economy. Regardless of the field, a usual attribute of growth
is that its underlying mechanisms are complex and not fully understood. For instance, the
replication of bacterial cells is determined by a large network of biochemical reactions that has
not been described completely (Notebaart, Kintses, Feist, and Papp 2018). This complexity
also applies to other fields outside of microbiology, such as socio-economic systems. For that
reason, growth is usually described using mathematical models at the population level, where
the meaning of the term “population” can vary broadly between fields (number of cells, gross
domestic product of a country, . . . ).
Mathematical models describing growth at a population level have many applications. Nonethe-
less, these can be roughly divided in two categories: prediction and inference. Model predic-
tion refers to the estimation of the future state of a system. For instance, growth models can
be used to predict the increase in the population size of a spoilage microorganism, supporting
shelf life estimation (García, Vilas, Herrera, Bernárdez, Balsa-Canto, and Alonso 2015). In-
ference, on the other hand, is related to analyzing the parameters of growth models estimated
under different conditions. For instance, changes in the growth rate of a population can be
used to assess whether some actions are effective at controlling a disease outbreak.
At the moment of writing this manuscript, several R packages for modeling growth are already
available in the Comprehensive R Archive Network (CRAN), although they have some limi-
tations. The packages grofit (Kahm, Hasenbrink, Lichtenberg-Fraté, Ludwig, and Kschischo
2010, no longer available in CRAN) and growthcurver (Sprouffske and Wagner 2016) include
functions for model fitting, but cannot be used for making predictions. On the contrary,
growthmodels (Perez 2023) can be used for making predictions, but does not include functions
for parameter estimation. The package nlsMicrobio (Baty, Delignette-Muller, and Siberchicot
2021) is an extension of nlstools (Baty, Ritz, Charles, Brutsche, Flandrois, and Delignette-
Muller 2015) that defines several growth models from predictive microbiology that can be
used for model fitting or making predictions under static conditions. The recent package
growthrates (Petzoldt 2022) includes functions for both parameter estimation and making
predictions, however, this package (as well as the ones mentioned before) cannot include the
effect of changes in the environmental conditions on the growth of the population, which can
be highly relevant in some case studies. For instance, in the context of food safety, storage
temperature is a major limiting factor of microbial growth that varies largely along the food
chain, being highly influential on the shelf life of a product (González-Tejedor, Garre, Esnoz,
Artés-Hernández, and Fernández 2018). Although it is not specifically designed for modeling
microbial growth drc includes functions for fitting and making predictions for sigmoid curves
that could also be adapted to the description of population growth (Ritz, Baty, Streibig,
and Gerhard 2015). On the other hand, this package cannot be used to simulate dynamic
environmental conditions.
Outside of the R environment, several software applications have been developed to describe
the growth of populations. BGFit (Veríssimo, Paixão, Neves, and Vinga 2013) is an on-
line tool that includes several functions for fitting and sharing growth models. PRECOG
(Fernandez-Ricaud, Kourtchenko, Zackrisson, Warringer, and Blomberg 2016) is also an on-
line application that can estimate various growth parameters of a population. However, it is
not based on predictive models, so it cannot be used for making predictions. This is similar
to the pyphe-growthcurves module of the Pyphe Python toolbox (Kamrad, Rodríguez-López,
Cotobal, Correia-Melo, Ralser, and Bähler 2020), which can describe growth curves based on
Journal of Statistical Software 3
non-parametric methods. DMFit is a web tool for model fitting included within ComBase
(Baranyi and Tamplin 2004) that can be used to fit growth models to data gathered under
dynamic environmental conditions.
In this article, we present the biogrowth package (version 1.0.3) for R. It includes functions
to describe the growth of populations using parametric models at the population level. It
implements functions for model fitting and for the calculation of model predictions. One
of its main advantages with respect to packages with a similar scope is the inclusion of
secondary models to describe the relationship between the growth rate and the environmental
conditions. Moreover, this package enables fitting growth models from data gathered under
varying environmental conditions conditions, a methodology that is considered by several
authors to be superior for parameter estimation (Telen, Logist, Van Derlinden, Tack, and Van
Impe 2012; Huang and Li 2020), or the estimation of secondary models from a set of growth
rates. Furthermore, the fit can be done using either non-linear regression or an adaptive
Monte Carlo algorithm. The growth models (either fitted or user-defined) can be used to
make predictions under static or dynamic environmental conditions. A second advantage
of biogrowth with respect to other approaches is the possibility to include uncertainty in
model predictions, a topic whose relevance for microbial risk assessment is nowadays strongly
emphasized by regulatory bodies (EFSA Scientific Committee et al. 2018; Schendel, Jung,
Lindtner, and Greiner 2018). Furthermore, the package allows choosing between different
model equations and implements functions for model selection/comparison (visualization and
statistical indexes). The package defines S3 classes as the output of the main functions. This
type of R classes is often included in this type of package (many of the packages listed above
include them) to facilitate the inspection of the model fits and predictions using S3 methods
based on generic functions (such as plot() or summary()). Considering these features, we
believe that biogrowth is a step forward with respect to the other R packages and other
applications already available for modeling population growth.
The biogrowth package follows the modeling approach of predictive microbiology, where mod-
els are defined in two steps: primary models and secondary models (Buchanan and Whiting
1996). In this approach, primary models describe the relationship between the population
size and the elapsed time. Therefore, they have a single explanatory variable (time). In
most cases, the maximum specific growth rate of the primary model depends on the envi-
ronmental conditions. A typical example is the relationship between the maximum specific
growth rate of microbial populations and temperature. In predictive microbiology, this type
of relationship is described using secondary models. Although this approach was initially
intended for describing microbial populations, its principles are applicable to a large number
of scientific fields. For instance, some of the models included in biogrowth have been used
in biotechnology (Gonzales, Kim, and Kim 2019; Martínez-Hernández, Taboada-Rodríguez,
Garre, Marín-Iniesta, and López-Gómez 2021), ecology (Jiang et al. 2018), medicine (Adam
and Bellomo 2012), astrobiology (Spada and Melini 2020), economics (Tsai 2013), operations
research (Cabecinhas et al. 2018), or social sciences (Gupta, Kumar, Sangam, and Karisid-
dappa 2002). Therefore, we consider that the functions included in biogrowth can be applied
to a large variety of fields with (in some cases) minor modifications, as illustrated in the case
studies below.
The following section of this paper describes the modeling approach used in biogrowth to
describe the growth of populations. Next, Section 3 describes the software architecture,
presenting the main functions and classes of the package, as well as the mathematical methods
4 biogrowth: Modeling Population Growth in R
used for model fitting and prediction. This is followed by Section 4 illustrating four common
ways to build and apply growth models: based on parameters from the literature, by fitting
primary growth models, by fitting secondary growth models, or by fitting both primary and
secondary growth models to data gathered under dynamic conditions. The article finalizes
with a section summarizing the functions included in biogrowth and contextualizing its use
in a variety of fields. For reasons of space, not every aspect of the package is described in full
detail. The interested reader is referred to the package manual and vignettes for an in-depth
description.
Elapsed time
The modified Gompertz model (Zwietering et al. 1990) is defined in Equation 1, where t is
the elapsed time, and N (t) is the size of the population at time t. This model is defined by
four model parameters: the logarithm of the population size at t = 0 (log10 N0 ), the duration
of the lag phase (λ), the slope of the growth curve during the exponential phase (µ) and
the difference between log10 N0 and the logarithm of the maximum population size in the
stationary phase (C).
e·µ
log10 N (t) = log10 N0 + C exp − exp (λ − t) + 1 (1)
C
Note that through this article we follow a common convention in predictive microbiology
where µ is defined as the slope of the growth curve during the exponential phase in the
log10 N versus t plot. This value is different from the specific growth rate (the slope in the
ln N vs t plot), which can be calculated by multiplying µ by ln(10). By default, the functions
in biogrowth also follow this convention. Nonetheless, the package includes arguments to
make the calculations using either scale (described in Section 3 and in the relevant vignette).
The logistic model is defined in Equation 2, where the variables and parameters have the
same interpretation as in the modified Gompertz model.
C
log10 N (t) = log10 N0 + (2)
1 + exp 4µ
C (λ − t) + 2
The Richards model is defined in Equation 3. This model has an additional parameter with
respect to the modified Gompertz and logistic models (ν), which defines the sharpness of the
transition between the different growth phases.
−1/ν
µ
log10 N (t) = log10 N0 + C 1 + ν · exp 1 + ν + (1 + ν)1+1/ν (λ − t) (3)
C
Regarding the Baranyi model, biogrowth uses the algebraic solution under static conditions
by (Öksüz and Buzrul 2020). This model uses log10 Nmax (the maximum population size)
instead of C. The remaining model parameters are the same as the ones in the modified
Gompertz model.
1 + 10µ(t−λ) − 10−µλ
log10 N (t) = log10 Nmax + log10
10µ(t−λ) − 10−µλ + 10(log10 Nmax −log10 N0 )
The triphasic linear model was suggested as a simplified version of other growth models
(Buchanan et al. 1997; Zwietering, De Wit, Cuppers, and Van ’t Riet 1994), where the
sigmoidal curve is described using three straight lines. It has the same parameters as the
Baranyi growth model.
log10 N0
if t < λ
log10 N (t) = log10 max N if t > tstat
log N + µ(t − λ)
otherwise
10 0
with tstat = (log10 Nmax −log10 N0 )/µ+λ standing for the time required to reach the stationary
phase.
6 biogrowth: Modeling Population Growth in R
dt = ln(10) · µ · N (t)
dN
µ = α · µopt · β
Q(t)
α = 1+Q(t) (4)
β = 1 − NNmax
(t)
dQ
= µopt · Q(t)
dt
In the Baranyi model, the lag phase is modeled based on the hypothesis of an ideal substance
(Q(t)), which must be produced before exponential growth begins. This is implemented by
coefficient α(t) = Q(t)/(1 + Q(t)), which acts as a scaling factor of the specific growth rate.
The duration of the lag phase is determined by the initial value of Q(t) (Q0 ) and the growth
rate according to the identity
ln (1 + 1/Q0 )
λ=
µopt
In this model, the stationary phase is introduced with the coefficient β(t), which takes the
form proposed by (Verhulst 1838). Note that when α = 1 and β = 1, the Baranyi model is
reduced to the exponential growth model.
The biogrowth package models the impact of the environmental conditions on the growth rate
using the so-called gamma concept (Zwietering, Wijtzes, De Wit, and Van ’t Riet 1992), also
called functional response or dimensionless moderators elsewhere. This modeling approach
considers that each environmental factor acts as an independent scaling factor for µ with
respect to the value of µ observed under optimal growth conditions (µopt ). This can be
described for k environmental conditions using the following equation
where Xmin , Xopt and Xmax are the minimum, optimum and maximum values of X for
the population growth. Note that Xmin and Xmax usually do not exactly match the values
Journal of Statistical Software 7
0.5
0.0
0 10 20 30 40 50
X
Figure 2: Gamma factor according to the cardinal parameter model (CPM) for an arbitrary
environmental factor, X. The lines represent the effect of changing the shape factor (n),
while the remaining parameters are fixed: Xmin = 10 (minimum value for growth), Xopt = 35
(optimum value for growth), Xmax = 40 (maximum value for growth).
3.1. Installation
The biogrowth package (Garre, Koomen, Den Besten, and Zwietering 2023) is developed in
R and available from CRAN at https://CRAN.R-project.org/package=biogrowth. Hence,
it can be installed in any computer with R (> 2.1.0) with the command:
R> install.packages("biogrowth")
The development version of the package can be installed from GitHub using devtools (Wick-
ham, Hester, Chang, and Bryan 2022b) with the following command:
R> devtools::install_github("albgarre/biogrowth")
These commands also install the dependencies of biogrowth. The code is written according
to the tidy philosophy, so it imports most packages from tidyverse (Wickham et al. 2019).
Visualizations are made in ggplot2, using also cowplot (Wilke 2020) for some visualizations
that require subplots. Differential equations are solved numerically using deSolve (Soetaert,
Petzoldt, and Setzer 2010), and model fitting is done using FME (Soetaert and Petzoldt
2010). Finally, MASS (Venables and Ripley 2002) and lamW (Adler 2023) are used for some
statistical calculations. The functions are thoroughly documented using roxygen2 (Wickham,
Danenberg, Csárdi, and Eugster 2022a), and several vignettes have been prepared using knitr
(Xie 2015). The guidelines of lifecycle have been followed to describe the maturity of the
different functions (Henry and Wickham 2022).
Model predictions
Calculations under constant environmental conditions are calculated using predict_growth()
setting environment = "constant". The growth model is passed as a list to primary_model,
defining the primary model equation and the values of the model parameters. The primary
models in biogrowth have an algebraic solution, so the prediction is calculated by substituting
the values of the model parameter into the equation. Nonetheless, it is calculated only at
discrete time points (passed as a numeric vector to times). Hence, the precision of posterior
analysis (e.g. visualizations or when calculating the time to reach a given size) can be low if
times is not dense enough.
The function returns an instance of ‘GrowthPrediction’ with the results of the calculation.
It is a subclass of ‘list’ that provides direct access the results of the calculation. It also
implements S3 methods for print(), plot(), summary() and coef() to facilitate analysis of
the prediction.
The calculations are done in two steps. First, the function generates a parameter sample
of the size defined by the argument n_sims, in the selected scale using MASS:mvrnorm().
By default, the function does not consider any parameter correlation, although a correlation
matrix can be passed to the cor argument. Once the parameter sample has been generated,
it is converted to the original parameter scale before calling predict_growth() for each
sampled parameter vector. This generates a family of curves that estimate the distribution
of the population size for the time points defined in the times argument.
The results are returned as an instance of ‘GrowthUncertainty’. Although this subclass of
‘list’ provides direct access to the results of the calculation, it also implements S3 methods
for print() and plot() to ease the visualization and analysis of the results.
Model predictions
Passing environment = "dynamic" to predict_growth() accounts for the effect on the
growth rate of changes in the environmental conditions during the experiment. In this case,
the growth curve is calculated by solving the differential equation of the Baranyi model (Equa-
tion 4) substituting µ(t) by the selected secondary model according to the gamma approach
(Equation 5). The solution is approximated numerically using deSolve::ode(). By default,
the function uses the default settings (algorithm, tolerances, . . . ) of ode(), although the user
can define other controls using the ... argument.
12 biogrowth: Modeling Population Growth in R
As well as for predictions based on primary models only, this mode of calculation estimates
the solution at the discrete time points passed through the argument times. Additionally, it
requires the definition of the variation of the environmental conditions through the experiment
(argument env_conditions). This must be defined as a data.frame (or tibble) with one
column providing the elapsed time and as many columns as needed defining the values of the
environmental conditions. For values of time not included in the data, the function calculates
the values of the environmental conditions by linear interpolation.
The argument secondary_models serves to map a secondary model equation to each environ-
mental factor using a nested list. Due to the use of the gamma approach, predict_growth()
does not impose any limit to the number of environmental conditions. Nonetheless, every
environmental factor must be defined at the same values of time.
Although models including secondary models are designed for describing the growth under dy-
namic environmental conditions, this modeling approach can also predict growth under static
conditions. For that, one has to define a table with constant values for every environmental
factor. The advantages of this approach are discussed in a dedicated package vignette.
As well as for predictions for environment = "constant", the function returns an instance
of ‘GrowthPrediction’ with several S3 methods to ease the analysis of the results.
Fitting both primary and secondary models under dynamic environmental conditions
Passing environment = "dynamic" to the fit_growth() function allows the estimation of
the parameters of both primary and secondary growth models from a dataset. In this case, the
function requires two types of data. The first one, fit_data is the variation in the logarithm
of the population size through the experiment. The second one, env_conditions, provides
the variation of the environmental conditions. Both inputs are defined as a tibble (or
data.frame) following the same conventions as the ones defined above for making predictions.
The fitting can be done using two different algorithms, according to the value of the argu-
ment algorithm: non-linear regression (using FME::modFit()), or an Adaptive Monte Carlo
algorithm (Haario, Laine, Mira, and Saksman 2006, using FME::modMCMC()). Both fitting al-
gorithms require the definition of initial guesses for every model parameter to estimate from
the data. These are defined as a named numeric vector using the argument start. This can be
defined based on information from the scientific literature or by a visual inspection of the pre-
dictions corresponding to a vector of parameter values. The function check_growth_guess()
facilitates this second approach. Given a dataset and a list of model parameters, it generates
a plot comparing the model prediction against the data.
Unlike for fitting when environment = "constant", the package does not include any func-
tion to generate initial guesses of the model parameters based on some heuristic. The rea-
son for this is the possible high parameter correlation for some experimental designs, where
different parameter combinations results in similar predictions. For instance, under some
conditions, the population growth could stop either because it has reached the stationary
phase (defined by log Nmax ) or because the value of some environmental condition is above
the one allowing growth (defined by Xmax ). Consequently, in order to avoid pushing the
fitting algorithm towards an arbitrary solution, a function based on heuristic rules is not
implemented.
As already mentioned, this high parameter correlation can also cause poor parameter iden-
tifiability. In order to mitigate this, fit_growth() allows fixing any model parameter to
Journal of Statistical Software 13
any value through the known argument. If the fitting was successful, the function returns an
instance of ‘GrowthFit’ with the same S3 methods as those implemented for models based
on algebraic equations.
Note that fit_growth() does not implement any convergence check for models fitted by
a Monte Carlo (MC) algorithm. Therefore, the convergence of the Markov chain has to be
evaluated independently. This needs to be done by inspecting the instance of ‘modMCMC’ that is
included in .$fit_results. For details on how to evaluate the convergence of the algorithm,
the reader is referred to the documentation of FME and references therein (Soetaert and
Petzoldt 2010).
Fitting both primary and secondary models from several experiments (global fitting)
The fit_growth() function can also fit a unique growth model (based on primary and sec-
ondary model equations) to the results of several experiments obtained under different (static
or dynamic) environmental conditions. This approach, sometimes called “global fitting” in
predictive microbiology, is triggered by passing approach = "global". This method is also
recommended when the experimental method has low precision, potentially causing relevant
changes in the environmental conditions between independent repetitions.
In this case, the input must be formatted in a similar way as when approach = "single".
It also requires two types of information: the variation in the population size through the
experiment and the experimental conditions during the experiment. However, because the
model is fitted to several experiments, the arguments fit_data and env_conditions must be
defined as (preferably named) lists, where each entry defines the observations or environmental
conditions as a tibble (or data.frame) following the same conventions as described above.
As well as for the standard approach, the model can be fitted either using regression or an
Adaptive Monte Carlo algorithm, according to the value of the argument algorithm. In order
to facilitate the definition of initial guesses for the model parameters, check_growth_guess()
can compare the predictions corresponding to a model with the experimental data.
Global fitting is affected by similar issues related to poor parameter identifiability as described
above for growth models fitted to data gathered under dynamic environmental conditions.
Consequently, biogrowth does not include a function to automatically generate initial guesses
based on heuristic rules. Furthermore, in order to mitigate potential parameter identifiability
issues, any model can be fixed to any value using the known argument.
The function returns an instance of ‘GlobalGrowthFit’, which implements similar S3 methods
to facilitate the visualization and inspection of the model as those defined for ‘GrowthFit’.
growth curve at the discrete time points defined in times by calling predict_growth(). The
calculation is done for the environmental conditions defined as a tibble (or data.frame) in
the env_conditions argument, following the conventions described above. Note that this
input must have the same environmental conditions as those used for model fitting.
The function uses the distribution of the Monte Carlo simulations as an estimate for the
distribution of the population size. The precision of this estimate depends both on the
number of iterations and the precision of the posterior distribution estimated by the adaptive
Monte Carlo algorithm. For this reason, it is essential when using predictMCMC() to also
analyze in detail the convergence of the Markov Chain.
The function returns an instance of ‘MCMCgrowth’. This subclass of ‘list’ provides direct
access to the results of the calculations. It also implements S3 methods for print() and
plot() to facilitate the analysis of the results of the simulations.
for the selected model (summarized in Table 2). The output of these functions also includes
identifier (the character key identifying the model), name (the whole name of the model),
model (function with the model equation) and ref (reference to the scientific article where
the model was defined). This meta-information is used by the functions in biogrowth to make
several consistency check of the models before doing any calculation. These tests can be
disabled by passing check = FALSE to the relevant function.
The functions fit_growth() and predict_growth() use slightly different approaches for
mapping the parameters of secondary models. In fit_growth(), each secondary model is
defined as a named list. This list has an entry named model defining the model equation.
Then, the values of the model parameters are defined as additional entries whose names match
those defined in secondary_model_data(). Once each secondary model has been defined,
they are mapped to the environmental conditions by gathering them in a single list. In this
list, each name must match the column names in env_conditions.
On the other hand, fit_growth() uses named numeric vectors to define model parameters.
Then, each parameter of the secondary model must be defined as factor name + *_* +
parameter key (as per secondary_model_data()). For instance, the parameter Xmin for
the environmental factor “temperature” would be defined as temperature_xmin. Then, the
model equations are defined using the model_keys argument. The illustrations below show
a practical illustration of both procedures. The package documentation shows additional
examples.
By default, all the calculations in biogrowth are done assuming that both log N and µ
are defined in log10 base. Nonetheless, this can be changed by passing different values to
logbase_logN and logbase_mu of the relevant functions.
error (RMSE)
1X
ME = ei
n i
1X 2
s
RMSE = e
n i i
where ei is the residual corresponding to observation i and n is the number of data points.
The analysis of the results of these functions is done by examining the objects they return
(an instance of either ‘GrowthComparison’ or ‘SecondaryComparison’). Both implement an
S3 method for print() that provides a short summary of the models analyzed, as well as a
table of the models arranged by AIC. They also implement an S3 method for plot() that
can be used to compare the models visually. This includes comparing the fitted curves, as
well as the model parameters. They also implement methods for coef() and summary() to
retrieve the statistical indexes used for model comparison and selection.
3.7. Method to estimate the elapsed time to reach a given population size
The time to reach a given population size is often a quantity of great interest in studies related
to population growth. In biogrowth, this can be estimated using time_to_size(). The func-
tion has two mandatory arguments. The first one (model) is an instance of ‘GrowthPrediction’,
‘GrowthFit’, ‘GlobalGrowthFit’, ‘GrowthUncertainty’ or ‘MCMCgrowth’ with the growth
model. Therefore, time_to_size() can be used both for models parameters defined manually
or estimated from data. The second mandatory argument, size, defines the target population
size in log units. By default, time_to_size() assumes that size is defined in the same units
as those used in model. Nonetheless, this can be changed with the logbase_logN argument.
By default, time_to_size() estimates the time by linear interpolation using approx() from
base R, setting the population size as the explanatory variable (x) and the elapsed time as
response variable (y). Note that the precision of the estimate depends strongly on the number
of time points used for the growth predictions. Therefore, it is advisable to plot the growth
curve before calling time_to_size() (e.g. using the plot() methods). If the curvature of the
growth curve is not well described in the vicinity of the target population size, the prediction
should be repeated increasing the number of time points. If the target population size is
outside the range of the simulations, time_to_size() returns NA.
On the other hand, if type = "distribution", the function accounts for parameter uncer-
tainty in the calculations. As of biogrowth 1.0.3, this type of calculation is only compatible
when the growth model is an instance of ‘GrowthUncertainty’ or ‘MCMCgrowth’. The dis-
tribution is calculated by calling time_to_size(type = "discrete") for the growth curve
corresponding to each Monte Carlo simulation of the model. Therefore, it generates as many
values of time as iterations in the original model, which serves as an estimate for the dis-
tribution of the time to reach the target population accounting for parameter uncertainty.
In this case, the function returns an instance of ‘TimeDistribution’ with the results of the
simulations that implements S3 methods for print(), summary() and plot() to ease the
analysis of the results.
Note that the precision of these calculations depend on several simulation settings that must
be checked by the user. First of all, the elapsed time is calculated several times with type =
"discrete". Hence, it has the same dependency on the number of time points. Moreover,
Journal of Statistical Software 17
it also depends on the size of the parameter sample. It is thus advisable to repeat the
simulations several times to evaluate the stability of the solution before defining a seed (using
set.seed()). Finally, care must be taken when the target population size is close to the
initial or maximum population size because it is possible that some of the Monte Carlo
simulations did not reach the target count. In that case, the calculation would return NA for
some Monte Carlo samples. Because the quantiles are calculated omitting NAs, the summary
table generated by time_to_size() would be biased. Therefore, it is advisable to carefully
evaluate the results of predict_growth_uncertainty() before using this function.
4. Illustrations
# A tibble: 6 x 3
time A1 A2
<dbl> <dbl> <dbl>
1 0 5.5 7
2 0.167 6.6 7
3 0.333 7.3 7
4 0.5 6.8 6.9
5 0.667 6 6.8
6 0.833 6.1 6.9
with a valid key, which can be retrieved by calling secondary_model_data() without any
argument.
R> secondary_model_data()
In this case, we will use a “CPM” model (the model used by the original authors). The values
of the model parameters are defined using the same list, identifying them using valid keys
that can be retrieved passing a model key to secondary_model_data()
Therefore, the secondary model with Topt = 36.8◦ C , Tmin = 1.4◦ C, Tmax = 41.0◦ C, n = 2
would be defined as
The next step is mapping this secondary model to a column in the data. First, we will
make the calculation for refrigerator “A1”. Therefore, we need to map the temperature in
the model to column A1 in the data using a named list. Note that fit_growth() uses the
gamma approach, so it can account for any number of environmental factors. If we wanted
to account for additional environmental factors, we would include additional entries in this
list (see the package vignettes for examples).
Then, we need to define the parameters of the primary model. Namely, the values of µopt ,
Nmax , N0 and Q0 . The value of µopt was reported as 2.61 ln CFU/h by (Carlin et al. 2013).
For illustration purposes, we will define an initial population of 100 CFU/g, whereas we will
define an Nmax of 10,000,000 CFU/g, a typical value for B. cereus. As described above,
parameter Q0 defines the duration of the lag phase. We will assign a large value to this
parameter (1,000) to have a model without a lag phase.
Before calling predict_growth(), we just need to define the time points where the solution
is calculated. We will define 1,000 uniformly distributed points between 0 and the maximum
time in the data. Then, we can pass these arguments to predict_growth(). Note that, by
default, the calculations are done in base 10. Because µopt was reported in base e by the
original authors, we need to pass the argument logbase_mu = exp(1).
8.0
2.8
Temperature (ºC)
7.0
2.4
6.5
2.2
6.0
2.0 5.5
0 5 10 15 20 25
Storage time (h)
Figure 3: Prediction for the growth of B. cereus in a domestic refrigerator (black line). The
grey line is the temperature profile.
In order to make predictions for the other refrigerator, we can just change the mapping in
the secondary model and call again predict_growth().
The fact that the plot() method returns an instance of ggplot has several advantages. For
instance, we can pass them to cowplot::plot_grid() to compare the model predictions
under the three conditions, to evaluate whether the frequency of the temperature oscillations
have a relevant effect in the model predictions.
A1 A2
3.00 3.00 7.6
8
Microbial count (log CFU/g)
Temperature (ºC)
Temperature (ºC)
7.2
2.50 7 2.50
6.8
2.25 2.25
6
2.00 2.00
0 5 10 15 20 25 0 5 10 15 20 25
Storage time (h) Storage time (h)
Figure 4: Prediction for the growth of B. cereus in two domestic refrigerators (black line).
The grey line is the temperature profile.
We will apply two transformations to the data before fitting the models. First, because the
models implemented in biogrowth usually describe the logarithm of the population size, the
Journal of Statistical Software 21
We will use the fit_growth() function to estimate the model parameters of a logistic growth
model. As already mentioned, models are defined in biogrowth using a character key, which
can be retrieved calling primary_model_data() without any arguments.
R> primary_model_data()
Passing a valid model key to primary_model_data() returns a list with meta-data of the
model, including the identifiers of each model parameter:
The fitting algorithms included in this function require the definition of initial guesses for
every model parameter fitted from the data (log N0 , C, µ and λ). Although this can be done
by visual inspection or from historical data, the make_guess_primary() function can be used
to obtain initial guesses automatically. Note that we are passing the formula argument to
indicate the column names for the population size and the elapsed time.
logN0 mu lambda C
4.35468455 0.02717687 5.00000000 1.05989775
R> check_growth_guess(greek_tractors,
+ model_keys = list(primary = "Logistic"), guess = my_guess,
+ formula = logtractors ~ t_model
+ )
The plot shows that the line is reasonably close for an initial guess to be used for the fitting
algorithm. Therefore, we can call fit_growth() to get the parameter estimates.
22 biogrowth: Modeling Population Growth in R
5.25
logN (in log−10)
5.00
4.75
4.50
0 10 20 30 40
time
Figure 5: Visualization of the initial guess generated automatically for the data on the number
of tractors.
Estimated parameters:
logN0 mu lambda C
3.60457777 0.04781977 -16.04746607 1.84390683
Fixed parameters:
NULL
We can use the S3 plot method included in the package to visually evaluate that the model
correctly describes the data. From the figure, we can see that the function was able to fit the
model to the data.
R> plot(greek_logistics,
+ label_x = "Year", label_y1 = "log10 of the number of tractors")
Then, we can use the summary() method to retrieve the values of the parameter estimates,
as well as their standard errors.
Journal of Statistical Software 23
5.00
4.75
4.50
0 10 20 30 40
Year
Figure 6: Fit of the logistic model to the number of tractors in Greece between 1961 (year 0
in the plot) and 2006.
R> summary(greek_logistics)
Parameters:
Estimate Std. Error t value Pr(>|t|)
logN0 3.604578 0.165268 21.810 < 2e-16 ***
mu 0.047820 0.001897 25.212 < 2e-16 ***
lambda -16.047466 2.787844 -5.756 8.90e-07 ***
C 1.843907 0.172679 10.678 1.53e-13 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Parameter correlation:
logN0 mu lambda C
logN0 1.0000 -0.9069 0.9892 -0.9992
mu -0.9069 1.0000 -0.8370 0.8947
lambda 0.9892 -0.8370 1.0000 -0.9921
C -0.9992 0.8947 -0.9921 1.0000
It could seem remarkable that a negative value is estimated for parameter λ, which quantifies
the duration of the lag phase. In a different context, this could indicate a deficiency of the
model. However, agricultural tractors have been available since the beginning of the XXth
century, whereas the earliest observation in the data is from 1961. Therefore, in this case,
it is more appropriate to use the mathematical interpretation of parameter λ, defined as the
value of x, where a line with slope µ tangent to the point with maximum growth rate cuts a
horizontal line with y intercept logN0 . Then, a negative value for λ implies that the onset of
exponential growth for the population took place approximately in 1945 (1961 − 16 = 1945).
24 biogrowth: Modeling Population Growth in R
To better illustrate this, we will also fit a growth model fixing parameter λ to zero using the
known argument.
Although the fit of the new model could also be analyzed using the S3 methods for plotting or
statistical summary defined for a growth fit, the function compare_growth_fits() includes
several functions for model comparison/selection. It takes as only input a list of model fits and
returns an instance of ‘GrowthComparison’ with several S3 methods for model comparison
and selection. In this sense, the print() method lists all the models arranged by Akaike
information criterion.
# A tibble: 2 x 5
model AIC df ME RMSE
<chr> <dbl> <int> <dbl> <dbl>
1 Logistic -239. 42 -0.000000156 0.0163
2 Logistic - no lag -179. 43 -0.0000000655 0.0321
The results show that the “Logistic” model (i.e., the one with the negative value of λ) would
be preferred according to the AIC. Note that the models are named according to the names
defined in the list passed to compare_growth_fits(). For this reason, it is recommended to
name this list.
Besides the comparison by statistical indexes, GrowthComparison also includes a plot()
method to visually compare the model fits.
R> plot(tractor_comparison)
This plot shows that the model where we fixed λ = 0 clearly deviates with respect to the
observations at the beginning of the time series. This is further emphasized, when type = 3
is passed to the plotting method to generate a plot of the residuals.
5.25
5.00 model
logN
Logistic
Logistic − no lag
4.75
4.50
0 10 20 30 40
time
Figure 7: Comparison of the model fits of the two logistic models to the tractor data
0.10
0.05
model
res
Logistic
0.00 Logistic − no lag
−0.05
0 10 20 30 40
x
Figure 8: Comparison of the residuals of the two logistic models fitted to the tractor data.
This figure shows that the residuals are larger at the beginning of the experiment for the
Logistic – no lag model, further confirming the conclusions drawn from the previous figure.
The effect of this deviation on the parameter estimates can be observed by passing type = 2
to the plot() method.
This plot shows that, although both models estimate similar values for µ, the model without
lag estimates a larger value for log N0 and smaller for C. This result can be visualized using
the predict() method to calculate the model predictions for time points on a broader time
range than the ones included in the data. We will calculate the predictions in the time range
26 biogrowth: Modeling Population Growth in R
C lambda
−13
2.00
−14
1.75 −15
1.50 −16
−17
1.25
−18
estimate
−19
logN0 mu
4.2 0.049
0.048
4.0
0.047
3.8
0.046
3.6
0.045
3.4
Logistic Logistic − no lag Logistic Logistic − no lag
model
Figure 9: Comparison of the model parameters estimated with both logistic models from the
tractor data. Dots represent estimated values and the error bars their standard errors.
between 1950 and 2010, so the model will be extrapolated both forwards and backwards in
time.
R> library("ggplot2")
R> tibble(
+ year = 1950:2020,
+ t_model = year - min(greek_tractors$Year)
+ ) %>%
+ mutate(Logistic = predict(greek_logistics, times = t_model),
+ `Logistic no lag` = predict(greek_logistics_noLag, times = t_model)
+ ) %>%
+ pivot_longer(-c(t_model, year),
+ names_to = "model", values_to = "logtractors") %>%
+ ggplot() +
+ geom_line(aes(x = year, y = logtractors, colour = model), size = 1) +
+ geom_point(aes(x = Year, y = logtractors),
+ data = greek_tractors, size = 2) +
+ ylab("log10 of the number of tractors") + xlab("Year") +
+ theme_cowplot() +
+ theme(legend.title = element_blank(), legend.position = "top")
This plot shows that, by fixing λ = 0, the model estimates that the number of tractors in
Greece before 1961 is practically equal to the number observed at that time point. On the
other hand, the model that fits every parameter predicts that the number of tractors in the
country increased exponentially since 1950. Considering that tractors were available since the
early XXth century, and that there was economic growth in Greece between 1950 and 1960,
Journal of Statistical Software 27
5.0
4.5
4.0
Figure 10: Comparison between the fit of both models (including extrapolation) to the number
of tractors in Greece between 1961 and 2006 when the lag phase is fixed to one.
the prediction of the model fitting every model parameter seems more reasonable. Together
with the fact the model without a lag phase deviates from the trend of the data points at the
beginning of the experiment, it is reasonable to conclude that this dataset is better described
by the model that considers a lag phase, even if the value of λ is negative.
The function predict_growth_uncertainty() enables including parameter uncertainty in
the model predictions. It is thus intended for cases where variation is a relevant part of the
problem studied, either because the system is highly uncertain or because variability is an
inherent part of it (e.g. in microbial risk assessment, EFSA Scientific Committee et al. 2018).
Although these distributions can be defined by hand, they can also be easily generated from
an instance of ‘GrowthComparison’ using the coef() method.
# A tibble: 4 x 4
par mean sd scale
<chr> <dbl> <dbl> <chr>
1 logN0 3.60 0.165 original
2 mu 0.0478 0.00190 original
3 lambda -16.0 2.79 original
4 C 1.84 0.173 original
5.0
4.5
4.0
0 10 20 30 40 50
Year
Figure 11: Prediction band for the number of tractors in Greece accounting for parameter
uncertainty. The lightblue ribbon shows the interval for the 0.9 level and the darkblue the
0.8 interval.
the analysis. It includes a plot() method to visualize the prediction band, where the line
represents the median of the simulations, and the shaded area the 5th, 10th, 90th and 95th
quantiles.
R> set.seed(12412)
R> greek_uncertainty <- predict_growth_uncertainty(
+ greek_logistics$primary_model,
+ times = seq(0, 50, length = 100),
+ n_sims = 1500,
+ pars)
R> plot(greek_uncertainty,
+ ribbon80_fill = "darkblue", ribbon90_fill = "lightblue") +
+ xlab("Year") + ylab("log10 of the number of tractors")
A final aspect that is often of great interest when using growth models is the elapsed time
required to reach a given population size. This can be estimated using the time_to_size()
function, which can make either a discrete calculation or estimate a distribution for this
variable. As an example, we will calculate the distribution of the time for a target size of
104.7 (chosen arbitrarily for this illustration).
# A tibble: 1 x 5
Journal of Statistical Software 29
150
100
count
50
0
0 10 20 30
time
Figure 12: Distribution of the time to reach a population size of 4.7 log for the tractor data.
The vertical, red line represents the median of the results. The vertical, gray lines the 10th
and 90th percentiles.
R> plot(distrib_to_4p7)
temperature pH mu
1 0.000000 5 9.768505e-04
2 5.714286 5 2.624919e-03
3 11.428571 5 0.000000e+00
4 17.142857 5 1.530706e-04
5 22.857143 5 2.301817e-05
6 28.571429 5 3.895598e-04
The dataset includes simulated values of µ for several experiments at different values of
pH and temperature. Each row of the dataset (64 in total) would represent one individual
experiment. We need to assign a secondary model equations to each environmental factor.
In this example, we will assign a cardinal parameter model (key CPM) to both factors.
As a first step, we will estimate every parameter in the model passing an empty vector to
known_pars.
The fitting algorithm returns several warnings when doing the calculation. This is often
an indication that the model fitted should be analyzed with special care. The instance of
‘FitSecondaryGrowth’ returned by the function implements several S3 methods to help in
this analysis. Namely, the summary() method provides the statistical information on the
model fit.
Journal of Statistical Software 31
R> summary(fit_cardinal)
Parameters:
Estimate Std. Error t value Pr(>|t|)
mu_opt 1.026 NA NA NA
temperature_xmin 5.089 NA NA NA
temperature_xopt 34.855 NA NA NA
temperature_xmax 40.000 NA NA NA
temperature_n 1.140 NA NA NA
pH_xmin 5.138 NA NA NA
pH_xopt 6.476 NA NA NA
pH_xmax 7.000 NA NA NA
pH_n 3.010 NA NA NA
In this case, the function returns an error and calculates NA’s for the standard errors of
every model parameter. This is yet another indication of parameter identifiability issues. To
mitigate this issue, fit_secondary_growth() gives the possibility to fix any model parameter
(and is often necessary in this type of model fit). Often, the first candidates to fix in this type
of model are the order of the models (n). In this case, we will fix the one for temperature
to 1 and the one for pH to 2. We will also fix the Xmax for temperature to 40◦ C, and Xmax
and Xmin for pH to 7 and 5.2 respectively. Note that we also remove these parameters from
my_start to indicate they should not be fitted.
Fixing these model parameters resolves some of the identifiability issues. This results in the
regression algorithm being able to estimate standard errors for every model parameters.
R> summary(fixed_cardinal)
Parameters:
Estimate Std. Error t value Pr(>|t|)
temperature_xmin 6.11686 1.20602 5.072 4.06e-06 ***
32 biogrowth: Modeling Population Growth in R
Parameter correlation:
temperature_xmin temperature_xopt pH_xopt mu_opt
temperature_xmin 1.000e+00 -6.296e-01 1.241e-08 -0.3830
temperature_xopt -6.296e-01 1.000e+00 -2.909e-09 0.8884
pH_xopt 1.241e-08 -2.909e-09 1.000e+00 0.1732
mu_opt -3.830e-01 8.884e-01 1.732e-01 1.0000
The instance of ‘FitSecondaryGrowth’ also includes plot() methods to visualize the good-
ness of the model fit. By default it generates a scatter plot of observed against predicted
values, including a gray line showing a linear regression model fitted to fits vs observations.
This line can be compared against the black, dashed line showing a perfect fit (slope = 1,
intercept = 0). In cases where there is a good agreement between both lines (as is the case
here), one can conclude that the fitted model describes the general trend in the data.
R> plot(fixed_cardinal)
The plot includes an alternative plotting method. Passing which = 2, shows a scatter plot
individually for each environmental factor, showing both the model fits (black) and the obser-
vations (gray). Note that, because the gamma model considers several environmental factors,
there are different model fits for each value of X (representing the effect of the other environ-
mental factors). For this reason, in order to facilitate the comparison, passing add_segment =
TRUE joins the observation and its corresponding fitted value. Moreover, passing add_trend
= TRUE adds a trend line for both the fits and the observations by local regression.
Both plots show that the model describe adequately the data. Nonetheless, there are some
conditions for which the model fits a value of µ = 0, whereas the experiments observed some
growth. Considering that the values of Xmax for both pH and temperature were fixed, this
could indicate that they were fixed to too narrow values. Also, note that the trend line shown
in the second plot is not a model fit. The reason for this is that the gamma approach can
have an arbitrary number of environmental condition, whereas each facet can only have one
environmental factor. Therefore, this plot is only intended to aid in evaluating the goodness
of the fit. For making model predictions, the predict() method should be used.
Journal of Statistical Software 33
0.6
0.3
0.0
0.0 0.3 0.6 0.9
Observed square root of the growth rate
Figure 13: Comparison between the model fits and the predictions for the secondary model
fitted to simulated data. The solid grey line is a linear regression model for fitted vs. observed
and the dashed black line represents a perfect fit.
pH temperature
Square root of the growth rate
0.8
0.4
0.0
Figure 14: Comparison between the trend of the fitted model and the one of the observations
for the secondary model fitted to simulated data. Observations are shown in gray and model
fits in black.
34 biogrowth: Modeling Population Growth in R
When building a model based on data gathered under dynamic conditions, it is important to
consider different hypotheses that may describe the observations. The microbial counts show
the type of sigmoidal shape that can be described using primary models, with growth slowing
down at the beginning and at the end of the experiment. If temperature was constant, this
behavior could be attributed to the adaptation phase required before growth onset and the
depletion of nutrients at the end of the exponential phase. However, the end of the “apparent”
lag phase matches the moment when the temperature was raised from 4 to 7◦ C . In a similar
way, the beginning of the “apparent” stationary phase takes place when temperature is lowered
from 11 to 4◦ C. Considering that the minimum temperature for growth of Listeria spp. is
slightly below 4◦ C (Mataragas, Drosinos, Siana, Skandamis, and Metaxopoulus 2006a), it
is reasonable to assume that the apparent lag and stationary phases are not due to the
mechanisms usually observed in isothermal experiments. Instead, they could be attributed
to the effect of temperature on microbial growth. Furthermore, there is a slight increase in
the population size during the first period at 4◦ C that is not observed in the second period
at this temperature. This could be attributed to the effect of the pH on the growth rate,
as this factor decreases from approximately 6.25 to 5.25 through the experiment (a common
observation during the growth of L. monocytogenes due to the production of lactate). This
effect could also be described using the gamma concept, defining a second gamma factor for
pH.
Journal of Statistical Software 35
5 10
Temperature
4
Logc
8
3
6
2
4
0 500 1000 0 500 1000
Time Time
6.25
6.00
pH
5.75
5.50
5.25
0 500 1000
Time
Figure 15: Illustration of the data retrieved from ComBase on the growth of L. monocytogenes
in pork.
The fit_growth() function can be used to test whether these different hypotheses could
explain the data. We can exploit the possibility to fix some parameters to fit a model that
attributes the reduction in the growth rate to temperature changes. Then, we can use the S3
methods included in the package to evaluate whether the proposed models can describe the
data. ComBase defines the temperature and pH profiles in two different Excel sheets, which
must be joined before they can be used by biogrowth.
Next, we need to define the secondary models. Due to its simplicity, we will use a Zwietering-
type secondary model for both environmental factors.
We can test different model hypotheses by fixing some model parameters to selected values.
To test whether the initial lag phase is due to the temperature inhibition, we can fix the
initial value of Q(t) to a high value (Q0 = 1e10), resulting in a model without lag phase. In
a similar way, we can fix the value of Nmax to a value several orders of magnitude higher than
the maximum microbial load one observed in the experiment (Nmax = 1e8), resulting in a
model without a stationary phase.
Apart from that, we will fix the initial population size to the initial value of the observations.
In predictive microbiology, the order of the model (n) is most often fixed to an integer value.
In this model, the order of both secondary models is set to n = 2.
36 biogrowth: Modeling Population Growth in R
5
logN (in log−10)
0 500 1000
time
Figure 16: Initial guess for fitting the growth model to the growth of L. monocytogenes in
pork observed under dynamic conditions.
The remaining model parameters (µopt and the cardinal parameters) are estimated from the
dynamic data. Due to the use of regression, fit_growth() needs initial guesses for these
parameters. In this example, we will use typical values for the growth of L. monocytogenes.
Nonetheless, we can use check_growth_guess() to check that our initial guess is reasonably
close to the data points. Note that, because ComBase does not use the default column names
for biogrowth, they are defined using the formula argument.
The plot shows that the growth curve is reasonably close to the data points. Then, we can
call fit_growth() with these arguments.
Temperature (ºC)
4
8
3
6
4
0 500 1000
Time (h)
Figure 17: Fit of the Baranyi model with secondary model based on the gamma concept
to the data on growth of Listeria monocytogenes under dynamic conditions (black). The
temperature profile is shown as a gray line.
the growth rate, without introducing hypotheses related to the adaptation of cell metabolism
(lag phase) or nutrient depletion (stationary phase).
The summary S3 method can be used to retrieve the values of the parameter estimates and
their standard errors.
R> summary(listeria_model)
Parameters:
Estimate Std. Error t value Pr(>|t|)
mu_opt 1.1669 0.5461 2.137 0.058330 .
Temperature_xopt 33.0219 27.8768 1.185 0.263578
Temperature_xmin 2.4712 0.4009 6.165 0.000106 ***
pH_xmin 5.2016 0.1039 50.068 2.44e-13 ***
pH_xopt 6.6221 1.0146 6.527 6.66e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Parameter correlation:
mu_opt Temperature_xopt Temperature_xmin pH_xmin pH_xopt
mu_opt 1.0000 0.8461 -0.51629 -0.61135 -0.62524
Temperature_xopt 0.8461 1.0000 -0.27314 -0.28023 -0.93517
Temperature_xmin -0.5163 -0.2731 1.00000 0.76472 -0.03807
pH_xmin -0.6113 -0.2802 0.76472 1.00000 -0.04806
pH_xopt -0.6252 -0.9352 -0.03807 -0.04806 1.00000
It shows that there is extremely high uncertainty associated to the value of Topt (relative
standard error of 0.84). This could be attributed to the experimental design, which covers
temperatures in a relatively short range (4 to 11◦ C), far from the parameter estimate (33◦ C).
Another parameter with high uncertainty in this model is µopt (relative standard error of
0.47). This could be attributed to the high correlation between this parameter and Topt ,
which inflates the uncertainty of both parameters. In order to reduce this uncertainty, we
could fix parameter Topt to 35◦ C, a value that has been reported for Listeria spp. in meat
products (Liu, Wang, Liu, and Dong 2019). For the same reasons, we also fix pHopt to 7.
Parameters:
Estimate Std. Error t value Pr(>|t|)
mu_opt 1.677883 0.214475 7.823 4.72e-06 ***
Temperature_xmin 2.189692 0.254278 8.611 1.76e-06 ***
pH_xmin 5.143662 0.001194 4308.702 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Parameter correlation:
mu_opt Temperature_xmin pH_xmin
mu_opt 1.00000 0.97696 0.01896
Temperature_xmin 0.97696 1.00000 -0.09215
pH_xmin 0.01896 -0.09215 1.00000
The summary table shows that fixing the model parameters results in a strong reduction of
parameter uncertainty for µopt compared to the initial model. The compare_growth_fits()
function can be used to assess whether this parameter reduction reduces the precision of the
model fit. This function accepts a (named) list of models.
4
model
logN
Reduced
Standard
3
0 500 1000
time
Figure 18: Comparison of the two model fits to the data on the growth of L. monocytogenes
under dynamic conditions.
The plot() method compares both models fitted. In this case, it shows that there is practi-
cally no difference between the fitted curves.
R> plot(comparison_listeria)
Calling the print() method returns a table of the AIC for each model fit, showing that the
“reduced” model is preferred over the standard one.
R> print(comparison_listeria)
# A tibble: 2 x 5
model AIC df ME RMSE
<chr> <dbl> <int> <dbl> <dbl>
1 Reduced -0.469 12 0.0140 0.179
2 Standard 8.01 10 0.00706 0.175
An additional comparison that can be of interest, is the one provided by the plot() method
when type = 2. This plot compares the expected values and standard errors of the param-
eter estimates for both models. The figure shows that both models practically estimate the
same parameter values. However, the standard model estimates every parameter with higher
uncertainty. This is often an indication of poor identifiability, being additional evidence that
not all the parameters of the standard model can be identified with this experimental design.
5.20
6.5
1.0 5.15
6.0
estimate
5.10
Reduced Standard
Temperature_xmin Temperature_xopt
60
2.75
2.50 40
2.25
20
2.00
Figure 19: Comparison of the parameters estimates for the two models fitted to the data on
the growth of L. monocytogenes under dynamic conditions. Note that the parameters that
were fixed in the reduced model are not shown in their respective facets.
It is worth mentioning that the fitted model estimates a value of 5.14 for pHmin . This value is
higher than the minimum pH enabling growth of L. monocytogenes at optimal temperature
conditions, which usually ranges 4.4 (George, Lund, and Brocklehurst 1988). This type of
deviation is typical when fitting a model to data gathered under dynamic conditions because
experimental conditions rarely include sufficient data points in the vicinity of Xmin or Xmax .
Therefore, the behavior of the population at extreme conditions has little leverage on the
model fitted for many experimental designs. For this reason, in predictive microbiology, the
parameters estimated in this approach are usually called “theoretical” Xmin or Xmax and must
be further validated using dedicated growth boundary experiments (George et al. 1988; Le
Marc, Huchet, Bourgeois, Guyonnet, Mafart, and Thuault 2002).
For global fitting, the fit_growth() function requires that the data is saved as a list, where
each entry describes the information from each experiment. The microbial counts can be
converted directly.
Regarding the environmental conditions, in the same way as in the previous example, we need
to put both temperature and pH in the same table before converting it into a list.
Now, we can fit the growth model using fit_growth(). In this case, we pass the argument
approach = "global" to indicate that we are fitting a unique model to different growth
experiments. We also use a Monte Carlo algorithm because we later intend to make predic-
tions including parameter uncertainty. Note that, when calling the function we are passing
additional arguments to FME::modMCMC(). Namely, the number of iterations for updating the
covariance matrix and the burn-in length.
R> set.seed(12421)
R> global_fit <- fit_growth(multiple_listeria, sec_models, start,
+ known, environment = "dynamic", algorithm = "MCMC",
+ approach = "global", env_conditions = multiple_profile,
+ formula = Logc ~ Time, niter = 6000, updatecov = 500,
+ burninlength = 1000
+ )
Although the algorithm converges to a solution, the percentage of accepted runs is relatively
low. This often indicates that the convergence of the algorithm could be improved by modi-
fying its parameters. Nonetheless, this topic is highly specific and is out of the scope of this
article. The interested reader is referred to the specific literature on the topic (e.g. (Brooks,
Gelman, Jones, and Meng 2011)).
We can call the plot() method of the instance of GlobalGrowthFit returned by the function
to check that the fitted model properly describes the trend in both experiments.
A B
5 6.25
Microbial concentration (log CFU/g)
6.00
6.00 4
4
pH value
pH value
5.75 5.75
3 3
5.50
5.50
2 2
5.25
Figure 20: Global fit of the growth model for L. monocytogenes in pork meat from two
independent experiments (black line). The pH profile is shown in maroon.
This figure illustrates that the fitted model can describe the variation in the population size
observed in both experiments. This is further support of the hypothesis that the sigmoidal
shape in the population size is due to the effect of the environmental conditions rather than
the usual causes for a lag or stationary phases.
Then, we can use the fitted model for making predictions. In this illustration, we will include
parameter uncertainty in the predictions using predictMCMC(). This prediction can be made
for any type of environmental condition, constant or not. As an example, we will use an
environmental profile with constant temperature (5◦ C) and pH decaying from pH 7 to pH 5
linearly over 1,000 hours. This functions also allows modifying any model parameter through
the newpars argument. Hence, we will also modify the initial population size (N0 ) to 1
CFU/g, to represent conditions different to those used under laboratory conditions.
8 7.0
pH value
4 6.0
2 5.5
0 5.0
0 250 500 750 1000
time
Figure 21: Prediction band for the growth of L. monocytogenes under dynamic environmental
conditions including parameter uncertainty. The grey and blue ribbons represent, respectively,
the credible intervals at the 0.9 and 0.8 levels. The dashed, black line represents the pH profile.
Once this prediction has been calculated, we can also estimate the distribution of the time
required to reach a population size of 2 log CFU/g. This value has been selected because it
is often used as a microbiological criterion to define the shelf life of food products.
# A tibble: 1 x 5
m_time sd_time med_time q10 q90
<dbl> <dbl> <dbl> <dbl> <dbl>
1 231. 141. 192. 101. 414.
Note that passing type = "distribution" calculates a distribution for the elapsed time
instead of a discrete value. This distribution represents the uncertainty in the parameter
estimates, and can also be represented as a histogram using the S3 method for plot().
R> plot(time_to_100)
200
150
count
100
50
0
0 250 500 750 1000
time
Figure 22: Estimated distribution for the time to reach 100 CFU/g for L. monocytogenes.
The vertical, red line represents the median of the simulations, whereas the vertical, grey lines
show the 10th and 90th percentiles.
5. Closing remarks
Mathematical models are applied in a large variety of fields to describe the growth of popula-
tions. However, their application usually requires the use of advanced mathematical methods
to build the model (e.g. inference) and/or to make prediction (e.g. solving differential equa-
tions). This can be specially troublesome when analyzing growth under dynamic environ-
mental conditions. The biogrowth package can make these models more accessible, providing
an R interface to the functions required for building the models and making predictions. Al-
though the methods implemented in the package are based on predictive microbiology, the
mathematical equations and methods are also applied to a large variety of fields, such as
biotechnology or economy.
Nonetheless, the biogrowth package also has some limitations. This is due to the fact that the
effect of changes in the environmental conditions on the duration of the lag phase is highly
complex, depending both on factors of the population (e.g. physiological state of the cells) and
external factors (e.g., temperature or pH). Although some models have been published to de-
scribe this effect (Augustin, Rosso, and Carlier 2000; Yue et al. 2019; Delignette-Muller, Baty,
Cornu, and Bergis 2005), they are quite specific and hard to extrapolate to non-laboratory
conditions. Consequently, these modeling approaches are not included in biogrowth. For
the same reason, the package does not include the possibility to define interactions between
gamma factors. Another limitation of the package is that it does not include the possibil-
ity to implement user-defined models. The reason for this is that it is designed as a tool
to facilitate the application of growth models that are broadly applied in the literature.
We believe that more general packages for model fitting (e.g. FME) are more suited for
tuning model equations. Nonetheless, for advanced users, the wiki from its GitHub page
Journal of Statistical Software 45
Acknowledgements
The authors would like to thank Prof. Pablo S. Fernandez from Technical University of Carta-
gena; Prof. Fernando Perez-Rodriguez, Dr. Aricia Possas and Ms. Cristina Diaz Martinez
from University of Cordoba; Dr. Ignacio Alvarez Lanzarote and Dr. Guillermo Cebrian from
University of Zaragoza; and Dr. Annemarie Pielaat and Dr. Joost Smid from Unilever for
reviewing and providing valuable feedback on early versions of the software. They also thank
Dr. Sara Bover-Cid, from IRTA for sharing the data regarding the temperature profiles in
refrigerators.
Alberto Garre was supported by the European Union’s Horizon 2020 research and innova-
tion programme under the Marie Sklodowska-Curie Individual Fellowship grant No 844423
(FANTASTICAL).
46 biogrowth: Modeling Population Growth in R
References
Adam JA, Bellomo N (2012). A Survey of Models for Tumor-Immune System Dynamics.
Springer-Verlag. doi:10.1007/978-0-8176-8119-7.
Adler A (2023). lamW: Lambert-W Function. R package version 2.1.2, URL https://CRAN.
R-project.org/package=lamW.
Antolinos V, Muñoz-Cuevas M, Ros-Chumillas M, Periago PM, Fernández PS, Le Marc Y
(2012). “Modelling the Effects of Temperature and Osmotic Shifts on the Growth Kinetics
of Bacillus weihenstephanensis in Broth and Food Products.” International Journal of Food
Microbiology, 158(1), 36–41. doi:10.1016/j.ijfoodmicro.2012.06.017.
Augustin JC, Rosso L, Carlier V (2000). “A Model Describing the Effect of Temperature His-
tory on Lag Time for Listeria Monocytogenes.” International Journal of Food Microbiology,
57(3), 169–181. doi:10.1016/s0168-1605(00)00260-9.
Baranyi J, Roberts TA (1994). “A Dynamic Approach to Predicting Bacterial Growth
in Food.” International Journal of Food Microbiology, 23(3-4), 277–294. doi:10.1016/
0168-1605(94)90157-0.
Baranyi J, Tamplin ML (2004). “ComBase: A Common Database on Microbial Responses
to Food Environments.” Journal of Food Protection, 67(9), 1967–1971. doi:10.4315/
0362-028x-67.9.1967.
Baty F, Delignette-Muller ML, Siberchicot A (2021). nlsMicrobio: Data Sets and Nonlinear
Regression Models Dedicated to Predictive Microbiology. R package version 0.0-3, URL
https://CRAN.R-project.org/package=nlsMicrobio.
Baty F, Ritz C, Charles S, Brutsche M, Flandrois JP, Delignette-Muller ML (2015). “A Tool-
box for Nonlinear Regression in R: The Package nlstools.” Journal of Statistical Software,
66(5), 1–21. doi:10.18637/jss.v066.i05.
Brooks S, Gelman A, Jones G, Meng XL (2011). Handbook of Markov Chain Monte Carlo.
CRC press. doi:10.1201/b10905.
Buchanan RL, Whiting RC (1996). “Risk Assessment and Predictive Microbiology.” Journal
of Food Protection, 59(13), 31–36. doi:10.4315/0362-028x-59.13.31.
Buchanan RL, Whiting RC, Damert WC (1997). “When Is Simple Good Enough: A Com-
parison of the Gompertz, Baranyi, and Three-Phase Linear Models for Fitting Bacterial
Growth Curves.” Food Microbiology, 14(4), 313–326. doi:10.1006/fmic.1997.0125.
Cabecinhas M, Domingues P, Sampaio P, Bernardo M, Franceschini F, Galetto M, Gianni M,
Gotzamani K, Mastrogiacomo L, Hernandez-Vivanco A (2018). “Integrated Management
Systems Diffusion Models in South European Countries.” International Journal of Quality
& Reliability Management, 35(10), 2289–2303. doi:10.1108/ijqrm-03-2017-0044.
Carlin F, Albagnac C, Rida A, Guinebretière MH, Couvert O, Nguyen-the C (2013). “Varia-
tion of Cardinal Growth Parameters and Growth Limits According to Phylogenetic Affilia-
tion in the Bacillus cereus Group. Consequences for Risk Assessment.” Food Microbiology,
33(1), 69–76. doi:10.1016/j.fm.2012.08.014.
Journal of Statistical Software 47
Delignette-Muller ML, Baty F, Cornu M, Bergis H (2005). “Modelling the Effect of a Temper-
ature Shift on the Lag Phase Duration of Listeria monocytogenes.” International Journal
of Food Microbiology, 100(1), 77–84. doi:10.1016/j.ijfoodmicro.2004.10.021.
EFSA Scientific Committee, Benford D, Halldorsson T, Jeger MJ, Knutsen HK, More S,
Naegeli H, Noteborn H, Ockleford C, Ricci A, Rychen G, Schlatter JR, Silano V, Solecki
R, Turck D, Younes M, Craig P, Hart A, Von Goetz N, Koutsoumanis K, Mortensen A,
Ossendorp B, Martino L, Merten C, Mosbach-Schulz O, Hardy A (2018). “Guidance on
Uncertainty Analysis in Scientific Assessments.” EFSA Journal, 16(1). doi:10.2903/j.
efsa.2018.5123.
García MR, Vilas C, Herrera JR, Bernárdez M, Balsa-Canto E, Alonso AA (2015). “Quality
and Shelf-Life Prediction for Retail Fresh Hake (Merluccius merluccius).” International
Journal of Food Microbiology, 208, 65–74. doi:10.1016/j.ijfoodmicro.2015.05.012.
Garre A, Egea JA, Iguaz A, Palop A, Fernandez PS (2018a). “Relevance of the Induced Stress
Resistance When Identifying the Critical Microorganism for Microbial Risk Assessment.”
Frontiers in Microbiology, 9, 1663. doi:10.3389/fmicb.2018.01663.
Garre A, Fernández PS, Lindqvist R, Egea JA (2017). “bioinactivation: Software for Mod-
elling Dynamic Microbial Inactivation.” Food Research International, 93, 66–74. doi:
10.1016/j.foodres.2017.01.012.
Garre A, Koomen J, Den Besten HMW, Zwietering MH (2023). biogrowth: Modelling of Pop-
ulation Growth. R package version 1.0.3, URL https://CRAN.R-project.org/package=
biogrowth.
George SM, Lund BM, Brocklehurst TF (1988). “The Effect of pH and Temperature on
Initiation of Growth of Listeria Monocytogenes.” Letters in Applied Microbiology, 6(6),
153–156. doi:10.1111/j.1472-765x.1988.tb01237.x.
Gonzales RR, Kim JS, Kim SH (2019). “Optimization of Dilute Acid and Enzymatic
Hydrolysis for Dark Fermentative Hydrogen Production from the Empty Fruit Bunch
of Oil Palm.” International Journal of Hydrogen Energy, 44(4), 2191–2202. doi:
10.1016/j.ijhydene.2018.08.022.
Gupta BM, Kumar S, Sangam SL, Karisiddappa CR (2002). “Modeling the Growth of
World Social Science Literature.” Scientometrics, 53(1), 161–164. doi:10.1023/a:
1014844222898.
Haario H, Laine M, Mira A, Saksman E (2006). “DRAM: Efficient Adaptive MCMC.” Statis-
tics and Computing, 16(4), 339–354. doi:10.1007/s11222-006-9438-0.
Henry L, Wickham H (2022). lifecycle: Manage the Life Cycle of Your Package Functions.
R package version 1.0.3, URL https://CRAN.R-project.org/package=lifecycle.
Le Marc Y, Huchet V, Bourgeois CM, Guyonnet JP, Mafart P, Thuault D (2002). “Mod-
elling the Growth Kinetics of Listeria as a Function of Temperature, pH and Organic
Acid Concentration.” International Journal of Food Microbiology, 73(2), 219–237. doi:
10.1016/s0168-1605(01)00640-7.
Liu Y, Wang X, Liu B, Dong Q (2019). “One-Step Analysis for Listeria Monocytogenes
Growth in Ready-to-Eat Braised Beef at Dynamic and Static Conditions.” Journal of Food
Protection, 82(11), 1820–1827. doi:10.4315/0362-028x.jfp-18-574.
Nguimkeu P (2014). “A Simple Selection Test between the Gompertz and Logistic Growth
Models.” Technological Forecasting and Social Change, 88, 98–105. doi:10.1016/j.
techfore.2014.06.017.
Notebaart RA, Kintses B, Feist AM, Papp B (2018). “Underground Metabolism: Network-
Level Perspective and Biotechnological Potential.” Current Opinion in Biotechnology, 49,
108–114. doi:10.1016/j.copbio.2017.07.015.
Öksüz HB, Buzrul S (2020). “Monte Carlo Analysis for Microbial Growth Curves.” Journal
of Microbiology, Biotechnology and Food Sciences, 10(3), 418–423. doi:10.15414/jmbfs.
2020.10.3.418-423.
Perez DR (2023). growthmodels: Nonlinear Growth Models. R package version 1.3.1, URL
https://CRAN.R-project.org/package=growthmodels.
Petzoldt T (2022). growthrates: Estimate Growth Rates from Experimental Data. R package
version 0.8.4, URL https://CRAN.R-project.org/package=growthrates.
Ratkowsky DA, Lowry RK, McMeekin TA, Stokes AN, Chandler RE (1983). “Model for
Bacterial Culture Growth Rate throughout the Entire Biokinetic Temperature Range.”
Journal of Bacteriology, 154(3), 1222–1226. doi:10.1128/jb.154.3.1222-1226.1983.
Ratkowsky DA, Olley J, McMeekin TA, Ball A (1982). “Relationship between Temperature
and Growth Rate of Bacterial Cultures.” Journal of Bacteriology, 149(1), 1–5. doi:
10.1128/jb.149.1.1-5.1982.
Ritz C, Baty F, Streibig JC, Gerhard D (2015). “Dose-Response Analysis Using R.” PLOS
One, 10(12), e0146021. doi:10.1371/journal.pone.0146021.
Rosso L, Lobry JR, Bajard S, Flandrois JP (1995). “Convenient Model to Describe the Com-
bined Effects of Temperature and pH on Microbial Growth.” Applied and Environmental
Microbiology, 61(2), 610–616. doi:10.1128/aem.61.2.610-616.1995.
Soetaert K, Petzoldt T (2010). “Inverse Modelling, Sensitivity and Monte Carlo Analysis in
R Using Package FME.” Journal of Statistical Software, 33(3), 1–28. doi:10.18637/jss.
v033.i03.
50 biogrowth: Modeling Population Growth in R
Telen D, Logist F, Van Derlinden E, Tack I, Van Impe J (2012). “Optimal Experiment Design
for Dynamic Bioprocesses: A Multi-Objective Approach.” Chemical Engineering Science,
78, 82–97. doi:10.1016/j.ces.2012.05.002.
Tsai BH (2013). “Predicting the Diffusion of LCD TVs by Incorporating Price in the Extended
Gompertz Model.” Technological Forecasting and Social Change, 80(1), 106–131. doi:
10.1016/j.techfore.2012.07.006.
Vásquez GA, Busschaert P, Haberbeck LU, Uyttendaele M, Geeraerd AH (2014). “An Edu-
cationally Inspired Illustration of Two-Dimensional Quantitative Microbiological Risk As-
sessment (QMRA) and Sensitivity Analysis.” International Journal of Food Microbiology,
190, 31–43. doi:10.1016/j.ijfoodmicro.2014.07.034.
Venables WN, Ripley BD (2002). Modern Applied Statistics with S. 4th edition. Springer-
Verlag, New York. URL https://www.stats.ox.ac.uk/pub/MASS4/.
Verhulst PF (1838). “Notice sur la loi que la population suit dans son accroissement.” Cor-
respondance Mathématique et Physique, 10, 113–121.
Veríssimo A, Paixão L, Neves AR, Vinga S (2013). “BGFit: Management and Automated
Fitting of Biological Growth Curves.” BMC Bioinformatics, 14(1), 283. doi:10.1186/
1471-2105-14-283.
Vilas C, Arias-Mendez A, Garcia MR, Alonso AA, Balsa-Canto E (2018). “Toward Predictive
Food Process Models: A Protocol for Parameter Estimation.” Critical Reviews in Food
Science and Nutrition, 58(3), 436–449. doi:10.1080/10408398.2016.1186591.
Wilke CO (2020). cowplot: Streamlined Plot Theme and Plot Annotations for ggplot2. R pack-
age version 1.1.1, URL https://CRAN.R-project.org/package=cowplot.
Xie Y (2015). Dynamic Documents with R and knitr. Chapman & Hall/CRC, Boca Raton.
Yue S, Liu Y, Wang X, Xu D, Qiu J, Liu Q, Dong Q (2019). “Modeling the Effects of the
Preculture Temperature on the Lag Phase of Listeria Monocytogenes at 25°C.” Journal of
Food Protection, 82(12), 2100–2107. doi:10.4315/0362-028x.jfp-19-117.
Zwietering MH, De Wit JC, Cuppers HGAM, Van ’t Riet K (1994). “Modeling of Bacterial
Growth with Shifts in Temperature.” Applied and Environmental Microbiology, 60(1), 204–
213. doi:10.1128/aem.60.1.204-213.1994.
Zwietering MH, Jongenburger I, Rombouts FM, Van ’t Riet K (1990). “Modeling of the
Bacterial Growth Curve.” Applied and Environmental Microbiology, 56(6), 1875–1881.
doi:10.1128/aem.56.6.1875-1881.1990.
Zwietering MH, Wijtzes T, De Wit JC, Van ’t Riet K (1992). “A Decision Support System
for Prediction of the Microbial Spoilage in Foods.” Journal of Food Protection, 55(12),
973–979. doi:10.4315/0362-028x-55.12.973.
Affiliation:
Alberto Garre
Food Microbiology
Wageningen University & Research
and
Department of Agronomical Engineering & Institute of Plant Biotechnology
Universidad Politécnica de Cartagena
Murcia, Paseo Alfonso XIII, 48, 30203, Spain
E-mail: alberto.garre@upct.es