1 Introduction

Theoretical models are derived through inductive (e.g., from primary principles) and deductive (e.g., observing natural populations) logic. For example, the density-independent (synonymous with Malthusian growth) models begin with a premise that individuals have a capacity to reproduce themselves at rates \(r\) or \(\lambda\). As another example, the density-dependent models begin with an observation about how populations tend to change (grow when they are small, but up to a maximum size), and we deduce that crowding must negatively be affecting population growth. We then use these and other models to learn about other observed behaviors or predict behaviors of the dependent state variable(s) of interest (predominantly population size in this course). In practice, the science of creating models, learning from and about the models, and reevaluating our models is cyclical and indefinite. A “model” of model building is depicted in a multiplicity ways. Consider these two examples:

Example 1: a general mathematical modeling cycle.
Model Cycle1


Example 2: a more realistic ecological forecasting model.
Model Cycle2

In this course, we’ve primarily focused on 6 models so far:

Discrete time Continuous time
Density-independence \(N_{t+1}=\lambda N_t\) \(\frac{dN}{dt}=rN\)
Density-dependence \(N_{t+1}=\lambda N_t\left(1 - N_t\right)\) \(\frac{dN}{dt}=rN - \alpha N^2\), and
\(\frac{dN}{dt}=rN\left(\frac{K-N}{K}\right)\)
Population structure \(\mathbf{N}_{t+1}=\mathbf{P} \mathbf{N}_t\)

The models you’ve learned about so far are all mathematical models that describe how single-species populations change. If we are interested in how these models depict population growth in a real population, we’d need to ask two questions:

  1. What are numerical values of each of the parameters in a model?

and

  1. How would we know, among models, which to choose if we were interested in understanding how a population is changing or will continue to change in the future?

In this lab, you’ll be answering both questions from a data set on the population growth of a single species.

2 Species and experimental background for Lemna minor (common duckweed)

lemna minor thalli
Lemna minor (hereafter Lemna) is an aquatic plant with a distribution across most of Africa, Eurasia, and North America. It is a floating flowering plant that grows thalli (leaf-like structures above) then divides into clones. It also propagates through sexual reproduction, but that happens at a longer timescale than the experiment happened. The aim of the experiment was (1) to grow Lemna and see if it followed a logistic-type curve and (2) if it did grow according to a logistic-type curve, estimate the parameters of the logistic equation.

3 Estimating paramters

3.1 Statistical parameter estimation from your past: linear regression

Models can be used to tell you about the underlying ecology in a population. In statistical models you’ve used in the past, such as regression, you’ve generated parameter values that describe the relationship between two variables in a linear model. The parameter values for regression are your intercept and slope; in other words, a constant parameter and a parameter that is the coefficient describing the proportional relationship between your independent and dependent variables. This statistical model, generalized is: \[y_i = \beta _0 + \beta _1x_i + \epsilon.\] Here, \(y_i\) is the dependent variable, \(x_i\) is the independent variable, \(\beta _0\) is the intercept (the value of the dependent variable when the independent is absent/equal to 0), \(\beta _1\) describes the linear relationship between the independent and dependent variables, and (you can ignore this term) \(\epsilon\) is the variation unaccounted deviation from \(\beta\) terms. Regression estimates \(\beta _0\) and \(\beta _1\), which then allows you to state the best estimated relationship between the two variables.

3.2 Nonlinear parameter estimation using nonlienar least squares estimates

Today, you’ll be doing the same thing as linear regression, conceptually, but you will be fitting a model that is non-linear instead of linear. For the logistic equation, for example, you’ll be estimating the parameters \(r\) and \(\alpha\) (or \(r\) and \(K\)) instead of an intercept and slope. The way that you’ll do this is by using a method called nonlinear least squares (Wikipedia). In R, the function is nls() and the nonlinear least squares parameter estimation method is quite intuitive. In sum, it is a search algorithm that finds the values of parameters that minimizes the sum of the squared distances of the data points from your curve (a line is a special curve, too).

As a graphical depiction of the method, imagine I have some fake logistic growth data (points) and the logistic equation with some parameters I made up (dotted curve), I can measure the distance between the points and the curve, square them, and sum them up to generate an value:

With some different parameters, with the same data will yield something that looks like this:

Notice how much more red (vertical difference between the curve and points) you see. This means that the curve did not approximate the points as well. This is essentially how non-linear least squares evaluates fit.

Now, your turn! Again, the function that you will use to generate non-linear least squares estimates is: nls(). This function takes two arguments that will be important: (i) the formula of the equation and (ii) values for where you’d like the parameter search to start. We need to give the search algorithm somewhere to start; otherwise it would have search between \(\infty\) and \(-\infty\) and would consequently take \(\infty\) seconds to find the optimal parameters estimates. The formula will be the solution to the logistic equation, just like you’ve been using. The format needs to be

data ~ equation

with the “data” being a vector of data and the “equation” being the logistic equation solution. The solution will include some knowns (i.e., your initial population size \(N(0)\), the time sequence \(t\)) and some unknowns (i.e., \(r\), \(\alpha\)). Oh, and the format for a forumla in R uses the tilde (pronounced: ’tildə): ~. Last, you need to give R some starting values for each of the parameter estimates. Remember, this algorithm can theoretically search all real numbers (\(\pm \infty\)), so we must supply it with some guesses. Ultimately, you should produce something that looks like the following if I had data on energy (E) and wanted to estimate the values of mass (m) and the speed of light (c):

nls(formula = E ~ m*c^2, start = c(m = 0.1, c = 2))

Here’s the solution to the logistic equation (remember than the parentheses are \(N\) as a function of time, and not multiplication): \[N(t) = \frac{rN(0)e^{rt}}{r + \alpha N(0)(e^{rt} - 1)}\]

and the exponential growth equation: \[N(t) = N(0)\left(\exp (rt)\right).\]

Using nls(), please estimate \(r\) and \(\alpha\) from the logistic equation and \(r\) from the exponential equation. The data can be downloaded here: link. Please place the data in the same directory as your .Rmd and read it in using the function read.csv(). That is, read.csv(file = /home/[your username]/Private/[directory between Private and your Lemna file]/Lemna.csv"). Look at the data. Rows correspond to sequential days when Lemna was counted and each column is a different group’s data. Choose a group’s data randomly by using sample(x = 1:48, size = 1) (that will give you the column), fit the logistic and exponential solutions to the data, use those parameters to draw two fitted lines, and then add the points. What are the parameter values for each model equation? Are they the same? Should they be?

So now you know how to fit your models to data: excellent!! That means we have estimates for \(r\), and in the case of logsitic growth, \(\alpha\). \(r\) is in units of \(\text{time}^{-1}\) and \(\alpha\) is in units of \(\text{individuals}^{-1}\times \text{time}^{-1}\). This means that you calcualted the intrinsic rate of growth per unit of time and the effect of crowding per unit of individuals (per time, too). Ultimately, you just learned something about biology from fitting your model to data!

4 Comparing models

Return again to the modeling cycle: we have models that act as hypotheses, we fit them to data to learn about the population biology, and now we want to evaluate the models against one another. Below, I outline several ways of comparing models: reasoning, graphical, and residual sum-of-squares.

4.1 Comparing through reasoning

I am trying to be as inclusive as possible here, so I am including “reasoning” as a way of comparing models. In some sense we have done this already. For instance, I dind’t include herbivore populations in my model that comsume Lemna. There is an entire universe of models that weren’t included because we rejected them by presumtion right off the bat. The modesl that remained we subject to more rigorous comparisons.

4.2 Graphical comparison

Visually, you probably saw a big difference in the fit of the logistic and exponential solutions. Like reasoning, this is a valid way of making comparisons between models. In some cases, like the logistic and exponential solutions, it’s very simple. But, as I am certian you can imagine, this is not always necessarily the case and we want methods that are more rigorous. Epistemologically, the approach of science is a powerhouse: we use quantitative information to evaluate your hypotheses. So, let’s go ahead and do that below.

4.3 Residual sum-of-squares comparison

If you run them again, notice that there is a metric that is in the output of nls(): the residual sum-of-squares. The residual sum-of-squares is exactly what the fake data above measued in tems of fit: the sum of the squared distances from the data to the best fit model. This metric does, in fact, yield the model that best fits the data! This is much more rigorous that reasoning or vizually comparing models. Which of the two models, the logistic or exponential solutions, had a lower sum-of-squared differences?

4.4 Fitting models without solutions

For the simplest models we could integrate and simply use their solutions to find least squares estimates. For the vast majority of ecological models, however, it’s not as simple. What we then need to do is

  1. use numerical solvers to generate a time series
  2. calculate the sum-of-squared differences between the time series and the data
  3. try other parameter values until we find optimal parameter values that minimize the sum-of-squared differences between the time series and the data

4.4.1 Use numerical solvers to generate a time series

You already know how to do this, as that’s what last week’s lab was all about. Just like in nls() we need to give ode() some initial conditions. I’ll do this with the logistic equation:

library(package = "deSolve")
lemna.dat <- read.csv(file = "Lemna.csv")

logistic <- function (t, y, params) {
  with(as.list(c(y, params)), {
  dN <-r*y*(K - y)/K
  return(list(c(dN)))
  })
}

N0 <- 4
par0 <- c(r = 0.5, K = 100)
times <- 1:31
obs.data <- lemna.dat[, sample(x = 1:48, size = 1)]

4.4.2 Calculate the sum-of-squared differences between the time series and the data

Next, we want to write a function that will calculate the difference between the time series and the data. It’s a pretty simple calculation, thankfully. The particular form it’s in takes parameter values, parms =, some data, data =, a vector of time with one time point per datum (31 with this dataset), sse.time =, and a starting density (we started with 4 plants), N0 =.

sse.log <- function(parms, data, sse.time, func, N0) {
  out <- as.data.frame(ode(y = N0, times = sse.time, func = func, parms = parms, hmax = 0.01))
  sse <- sum((out[,2]-data)^2)
  return(sse)
}

Give sse.log() parms, N0, sse.time, func, and N0. Notice it returns the sum-of-squares estimate. Change the initial parameter values and notice what happens—the sum-of-squares estimate changes! As you move farther awawy from better-fitting parameters, the sum-of-squares estimates increase! Now, notice you could just “find” the parameter values that minimize the sum-of-squares estimates by searching and running over and over and over. BUT, why don’t we again let our computers do all of the hard work for us and find the optimal sum-of-squares estimate.

4.4.3 Find the optimized parameter values

Now that you’ve written your function for sum-of-squares estimates, you can plug it into optim().

opt.log <- optim(par = par0, fn = sse.log, data = obs.data, sse.time = times, N0 = N0)

Notice that if you look at the str() of opt.log, you see parameter estimates, opt.log$par = 0.2277044, 115.8067344 (for \(r\) and \(\alpha\), respectively), and a function value (the sum-of-squares), opt.log$value = 926.1034135.

4.5 Comparison other single-species poulation models

Below are a list of other single-species population models that I would like for you to fit to the data. Fit each one using the method outlines in 4.4.

\[\begin{align}\frac{dN}{dt} &= Nr \\ \frac{dN}{dt} &= N\left(r - b\log(N)\right) \\ \frac{dN}{dt} &= N\left(r - \log (1 + aN) \right) \\ \frac{dN}{dt} &= N\left(r - aN \right) \\ \frac{dN}{dt} &= N\left(r - aN^b \right) \\ \frac{dN}{dt} &= N\left(r - b\log(1 + aN) \right) \\ \frac{dN}{dt} &= N\left(r - log(1 + (aN)^b) \right) \end{align}\]

“With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.”
—John von Neumann, attributed by E. Fermi, as written by Freeman Dyson, no date
Elephant trunk
(from link)