Getting ready for another semester, thinking about how to communicate to students about statistical philosophy while they are trying to learn about the tool box. The mind wanders, and I was curious about the prevalence of Akaike Information Criterion (AIC) versus Bayesian Information Criterion (BIC) in the literature.
In toxicology, the dose of a pathogen, radiation, or toxin required to kill half the members of a tested population of animals or cells is called the lethal dose, 50%, or LD50. This measure is also known as the lethal concentration, LC50, or properly after a specified test duration, the LCt50 indicating the lethal concentration and time of exposure. LD50 figures are frequently used as a general indicator of a substance’s acute toxicity. A lower LD50 is indicative of increased toxicity.
More generally, the point at which 50% response of studied organisms to range of doses of a substance (e.g., agonist, antagonist, inhibitor, etc.) to any response, from change in behavior or life history characteristics up to and including death can be described by the methods described in this chapter. The procedures outlined below assume that there is but one inflection point, i.e., an “s-shaped” curve, either up or down; if there are more than one inflection points, then the logistic equations described will not fit the data well and other choices need to be made (see Di Veroli et al 2015). We will use the drc package (Ritz et al 2015).
After starting R, load the drc library.
Consider some hypothetical 24-hour survival data for yeast exposed to salt solutions. Let resp equal the variable for frequency of survival (e.g., estimated from OD600 readings) and NaCl equal the millimolar (mm) salt concentrations or doses.
At the R prompt type
resp=c(1,1,1,.9,.7,.3,.4,.2,0,0,0) NaCl=seq(0,1000,100) #To check to make sure that the sequence has been correctly created; alternatively, just enter the values. NaCl  0 100 200 300 400 500 600 700 800 900 1000 #Make a plot plot(NaCl,resp,pch=19,cex=1.2,col=”blue”,xlab=”NaCl [mm]”,ylab=”Survival frequency”)
And here is the plot of the simulated data (Figure 1).
Note the sigmoidal shape — we’ll need an logistic equation to describe the relationship between survival of yeast and NaCl doses.
The equation for the four parameter logistic curve, also called the Hill-Slope model, is (Figure 2)
where c is the parameter for the lower limit of the response, d is the parameter for the upper limit of the response, e is the relative EC50, the dose fitted half-way between the limits c and d, and b is the relative slope around the EC50. The slope, b, is also known as the Hill slope. Because this experiment included a dose of zero, a three parameter logistic curve would be appropriate. The equation simplifies to (Figure 3)
EC50 from 4 parameter model
First, make a data frame
dose = data.frame(NaCl,resp)
Next, call up a function, drm, from the drc library and specify the model as the four parameter logistic equation, specified as LL.4(). We follow with a call to the summary command to retrieve output from the drm function. Note that we are using the four-parameter logistic equation (Figure 2)
As long as the plot you made in earlier steps is still available, R will add the line specified in the lines command. Here is the plot with the predicted logistic line displayed (Figure 4).
While there are additional steps we can take to decide is the fit of the logistic curve was good to the data, visual inspection suggests that indeed the curve fits the data reasonably well.
More work to do
Because the EC50 calculations are an estimate, we should also obtain confidence intervals. The drc library provides a function called ED which will accomplish this. We can also ask what the survival was at 10% and 90% in addition to 50%, along with the confidence intervals for each.
Thus, the 95% confidence interval for the EC50 calculated from the four parameter logistic curve was between the lower limit of 379.9 and an upper limit of 605.39 mm NaCl.
EC50 from 3 parameter model
Looking at the summary output from the four parameter logistic function we see that the value for c was -0.085 and p-value was 0.53, which suggests that the lower limit was not statistically different from zero. We would expect this given that the experiment had included a control of zero mm added salt. Thus, we can explore by how much the EC50 estimate changes when the additional parameter c is no longer estimated by calculating a three parameter model with LL.3(). At the R prompt type
How do the four and three parameter models compare? We can rephrase this as as statistical test of fit; which model fits the data better, a three parameter or a four parameter model?
At the R prompt type
The output is below
1st model fct: LL.3() 2nd model fct: LL.4()
ModelDf RSS Df F value p value 2nd model 8 0.054684 1st model 7 0.050012 1 0.6539 0.4453
Because the p-value is much greater than 5% we may conclude that the fit of the four parameter model was not significantly better than the fit of the three parameter model. Thus, based on your criteria of model fit (e.g., select a more complicated model if it demonstrates an improvement over a model with fewer predictors), we would conclude that the three parameter model is the preferred model.
The plot below (Figure 5) now includes the fit of the four parameter model (red line) and the three parameter model (green line) to the data.
The R command to make this addition to our active plot was
Thus, the 95% confidence interval for the EC50 calculated from the three parameter logistic curve was between the lower limit of 409.7 and an upper limit of 526.1 mm NaCl. The difference between upper and lower limits was 116.4 mm NaCl, a smaller difference than the interval calculated for the 95% confidence intervals from the four parameter model (225.5 mm NaCl). This demonstrates the estimation trade-off: more parameters to estimate reduces the confidence in any one parameter estimate.
Additional notes of EC50 calculations
Care must be taken that the model fits the data well. What if we did not have observations throughout the range of the sigmoidal shape? We can explore this by taking a subset of the data. At the R prompt type
Conclusion? The estimate is different, but only just so, 447 vs. 468 mm NaCl. Thus, within reason, the drc function performs well for the calculation of EC50. Not all tools available to the student will do as well.
Use the subset function instead:
dd = subset(dose, NaCl <= 500)
Di Veroli G. Y., Fornari C., Goldlust I., Mills G., Koh S. B., Bramhall J. L., Richards, F. M., Jodrell D. I. (2015) An automated fitting procedure and software for dose-response curves with multiphasic features. Scientific Reports 5: 14701. (doi: 10.1038/srep14701)