Category Archives: R language

LD50 & dose response calculations with R

In toxicology, the dose of a pathogen, radiation, or toxin required to kill half the members of a tested population of animals or cells is called the lethal dose, 50%, or LD50. This measure is also known as the lethal concentration, LC50, or properly after a specified test duration, the LCt50 indicating the lethal concentration and time of exposure. LD50 figures are frequently used as a general indicator of a substance’s acute toxicity. A lower LD50 is indicative of increased toxicity.

More generally, the point at which 50% response of studied organisms to range of doses of a substance (e.g., agonist, antagonist, inhibitor, etc.) to any response, from change in behavior or life history characteristics up to and including death can be described by the methods described in this chapter. The procedures outlined below assume that there is but one inflection point, i.e., an “s-shaped” curve, either up or down; if there are more than one inflection points, then the logistic equations described will not fit the data well and other choices need to be made (see Di Veroli et al 2015). We will use the drc package (Ritz et al 2015).


After starting R, load the drc library.


Consider some hypothetical 24-hour survival data for yeast exposed to salt solutions. Let resp equal the variable for frequency of survival (e.g., estimated from OD600 readings) and NaCl equal the millimolar (mm) salt concentrations or doses.

At the R prompt type

#To check to make sure that the sequence has been correctly created; alternatively, just enter the values.
NaCl [1] 0 100 200 300 400 500 600 700 800 900 1000
#Make a plot
plot(NaCl,resp,pch=19,cex=1.2,col=”blue”,xlab=”NaCl [mm]”,ylab=”Survival frequency”)

And here is the plot of the simulated data (Figure 1).

Figure 1. Plot of yeast survival in different amounts of salt, simulated data

Note the sigmoidal shape — we’ll need an logistic equation to describe the relationship between survival of yeast and NaCl doses.

The equation for the four parameter logistic curve, also called the Hill-Slope model, is (Figure 2)

Figure 2. Equation of the four parameter logistic curve.

where c is the parameter for the lower limit of the response, d is the parameter for the upper limit of the response, e is the relative EC50, the dose fitted half-way between the limits c and d, and b is the relative slope around the EC50. The slope, b, is also known as the Hill slope. Because this experiment included a dose of zero, a three parameter logistic curve would be appropriate. The equation simplifies to (Figure 3)

Figure 3. Equation of the three parameter logistic curve

EC50 from 4 parameter model

First, make a data frame

dose = data.frame(NaCl,resp)

Next, call up a function, drm, from the drc library and specify the model as the four parameter logistic equation, specified as LL.4(). We follow with a call to the summary command to retrieve output from the drm function. Note that we are using the four-parameter logistic equation (Figure 2)

model.dose1 = drm(dose,fct=LL.4())

And here is the output from R.

Model fitted: Log-logistic (ED50 as parameter) (4 parms)

Parameter estimates:

                   Estimate  Std. Error    t-value  p-value
b:(Intercept)      3.753415    1.074050   3.494636   0.0101
c:(Intercept)     -0.084487    0.127962  -0.660251   0.5302
d:(Intercept)      1.017592    0.052460  19.397441   0.0000
e:(Intercept)    492.645128   47.679765  10.332373   0.0000

Residual standard error:

0.0845254 (7 degrees of freedom)

The EC50, technically, because the data were for survival, the LD50 is e = 492.65 mM NaCl, where e, again, is the dose fitted half-way between the limits c and d.

You should always plot the predicted line from your model against the real data and inspect the fit.

At the R prompt type

lines(dose,predict(model.dose1, data.frame(x=dose)),col=”red”)

As long as the plot you made in earlier steps is still available, R will add the line specified in the lines command. Here is the plot with the predicted logistic line displayed (Figure 4).

Figure 4. Plot with the predicted logistic line displayed

While there are additional steps we can take to decide is the fit of the logistic curve was good to the data, visual inspection suggests that indeed the curve fits the data reasonably well.

More work to do

Because the EC50 calculations are an estimate, we should also obtain confidence intervals. The drc library provides a function called ED which will accomplish this. We can also ask what the survival was at 10% and 90% in addition to 50%, along with the confidence intervals for each.

At the R prompt type

ED(model.dose1,c(10,50,90), interval=”delta”)

And the output is shown below.

Estimated effective doses
(Delta method-based confidence interval(s))

Estimate Std. Error   Lower   Upper
1:10  274.348     38.291 183.803  364.89
1:50  492.645     47.680 379.900  605.39
1:90  884.642    208.171 392.395 1376.89

Thus, the 95% confidence interval for the EC50 calculated from the four parameter logistic curve was between the lower limit of 379.9 and an upper limit of 605.39 mm NaCl.

EC50 from 3 parameter model

Looking at the summary output from the four parameter logistic function we see that the value for c was -0.085 and p-value was 0.53, which suggests that the lower limit was not statistically different from zero. We would expect this given that the experiment had included a control of zero mm added salt. Thus, we can explore by how much the EC50 estimate changes when the additional parameter c is no longer estimated by calculating a three parameter model with LL.3(). At the R prompt type

model.dose2 = drm(dose,fct=LL.3())

And here is the output.

Model fitted: Log-logistic (ED50 as parameter) with lower limit at 0 (3 parms)

Parameter estimates:

Estimate Std. Error   t-value p-value
b:(Intercept)   4.46194    0.76880   5.80378   4e-04
d:(Intercept)   1.00982    0.04866  20.75272   0e+00
e:(Intercept) 467.87842   25.24633  18.53253   0e+00

Residual standard error:

0.08267671 (8 degrees of freedom)

The EC50 is the value of e: 467.88 mM NaCl.

How do the four and three parameter models compare? We can rephrase this as as statistical test of fit; which model fits the data better, a three parameter or a four parameter model?

At the R prompt type

anova(model.dose1, model.dose2)

The output is below

1st model
fct:      LL.3()
2nd model
fct:      LL.4()

ANOVA table

          ModelDf      RSS Df F value p value
2nd model       8 0.054684
1st model       7 0.050012  1  0.6539  0.4453

Because the p-value is much greater than 5% we may conclude that the fit of the four parameter model was not significantly better than the fit of the three parameter model. Thus, based on your criteria of model fit (e.g., select a more complicated model if it demonstrates an improvement over a model with fewer predictors), we would conclude that the three parameter model is the preferred model.

The plot below (Figure 5) now includes the fit of the four parameter model (red line) and the three parameter model (green line) to the data.

Figure 5. Plot now includes the fit of the four parameter model (red line) and the three parameter model (green line) to the data.

The R command to make this addition to our active plot was

lines(dose,predict(model.dose2, data.frame(x=dose)),col=”green”)

We continue with our analysis of the three parameter model and produce the confidence intervals for the EC50 (modify the ED() statement above for model.dose2 in place of model.dose1).

Estimated effective doses
(Delta method-based confidence interval(s))

     Estimate Std. Error   Lower  Upper
1:10  285.937     33.154 209.483 362.39
1:50  467.878     25.246 409.660 526.10
1:90  765.589     63.026 620.251 910.93

Thus, the 95% confidence interval for the EC50 calculated from the three parameter logistic curve was between the lower limit of 409.7 and an upper limit of 526.1 mm NaCl. The difference between upper and lower limits was 116.4 mm NaCl, a smaller difference than the interval calculated for the 95% confidence intervals from the four parameter model (225.5 mm NaCl). This demonstrates the estimation trade-off: more parameters to estimate reduces the confidence in any one parameter estimate.

Additional notes of EC50 calculations

Care must be taken that the model fits the data well. What if we did not have observations throughout the range of the sigmoidal shape? We can explore this by taking a subset of the data. At the R prompt type

dd = dose[1:6,]

Here, all values greater than dose 500 were dropped (see below for a more general approach to subset).

> dd
resp dose
1  1.0    0
2  1.0  100
3  1.0  200
4  0.9  300
5  0.7  400
6  0.3  500

and the plot does not show an obvious sigmoidal shape (Figure 6)

Figure 6. Plot of subset of data. No longer “sigmoidal” curve.

We run the three parameter model again, this time on the subset of the data.

model.dosedd = drm(dd,fct=LL.3())

Output from the results are

Model fitted: Log-logistic (ED50 as parameter) with lower limit at 0 (3 parms)

Parameter estimates:

                Estimate Std. Error    t-value p-value
b:(Intercept)   6.989842   0.760112   9.195801  0.0027
d:(Intercept)   0.993391   0.014793  67.153883  0.0000
e:(Intercept) 446.882542   5.905728  75.669344  0.0000

Residual standard error:

0.02574154 (3 degrees of freedom)

Conclusion? The estimate is different, but only just so, 447 vs. 468 mm NaCl. Thus, within reason, the drc function performs well for the calculation of EC50. Not all tools available to the student will do as well.

Use the subset function instead:

dd = subset(dose, NaCl <= 500)


Di Veroli G. Y., Fornari C., Goldlust I., Mills G., Koh S. B., Bramhall J. L., Richards, F. M., Jodrell D. I. (2015) An automated fitting procedure and software for dose-response curves with multiphasic features. Scientific Reports 5: 14701. (doi: 10.1038/srep14701)

Ritz, C., Baty, F., Streibig, J. C., Gerhard, D. (2015) Dose-Response Analysis Using R. PLOS ONE, 10(12),
e0146021 (doi: 10.1371/journal.pone.0146021