# Use double sampling to justify and calibrate cheap sampling methods

Monitoring plays an important role in environmental management. Ideally, our monitoring will be both precise and accurate. However, the natural environment is variable and only partly observable, so monitoring data collected using typical methods are imprecise, and commonly biased.

For example, imagine we are trying to count the number of hollows in eucalypt trees (Harper et al. 2004). Tree hollows are an important resource for hundreds of animal species, but they are very difficult to count when standing on the ground. Some hollows will remain unobserved, while some mere dents in trees will be counted incorrectly as hollows. Is counting the number of tree hollows from the ground justified?

No matter how hard we look, some tree hollows will be hard to see.

Of course, an unlimited budget of time, money and expertise would provide data that are both precise and accurate. In the case of counting tree hollows, we could climb every tree, and examine every branch. But that would be expensive – perhaps prohibitively so. Trade-offs between the budget, precision and accuracy are required. One way to think about this trade-off is to consider whether to use a cheaper monitoring method that might be less precise, and possibly biased, or a more expensive method that is more precise. Should we stand on the ground and count hollows, climb trees to count them reliably, or do a bit of both?

Counting hollows by climbing the tree and examining every branch would be precise and accurate but time consuming.

The advantage of using the cheaper method is that, for the same budget, we can collect a sample of a larger size. And if we can calibrate results from that method so that they conform with the more expensive method, we can correct for any bias. For example, we could count the number of hollows on many more trees with the same amount of time by standing on the ground than if we climbed them. But the ground-based surveys would need to be calibrated by doing both ground-based and climbing surveys of a set of trees.

So, how much effort should we put into developing the calibration versus just measuring more samples? A technique known as “double sampling” answers this question (Gilbert 1987). Pause and think about this for a moment. How would the relative costs of the two sampling methods influence how much double sampling we did? How would the correlation between the two methods influence the level of sampling? Have you thought about that? If so, read on.

Intuitively, you might think that as the cost of each expensive sample increases, we should use the expensive method less. And further, you might think that as strength of the relationship between the cheap and expensive methods increases, it becomes easier to calibrate the cheaper method; hence, we might need to invest less in the expensive method. If that was your train of thought, then you were on the correct track.

It turns out that answer to the double sampling problem depends on the ratio of the cost of each sample from the two methods (R = CA/CI), and the correlation between the two methods (ρ). We use double sampling (i.e., use both the cheap and expensive method, rather than just the expensive method) if (Gilbert 1987)

$R=\dfrac{C_\text{A}}{C_\text{I}}>\dfrac{(1+\sqrt{1-{\rho}^2})^2}{{\rho}^2}$

Graphically, this is depicted by a function that separates parameter values for which simple sampling is most efficient from parameter values where double sampling is most efficient. We use the cheap method at least sometimes when it is sufficiently expensive and when the correlation between the two methods is sufficiently large.

When to use doubling sampling (using a cheap method that is calibrated to a more expensive but more reliable method), and when to simply use the more expensive method only.

So, if we had a method that was one tenth of the cost per sample as a reliable method, we would use the cheaper method, and calibrate it with the expensive method, only if the correlation between the two methods was greater than ~0.6. Otherwise, we’d simply use the expensive method.

Given that double sampling is most efficient, it is also possible to specify the proportion of samples that are measured using both the cheap and expensive method. This double sampling of a fraction of the population allows us to calibrate the two methods. Under double sampling, we measure all individuals in a sample using the cheap method and a fraction with the expensive method. The fraction that we sample using the expensive method is given by (Gilbert 1987)

$f=\sqrt{\dfrac{1-{\rho}^2}{{\rho}^2 R}}$

The fraction of individuals in a sample that are “double sampled” (measured using both methods with a view to calibrating them) as a function of the cost ratio (R) and the correlation between the accurate and inaccurate method (rho).

This idea could be used much more frequently in environmental science. In many cases, indicators and indices are used to measure environmental features. They are used because they are simple and easy to collect, compared with more exhaustive methods. Double sampling could be used to tell us whether these indicators and indices are actually efficient, and how much effort should be put into calibrating them.

References

Gilbert, R.O. (1987). Statistical methods for environmental pollution monitoring. Wiley, New York.

Harper, M.J., McCarthy, M.A., van der Ree, R., and Fox, J.C. (2004). Overcoming bias in ground-based surveys of hollow-bearing trees using double sampling. Forest Ecology and Management 190: 291-300.