How well did we measure our mean?

Problem

We’re just never going to get enough time and money to cross every experiment off our to do lists. Given the choice between trying something new and repeating an experiment we’ve already done once before, the lure of the new is hard to resist: maybe one of those untested conditions will give us the home run we’ve been looking for? At least we’ll get the satisfaction of crossing one more thing off our to do list.  

But the noise buffeting our research processes means some humdrum “N of 1” results might prove much more exciting upon replication – just as we know from so much painful experience that exciting “N of 1” results often disappoint upon replication. We know that whatever we’ve measured is wrong to some degree, but precisely how wrong is it?

Solution

Confidence Intervals are what we need to estimate how well our sample of N independent measurements has measured the true mean we intended to measure. For example, a 95% confidence interval of the mean is the region centered around the sample mean that contains 95% of the area under a curve that traces the probabilities that the true mean lies elsewhere on the number line. This probability distribution, and its X% confidence intervals, are calculated as a function of sample size (N), signal (mean) and noise. That noise can be summarized as a standard deviation (stdev), which has the same units as the signal, or as a unitless percentage of the mean, which is often referred to as a “coefficient of variation” (%CV = 100*stdev/mean).

The solid lines in the interactive visualization above illustrate confidence intervals calculated using the CV (or stdev) of the same sample of data we are using to estimate the true mean. The dashed lines illustrate confidence intervals calculated using an independent and presumably much more accurate estimate of the true process CV (or stdev). The confidence interval calculated using the sampled CV is inherently wider than the confidence interval calculated using the true CV because the former must accommodate the tendency of sampled CVs to underestimate true process CVs, especially when N = 2 or 3.

By default, the visualization plots the 95% confidence intervals calculated using either the Sampled CV (solid line) or an independent and accurate estimate of the True Process CV (dashed line). Because both confidence intervals are symmetric around the measured mean, only one half of each interval is shown, and the y-axis is signed as ±. For example, when the text box above reports “there is >5% chance that the true mean is >2.4*CV distant from the measured value,” there is both (1) >2.5% chance the true mean lies >2.4CV above the sample mean, and (2) >2.5% chance the true mean lies >2.4CV below it.

By default, the units of the y-axis are multiples of an unspecified CV. The y-axis will update appropriately if a specific value of CV is entered into the empty field at the upper right corner or the adjacent “Measured” toggle is flipped from “CV” to “stdev”.

If the box between the default values of 95% and 80% is checked, the visualization will update to illustrate both the 95% and 80% confidence intervals (black and gray lines, respectively) calculated according to whichever method is specified by the “Highlight” toggle, either “Sampled CV” (default) or “True CV”. Rolling over the underlined text within the text box above will highlight the corresponding inputs and/or graph elements.

Insights

  • Both kinds of confidence intervals illustrated above assume that our process noise is normally distributed with a single, unchanging CV (or standard deviation). If our process noise is not actually normally distributed, or simply not stable, even the wider interval calculated using the sampled CV could give us a false sense of confidence in our estimate of the mean. If we haven’t already proven that our process noise is normally distributed and stable – e.g., using a control chart – we should assume it isn’t. No process is normally distributed until we make it normally distributed!
  • Whenever we make triplicate measurements of a protein, chemical or process condition the world has never seen before, our ability to measure the true mean is greatly improved – the confidence interval is narrowed by >50%! – if we have already stabilized our process and obtained an accurate estimate of its true variation.
  • 90% of what we do in R&D is 90% identical to what we did last time. Although there are definitely exceptions, usually it’s safe to assume the process CV is not changed by whatever factors we have tweaked since our last experiment. We may already have an excellent estimate of the true process CV if we have collected triplicate measurements for each of 50 similar conditions or 5 replicates for each of 25 similar conditions or 11 replicates for each of 10 similar conditions. In each case a single, a pooled value of process CV can be calculated with 100 degrees of freedom.
  • When N = 1 for a new condition, we can still estimate a confidence interval as long as we have already stabilized our process and obtained an accurate estimate of its true variation.
  • Error bars expressed as ±2*(sample standard deviation) – and especially ±1*(sample standard deviation) – should not be interpreted as 95% confidence intervals of the mean. Even error bars expressed as ±2*(sample standard error) should not be interpreted this way, because even when N is as large as 30 there is still >10% chance the true standard deviation is >1.3 times larger than its measured value.
  • Considering that there are so many different “conventions” for calculating error bars, no graph with error bars is complete without a clear and prominent description of how those error bars were calculated.
  • In order to make predictions with 95% confidence intervals that are truly accurate >95% of the time, it’s critical that the replicates sample the largest sources of process variation. If we don’t already know (1) our process CV and (2) our sub-process CVs, it is strongly recommended that each replicate in the experiment is executed as an independent instance of the full end-to-end process, and that supposedly insignificant factors such operator (WHO) or location (WHERE) are randomized among the replicates,  and that sequence (WHEN) is tracked (so we can test how much of the observed variation, if any, is explained by the order of the replicates). Although injecting these potential sources of variation into our experiments can widen both our sample CVs and our 95% confidence intervals, their predictions are less likely to fail when these processes are scaled up or otherwise transferred to other operators in other places with other reagents and equipment.

Equations

 Interactive visualization was created by Chris @ engaging-data.com.

Share this post

Back to Top