We could choose one mutant mouse and one wild type, and perform 20 replicate measurements of each of their tails. Consider trying to determine whether deletion of a gene in mice affects tail length. It is essential that n (the number of independent results) is carefully distinguished from the number of replicates, which refers to repetition of measurement on one individual in a single condition, or multiple measurements of the same or identical samples. Rule 2: the value of n (i.e., the sample size, or the number of independently performed experiments) must be stated in the figure legend. Science typically copes with the wide variation that occurs in nature by measuring a number ( n) of independently sampled individuals, independently conducted experiments, or independent observations. Replicates or independent samples-what is n? Wide inferential bars indicate large error short inferential bars indicate high precision. A big advantage of inferential error bars is that their length gives a graphic signal of how much uncertainty there is in the data: The true value of the mean μ we are estimating could plausibly be anywhere in the 95% CI. If you measured the heights of three male and three female Biddelonian basketball players, and did not see a significant difference, you could not conclude that sex has no relationship with height, as a larger sample size might reveal one. It is a common and serious error to conclude “no effect exists” just because P is greater than 0.05.
There may be a real effect, but it is small, or you may not have repeated your experiment often enough to reveal it. By convention, if P 0.05, and you therefore cannot conclude there is a statistically significant effect, you may not conclude that the effect is zero. Other things (e.g., sample size, variation) being equal, a larger difference in results gives a lower P value, which makes you suspect there is a true difference. If you carry out a statistical significance test, the result is a P value, where P is the probability that, if there really is no difference, you would get, by chance, a difference as large as the one you observed, or even larger. Statistical significance tests and P values Like M, SD does not change systematically as n changes, and we can use SD as our best estimate of the unknown σ, whatever the value of n. However, the SD of the experimental results will approximate to σ, whether n is large or small. Similarly, as you repeat an experiment more and more times, the SD of your results will tend to more and more closely approximate the true standard deviation (σ) that you would get if the experiment was performed an infinite number of times, or on the whole population. We can use M as our best estimate of the unknown μ. Less than 5% of all red blood cell counts are more than 2 SD from the mean, so if the count in question is more than 2 SD from the mean, you might consider it to be abnormal.Īs you increase the size of your sample, or repeat the experiment more times, the mean of your results ( M) will tend to get closer and closer to the true mean, or the mean of the whole population, μ. For example, if you wished to see if a red blood cell count was normal, you could see whether it was within 2 SD of the mean of the population as a whole. SD error bars include about two thirds of the sample, and 2 x SD error bars would encompass roughly 95% of the sample.ĭescriptive error bars can also be used to see whether a single result fits within the normal range. Note also that although the range error bars encompass all of the experimental results, they do not necessarily cover all the results that could possibly occur.
M and SD are the same for every case, but notice how much the range increases with n. The bars on the left of each column show range, and the bars on the right show standard deviation (SD). The small black dots are data points, and the column denotes the data mean M. Means with error bars for three cases: n = 3, n = 10, and n = 30.