Statistical significance

Be cautious of results reported as “statistically significant” or “non-significant”.

Claims that results were significant or non-significant usually mean that they were statistically significant or statistically non-significant. This is not the same as important or not important. A small, unimportant effect can be “statistically significant”. Similarly, a large, important effect can be “statistically non-significant”. Focusing on statistical significance can be misleading.

Explanation

In comparisons of health actions, statistical significance is the probability (likelihood) of the observed difference in the number of outcomes of interest in the comparison groups (the effect estimate) having occurred by chance. A “statistically significant” result is a result that is unlikely to have happened by chance. The usual threshold for this judgement is a probability of less than 5% (p < 0.05) that the result happened by chance.

Descriptions of effect estimates as “statistically significant” can be misleading because “significant” and “important” are similar words, and statistical significance is often confused with importance, especially when “significant” is used rather than “statistically significant”.

In addition, the cut-off for statistical significance (p < 0.05) is arbitrary and statistical significance does not give any information about the size of the effect. A “statistically significant” effect may or may not be important. Similarly, an observed effect that is “statistically nonsignificant” may or may not be important, and the results may or may not rule out an important effect. Being told that the results are “not statistically significant” does not tell you whether they were informative (showing that it is very unlikely that a health action has an important effect) or inconclusive (showing that the results are uncertain).

Reporting results as “statistically significant” may also results in reporting and publication bias because “statistically nonsignificant” results are often not reported.

Example

A study of a possible adverse effect of anti-inflammatory drugs on the risk of heart rhythm abnormalities (atrial fibrillation) was reported as having had “statistically nonsignificant” results. The researcher concluded that exposure to the drugs was “not associated” with an increased risk and that the results were the opposite to those from another earlier study with a “statistically significant” result. However, the effect estimates were the same for the two studies. The difference between the studies was that the earlier study was more precise, with a narrower confidence interval around the effect estimate. The effect estimate in the second study had a wider confidence interval, which mostly suggested that anti-inflammatory drugs had an effect on heart rhythm but which at its lower end, suggested that the anti-inflammatory drugs had no effect on heart rhythm. Concluding that the results of the second study showed “no effect” based on the significance level was misleading. It was also misleading to conclude that the results were the opposite to the earlier study that had an identical effect estimate. Yet, misleading interpretations like this, which are based on an arbitrary cut-off for “statistical significance” are common.

Remember: When results that are reported as “statistically significant” or “not statistically significant”, this does not necessarily mean they are important or not important.

Educational resources for this concept
Back to Top