If you want to interpret nursing research outcomes, you need to understand statistical power. Few nurses are familiar with the concepts of statistical power and power analysis. Learning about statistical power and related concepts will help you more accurately interpret research findings and determine what influence, if any, these findings should have on nursing practice.
What is statistical power?
Statistical power is the ability of a statistical test to detect an effect (caused by an intervention in the study), given that the effect actually exists (is not due to chance). Viewed another way, it’s the probability that a false null hypothesis will be rejected. (The null hypothesis states that the results of an intervention don’t differ from what might have occurred as a result of chance.) Essentially, power is the ability of a statistical test to detect true differences between groups.
What determines statistical power?
A study’s statistical power is determined primarily by:
- significance level
- sample size
- effect size.
If only two of these three factors are known, the third can be calculated from the other two. This property allows researchers to determine the power level of their statistical test before analysis.
What is the significance level?
Significance level is a fixed probability of wrongly rejecting the null hypothesis if this hypothesis is true. Researchers try to make the significance level as small as possible to protect the null hypothesis and avoid making false claims.
Type I error
Before starting a study, researchers establish their tolerance for committing a type I error. This error occurs when the researcher believes the observed effect in a study was real (when it was not) and not due to chance (when it was). In this case, the researcher has wrongly rejected the null hypothesis.
Researchers can establish the maximum chance of committing a type I error at any rate they wish, but .05 is the accepted convention. The probability of committing a type I error is called the alpha (a) level, or the level of statistical significance.
Suppose for example, a researcher sets an a level of .05 when designing a study to determine how classical music affects blood pressure in preoperative patients. This means 5% is the maximum chance of incorrectly rejecting the null hypothesis (inferring that classical music is associated with blood pressure changes when it isn’t). The researcher is willing to tolerate a risk level of .05 when using various statistical tests to analyze study data—or put another way, will accept that the null hypothesis will be wrongly rejected 1 in 20 times.
This example relates to determining differences between two groups (intervention vs. control). However, power analysis also can be used with other designs and statistical techniques.
Type II error
A type II error occurs when the researcher believes the effect observed in a study resulted from chance when, in fact, it was an actual effect of the intervention. In this case, the researcher fails to reject the null hypothesis when it’s false—in other words, accepts that the intervention had no effect when it actually did. Some people find it easier to think of type II error as the erroneous conclusion that no statistically significant difference exists between groups.
The probability of making a type II error is called beta (ß). The ß level relates directly to the concept of statistical power. The quantity (1 – ß) is known as the power of a test.
Just as researchers can set the a level of a study, they can also set the ß level. Contemporary convention usually sets ß between .05 and .20.
Suppose the researcher in the study described above sets ß at .20. This means he’s willing to accept a 20% chance of finding no group differences in blood pressure due to classical music when classical music did have an effect.
How does sample size relate to statistical power?
Sample size—the number of subjects in a study—plays an important role in determining the power of a study. Because researchers can use only a sample of the population when studying a phenomenon, the number of observations made in the sample must be large enough so the outcomes generated by the sample approximate the outcomes that would be generated if the entire population were studied. The larger the sample size, the higher the probability of rejecting a false null hypothesis. Thus, increasing the sample size increases the power of a study.
What is effect size?
Effect size represents the magnitude of the differences between two groups (for instance, the intervention group vs. the control group). In experimental studies, effect size is important because it tells us about the size of the effect an intervention has on the phenomenon being studied. A statistically significant outcome simply tells us whether or not a difference between groups exists. It doesn’t tell us the magnitude of the difference. That’s where effect size comes in.
Determining what constitutes a clinically significant effect is up to the researcher.
When is power analysis conducted?
Power analysis can be conducted either before data collection begins (a priori) or after data collection ends (post hoc). Typically, a priori power analysis is done to determine the sample size needed to achieve adequate power. It conserves valuable resources, such as money and time, by calculating a sample size sufficient to detect a clinically significant effect. It also spares research subjects from exposure to potentially harmful agents and interventions in studies insufficiently powered to produce statistically significant results.
Post hoc power analysis, on the other hand, uses sample size and effect size to determine the power of the study (assuming that effect size in the sample equals effect size in the population). Post hoc power analysis has no merit except to satisfy the curiosity of the researcher who is concerned about not achieving statistically significant results. The literature describes the inappropriateness of post hoc power analysis.
Why is power analysis important?
Power analysis can be used to determine whether a study has a good chance of providing a statistically significant result if a difference truly exists in the population. For nurse researchers, understanding the likelihood of achieving statistically significant results is important. Studies that don’t achieve such results fail to support the researcher’s hypothesis. They may be considered unreliable research and aren’t likely to be published.
The concept of statistical power is crucial to conducting responsible research. By understanding statistical power, nurses at all levels will become better consumers of sound scientific studies. Ultimately, when studies are used to shape delivery of patient care, it’s our patients who benefit the most.
Bausell RB, Li YF. Power Analysis for Experimental Research: A Practical Guide for the Biological, Medical, and Social Sciences. New York, NY: Cambridge University Press; 2002.
Browner W, Newman T, Hearst N, Hulley S. Getting ready to estimate sample size: hypotheses and underlying principles. In: Hulley S, Cummings S, Browner W, Grady D, Hearst N, Newman T, eds. Designing Clinical Research. Philadelphia, PA: Lippincott Williams & Wilkins; 2001:51-64.
Cashen LG, Geiger S. Statistical power and the testing of null hypothesis: a review of contemporary management research and recommendations for future research. Organizational Research Methods. 2004;7(2):151-167.
Cohen J. Statistical Power Analysis for the Behavioral Sciences. Revised ed. New York, NY: Academic Press; 1977.
Goodman SN, Berlin JA. The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Ann Intern Med. 1994;121(3):200-206.
Levine M, Ensom MH. Post hoc power analysis: an idea whose time has passed? Pharmacotherapy. 2001;21(4):405-409.
Thomas L, Juanes F. The importance of statistical power analysis; an example from Animal Behaviour. Animal Behaviour. 1996;52:856-859.
Kenneth J. Rempher is Director of Professional Nursing Practice at Sinai Hospital in Baltimore, Maryland. Stephanie Miller is a direct care nurse in the operating room at the same facility.