As nurses, we must administer nursing care based on the best available scientific evidence. But for many nurses, critical appraisal, the process used to determine the best available evidence, can seem intimidating. To make critical appraisal more approachable, let’s examine the P value and make sure we know what it is and what it isn’t.
Defining P value
The P value is the probability that the results of a study are caused by chance alone. To better understand this definition, consider the role of chance.
The concept of chance is illustrated with every flip of a coin. The true probability of obtaining heads in any single flip is 0.5, meaning that heads would come up in half of the flips and tails would come up in half of the flips. But if you were to flip a coin 10 times, you likely would not obtain heads five times and tails five times. You’d be more likely to see a seven-to-three split or a six-to-four split. Chance is responsible for this variation in results.
Just as chance plays a role in determining the flip of a coin, it plays a role in the sampling of a population for a scientific study. When subjects are selected, chance may produce an unequal distribution of a characteristic that can affect the outcome of the study. Statistical inquiry and the P value are designed to help us determine just how large a role chance plays in study results. We begin a study with the assumption that there will be no difference between the experimental and control groups. This assumption is called the null hypothesis. When the results of the study indicate that there is a difference, the P value helps us determine the likelihood that the difference is attributed to chance.
In every study, researchers put forth two kinds of hypotheses: the research or alternative hypothesis and the null hypothesis. The research hypothesis reflects what the researchers hope to show—that there is a difference between the experimental group and the control group. The null hypothesis directly competes with the research hypothesis. It states that there is no difference between the experimental group and the control group.
It may seem logical that researchers would test the research hypothesis—that is, that they would test what they hope to prove. But the probability theory requires that they test the null hypothesis instead. To support the research hypothesis, the data must contradict the null hypothesis. By demonstrating a difference between the two groups, the data contradict the null hypothesis.
Testing the null hypothesis
Now that you know why we test the null hypothesis, let’s look at how we test the null hypothesis.
After formulating the null and research hypotheses, researchers decide on a test statistic they can use to determine whether to accept or reject the null hypothesis. They also propose a fixed-level P value. The fixed level P value is often set at .05 and serves as the value against which the test-generated P value must be compared. (See Why .05?)
A comparison of the two P values determines whether the null hypothesis is rejected or accepted. If the P value associated with the test statistic is less than the fixed-level P value, the null hypothesis is rejected because there’s a statistically significant difference between the two groups. If the P value associated with the test statistic is greater than the fixed-level P value, the null hypothesis is accepted because there’s no statistically significant difference between the groups.
The decision to use .05 as the threshold in testing the null hypothesis is completely arbitrary. The researchers credited with establishing this threshold warned against strictly adhering to it.
Remember that warning when appraising a study in which the test statistic is greater than .05. The savvy reader will consider other important measurements, including effect size, confidence intervals, and power analyses when deciding whether to accept or reject scientific findings that could influence nursing practice.
Real-world hypothesis testing
How does this play out in real life? Let’s assume that you and a nurse colleague are conducting a study to find out if patients who receive backrubs fall asleep faster than patients who do not receive backrubs.
1. State your null and research hypotheses
Your null hypothesis will be that there will be no difference in the average amount of time it takes patients in each group to fall asleep. Your research hypothesis will be that patients who receive backrubs fall asleep, on average, faster than those who do not receive backrubs. You will be testing the null hypothesis in hopes of supporting your research hypothesis.
2. Propose a fixed-level P value
Although you can choose any value as your fixed-level P value, you and your research colleague decide you’ll stay with the conventional .05. If you were testing a new medical product or a new drug, you would choose a much smaller P value (perhaps as small as .0001). That’s because you would want to be as sure as possible that any difference you see between groups is attributed to the new product or drug and not to chance. A fixed-level P value of .0001 would mean that the difference between the groups was attributed to chance only 1 time out of 10,000. For a study on backrubs, however, .05 seems appropriate.
3. Conduct hypothesis testing to calculate a probability value
You and your research colleague agree that a randomized controlled study will help you best achieve your research goals, and you design the process accordingly. After consenting to participate in the study, patients are randomized to one of two groups:
- the experimental group that receives the intervention—the backrub group
- the control group—the non-backrub group.
After several nights of measuring the number of minutes it takes each participant to fall asleep, you and your research colleague find that on average, the backrub group takes 19 minutes to fall asleep and the non-backrub group takes 24 minutes to fall asleep.
Now the question is: Would you have the same results if you conducted the study using two different groups of people? That is, what role did chance play in helping the backrub group fall asleep 5 minutes faster than the non-backrub group? To answer this, you and your colleague will use an independent samples t-test to calculate a probability value.
An independent samples t-test is a kind of hypothesis test that compares the mean values of two groups (backrub and non-backrub) on a given variable (time to fall asleep).
Hypothesis testing is really nothing more than testing the null hypothesis. In this case, the null hypothesis is that the amount of time needed to fall asleep is the same for the experimental group and the control group. The hypothesis test addresses this question: If there’s really no difference between the groups, what is the probability of observing a difference of 5 minutes or more, say 10 minutes or 15 minutes?
We can define the P value as the probability that the observed time difference resulted from chance. Some find it easier to understand the P value when they think of it in relationship to error. In this case, the P value is defined as the probability of committing a Type 1 error. (Type 1 error occurs when a true null hypothesis is incorrectly rejected.)
4. Compare and interpret the P value
Early on in your study, you and your colleague selected a fixed-level P value of .05, meaning that you were willing to accept that 5% of the time, your results might be caused by chance. Also, you used an independent samples t-test to arrive at a probability value that will help you determine the role chance played in obtaining your results. Let’s assume, for the sake of this example, that the probability value generated by the independent samples t-test is .01 (P = .01). Because this P value associated with the test statistic is less than the fixed-level statistic (.01 < .05), you can reject the null hypothesis. By doing so, you declare that there is a statistically significant difference between the experimental and control groups. (See Putting the P value in context.)
In effect, you’re saying that the chance of observing a difference of 5 minutes or more, when in fact there is no difference, is less than 5 in 100. If the P value associated with the test statistic would have been greater than .05, then you would accept the null hypothesis, which would mean that there is no statistically significant difference between the control and experimental groups. Accepting the null hypothesis would mean that a difference of 5 minutes or more between the two groups would occur more than 5 times in 100.
Putting the P value in context
Although the P value helps you interpret study results, keep in mind that many factors can influence the P value—and your decision to accept or reject the null hypothesis. These factors include the following:
- Insufficient power. The study may not have been designed appropriately to detect an effect of the independent variable on the dependent variable. Therefore, a change may have occurred without your knowing it, causing you to incorrectly reject your hypothesis.
- Unreliable measures. Instruments that don’t meet consistency or reliability standards may have been used to measure a particular phenomenon.
- Threats to internal validity. Various biases, such as selection of patients, regression, history, and testing bias, may unduly influence study outcomes.
A decision to accept or reject study findings should focus not only on P value but also on other metrics including the following:
- Confidence intervals (an estimated range of values with a high probability of including the true population value of a given parameter)
- Effect size (a value that measures the magnitude of a treatment effect)
Remember, P value tells you only whether a difference exists between groups. It doesn’t tell you the magnitude of the difference.
5. Communicate your findings
The final step in hypothesis testing is communicating your findings. When sharing research findings (hypotheses) in writing or discussion, understand that they are statements of relationships or differences in populations. Your findings are not proved or disproved. Scientific findings are always subject to change. But each study leads to better understanding and, ideally, better outcomes for patients.
The P value isn’t the only concept you need to understand to analyze research findings. But it is a very important one. And chances are that understanding the P value will make it easier to understand other key analytical concepts.
Burns N, Grove S: The Practice of Nursing Research: Conduct, Critique, and Utilization. 5th ed. Philadelphia: WB Saunders; 2004.
Glaser DN: The controversy of significance testing: misconceptions and alternatives. Am J Crit Care. 1999;8(5):291-296.
Kenneth J. Rempher, PhD, RN, MBA, CCRN, APRN,BC, is Director, Professional Nursing Practice at Sinai Hospital of Baltimore (Md.). Kathleen Urquico, BSN, RN, is a Direct Care Nurse in the Rubin Institute of Advanced Orthopedics at Sinai Hospital of Baltimore.