Jump to content
How to understand the p-value in a research paper
Submitted by blackbear
Posted Oct 05 2009 05:30 AM
When you read a published research paper, one of the questions you need to ask yourself is, "How statistically valid is the finding?" The readership of these papers is largely limited to people with postgraduate degrees in the field being covered, and who have had a basic grounding in statistics. However, as more and more people are looking at basic scientific research, in issues ranging from genomics to global climate change, it is important that people who haven't been exposed as rigorously to the discipline of statistics be able to judge the merits of the work. In a typical scientific paper, you commonly see two statistical values mentioned, the p-value and the confidence interval (or CI).
We'll start by looking at the p-value. Let's say you have a hypothesis, that people who habitually use HP calculators with reverse polish notation end up speaking like Yoda, because they get used to putting their nouns before their verbs. You might more formally state the hypothesis as "Subjects who use an RPN calculator more than 10 hours a week will have an increased incidence of putting their verbs last in a sentence." At the same time, you also state what's called the null hypothesis, which is what a negative finding in your research would be, i.e., "Subjects who use an RPN calculator more than 10 hours a week will not have an increased incidence of putting their verbs last in a sentence."
You then gather a group of test subjects, record them talking for an hour, and count the number of times they talk like Yoda. Let's say that, on average, non-RPN calculator users put their verbs at the end of their sentences 5 times in an hour, while RPN calculator users do so 7 times an hour, on average. "Aha," you shout, "my hypothesis was proven!" But not so fast, Charlie...
Let's step away from this experiment for a moment and consider a simple coin flip. Suppose you have a theory that silver dollar coins, because of some weird off-balance factor in their design, always come up heads. You flip the coin 4 times, and all 4 are heads. My god, you were right! Time to make some bar bets!
But with such a small number of samples, there's a significant chance that it could have been just dumb luck that got you that result. In fact, in this case, it's the same chance that you would flip four heads in a row, or 1 in 16. This is where the p-value comes in. The p-value is the likelihood that, had the results been purely random, that you could have come up with the same results. It's expressed as a decimal number between 0 and 1, where 0.01 is 1%, 0.25 is 25%, etc. So, in this case, P = 0.0625.
Normally, before the experiment is run, you determine a P value, below which you will consider the hypothesis confirmed. For the "soft" sciences, where results can be a little fuzzy, a P value of 0.05 is considered a reasonable threshold. For physical science, and for experiments with implications on health or safety, a value of 0.01 is more common, since you'd like to be really sure the results are statistically value. In general, the larger the sample size, the less likely that chance could have generated the results. In other words, a 6 to 4 ratio of positive results will have a lower P value if there were 10,000 samples, as opposed to 10. That's one reason that researchers like to have as large a sample size as is practical and affordable.
Note that the P value is NOT the likelihood that the hypothesis is correct! You can have a statistically valid result, but for an entirely different reason than the thesis you proposed. For example, in our Yoda example, maybe there was an HP Calculator Users of America meeting, and they all watched Star Wars. All the P value says is, "this is the likelihood that you could have gotten this result in a random sample."
The other magic number is the confidence interval, or CI. You see this in research with quantitive results. For example, in our Yoda example, we might report "RPN calculator users are 3.4 times as likely to speak like Yoda as non-RPN users." But chance rears its ugly head here too, and you need to take into account the possibility that random factors have skewed your data. That's why, using statistical formulas, quantitative results are usually reported with a confidence interval, typically 95%. The way it is written in a report is, "RPN calculator users are 3.4 times (2.5-4.3, 95% CI) as likely to speak like Yoda as non-RPN users." You read that as "we're 95% sure that the value is somewhere between 2.5 times and 4.3 times as likely."
Alternative solution by pkratka
Posted Jan 26 2013 11:15 AM
Hello Mr. Turner ("blackbear"),
Great overview of understanding p-values.
I got lost in the last paragraph on a couple of accounts: (1) how did the "3.4 times as likely" get calculated?; (2) how was "(2.5-4.3, 95% CI)" determined/calculated - both the range and percentage?
I apologize if this is a somewhat elementary question.