13
The Significance of P-Value in Medical Research
Muhammad Ibrahim[1],
Abstract
Now-a-days in medical and biological sciences research most of the hypothesis is rejected or accepted by using P-value. It is very easy to interpret the result of any data set, whether it is taken from measuring process or counting process by calculating the P-value associated with any test statistic to determine the level of significance rather than to fix the level of significance in advance. It is also important to know more about P-value because most of the computer software produces results of test statistic in the form of P-value. In some cases direct P-value is difficult to calculate especially when we are using chi-square and F statistic. In this article, a short description of P-value, its interpretation and calculation are discussed by using numerical example, comments and normal curve.
Key words: P-value, test statistic, hypothesis, level of significance, critical region
Introduction
Traditional theory of testing of hypothesis
Traditionally, the simplest method to accept or reject a null hypothesis on the basis of sample observation is to calculate the test statistic (Z,,t, F) then select a critical region from a pre-determined level of significance (α ), if the value of calculated test statistic falls in critical region, then we reject our null hypothesis otherwise we do not reject the null hypothesis1.
In traditional method of testing of hypothesis, the researcher is unable to decide how strongly the data contradict or support the hypothesis. If we say that our hypothesis is rejected at 5% level of significance (we have an idea that sample mean is at least 1.96 SE’s away from population mean, we don’t have any idea about exact distance), this statement of conclusion is often inadequate, because it gives the researcher/analyst no idea whether the calculated value of test statistic is just barley in the rejection region or very far into this critical region. This approach is sometimes unsatisfactory for decision makers.
Steps in testing of hypothesis2
Hypothesis testing is a well defined activity and follows a relatively arranged pattern. A general procedure of testing a hypothesis contains several steps, these steps are as:-
i. Set of null Hypothesis
ii. Decided level of significance
iii. Selection of test statistic
iv. Critical region
v. Statistical and clinical decision
i) Null Hypothesis
In testing of hypothesis, the first step in this journey is the formulation of null hypothesis (it may be true or false). The null hypothesis is that statement which is tested for possible rejection or acceptance on the basis of sample observations. The null hypothesis is denoted Ho, while the other statement concerning the other possibilities which do not come into consideration with the null hypothesis is called alternative hypothesis dented by H1. For example, the average stay of patients in Mayo Hospital, Cardiac ward is 15 days, this statement is null hypothesis, and mathematically it is written as
Ho: µ = 15 days
The alternative hypothesis may be stated as
1. Ho : µ > 15 days
2. Ho : µ < 15 days
3. Ho : µ ≠ 15 days
Alternative hypothesis 1& 2 is called directional or one sided while 3 is known as two sided. It is assumed that rejecting null hypothesis is equivalent to accepting the alternative hypothesis. Further, if null hypothesis is not rejected, we should not write as “accept the Ho” but more appropriate is “do not reject the null hypothesis”. As for as the alternative hypothesis is concerned it does not matter to accept it
ii) Level of significance.
All testing of hypothesis procedures are based on the selection of test statistic and critical region formulated for such procedure. We use different test statistic's for different situations as per requirement of the research. All test statistic’s are based on sampling distribution, so these are treated as variable. Because procedure is based on test statistic (which is variable), it is impossible to hope for correct decisions always. So whenever a conclusion is made from a statistical test, there is a risk of committing an error. For example, it is an error when we reject a true hypothesis or in medical diagnostic test the “false – positive” result is an error, called type-I error. The probability of type-I error is called level of significance which is denoted by “α ". In other words “α-error” is explained as a treatment / drug has no effect and we conclude that it has an effect on the basis of wrong decision. In testing of hypothesis the second type of error is called type-II error ( β-error) which creeps when someone accepts the false null hypothesis, whatever the reason is , but normally it is occurred due to small sample size and procedural mishandlings ( type-I error is considered as more serious than type-II error). In keeping with tradition, we generally take the chances of such an error below 5%. This is a maximum acceptable risk of such error. It is normally specified before the sample is drawn or the experiment is conducted.
This means that if our level of significance is 5% or α= 0.05 then there are less or equal 5 chances out of 100 that we would reject the hypothesis when it is true or we may be 95% confident that we have made right decision.
ii. Test Statistic
It is a single value of the observations which are taken from sample, that provide the bases for rejection or acceptance of null hypothesis. The most common test statistic’s are (Z, ,t, F). Test statistic is a variable and has its own sampling distribution. For example, Z-statistic has distribution with zero mean and unit variance (Z~N(0,1)). The choice of test statistic would depend on assumed probability model and hypothesis under question.
iv. Critical Region
Once an appropriate / suitable test statistic is identified for testing the hypothesis then we must decide which value of test statistic should lead us to reject null hypothesis. The set of values which would lead us to reject null hypothesis is known as critical region or region of rejection. Sometimes the entire critical region is located in one tail of the test statistic distribution either on lower side or upper side. This test is called one tailed test., when the critical region is located on two sides of the test statistic's distribution that it is known as two tailed test.
HO µ = µO HO µ = µO HO µ = µO
H1 µ < µO H1 µ >µO H1 µ ≠µO
V. p-Value
Before conducting the research of any sort, the researcher starts his / her research by framing a null hypothesis (denoted by Ho), that mean "no difference between two treatments" or "no effect of treatment" along with a testing level (that is level of significance) which is denoted by α, that is normally 5% or 1%. Then the researcher starts to collect the data from population by taking sample and measure the consistency of the data with null hypothesis. But how to measure the consistency, this is the question?
P-value is a statistical tool that measures the consistency by calculating the probability of observing the results extreme from the data, assuming that null hypothesis is true. Smaller the P-value greater is the inconsistency. A, P-value is a measure of how much evidence we have against null hypothesis, when it is assumed as true, larger P-value shows better or no evidence against null hypothesis. According to D.N.Gujarati3 “A p-value (probability value) also known as the observed or exact level of significance or exact probability of type-I error. P-value is defined as the lowest significance level at which a null hypothesis is rejected.” A p-value is also known the observed significance level,4 (Detailed discussion next).
Disadvantages of traditional approach
The traditional approach of testing hypothesis has disadvantages that it does not provide any precise strength of evidence against the null hypothesis and the researcher decides on the basis of pre-decided level of significance, then the question arises, what should be the pre-decided level of significance? That’s why traditional approach is not preferred yet.
P-Value, a new approach in testing of hypothesis
While testing a hypothesis in medical and health sciences, the researcher needs to develop a critical region, so to accept or reject his / her hypothesis (as discussed early) ,the traditional way to reject or accept Ho is that first to construct a critical region with the pre-decided value of α (level of significance), where α is usually chosen in advance. The most commonly used value are 0.05 (5%) or 0.01 (1%). Then, we decide to reject or accept the hypothesis depending upon whether test statistic falls in the critical region or not.5 To understand above, consider an example, a researcher wishes to test whether a new drug is effective in lowering blood pressure of hypertensive patients or not than an existing available drug. He takes two similar groups of patients, one group is treated with new drug and other is treated with existing available drug (or any placebo drug). After administrating the drugs, he measures BP of both groups and calculates their mean and standard deviation of BP measured in each group.
The mean responses of the two groups will definitely be different, regardless of whether the drug has an effect or not. Now a question arises. Is this difference in mean BP of two groups likely to be due to sampling or random variation associated with the allocation of subjects (patients) to the groups or due to drug?
To answer this question, we must quantify the observed difference between the two groups. How to quantify the difference? Statistician has statistical tool that is called test statistic. Such as Z, t,, F.
Most of the test statistic's have property that greater the observed difference, greater their value. In above example, if drug has no effect, the test statistic will be smaller, smaller value of test statistic mean a small difference between observed value are the actual value, but what is "big or small".
To identify the boundary between "big" and "small" it is a real matter to think over.
As we discussed above that the traditional method to reject the hypothesis when it falls in the critical region with a certain level of significance α, this is arbitrary. On the basis of that arbitrary level, we make a cut off point. For example while using Z - statistic with 5% level of significance, the cut off points are ± 1.96, the calculated Z value is "big" if it is larger than this cut off point and we reject our hypothesis on the basis of this criteria. Then we state, "We reject the hypothesis at 5% level" or the difference is significant at 5% level". This statement of conclusion is often inadequate, because it gives no idea how test statistic is for from the cut off point. This approach may be unsatisfactory for some decision makers who might be uncomfortable with risk implied by α = 5%.
To overcome these difficulties, the P-value approach is now adopted by the researchers in daily exercise. A P-value (a least significant level at which the hypothesis can be rejected) conveys sufficient information about the weight of evidence against Ho, so the researcher can draw a valid conclusion at any specified level of significance. Once the P-value of the test statistic is known, the researcher / investigator can find the significance of test without fixing a level of significance in advance. To understand this, consider these examples.
Let the mean of a population for certain variable is 15 with known standard deviation 3 . A sample from population with size 70 showed a mean value 14.5. To see whether we should reject null hypothesis or not, the test statistic Z is calculated as
As the tabulated value of Z is ±1.96 (), and the calculated value of Z does not fall in critical region so we do not reject our null hypothesis. On the other hand if we calculate the P-value of the calculated Z, which is 0.081 that is greater than 0.05.
P(Z<-1.40) = 0.081
Now it is easy to decide that we would reject our hypothesis at α= 10% and not to reject the Ho at α = 5%. This mean there is no strong evidence at α = 5% to reject the hypothesis. In another example let µ = 1800, with SD=100 a sample of size 50 taken from this population whose mean is 1850.
The value of test statistic Z is calculated.
P(Z>3.54)=0
The calculated value is greater than 1.96, and we will reject our null hypothesis at 5% level of significance. The P-value of such Z-value is zero.
Here the hypothesis is rejected at 5% and also 1% level of significance because 5% or 1% is greater than P = 0.
It is clear from above examples that we can fix the level of significance after calculating the P-value.
Definition of P-Value
The smallest value of probability for a test of significance at which the null hypothesis could be rejected is known as p-value, or the highest level of significance at which the null hypothesis is not rejected6.
How to calculate P-Value
Most of the computer software like SPSS, Minitab, MS-Excel, SAS, which working on testing of hypothesis, also shows P-value in their final results. But manually the calculation of P-value is a difficult except normal distribution based test (Z - Statistic). Here is a brief discussion, how to find P-value. Suppose we wish to test a hypothesis (Ho) µ = 8.0 against the alternative hypothesis (H1) µ < 8.0 with know variance of 1.0 A sample is taken of size 9 with mean