Sample size and power when estimating a probability


Concepts


Reference value of a probability: A value that we want to verify against the evidence in a sample. For instance, we could say that the population proportion of infected people by a virus is 0.7. Usins an appropriate sample, we want to verify if this value is admissible

Alternative value of a probability: A value that we want to verify against the reference value. For instance, we could say that the true population proportion is more than 0.7, say at least 0.8

Significance level: The probability of obtainning a sample leading to reject the reference value when this is the true value.

Statistical power: Is the probability of rejecting a reference value of a probability if an alternative value is the true value. A power close to 1, assures that if the alternative value is true, then almost any sample will lead to reject the reference value.

Formulas


\(\alpha\): Significance level

\(\beta\) : 1-power

\(z_{1-\alpha/2}\): qnorm(\(1-\alpha/2)\)

\(z_{\beta}\): qnorm(\(1-\beta)\)

\(n=\frac{(z_{1-\alpha/2}+z_\beta)^2\times p \times (1-p)}{(p-p_0)^2}\)

Sample size required for estimating a probability


To compute the required sample size for estimating a probability, You should indicate the significance level \((\alpha)\), the desired power, the value of the reference probability \((p_0)\), and the alternative value for which you require power to discriminate \((p_1)\).

The diference between \(p_0\) and \(p_1\) is the minimum difference that you would like to detect with the indicated power.




Power for a given sample size when estimating a probability


To compute the resulting power for estimating a probability gived a sample size, you should indicate the significance level \((\alpha)\), the value of the reference probability \((p_0)\), and the alternative value for which you require power to discriminate \((p_1)\).

The diference between \(p_0\) and \(p_1\) is the minimum difference that you would like to detect with the indicated power.




It is important to appropriatelly interpret the results of a confidence interval. Here we explore the case of estimating \(\pi_1-\pi_2\).


Concepts


Confidence interval: A confidence interval \((a,b)\) is the result of a procedure that independently of the sample will contain the value of the target parameter in most of the cases (say, for instance, in the 95% of the cases).

Then, given a particular sample, the values \((a,b)\) can be taken has containning, with a degree of confidence, the true value of the parameter. Different samples will produce different confidence interval values. Thus, conclusions based on a confidence interval should be taken with caution.

Equivalence range: \(\Delta\) indicates the value for which the absolute difference \(|\pi_1-\pi_2|<\Delta\) leads to consider that both probabilities are clinically equivalent.

Interpretation


Consider that the confidence interval obtained is CI:\((a,b)\)

\( -\Delta < (a,b) <\Delta \): Clinically equivalent results

\( 0 > a > -\Delta \) and \( b > \Delta \): Non-inferior results

\( a > 0\) and \( a < \Delta \): Superior results

\( a > \Delta \): Clinically superior results



You can explore this interpretation in the tab: Equivalence and non-inferiority

Observed results


Interpretation of results








This exercise is based on an idea by Kristoffer Magnusson: https://rpsychologist.com/d3/equivalence/

Conditions


Sample size required



(c) Albert Sorribas, Ester Vilaprino, Montse Rue, Rui Alves. Biomodels Grup, Departament de Ciencies Mediques Basiques. Universitat de Lleida, Institut de Recerca Biomedica de Lleida (IRBLleida