Confidence interval for probabilities


Methods for estimating a probability from a sample


Introduction

In a sample of size \(n\), we observe \(x\) times the event of interest. The relative frequency of the even in the sample is \(p_0=x/n\). The problem is to estimate the probability \(\pi\) of the event in the population.

In the formulas \(z_q\) is computed with rnorm(q). For instance, \(z_{0.05}\)=qnorm(0.025).

Confidence interval

A confidence interval \((a,b)\) is computed so that of all possible samples of size \(n\), a \((1-\alpha)\)% of the samples the corresponding interval will contain the actual parameter value. So, we are confident that in a given sample, the computed coonfidence interval will contain the value of \(\pi\). Only if the sample is one of the \(\alpha\)% of the cases that this not so, we will be mistaked.

Exact method

This method computes the confidence interval \((a,b)\) such as the limit values of \(\pi\) that the probability of obtaining less that \(x\) events with \(\pi=a\) is \(\alpha/2\) and the probability of obtaining more that \(x\) events with \(\pi=b\) is \(\alpha/2\)

Wilson's method

This method computes the confidence interval \((a,b)\) as: $$ a = \frac{p_0+\frac{z_{\alpha/2}^2}{2 n}+z_{ \alpha/2} \sqrt{\frac{p_0 (1-p_0)}{n}+\frac{z_{ \alpha/2}^2}{4 n^2}}} {1+\frac{z_{ \alpha/2}^2}{n}}$$ $$ b = \frac{p_0+\frac{z_{1-\alpha/2}^2}{2 n}+z_{1-\alpha/2} \sqrt{\frac{p_0 (1-p_0)}{n}+\frac{z_{1-\alpha/2}^2}{4 n^2}}} {1+\frac{z_{1-\alpha/2}^2}{n}}$$

Approximated method (large samples)

This method computes the confidence interval \((a,b)\) as: $$ p_0 \pm z_{1-\alpha/2} \sqrt{\frac{p_0 (1-p_0)}{n}}$$

Comparison of methods

Both the Exact and Willson methods produce estimations that respect the limits of the possible values of \(\pi\), i.e \([0,1]\). The Approximated method is useful for large samples and for values of the probability that are not close to the extremes. When the observed number of events is low or high, that is clos to \(0\) or to \(n\), then this method can give limits bellow \(0\) or avobe \(1\).

You can specify the sample size and the number of observed events an obtain the corresponding confidence interval using the three methods. You can compare them in the figure.

Comparing confidence intervals obtained from different samples


Here, we simulate 15 samples from a given situation defined we the indicated value of \(\pi\) and the size \(n\) of each sample. You can select the method for obtaining the confidence interval.

Sample size for estimating a probability in large samples


The approximated method computes the confidence interval \((a,b)\) as: $$ p_0 \pm z_{1-\alpha/2} \sqrt{\frac{p_0 (1-p_0)}{n}}$$

The precission of the interval is: $$\delta = z_{1-\alpha/2} \sqrt{\frac{p_0 (1-p_0)}{n}}$$ Then, if we consider a given temptative value of \(p_0 \rightarrow \pi\), the required sample size for attaining a given value of \(\delta\) is: $$n=\frac{z_{1-\alpha/2}^2 p_0 (1-p_0) }{\delta^2}$$ A value of \(p_0\) can be stated from a pilot study or by educated guesses. The resulting sample size in the one that will provide the indicated precission if the ofserved \(p_0\) has the value considered in computing the sample size. In practice, the sample will have a different value of \(p_0\). If there is no information on the possible value of \(\pi\), we can use a value of \(p_0=0.5\). If the confidence is 0.95, then \(z_{(1-\alpha/2)}=1.96\). Thus, the sample size in the most uninformative situation would be: $$n=\frac{1.96^2 0.5 (1-0.5) }{\delta^2}$$

Set your sample size


Sample size with the specified settings




CI obtained in the different samples




(c) Albert Sorribas, Ester Vilaprinyo, Montse Rue, Rui Alves. Biomodels Grup, Departament de Ciencies Mediques Basiques. Universitat de Lleida, Institut de Recerca Biomedica de Lleida (IRBLleida