Confidence interval for probabilities

Methods for estimating a probability from a sample

Introduction

In a sample of size $n$, we observe $x$ times the event of interest. The relative frequency of the even in the sample is $p_0=x/n$. The problem is to estimate the probability $\pi$ of the event in the population.

In the formulas $z_q$ is computed with rnorm(q). For instance, $z_{0.05}$=qnorm(0.025).

Confidence interval

A confidence interval $(a,b)$ is computed so that of all possible samples of size $n$, a $(1-\alpha)$% of the samples the corresponding interval will contain the actual parameter value. So, we are confident that in a given sample, the computed coonfidence interval will contain the value of $\pi$. Only if the sample is one of the $\alpha$% of the cases that this not so, we will be mistaked.

Exact method

This method computes the confidence interval $(a,b)$ such as the limit values of $\pi$ that the probability of obtaining less that $x$ events with $\pi=a$ is $\alpha/2$ and the probability of obtaining more that $x$ events with $\pi=b$ is $\alpha/2$

Wilson's method

This method computes the confidence interval $(a,b)$ as: $$ a = \frac{p_0+\frac{z_{\alpha/2}^2}{2 n}+z_{ \alpha/2} \sqrt{\frac{p_0 (1-p_0)}{n}+\frac{z_{ \alpha/2}^2}{4 n^2}}} {1+\frac{z_{ \alpha/2}^2}{n}}$$ $$ b = \frac{p_0+\frac{z_{1-\alpha/2}^2}{2 n}+z_{1-\alpha/2} \sqrt{\frac{p_0 (1-p_0)}{n}+\frac{z_{1-\alpha/2}^2}{4 n^2}}} {1+\frac{z_{1-\alpha/2}^2}{n}}$$

Approximated method (large samples)

This method computes the confidence interval $(a,b)$ as: $$ p_0 \pm z_{1-\alpha/2} \sqrt{\frac{p_0 (1-p_0)}{n}}$$

Number of Observations

Number of Events

Confidence level $(1-\alpha)$

Comparison of methods

Both the Exact and Willson methods produce estimations that respect the limits of the possible values of $\pi$, i.e $[0,1]$. The Approximated method is useful for large samples and for values of the probability that are not close to the extremes. When the observed number of events is low or high, that is clos to $0$ or to $n$, then this method can give limits bellow $0$ or avobe $1$.

You can specify the sample size and the number of observed events an obtain the corresponding confidence interval using the three methods. You can compare them in the figure.

Comparing confidence intervals obtained from different samples

Here, we simulate 15 samples from a given situation defined we the indicated value of $\pi$ and the size $n$ of each sample. You can select the method for obtaining the confidence interval.

Value of the probability $(\pi)$

Size of each sample $(n)$

Method

Exact

Wilson

Approximated

Confidence level $(1-\alpha)$

Sample size for estimating a probability in large samples

The approximated method computes the confidence interval $(a,b)$ as: $$ p_0 \pm z_{1-\alpha/2} \sqrt{\frac{p_0 (1-p_0)}{n}}$$

The precission of the interval is: $$\delta = z_{1-\alpha/2} \sqrt{\frac{p_0 (1-p_0)}{n}}$$ Then, if we consider a given temptative value of $p_0 \rightarrow \pi$, the required sample size for attaining a given value of $\delta$ is: $$n=\frac{z_{1-\alpha/2}^2 p_0 (1-p_0) }{\delta^2}$$ A value of $p_0$ can be stated from a pilot study or by educated guesses. The resulting sample size in the one that will provide the indicated precission if the ofserved $p_0$ has the value considered in computing the sample size. In practice, the sample will have a different value of $p_0$. If there is no information on the possible value of $\pi$, we can use a value of $p_0=0.5$. If the confidence is 0.95, then $z_{(1-\alpha/2)}=1.96$. Thus, the sample size in the most uninformative situation would be: $$n=\frac{1.96^2 0.5 (1-0.5) }{\delta^2}$$

Temptative value of $(\pi)$

Desired precission $(\delta)$

Confidence of the interval $(1-\alpha)$

Set your sample size

Subjects in each sample

How many samples?

Sample size with the specified settings

CI obtained in the different samples

Sample size in group 1

Value of $\pi_1$

Sample size in group 2

Value of $\pi_2$

Range for CI axis