Methods for estimating a probability from a sample
Introduction
In a sample of size \(n\), we observe \(x\) times the event of interest. The
relative frequency of the even in the sample is \(p_0=x/n\). The problem is to
estimate the probability \(\pi\) of the event in the population.
In the formulas \(z_q\) is computed with rnorm(q). For instance,
\(z_{0.05}\)=qnorm(0.025).
Confidence interval
A confidence interval \((a,b)\) is computed so that of all
possible samples of size \(n\), a \((1-\alpha)\)% of the samples the corresponding interval
will contain the actual parameter value.
So, we are confident that in a given sample, the computed coonfidence interval will
contain the value of \(\pi\). Only if the sample is one of the \(\alpha\)% of the cases
that this not so, we will be mistaked.
Exact method
This method computes the confidence interval \((a,b)\) such as the limit values of \(\pi\) that
the probability of obtaining less that \(x\) events with \(\pi=a\) is \(\alpha/2\) and the probability
of obtaining more that \(x\) events with \(\pi=b\) is \(\alpha/2\)
Wilson's method
This method computes the confidence interval \((a,b)\) as:
$$ a = \frac{p_0+\frac{z_{\alpha/2}^2}{2 n}+z_{ \alpha/2} \sqrt{\frac{p_0 (1-p_0)}{n}+\frac{z_{ \alpha/2}^2}{4 n^2}}}
{1+\frac{z_{ \alpha/2}^2}{n}}$$
$$ b = \frac{p_0+\frac{z_{1-\alpha/2}^2}{2 n}+z_{1-\alpha/2} \sqrt{\frac{p_0 (1-p_0)}{n}+\frac{z_{1-\alpha/2}^2}{4 n^2}}}
{1+\frac{z_{1-\alpha/2}^2}{n}}$$
Approximated method (large samples)
This method computes the confidence interval \((a,b)\) as:
$$ p_0 \pm z_{1-\alpha/2} \sqrt{\frac{p_0 (1-p_0)}{n}}$$
Comparison of methods
Both the Exact and Willson methods produce estimations that respect the limits
of the possible values of \(\pi\), i.e \([0,1]\). The Approximated method is
useful for large samples and for values of the probability that are not close to
the extremes. When the observed number of events is low or high, that is clos to \(0\) or
to \(n\), then this method can give limits bellow \(0\) or avobe \(1\).
You can specify the sample size and the number of observed events an obtain the
corresponding confidence interval using the three methods. You can compare them in
the figure.
Comparing confidence intervals obtained from different samples
Here, we simulate 15 samples from a given situation defined we the
indicated value of \(\pi\) and the size \(n\) of each sample. You can select
the method for obtaining the confidence interval.
Sample size for estimating a probability in large samples
The approximated method computes the confidence interval \((a,b)\) as:
$$ p_0 \pm z_{1-\alpha/2} \sqrt{\frac{p_0 (1-p_0)}{n}}$$
The precission of the interval is:
$$\delta = z_{1-\alpha/2} \sqrt{\frac{p_0 (1-p_0)}{n}}$$
Then, if we consider a given temptative value of \(p_0 \rightarrow \pi\), the
required sample size for attaining a given value of \(\delta\) is:
$$n=\frac{z_{1-\alpha/2}^2 p_0 (1-p_0) }{\delta^2}$$
A value of \(p_0\) can be stated from a pilot study or by educated guesses. The
resulting sample size in the one that will provide the indicated precission if the
ofserved \(p_0\) has the value considered in computing the sample size. In practice,
the sample will have a different value of \(p_0\).
If there is no information on the possible value of \(\pi\), we can use a value of \(p_0=0.5\).
If the confidence is 0.95, then \(z_{(1-\alpha/2)}=1.96\). Thus, the sample size in the most
uninformative situation would be:
$$n=\frac{1.96^2 0.5 (1-0.5) }{\delta^2}$$
Sample size with the specified settings
CI obtained in the different samples