A population mean \(\mu\) can be estimated from appropriate sampling the population. As
each sample will produce a different result, we need some sort of procedure for dealing
with the uncertainty of the result. Confidence intervals are these tools.
Conficence interval (CI) for a population mean
The \((1-\alpha)\) CI for a population mean is computed as:
\(\mu \rightarrow m \pm t_{1-\alpha/2,n-1} \times \frac{s}{\sqrt{n}}\)
where \(m\) is the sample mean and \(s\) the sample standard deviation. The sample size
\(n\) indicates how many individuals have been included in the sample.
Exercises in this application
Explore the distribution of the means of different samples
Estimate a population mean from data
Perform simulations to compare the CI of different samples when estimating a population mean
Estimate the difference of the means of two populations
Compute the required sample sizes for obtaining precise estimations
The method of maximum likelihood gives estimators that maximize the
likelihhod function. In can be interpreted as estimating the paramters that
makes it more like to have obtained the sample. For a normal variable,
the likelihood function is:
$$L=\prod_{i=1}^n \left(\frac{1}{\sigma \sqrt{2/\pi}}
e^{-\frac{(x_i-\mu)^2}{2\sigma^2}}\right)$$
The maximum likelihood estimators of \(\mu\) and \(\sigma\) are:
$$\hat \mu = \frac{\sum\limits_{i=1}^n x_i}{n}$$
$$\hat \sigma = \frac{\sum\limits_{i=1}^n (x_i-\hat \mu)^2}{n-1}$$
Select parameter values and a sample size
Maximum likelihood estimates
Contour of likelihood values as a function
of \(\mu\) and \(\sigma\).
The red point indicates the actual parameters.
The black points indicates the ML estimates.
If \(X\) is \(N(\mu,\sigma)\), then the mean of samples of size \(n\),
\(\bar X\), has a distribution
\(N(\mu,\frac{\sigma}{\sqrt{n}})\). According to this, when \(n \rightarrow \infty\),
\(\bar X \rightarrow \mu\)
95% prediction intervals
\( X \rightarrow \mu \pm z_{1-\alpha/2} \sigma\)
\(\bar X \rightarrow \mu \pm z_{1-\alpha/2} \frac{\sigma}{\sqrt{n}}\)
\(z_{1-\alpha/2}=1.96\)
The population mean is estimated by the sampling mean and the corresponding CI.
Here we consider the case of a normal distributed variable.
\(\mu \rightarrow m \pm t_{1-\alpha/2,n-1} \times \frac{s}{\sqrt{n}}\)
The required computations are:
Results
Result
$$\mu \rightarrow m \pm t_{1-\alpha/2,n-1} \times \frac{s}{\sqrt{n}}$$
In a practical case, we are interested in obtaining precise CI. This precission
depends dramatically on the sample size. Here you can compute the required sample
size for obtaining the desired precission, given a previous knowledge of the
approximated value of the standard deviation.
The required sample size for a \((1-\alpha)\) IC with precission \(\delta\)
is computed as:
$$n \rightarrow z_{1-\alpha/2}^2 \frac{\hat \sigma^2}{\delta ^2}$$
where \(\hat \sigma \approx s\)
Required sample size for different \(\delta\) values
You can set a theoretical distribution \(N(\mu,\sigma)\) and obtain a
number of samples with a given sample size. The application computes the corresponding
CI for \(\mu\) and compares the results for the different samples. According
to the definition of CI, 95% of the computed results will include the
true value of the population parameter.
Population values
Results
One of the most basic experimental designs compares the results of a control group and a
treatment group. This is done by estimating the difference of the population means. The same
procedure could be used to compare a biomarker among two populations. In this exercise we
provide simulations for understanding the interpretation of the CI for the difference of two
population means.
The \((1-\alpha)\) confidence interval for the diffence of two means is computed as:
$$\bar X_1-\bar X_2 \pm t_{1-\alpha/2,\nu} \sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}} $$
Assuming large samples, the precission of the CI can be written as:
$$\delta=z_{1-\alpha/2} \sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}$$
If we call \(r=(n_2/n_1) \rightarrow n_2=r \times n_1\). Then:
$$n_1=z_{1-\alpha/2}^2 \frac{r \times s_1^2 + s_2^2}{r \times \delta^2}$$
You can define the parameters of two populations and obtain samples for estimating
the difference \(\mu_1-\mu_2\). You can also fix the minimum difference for
considering a relevant effect for that difference.
You can define the parameters of two populations and obtain samples for estimating
the difference \(\mu_1-\mu_2\). The blue line indicates the actual value of the
difference of menas defined by the user. The red line indicates the null hypothesis
that population means are equal. Try to figure out which is the effect of sample
size in providing enough information for identifying the true differecne of means.
Concers on misusing intervals
It is important to understand the difference between various types of intervals. Here, we
compare them in a sample of two groups
We assume that the studied variable is distribruted as a \(N(\mu_i,\sigma_i)\) in each group \(i\). We
can obtain two classes of intervals:
Reference intervals: Predictions on the expected values of individual observations. Confidence intervals: Approximate regions that would include the value of the population parameter
Definitions
Population mean (\(\mu\)): Is a parameter of the distribution. Population standard deviation: (\(\sigma\)): Is a parameter of the distribution. Population standard error of the mean: (\(\frac{\sigma}{\sqrt{n}}\)): Is a parameter of the distribution.
Sample size (n): The number of observations in the sample.
Sample mean (m): Is an estimation of the population mean (\(\mu\)) Sample standard deviation (s): Is an estimation of the population standard deviations (\(\sigma\)) Sample standard error of mean (sem:\(\frac{s}{\sqrt{n}}\)): Is an estimation of the population error of mean
Reference intervals
\(\mu \pm \sigma \): This interval is where we expect to observe an approximated 68% of the values in a sample of any size.
\(\mu \pm 1.96 \times \sigma \): This interval is where we expect to observe an approximated 95% of the values in a sample of any size.
\(m \pm s \): This interval is an estimation of the reference interval
where we expect to observe an approximated 68% of the values in a sample of any size.
\(m \pm 1.96 \times s \): This interval is an estimation of the reference interval
where we expect to observe an approximated 95% of the values in a sample of any size.
Confidence intervals
\(m \pm sem\) is an approximated 68% confidence interval for \(\mu\) in large samples
\(m \pm 1.96 \times sem\) is an approximated 95% confidence interval for \(\mu\) in large samples
\(m \pm t_{n-1,1-\alpha/2} \times sem\) is an approximated \((1-\alpha)\) confidence interval for \(\mu\).
Simulation and comparison of intervals
(c) Albert Sorribas, Ester Vilaprinyo, Montse Rue, Rui Alves.
Biomodels Grup, Departament de Ciencies Mediques Basiques.
Universitat de Lleida, Institut de Recerca Biomedica de Lleida (IRBLleida