Model for one fixed factor


In a design with on fixed factor we define several experimental groups (factor levels) and measure a variable in different subjects within each experimental unit. This is the case of one Control group treated with a placebo compared to three treatments. In this case the factor is the treatment, with four levels (placebo and three treatments) A linear model for one fixed factor is defined as: $$ y_{ij}=\mu + \alpha_i + \epsilon_{ij}$$ Where \(y_{ij}\) is the \(j^{th}\) observation in the \(i^{th}\) group (level). \(\mu\) is the mean of the results if there is no effect of the factor. \(\alpha_i\) is the effect (additive) of the level \(i\). Thus, the expected mean in the \(i^{th}\) group is \(\mu_i=\mu+\alpha_i\) Finally, \(\epsilon_{ij}\) indicates the random variation around the mean. It is assumed that the random variation follows a \(N(0,\sigma)\) distribution.

How to interpret the sum of squares?

In this illustrative example, we have three treatment groups, Control, T1, and T2, and we measure the concentration of a given biomolecule in the blood. You can indicate the effect of the treatments on the mean, the sample size, and the standard deviation for the observations of each group.
After obtainning simulated samples for each group, the application computes the total sum of squares (SST), the residual sum of squares (SSR), and the sum of squares between treatments (SSTreat).
Factor levels: Each of the groups defined by the experimenter (i.e. control, treatment 1, treatment 2). The number of levels is indicated as \(k\). In this example \(k=3\). We also refer each level as experimental group or condition.
Observations: \(y_{ij}\) indicates the \(jth\) observation in the factor level \(i\). There are \(n_i\) observations in each level. So, for instance, \(y_{23}\) indicated the \(3th\) observation in the \(2nd\) experimental group.
Mean of observations (overall mean): Is the mean of all observations without considering the group. It is calculated as: $$ y_{..}=\frac {\displaystyle\sum_{i=1}^k \displaystyle\sum_{j=1}^{n_i} y_{ij}} {\displaystyle\sum_{i=1}^k n_i}$$ Mean of level \(i\): Is the mean of the observed values in each of the factor levels. It is calculated as: $$y_{i.}= \frac {\displaystyle\sum_{j=1}^{n_i} y_{ij}} {n_i}$$
Total sum of squares (SST): Is the sum of the square differences among each observation and the global mean. It is computed as: $$SST=\displaystyle\sum_{i=1}^k \displaystyle\sum_{j=1}^{n_i} \left(y_{ij}-y_{..} \right)^2$$
Residual sum of squares (SST): The residual sum of squares computes the variability of the observations with respect the mean of each level. That is: $$SSR=\displaystyle\sum_{i=1}^k \displaystyle\sum_{j=1}^{n_i} \left(y_{ij} - y_{i.}\right)^2$$
Treatment sum of squares (SSTreat): Is the variability of the means of each group with respect to the overall mean. It is computed as: $$SSTreat= \displaystyle\sum_{i=1}^k \left(n_i \times\left(y_{i.} - y_{..}\right)^2\right)$$
SST=SSTreat+SSR

Generate random data that simulates an experiment with one fixed factor.


You can indicate the levels of the factor and the size of the effects of each level with respect to the global mean (that is the mean without any effect).




                  


Coefficients of the fitted linear model



                  


95% CI for the mean differences (Tukey)



                  


                  

Statistical power


Here, we simulate a number of samples from the selected conditions. Fro each sample, a linear modell is fitted and the corresponding p-value is stored. When the effects are different from 0, the % of samples with p<0.05 approximate the statistical power for discovering the difference. If all the effects are 0, we expect that a 5% of the sample will produce a p<0.05 (significance level). The histogram reflects the results obtained. If the power is low, you need to increase the sample size.



First compute a linear model for the means of Biomarker by group

res <- lm(Biomarker~Group)

The anova table can be obtained as:

anova(res)

A summary will produce the coefficients of the linear model:

summary(res)

The estimation of the CI for mean differences is obtrained as:

TukeyHSD(aov(res)

And the plot of the CI as:

plot(TukeyHSD(aov(res))

(c) Albert Sorribas, Ester Vilaprino, Montse Rue, Rui Alves. Biomodels Grup, Departament de Ciencies Mediques Basiques. Universitat de Lleida, Institut de Recerca Biomedica de Lleida (IRBLleida)