Design with one fixed factor

Model for one fixed factor

In a design with on fixed factor we define several experimental groups (factor levels) and measure a variable in different subjects within each experimental unit. This is the case of one Control group treated with a placebo compared to three treatments. In this case the factor is the treatment, with four levels (placebo and three treatments) A linear model for one fixed factor is defined as: $$ y_{ij}=\mu + \alpha_i + \epsilon_{ij}$$ Where $y_{ij}$ is the $j^{th}$ observation in the $i^{th}$ group (level). $\mu$ is the mean of the results if there is no effect of the factor. $\alpha_i$ is the effect (additive) of the level $i$. Thus, the expected mean in the $i^{th}$ group is $\mu_i=\mu+\alpha_i$ Finally, $\epsilon_{ij}$ indicates the random variation around the mean. It is assumed that the random variation follows a $N(0,\sigma)$ distribution.

Mean of controls

Effect of T1

Effect of T2

Sigma

Sample size

Range for x

How to interpret the sum of squares?

In this illustrative example, we have three treatment groups, Control, T1, and T2, and we measure the concentration of a given biomolecule in the blood. You can indicate the effect of the treatments on the mean, the sample size, and the standard deviation for the observations of each group.
After obtainning simulated samples for each group, the application computes the total sum of squares (SST), the residual sum of squares (SSR), and the sum of squares between treatments (SSTreat).

Definitions
Simulation of experiments

Factor levels: Each of the groups defined by the experimenter (i.e. control, treatment 1, treatment 2). The number of levels is indicated as $k$. In this example $k=3$. We also refer each level as experimental group or condition.
Observations: $y_{ij}$ indicates the $jth$ observation in the factor level $i$. There are $n_i$ observations in each level. So, for instance, $y_{23}$ indicated the $3th$ observation in the $2nd$ experimental group.
Mean of observations (overall mean): Is the mean of all observations without considering the group. It is calculated as: $$ y_{..}=\frac {\displaystyle\sum_{i=1}^k \displaystyle\sum_{j=1}^{n_i} y_{ij}} {\displaystyle\sum_{i=1}^k n_i}$$ Mean of level $i$: Is the mean of the observed values in each of the factor levels. It is calculated as: $$y_{i.}= \frac {\displaystyle\sum_{j=1}^{n_i} y_{ij}} {n_i}$$

Total sum of squares (SST): Is the sum of the square differences among each observation and the global mean. It is computed as: $$SST=\displaystyle\sum_{i=1}^k \displaystyle\sum_{j=1}^{n_i} \left(y_{ij}-y_{..} \right)^2$$
Residual sum of squares (SST): The residual sum of squares computes the variability of the observations with respect the mean of each level. That is: $$SSR=\displaystyle\sum_{i=1}^k \displaystyle\sum_{j=1}^{n_i} \left(y_{ij} - y_{i.}\right)^2$$
Treatment sum of squares (SSTreat): Is the variability of the means of each group with respect to the overall mean. It is computed as: $$SSTreat= \displaystyle\sum_{i=1}^k \left(n_i \times\left(y_{i.} - y_{..}\right)^2\right)$$
SST=SSTreat+SSR

Sample size per group

Mean of control group

Common sigma value

Effect of treatment 1

Effect of treatment 2

Range of y axis

Generate random data that simulates an experiment with one fixed factor.

You can indicate the levels of the factor and the size of the effects of each level with respect to the global mean (that is the mean without any effect).

Levels of factor

Mean in absence of effects

Common sigma

Vector of sample sizes

Vector of effects

Vector of group labels

Name for data file

Save data to csv file

Range of biomarker axes

Coefficients of the fitted linear model

Range of CI axis

95% CI for the mean differences (Tukey)

Select a contrast

Helmert

Treatment

Polynomial

SAS

Sum

Statistical power

Here, we simulate a number of samples from the selected conditions. Fro each sample, a linear modell is fitted and the corresponding p-value is stored. When the effects are different from 0, the % of samples with p<0.05 approximate the statistical power for discovering the difference. If all the effects are 0, we expect that a 5% of the sample will produce a p<0.05 (significance level). The histogram reflects the results obtained. If the power is low, you need to increase the sample size.

First compute a linear model for the means of Biomarker by group

res <- lm(Biomarker~Group)

The anova table can be obtained as:

anova(res)

A summary will produce the coefficients of the linear model:

summary(res)

The estimation of the CI for mean differences is obtrained as:

TukeyHSD(aov(res)

And the plot of the CI as:

plot(TukeyHSD(aov(res))

(c) Albert Sorribas, Ester Vilaprino, Montse Rue, Rui Alves. Biomodels Grup, Departament de Ciencies Mediques Basiques. Universitat de Lleida, Institut de Recerca Biomedica de Lleida (IRBLleida)