When there are groupings of observations present in the data, many researchers resort to utilising a Principal Components Analysis (PCA) a priori for identifying patterns in the data, and then look to map the patterns obtained from PCA to differences among the groupings, or attribute biological signal to them. Is this appropriate, given that PCA’s are not designed to discriminate between the groupings? Is a group-oriented multivariate methodology such as multivariate analyses of variance (MANOVA) or Canonical Discriminant Analysis (CDA) preferable? Which method has more relevance when investigating factor effects of biochemical pathways? We explore this question via a biological example.
Two biological materials (B1 & B2) were analysed for the same 19 primary metabolites, with three factors of Methods (M1 & M2), Treatment (T1, T2, T3 and T4), and Age (A1 & A2), with three replicate values giving a total of 48 observations for each biological material. Univariate and multivariate analyses of variance (ANOVA and MANOVA, respectively) were carried out, for which there were many statistically significant interaction effects. In addition, other multivariate techniques such as PCA and CDA were used to explore relationships between the variables.
The question remains as to the appropriateness of carrying out PCA to explore biochemical pathways, the comparison being between tailoring the pattern extraction a priori to match the known groupings within the data versus starting with an unrestrained pattern analysis and seeking to explain the patterns detected post-analysis?