A Multi-Step Classifier Addressing Cohort Heterogeneity Improves Performance Of Prognostic Biomarkers In Complex Disease

Recent studies in cancer and other complex diseases continue to highlight the extensive genetic diversity between and within cohorts. This intrinsic heterogeneity poses one of the central challenges to predicting patient clinical outcome and the personalization of treatments. Here, we will discuss the concept of ‘classifiability` observed in multi-omics studies where individual patients’ samples may be considered as either ‘hard’ or ‘easy’ to classify by different platforms, reflected in moderate error rates with large ranges. We demonstrate in a cohort of 45 AJCC stage III melanoma patients that clinico-pathologic biomarkers can identify those patients that are most likely to be misclassified by a molecular biomarker. The process of modelling the classifiability of patients was then replicated in independent data from other diseases.

A multi-step procedure incorporating this information not only improved classification accuracy overall but also indicated the specific clinical attributes that had made classification problematic in each cohort. In statistical terms, our strategy models cohort heterogeneity via the identification of interaction effects in a high dimensional setting. At the translational level, these findings show that even when cohorts are of moderate size, including features that explain the patient-specific performance of a prognostic biomarker in a classification framework can significantly improve the modelling and estimation of survival, as well as increase understanding.