Across the biology disciplines, one commonly asks whether biological objects are spatially correlated. For example, in ecology, it may be of interest whether different species co-habitat across geographical regions. In molecular detection imaging studies, one may wish to statistically test whether molecules, typed by fluorescent markers, co-localise. At a finer level, bioinformaticians studying genomic annotations such as transposable elements, CpG islands, or transcription factor binding sites may ask whether there are functionally relevant or otherwise interesting interactions between annotations. Various methods have been proposed for statistically identifying whether two sets of biological objects are spatially correlated. In a recent paper, Favorov et al [1] described a set of methods to statistically assess co-localisation between pairs of gene annotations in a genome. Curiously, statistical inference depends upon the choice of a reference set, as well as the choice of a distance metric (relative distance or absolute distance). The biology researcher is confronted with a dilemma: if inference depends on such choices, how does one effectively assess the data if “A differs from B” but “B does not differ from A”? In this talk, I explore this dilemma, and also propose two invariant methods: one based upon spatial cross-correlation, and another on mutual information. For both of these methods, inference is independent of distance measure and reference set.
1. Favorov A, Mularoni L, Cope LM, et al (2012) Exploring Massive, Genome Scale Datasets with the GenometriCorr Package. Lapp H, ed. PLoS Computational Biology. 8(5):e1002529. doi:10.1371/journal.pcbi.1002529.