20 Statistics: colocalization and co-occurence measures

PyRanges can compute a normalized Jaccard-statistic (ranging from 0 to 1) to compute the similarities between two ranges.

import pyranges as pr
gr = pr.data.chipseq()
gr2 = pr.data.chipseq_background()
print(gr.stats.jaccard(gr2, strandedness="same"))
## 6.657609543683281e-05

To compute the similarities between two sets of ranges which do not necessarily overlap much, we can use the relative distance function. It describes the relative distances between each interval in one set and the two closest intervals in another. Any deviance from a uniform distribution is an indication of spatial correlation.

print(gr.stats.relative_distance(gr2, strandedness="same"))
##     reldist  count  total  fraction
## 0      0.00    254   9930  0.025579
## 1      0.01    210   9930  0.021148
## 2      0.02    222   9930  0.022356
## 3      0.03    240   9930  0.024169
## 4      0.04    212   9930  0.021349
## 5      0.05    191   9930  0.019235
## 6      0.06    192   9930  0.019335
## 7      0.07    205   9930  0.020645
## 8      0.08    162   9930  0.016314
## 9      0.09    189   9930  0.019033
## 10     0.10    186   9930  0.018731
## 11     0.11    212   9930  0.021349
## 12     0.12    209   9930  0.021047
## 13     0.13    189   9930  0.019033
## 14     0.14    201   9930  0.020242
## 15     0.15    178   9930  0.017925
## 16     0.16    203   9930  0.020443
## 17     0.17    224   9930  0.022558
## 18     0.18    196   9930  0.019738
## 19     0.19    212   9930  0.021349
## 20     0.20    208   9930  0.020947
## 21     0.21    196   9930  0.019738
## 22     0.22    203   9930  0.020443
## 23     0.23    198   9930  0.019940
## 24     0.24    223   9930  0.022457
## 25     0.25    186   9930  0.018731
## 26     0.26    189   9930  0.019033
## 27     0.27    192   9930  0.019335
## 28     0.28    163   9930  0.016415
## 29     0.29    204   9930  0.020544
## 30     0.30    210   9930  0.021148
## 31     0.31    202   9930  0.020342
## 32     0.32    211   9930  0.021249
## 33     0.33    195   9930  0.019637
## 34     0.34    197   9930  0.019839
## 35     0.35    175   9930  0.017623
## 36     0.36    214   9930  0.021551
## 37     0.37    178   9930  0.017925
## 38     0.38    176   9930  0.017724
## 39     0.39    193   9930  0.019436
## 40     0.40    192   9930  0.019335
## 41     0.41    179   9930  0.018026
## 42     0.42    209   9930  0.021047
## 43     0.43    184   9930  0.018530
## 44     0.44    198   9930  0.019940
## 45     0.45    208   9930  0.020947
## 46     0.46    192   9930  0.019335
## 47     0.47    184   9930  0.018530
## 48     0.48    183   9930  0.018429
## 49     0.49    201   9930  0.020242

PyRanges also contains yet another method (which is still in beta-mode) for computing colocalization statistics, the Forbes coefficient:

print(gr.stats.forbes(gr2, strandedness="same"))

Please report any issues you encounter using it :)

See this paper for a discussion of jaccard and forbes: https://doi.org/10.1093/bib/bbz083