20 Statistics: colocalization and co-occurence measures
PyRanges can compute a normalized Jaccard-statistic (ranging from 0 to 1) to compute the similarities between two ranges.
import pyranges as pr
= pr.data.chipseq()
gr = pr.data.chipseq_background()
gr2 print(gr.stats.jaccard(gr2, strandedness="same"))
## 6.657609543683281e-05
To compute the similarities between two sets of ranges which do not necessarily overlap much, we can use the relative distance function. It describes the relative distances between each interval in one set and the two closest intervals in another. Any deviance from a uniform distribution is an indication of spatial correlation.
print(gr.stats.relative_distance(gr2, strandedness="same"))
## reldist count total fraction
## 0 0.00 254 9930 0.025579
## 1 0.01 210 9930 0.021148
## 2 0.02 222 9930 0.022356
## 3 0.03 240 9930 0.024169
## 4 0.04 212 9930 0.021349
## 5 0.05 191 9930 0.019235
## 6 0.06 192 9930 0.019335
## 7 0.07 205 9930 0.020645
## 8 0.08 162 9930 0.016314
## 9 0.09 189 9930 0.019033
## 10 0.10 186 9930 0.018731
## 11 0.11 212 9930 0.021349
## 12 0.12 209 9930 0.021047
## 13 0.13 189 9930 0.019033
## 14 0.14 201 9930 0.020242
## 15 0.15 178 9930 0.017925
## 16 0.16 203 9930 0.020443
## 17 0.17 224 9930 0.022558
## 18 0.18 196 9930 0.019738
## 19 0.19 212 9930 0.021349
## 20 0.20 208 9930 0.020947
## 21 0.21 196 9930 0.019738
## 22 0.22 203 9930 0.020443
## 23 0.23 198 9930 0.019940
## 24 0.24 223 9930 0.022457
## 25 0.25 186 9930 0.018731
## 26 0.26 189 9930 0.019033
## 27 0.27 192 9930 0.019335
## 28 0.28 163 9930 0.016415
## 29 0.29 204 9930 0.020544
## 30 0.30 210 9930 0.021148
## 31 0.31 202 9930 0.020342
## 32 0.32 211 9930 0.021249
## 33 0.33 195 9930 0.019637
## 34 0.34 197 9930 0.019839
## 35 0.35 175 9930 0.017623
## 36 0.36 214 9930 0.021551
## 37 0.37 178 9930 0.017925
## 38 0.38 176 9930 0.017724
## 39 0.39 193 9930 0.019436
## 40 0.40 192 9930 0.019335
## 41 0.41 179 9930 0.018026
## 42 0.42 209 9930 0.021047
## 43 0.43 184 9930 0.018530
## 44 0.44 198 9930 0.019940
## 45 0.45 208 9930 0.020947
## 46 0.46 192 9930 0.019335
## 47 0.47 184 9930 0.018530
## 48 0.48 183 9930 0.018429
## 49 0.49 201 9930 0.020242
PyRanges also contains yet another method (which is still in beta-mode) for computing colocalization statistics, the Forbes coefficient:
print(gr.stats.forbes(gr2, strandedness="same"))
Please report any issues you encounter using it :)
See this paper for a discussion of jaccard
and forbes
:
https://doi.org/10.1093/bib/bbz083