21 Statistics: Simes method

Simes method lets you combine dependent p-values into one. The function takes three arguments: a dataframe, the columns identifying the rows to merge and the column containing the p-values.

import numpy as np
import pyranges as pr
gr = pr.random()
gr.P = np.random.random(len(gr))
gr.Cluster = np.random.randint(20, size=len(gr))
print(gr)
## +--------------+-----------+-----------+--------------+-------+
## | Chromosome   | Start     | End       | Strand       | +2    |
## | (category)   | (int32)   | (int32)   | (category)   | ...   |
## |--------------+-----------+-----------+--------------+-------|
## | chr1         | 27293486  | 27293586  | +            | ...   |
## | chr1         | 127383507 | 127383607 | +            | ...   |
## | chr1         | 225512616 | 225512716 | +            | ...   |
## | chr1         | 176131037 | 176131137 | +            | ...   |
## | ...          | ...       | ...       | ...          | ...   |
## | chrY         | 38491895  | 38491995  | -            | ...   |
## | chrY         | 26337904  | 26338004  | -            | ...   |
## | chrY         | 40786076  | 40786176  | -            | ...   |
## | chrY         | 26500663  | 26500763  | -            | ...   |
## +--------------+-----------+-----------+--------------+-------+
## Stranded PyRanges object has 1,000 rows and 6 columns from 24 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 2 hidden columns: P, Cluster
print(pr.stats.simes(gr.df, "Cluster", "P"))
##     Cluster     Simes
## 0         0  0.294263
## 1         1  0.194612
## 2         2  0.138653
## 3         3  0.242226
## 4         4  0.547232
## 5         5  0.831406
## 6         6  0.474479
## 7         7  0.862408
## 8         8  0.551776
## 9         9  0.114838
## 10       10  0.426132
## 11       11  0.055800
## 12       12  0.540235
## 13       13  0.880630
## 14       14  0.492563
## 15       15  0.276158
## 16       16  0.266277
## 17       17  0.667854
## 18       18  0.916523
## 19       19  0.102858
print(pr.stats.simes(gr.df, ["Cluster", "Strand"], "P"))
##     Cluster Strand     Simes
## 0         0      +  0.815705
## 1         0      -  0.122011
## 2         1      +  0.647093
## 3         1      -  0.111207
## 4         2      +  0.195375
## 5         2      -  0.043096
## 6         3      +  0.998096
## 7         3      -  0.159766
## 8         4      +  0.539287
## 9         4      -  0.442816
## 10        5      +  0.875164
## 11        5      -  0.423856
## 12        6      +  0.468396
## 13        6      -  0.414288
## 14        7      +  0.934276
## 15        7      -  0.688575
## 16        8      +  0.425527
## 17        8      -  0.523601
## 18        9      +  0.955002
## 19        9      -  0.061247
## 20       10      +  0.213066
## 21       10      -  0.759273
## 22       11      +  0.023644
## 23       11      -  0.211788
## 24       12      +  0.598323
## 25       12      -  0.401967
## 26       13      +  0.607969
## 27       13      -  0.617153
## 28       14      +  0.468841
## 29       14      -  0.321674
## 30       15      +  0.155036
## 31       15      -  0.345685
## 32       16      +  0.313127
## 33       16      -  0.211012
## 34       17      +  0.444700
## 35       17      -  0.594878
## 36       18      +  0.911100
## 37       18      -  0.776099
## 38       19      +  0.370946
## 39       19      -  0.051429