8 Joining Ranges
You can combine all the intervals that overlap in two PyRanges objects with the join method. If you do not use a suffix, the default _b
is chosen.
import pyranges as pr
gr = pr.load_dataset("aorta")
gr2 = pr.load_dataset("aorta2")
print(gr.join(gr2, suffix="_2"))
## +--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand | Start_2 | End_2 | Name_2 | Score_2 | Strand_2 |
## | (category) | (int64) | (int64) | (object) | (int64) | (category) | (int64) | (int64) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------|
## | chr1 | 9916 | 10115 | H3K27me3 | 5 | - | 10073 | 10272 | Input | 1 | + |
## | chr1 | 9939 | 10138 | H3K27me3 | 7 | + | 10073 | 10272 | Input | 1 | + |
## | chr1 | 9951 | 10150 | H3K27me3 | 8 | - | 10073 | 10272 | Input | 1 | + |
## | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
## | chr1 | 10246 | 10445 | H3K27me3 | 4 | + | 10079 | 10278 | Input | 1 | - |
## | chr1 | 10246 | 10445 | H3K27me3 | 4 | + | 10082 | 10281 | Input | 1 | - |
## | chr1 | 10246 | 10445 | H3K27me3 | 4 | + | 10149 | 10348 | Input | 1 | - |
## +--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 49 sequences from 1 chromosomes.
Both methods also take a strandedness option, which can either be "same"
, "opposite"
or False
/None
print(gr.join(gr2, strandedness="opposite"))
## +--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand | Start_b | End_b | Name_b | Score_b | Strand_b |
## | (category) | (int64) | (int64) | (object) | (int64) | (category) | (int64) | (int64) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------|
## | chr1 | 9939 | 10138 | H3K27me3 | 7 | + | 9988 | 10187 | Input | 1 | - |
## | chr1 | 9939 | 10138 | H3K27me3 | 7 | + | 10079 | 10278 | Input | 1 | - |
## | chr1 | 9939 | 10138 | H3K27me3 | 7 | + | 10082 | 10281 | Input | 1 | - |
## | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
## | chr1 | 10127 | 10326 | H3K27me3 | 1 | - | 10280 | 10479 | Input | 1 | + |
## | chr1 | 10241 | 10440 | H3K27me3 | 6 | - | 10073 | 10272 | Input | 1 | + |
## | chr1 | 10241 | 10440 | H3K27me3 | 6 | - | 10280 | 10479 | Input | 1 | + |
## +--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 22 sequences from 1 chromosomes.
The join method also takes a how argument, which currently accepts the option "containment"
, which requires that the intervals in self be completely within the intervals in other.
print(f2.join(f1, how="containment"))
## +--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand | Start_b | End_b | Name_b | Score_b | Strand_b |
## | (category) | (int64) | (int64) | (object) | (int64) | (category) | (int64) | (int64) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------|
## | chr1 | 6 | 7 | b | 0 | - | 5 | 7 | interval2 | 0 | - |
## +--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 1 sequences from 1 chromosomes.
The join method also takes the argument new_pos which can either be union
or ìntersection
. The default suffixes are ["_a", "_b"]
, but a suffixes argument overrides this.
print(f2.join(f1, new_pos="intersection"))
## +--------------+-----------+-----------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Start_a | End_a | Name_a | Score_a | Strand | Start_b | End_b | Name_b | Score_b | Strand_b |
## | (category) | (int64) | (int64) | (int64) | (int64) | (object) | (int64) | (category) | (int64) | (int64) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------|
## | chr1 | 6 | 7 | 6 | 7 | b | 0 | - | 5 | 7 | interval2 | 0 | - |
## +--------------+-----------+-----------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 1 sequences from 1 chromosomes.