8 Joining Ranges

You can combine all the intervals that overlap in two PyRanges objects with the join method. If you do not use a suffix, the default _b is chosen.

import pyranges as pr
gr = pr.load_dataset("aorta")
gr2 = pr.load_dataset("aorta2")
print(gr.join(gr2, suffix="_2"))
## +--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   | Start     | End       | Name       | Score     | Strand       | Start_2   | End_2     | Name_2     | Score_2   | Strand_2     |
## | (category)   | (int64)   | (int64)   | (object)   | (int64)   | (category)   | (int64)   | (int64)   | (object)   | (int64)   | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------|
## | chr1         | 9916      | 10115     | H3K27me3   | 5         | -            | 10073     | 10272     | Input      | 1         | +            |
## | chr1         | 9939      | 10138     | H3K27me3   | 7         | +            | 10073     | 10272     | Input      | 1         | +            |
## | chr1         | 9951      | 10150     | H3K27me3   | 8         | -            | 10073     | 10272     | Input      | 1         | +            |
## | ...          | ...       | ...       | ...        | ...       | ...          | ...       | ...       | ...        | ...       | ...          |
## | chr1         | 10246     | 10445     | H3K27me3   | 4         | +            | 10079     | 10278     | Input      | 1         | -            |
## | chr1         | 10246     | 10445     | H3K27me3   | 4         | +            | 10082     | 10281     | Input      | 1         | -            |
## | chr1         | 10246     | 10445     | H3K27me3   | 4         | +            | 10149     | 10348     | Input      | 1         | -            |
## +--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 49 sequences from 1 chromosomes.

Both methods also take a strandedness option, which can either be "same", "opposite" or False/None

print(gr.join(gr2, strandedness="opposite"))
## +--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   | Start     | End       | Name       | Score     | Strand       | Start_b   | End_b     | Name_b     | Score_b   | Strand_b     |
## | (category)   | (int64)   | (int64)   | (object)   | (int64)   | (category)   | (int64)   | (int64)   | (object)   | (int64)   | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------|
## | chr1         | 9939      | 10138     | H3K27me3   | 7         | +            | 9988      | 10187     | Input      | 1         | -            |
## | chr1         | 9939      | 10138     | H3K27me3   | 7         | +            | 10079     | 10278     | Input      | 1         | -            |
## | chr1         | 9939      | 10138     | H3K27me3   | 7         | +            | 10082     | 10281     | Input      | 1         | -            |
## | ...          | ...       | ...       | ...        | ...       | ...          | ...       | ...       | ...        | ...       | ...          |
## | chr1         | 10127     | 10326     | H3K27me3   | 1         | -            | 10280     | 10479     | Input      | 1         | +            |
## | chr1         | 10241     | 10440     | H3K27me3   | 6         | -            | 10073     | 10272     | Input      | 1         | +            |
## | chr1         | 10241     | 10440     | H3K27me3   | 6         | -            | 10280     | 10479     | Input      | 1         | +            |
## +--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 22 sequences from 1 chromosomes.

The join method also takes a how argument, which currently accepts the option "containment", which requires that the intervals in self be completely within the intervals in other.

print(f2.join(f1, how="containment"))
## +--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   |     Start |       End | Name       |     Score | Strand       |   Start_b |     End_b | Name_b     |   Score_b | Strand_b     |
## | (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------|
## | chr1         |         6 |         7 | b          |         0 | -            |         5 |         7 | interval2  |         0 | -            |
## +--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 1 sequences from 1 chromosomes.

The join method also takes the argument new_pos which can either be unionor ìntersection. The default suffixes are ["_a", "_b"], but a suffixes argument overrides this.

print(f2.join(f1, new_pos="intersection"))
## +--------------+-----------+-----------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   |     Start |       End |   Start_a |     End_a | Name_a     |   Score_a | Strand       |   Start_b |     End_b | Name_b     |   Score_b | Strand_b     |
## | (category)   |   (int64) |   (int64) |   (int64) |   (int64) | (object)   |   (int64) | (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
## |--------------+-----------+-----------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------|
## | chr1         |         6 |         7 |         6 |         7 | b          |         0 | -            |         5 |         7 | interval2  |         0 | -            |
## +--------------+-----------+-----------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 1 sequences from 1 chromosomes.