6 Intersecting Ranges
PyRanges objects can be intersected with other PyRanges to find the subset of the genome that is contained in both. The regular intersection-method finds the intersection of all combinations of ranges: 2
import pyranges as pr
gr = pr.load_dataset("aorta")
gr2 = pr.load_dataset("aorta2")
print(gr.intersection(gr2))
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int64) | (int64) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1 | 10073 | 10115 | H3K27me3 | 5 | - |
## | chr1 | 10073 | 10138 | H3K27me3 | 7 | + |
## | chr1 | 10073 | 10150 | H3K27me3 | 8 | - |
## | ... | ... | ... | ... | ... | ... |
## | chr1 | 10246 | 10278 | H3K27me3 | 4 | + |
## | chr1 | 10246 | 10281 | H3K27me3 | 4 | + |
## | chr1 | 10246 | 10348 | H3K27me3 | 4 | + |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 49 sequences from 1 chromosomes.
The set_intersection method clusters the intervals (i.e. merges them into one) before finding the intersection: 3
print(gr.set_intersection(gr2))
## +--------------+-----------+-----------+
## | Chromosome | Start | End |
## | (category) | (int64) | (int64) |
## |--------------+-----------+-----------|
## | chr1 | 9988 | 10445 |
## +--------------+-----------+-----------+
## PyRanges object has 1 sequences from 1 chromosomes.
Both methods also take a strandedness option, which can either be "same"
, "opposite"
or False
/None
print(gr.set_intersection(gr2, strandedness="opposite"))
## +--------------+-----------+-----------+--------------+
## | Chromosome | Start | End | Strand |
## | (category) | (int64) | (int64) | (category) |
## |--------------+-----------+-----------+--------------|
## | chr1 | 9988 | 10223 | + |
## | chr1 | 10246 | 10348 | + |
## | chr1 | 10073 | 10272 | - |
## | chr1 | 10280 | 10440 | - |
## +--------------+-----------+-----------+--------------+
## PyRanges object has 4 sequences from 1 chromosomes.
The intersection method also takes a how argument, which currently accepts the option "containment"
, which requires that the intervals in self be completely within the intervals in other.
f1 = pr.load_dataset("f1")
print(f1)
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int64) | (int64) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1 | 3 | 6 | interval1 | 0 | + |
## | chr1 | 5 | 7 | interval2 | 0 | - |
## | chr1 | 8 | 9 | interval3 | 0 | + |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 3 sequences from 1 chromosomes.
f2 = pr.load_dataset("f2")
print(f2)
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int64) | (int64) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1 | 1 | 2 | a | 0 | + |
## | chr1 | 6 | 7 | b | 0 | - |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 2 sequences from 1 chromosomes.
result = f2.intersection(f1, how="containment")
print(result)
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int64) | (int64) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1 | 6 | 7 | b | 0 | - |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 1 sequences from 1 chromosomes.