6 Intersecting Ranges

PyRanges objects can be intersected with other PyRanges to find the subset of the genome that is contained in both. The regular intersection-method finds the intersection of all combinations of ranges: 2

import pyranges as pr
gr = pr.load_dataset("aorta")
gr2 = pr.load_dataset("aorta2")
print(gr.intersection(gr2))
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   | Start     | End       | Name       | Score     | Strand       |
## | (category)   | (int64)   | (int64)   | (object)   | (int64)   | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1         | 10073     | 10115     | H3K27me3   | 5         | -            |
## | chr1         | 10073     | 10138     | H3K27me3   | 7         | +            |
## | chr1         | 10073     | 10150     | H3K27me3   | 8         | -            |
## | ...          | ...       | ...       | ...        | ...       | ...          |
## | chr1         | 10246     | 10278     | H3K27me3   | 4         | +            |
## | chr1         | 10246     | 10281     | H3K27me3   | 4         | +            |
## | chr1         | 10246     | 10348     | H3K27me3   | 4         | +            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 49 sequences from 1 chromosomes.

The set_intersection method clusters the intervals (i.e. merges them into one) before finding the intersection: 3

print(gr.set_intersection(gr2))
## +--------------+-----------+-----------+
## | Chromosome   |     Start |       End |
## | (category)   |   (int64) |   (int64) |
## |--------------+-----------+-----------|
## | chr1         |      9988 |     10445 |
## +--------------+-----------+-----------+
## PyRanges object has 1 sequences from 1 chromosomes.

Both methods also take a strandedness option, which can either be "same", "opposite" or False/None

print(gr.set_intersection(gr2, strandedness="opposite"))
## +--------------+-----------+-----------+--------------+
## | Chromosome   |     Start |       End | Strand       |
## | (category)   |   (int64) |   (int64) | (category)   |
## |--------------+-----------+-----------+--------------|
## | chr1         |      9988 |     10223 | +            |
## | chr1         |     10246 |     10348 | +            |
## | chr1         |     10073 |     10272 | -            |
## | chr1         |     10280 |     10440 | -            |
## +--------------+-----------+-----------+--------------+
## PyRanges object has 4 sequences from 1 chromosomes.

The intersection method also takes a how argument, which currently accepts the option "containment", which requires that the intervals in self be completely within the intervals in other.

f1 = pr.load_dataset("f1")
print(f1)
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   |     Start |       End | Name       |     Score | Strand       |
## | (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1         |         3 |         6 | interval1  |         0 | +            |
## | chr1         |         5 |         7 | interval2  |         0 | -            |
## | chr1         |         8 |         9 | interval3  |         0 | +            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 3 sequences from 1 chromosomes.
f2 = pr.load_dataset("f2")
print(f2)
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   |     Start |       End | Name       |     Score | Strand       |
## | (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1         |         1 |         2 | a          |         0 | +            |
## | chr1         |         6 |         7 | b          |         0 | -            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 2 sequences from 1 chromosomes.
result = f2.intersection(f1, how="containment")
print(result)
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   |     Start |       End | Name       |     Score | Strand       |
## | (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1         |         6 |         7 | b          |         0 | -            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 1 sequences from 1 chromosomes.

  1. This is the same behavior as bedtools intersect.

  2. This is the same behavior as Bioconductor GenomicRanges intersect.