14 Intersecting Ranges

PyRanges objects can be intersected with other PyRanges to find the subset of the genome that is contained in both. The regular intersect-method finds the intersection of all combinations of ranges: 1

import pyranges as pr
gr = pr.data.aorta()
gr2 = pr.data.aorta2()
print(gr.intersect(gr2))
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   | Start     | End       | Name       | Score     | Strand       |
## | (category)   | (int32)   | (int32)   | (object)   | (int64)   | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1         | 9988      | 10138     | H3K27me3   | 7         | +            |
## | chr1         | 10073     | 10138     | H3K27me3   | 7         | +            |
## | chr1         | 10079     | 10138     | H3K27me3   | 7         | +            |
## | chr1         | 10082     | 10138     | H3K27me3   | 7         | +            |
## | ...          | ...       | ...       | ...        | ...       | ...          |
## | chr1         | 10241     | 10278     | H3K27me3   | 6         | -            |
## | chr1         | 10241     | 10281     | H3K27me3   | 6         | -            |
## | chr1         | 10241     | 10348     | H3K27me3   | 6         | -            |
## | chr1         | 10280     | 10440     | H3K27me3   | 6         | -            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 49 rows and 6 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.

The set_intersect method merges the intervals before finding the intersect: 2

print(gr.set_intersect(gr2))
## +--------------+-----------+-----------+
## | Chromosome   |     Start |       End |
## | (category)   |   (int32) |   (int32) |
## |--------------+-----------+-----------|
## | chr1         |      9988 |     10445 |
## +--------------+-----------+-----------+
## Unstranded PyRanges object has 1 rows and 3 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome.

Both methods also take a strandedness option, which can either be "same", "opposite" or False/None

print(gr.set_intersect(gr2, strandedness="opposite"))
## +--------------+-----------+-----------+--------------+
## | Chromosome   |     Start |       End | Strand       |
## | (category)   |   (int32) |   (int32) | (category)   |
## |--------------+-----------+-----------+--------------|
## | chr1         |      9988 |     10223 | +            |
## | chr1         |     10246 |     10348 | +            |
## | chr1         |     10073 |     10272 | -            |
## | chr1         |     10280 |     10440 | -            |
## +--------------+-----------+-----------+--------------+
## Stranded PyRanges object has 4 rows and 4 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.

The intersect method also takes a how argument, which currently accepts the option "containment", "first" or "last". The former gives you the intervals in self be completely within the intervals in other, while first and last gives you the first and last overlap, respectively.

f1 = pr.data.f1()
print(f1)
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   |     Start |       End | Name       |     Score | Strand       |
## | (category)   |   (int32) |   (int32) | (object)   |   (int64) | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1         |         3 |         6 | interval1  |         0 | +            |
## | chr1         |         8 |         9 | interval3  |         0 | +            |
## | chr1         |         5 |         7 | interval2  |         0 | -            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
f2 = pr.data.f2()
print(f2)
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   |     Start |       End | Name       |     Score | Strand       |
## | (category)   |   (int32) |   (int32) | (object)   |   (int64) | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1         |         1 |         2 | a          |         0 | +            |
## | chr1         |         6 |         7 | b          |         0 | -            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 2 rows and 6 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
result = f2.intersect(f1, how="containment")
print(result)
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   |     Start |       End | Name       |     Score | Strand       |
## | (category)   |   (int32) |   (int32) | (object)   |   (int64) | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1         |         6 |         7 | b          |         0 | -            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 1 rows and 6 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.