17 Finding the closest intervals

With the nearest-method, you can search for the feature in other that is nearest the ones in self.

import pyranges as pr
gr = pr.data.chipseq()
gr2 = pr.data.chipseq_background()
print(gr.nearest(gr2, suffix="_Input"))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome   | Start     | End       | Name       | Score     | +7    |
## | (category)   | (int32)   | (int32)   | (object)   | (int64)   | ...   |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1         | 226987592 | 226987617 | U0         | 0         | ...   |
## | chr1         | 1541598   | 1541623   | U0         | 0         | ...   |
## | chr1         | 1599121   | 1599146   | U0         | 0         | ...   |
## | chr1         | 3504032   | 3504057   | U0         | 0         | ...   |
## | ...          | ...       | ...       | ...        | ...       | ...   |
## | chrY         | 21751211  | 21751236  | U0         | 0         | ...   |
## | chrY         | 21910706  | 21910731  | U0         | 0         | ...   |
## | chrY         | 22054002  | 22054027  | U0         | 0         | ...   |
## | chrY         | 22210637  | 22210662  | U0         | 0         | ...   |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 10,000 rows and 12 columns from 24 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 7 hidden columns: Strand, Start_Input, End_Input, Name_Input, Score_Input, Strand_Input, ... (+ 1 more.)

The nearest method takes a strandedness option, which can either be "same", "opposite" or False/None

print(gr.nearest(gr2, suffix="_Input", strandedness="opposite"))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome   | Start     | End       | Name       | Score     | +7    |
## | (category)   | (int32)   | (int32)   | (object)   | (int64)   | ...   |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1         | 226987592 | 226987617 | U0         | 0         | ...   |
## | chr1         | 1541598   | 1541623   | U0         | 0         | ...   |
## | chr1         | 1599121   | 1599146   | U0         | 0         | ...   |
## | chr1         | 3504032   | 3504057   | U0         | 0         | ...   |
## | ...          | ...       | ...       | ...        | ...       | ...   |
## | chrY         | 21751211  | 21751236  | U0         | 0         | ...   |
## | chrY         | 21910706  | 21910731  | U0         | 0         | ...   |
## | chrY         | 22054002  | 22054027  | U0         | 0         | ...   |
## | chrY         | 22210637  | 22210662  | U0         | 0         | ...   |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 10,000 rows and 12 columns from 24 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 7 hidden columns: Strand, Start_Input, End_Input, Name_Input, Score_Input, Strand_Input, ... (+ 1 more.)

The nearest method also takes two variables, namely how and overlap. How can take the values None, "upstream", "downstream", "next" and "previous". "upstream" and "downstream" are always in reference to the PyRange the method is called on. "next" (to the right) and "previous" (to the left) are most useful to implement your own custom nearest methods. The default is None, which means that PyRanges looks in both directions. The overlap argument is a bool which indicates whether you want to include overlaps or not.

f1 = pr.data.f1()
print(f1)
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   |     Start |       End | Name       |     Score | Strand       |
## | (category)   |   (int32) |   (int32) | (object)   |   (int64) | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1         |         3 |         6 | interval1  |         0 | +            |
## | chr1         |         8 |         9 | interval3  |         0 | +            |
## | chr1         |         5 |         7 | interval2  |         0 | -            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
f2 = pr.data.f2()
print(f2)
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   |     Start |       End | Name       |     Score | Strand       |
## | (category)   |   (int32) |   (int32) | (object)   |   (int64) | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1         |         1 |         2 | a          |         0 | +            |
## | chr1         |         6 |         7 | b          |         0 | -            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 2 rows and 6 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
print(f2.nearest(f1, strandedness="opposite", how="next"))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome   |     Start |       End | Name       |     Score | +7    |
## | (category)   |   (int32) |   (int32) | (object)   |   (int64) | ...   |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1         |         1 |         2 | a          |         0 | ...   |
## | chr1         |         6 |         7 | b          |         0 | ...   |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 2 rows and 12 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 7 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b, Distance
print(f2.nearest(f1, how="upstream"))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome   |     Start |       End | Name       |     Score | +7    |
## | (category)   |   (int32) |   (int32) | (object)   |   (int64) | ...   |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1         |         6 |         7 | b          |         0 | ...   |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 1 rows and 12 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 7 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b, Distance
print(f2.nearest(f1, strandedness="opposite", how="next", overlap=False))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome   |     Start |       End | Name       |     Score | +7    |
## | (category)   |   (int32) |   (int32) | (object)   |   (int64) | ...   |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1         |         1 |         2 | a          |         0 | ...   |
## | chr1         |         6 |         7 | b          |         0 | ...   |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 2 rows and 12 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 7 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b, Distance