17 Finding the closest intervals
With the nearest-method, you can search for the feature in other that is nearest the ones in self.
import pyranges as pr
= pr.data.chipseq()
gr = pr.data.chipseq_background()
gr2 print(gr.nearest(gr2, suffix="_Input"))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome | Start | End | Name | Score | +7 |
## | (category) | (int32) | (int32) | (object) | (int64) | ... |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1 | 226987592 | 226987617 | U0 | 0 | ... |
## | chr1 | 1541598 | 1541623 | U0 | 0 | ... |
## | chr1 | 1599121 | 1599146 | U0 | 0 | ... |
## | chr1 | 3504032 | 3504057 | U0 | 0 | ... |
## | ... | ... | ... | ... | ... | ... |
## | chrY | 21751211 | 21751236 | U0 | 0 | ... |
## | chrY | 21910706 | 21910731 | U0 | 0 | ... |
## | chrY | 22054002 | 22054027 | U0 | 0 | ... |
## | chrY | 22210637 | 22210662 | U0 | 0 | ... |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 10,000 rows and 12 columns from 24 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 7 hidden columns: Strand, Start_Input, End_Input, Name_Input, Score_Input, Strand_Input, ... (+ 1 more.)
The nearest method takes a strandedness option, which can either be
"same"
, "opposite"
or False
/None
print(gr.nearest(gr2, suffix="_Input", strandedness="opposite"))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome | Start | End | Name | Score | +7 |
## | (category) | (int32) | (int32) | (object) | (int64) | ... |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1 | 226987592 | 226987617 | U0 | 0 | ... |
## | chr1 | 1541598 | 1541623 | U0 | 0 | ... |
## | chr1 | 1599121 | 1599146 | U0 | 0 | ... |
## | chr1 | 3504032 | 3504057 | U0 | 0 | ... |
## | ... | ... | ... | ... | ... | ... |
## | chrY | 21751211 | 21751236 | U0 | 0 | ... |
## | chrY | 21910706 | 21910731 | U0 | 0 | ... |
## | chrY | 22054002 | 22054027 | U0 | 0 | ... |
## | chrY | 22210637 | 22210662 | U0 | 0 | ... |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 10,000 rows and 12 columns from 24 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 7 hidden columns: Strand, Start_Input, End_Input, Name_Input, Score_Input, Strand_Input, ... (+ 1 more.)
The nearest method also takes two variables, namely how and overlap. How can
take the values None
, "upstream"
, "downstream"
, "next"
and "previous"
.
"upstream"
and "downstream"
are always in reference to the PyRange the
method is called on. "next"
(to the right) and "previous"
(to the left) are
most useful to implement your own custom nearest methods. The default is None
,
which means that PyRanges looks in both directions. The overlap argument is a
bool which indicates whether you want to include overlaps or not.
= pr.data.f1()
f1 print(f1)
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int32) | (int32) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1 | 3 | 6 | interval1 | 0 | + |
## | chr1 | 8 | 9 | interval3 | 0 | + |
## | chr1 | 5 | 7 | interval2 | 0 | - |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
= pr.data.f2()
f2 print(f2)
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int32) | (int32) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1 | 1 | 2 | a | 0 | + |
## | chr1 | 6 | 7 | b | 0 | - |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 2 rows and 6 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
print(f2.nearest(f1, strandedness="opposite", how="next"))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome | Start | End | Name | Score | +7 |
## | (category) | (int32) | (int32) | (object) | (int64) | ... |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1 | 1 | 2 | a | 0 | ... |
## | chr1 | 6 | 7 | b | 0 | ... |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 2 rows and 12 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 7 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b, Distance
print(f2.nearest(f1, how="upstream"))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome | Start | End | Name | Score | +7 |
## | (category) | (int32) | (int32) | (object) | (int64) | ... |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1 | 6 | 7 | b | 0 | ... |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 1 rows and 12 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 7 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b, Distance
print(f2.nearest(f1, strandedness="opposite", how="next", overlap=False))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome | Start | End | Name | Score | +7 |
## | (category) | (int32) | (int32) | (object) | (int64) | ... |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1 | 1 | 2 | a | 0 | ... |
## | chr1 | 6 | 7 | b | 0 | ... |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 2 rows and 12 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 7 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b, Distance