18 Finding the k closest intervals
With the k_nearest-method, you can search for the k nearest intervals in other that is nearest the ones in self.
import pyranges as pr
= pr.data.chipseq()
gr = pr.data.chipseq_background()
gr2 print(gr.k_nearest(gr2, suffix="_Input"))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome | Start | End | Name | Score | +7 |
## | (category) | (int32) | (int32) | (object) | (int64) | ... |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1 | 212609534 | 212609559 | U0 | 0 | ... |
## | chr1 | 169887529 | 169887554 | U0 | 0 | ... |
## | chr1 | 216711011 | 216711036 | U0 | 0 | ... |
## | chr1 | 144227079 | 144227104 | U0 | 0 | ... |
## | ... | ... | ... | ... | ... | ... |
## | chrY | 15224235 | 15224260 | U0 | 0 | ... |
## | chrY | 13517892 | 13517917 | U0 | 0 | ... |
## | chrY | 8010951 | 8010976 | U0 | 0 | ... |
## | chrY | 7405376 | 7405401 | U0 | 0 | ... |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 10,000 rows and 12 columns from 24 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 7 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b, Distance
The nearest method takes a strandedness option, which can either be
"same"
, "opposite"
or False
/None
print(gr.nearest(gr2, suffix="_Input", strandedness="opposite"))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome | Start | End | Name | Score | +7 |
## | (category) | (int32) | (int32) | (object) | (int64) | ... |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1 | 226987592 | 226987617 | U0 | 0 | ... |
## | chr1 | 1541598 | 1541623 | U0 | 0 | ... |
## | chr1 | 1599121 | 1599146 | U0 | 0 | ... |
## | chr1 | 3504032 | 3504057 | U0 | 0 | ... |
## | ... | ... | ... | ... | ... | ... |
## | chrY | 21751211 | 21751236 | U0 | 0 | ... |
## | chrY | 21910706 | 21910731 | U0 | 0 | ... |
## | chrY | 22054002 | 22054027 | U0 | 0 | ... |
## | chrY | 22210637 | 22210662 | U0 | 0 | ... |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 10,000 rows and 12 columns from 24 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 7 hidden columns: Strand, Start_Input, End_Input, Name_Input, Score_Input, Strand_Input, ... (+ 1 more.)
The nearest method takes four further options, namely how, overlap, ties and k.
How can take the values None
, "upstream"
, "downstream".
“upstream”and
“downstream”are always in reference to the PyRange the method is called on. The default is
None`, which means that PyRanges looks in both directions. The
overlap argument is a bool which indicates whether you want to include overlaps
or not. Ties is the method in which you want to resolve ties, that is intervals
with an equal distance to your query interval. The options are None which means
that you get all ties. This might be more than k if there are multiple intervals
with the same distance. The options “first” and “last” gives you the first or
last interval for each separate distance. The option “different” gives you all
nearest intervals from k different distances. k is the number of different
intervals you want to find. It can be a vector with the length of the query
vector.
import pyranges as pr
= pr.data.chipseq()
gr = pr.data.chipseq_background()
gr2 ="_Input", k=[1, 2] * 5000).print() gr.k_nearest(gr2, suffix
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome | Start | End | Name | Score | +7 |
## | (category) | (int32) | (int32) | (object) | (int64) | ... |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1 | 212609534 | 212609559 | U0 | 0 | ... |
## | chr1 | 169887529 | 169887554 | U0 | 0 | ... |
## | chr1 | 169887529 | 169887554 | U0 | 0 | ... |
## | chr1 | 216711011 | 216711036 | U0 | 0 | ... |
## | ... | ... | ... | ... | ... | ... |
## | chrY | 13517892 | 13517917 | U0 | 0 | ... |
## | chrY | 8010951 | 8010976 | U0 | 0 | ... |
## | chrY | 7405376 | 7405401 | U0 | 0 | ... |
## | chrY | 7405376 | 7405401 | U0 | 0 | ... |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 15,000 rows and 12 columns from 24 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 7 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b, Distance
Note that nearest intervals that are upstream of the query interval have a negative distance.