9 Finding the closest intervals

With the nearest-method, you can search for the feature in other that is nearest the ones in self.

import pyranges as pr
gr = pr.load_dataset("chipseq")
gr2 = pr.load_dataset("chipseq_background")
print(gr.nearest(gr2, suffix="_Input"))
## +--------------+-----------+-----------+------------+-----------+--------------+---------------+-------------+--------------+---------------+----------------+------------+
## | Chromosome   | Start     | End       | Name       | Score     | Strand       | Start_Input   | End_Input   | Name_Input   | Score_Input   | Strand_Input   | Distance   |
## | (category)   | (int64)   | (int64)   | (object)   | (int64)   | (category)   | (int64)       | (int64)     | (object)     | (int64)       | (category)     | (int64)    |
## |--------------+-----------+-----------+------------+-----------+--------------+---------------+-------------+--------------+---------------+----------------+------------|
## | chr1         | 226987592 | 226987617 | U0         | 0         | +            | 226987603     | 226987628   | U0           | 0             | -              | 0          |
## | chr15        | 26105515  | 26105540  | U0         | 0         | +            | 26105493      | 26105518    | U0           | 0             | +              | 0          |
## | chr8         | 38747226  | 38747251  | U0         | 0         | -            | 38747236      | 38747261    | U0           | 0             | +              | 0          |
## | ...          | ...       | ...       | ...        | ...       | ...          | ...           | ...         | ...          | ...           | ...            | ...        |
## | chrY         | 8316773   | 8316798   | U0         | 0         | +            | 20557165      | 20557190    | U0           | 0             | +              | 2312314    |
## | chrY         | 7463444   | 7463469   | U0         | 0         | +            | 20557165      | 20557190    | U0           | 0             | +              | 3165643    |
## | chrY         | 7405376   | 7405401   | U0         | 0         | -            | 20557165      | 20557190    | U0           | 0             | +              | 3223711    |
## +--------------+-----------+-----------+------------+-----------+--------------+---------------+-------------+--------------+---------------+----------------+------------+
## PyRanges object has 10000 sequences from 24 chromosomes.

The nearest method takes a strandedness option, which can either be "same", "opposite" or False/None

print(gr.nearest(gr2, suffix="_Input", strandedness="opposite"))
## +--------------+-----------+-----------+------------+-----------+--------------+---------------+-------------+--------------+---------------+----------------+------------+
## | Chromosome   | Start     | End       | Name       | Score     | Strand       | Start_Input   | End_Input   | Name_Input   | Score_Input   | Strand_Input   | Distance   |
## | (category)   | (int64)   | (int64)   | (object)   | (int64)   | (category)   | (int64)       | (int64)     | (object)     | (int64)       | (category)     | (int64)    |
## |--------------+-----------+-----------+------------+-----------+--------------+---------------+-------------+--------------+---------------+----------------+------------|
## | chr1         | 226987592 | 226987617 | U0         | 0         | +            | 226987603     | 226987628   | U0           | 0             | -              | 0          |
## | chr8         | 38747226  | 38747251  | U0         | 0         | -            | 38747236      | 38747261    | U0           | 0             | +              | 0          |
## | chr1         | 212609534 | 212609559 | U0         | 0         | +            | 212410559     | 212410584   | U0           | 0             | -              | 198951     |
## | ...          | ...       | ...       | ...        | ...       | ...          | ...           | ...         | ...          | ...           | ...            | ...        |
## | chrY         | 13517892  | 13517917  | U0         | 0         | -            | 11776321      | 11776346    | U0           | 0             | +              | 1741547    |
## | chrY         | 8010951   | 8010976   | U0         | 0         | -            | 20557165      | 20557190    | U0           | 0             | +              | 2632551    |
## | chrY         | 7405376   | 7405401   | U0         | 0         | -            | 20557165      | 20557190    | U0           | 0             | +              | 3238126    |
## +--------------+-----------+-----------+------------+-----------+--------------+---------------+-------------+--------------+---------------+----------------+------------+
## PyRanges object has 10000 sequences from 24 chromosomes.

The nearest method also takes two variables, namely how and overlap. How can take the values None, "next" and "previous". The default is None, which means that PyRanges looks in both directions. The default is None. The overlap argument is a bool which indicates whether you want to include overlaps or not.

f1 = pr.load_dataset("f1")
print(f1)
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   |     Start |       End | Name       |     Score | Strand       |
## | (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1         |         3 |         6 | interval1  |         0 | +            |
## | chr1         |         5 |         7 | interval2  |         0 | -            |
## | chr1         |         8 |         9 | interval3  |         0 | +            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 3 sequences from 1 chromosomes.
f2 = pr.load_dataset("f2")
print(f2)
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   |     Start |       End | Name       |     Score | Strand       |
## | (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1         |         1 |         2 | a          |         0 | +            |
## | chr1         |         6 |         7 | b          |         0 | -            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 2 sequences from 1 chromosomes.
print(f2.nearest(f1, strandedness="opposite", how="next"))
## +--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------+------------+
## | Chromosome   |     Start |       End | Name       |     Score | Strand       |   Start_b |     End_b | Name_b     |   Score_b | Strand_b     |   Distance |
## | (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |    (int64) |
## |--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------+------------|
## | chr1         |         1 |         2 | a          |         0 | +            |         5 |         7 | interval2  |         0 | -            |          4 |
## | chr1         |         6 |         7 | b          |         0 | -            |         8 |         9 | interval3  |         0 | +            |          2 |
## +--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------+------------+
## PyRanges object has 2 sequences from 1 chromosomes.
print(f2.nearest(f1, strandedness="opposite", how="next", overlap=False))
## +--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------+------------+
## | Chromosome   |     Start |       End | Name       |     Score | Strand       |   Start_b |     End_b | Name_b     |   Score_b | Strand_b     |   Distance |
## | (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |    (int64) |
## |--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------+------------|
## | chr1         |         1 |         2 | a          |         0 | +            |         5 |         7 | interval2  |         0 | -            |          4 |
## | chr1         |         6 |         7 | b          |         0 | -            |         8 |         9 | interval3  |         0 | +            |          2 |
## +--------------+-----------+-----------+------------+-----------+--------------+-----------+-----------+------------+-----------+--------------+------------+
## PyRanges object has 2 sequences from 1 chromosomes.