5 Methods for manipulating single PyRanges
There are several methods for manipulating the contents of a PyRanges.
cluster
is a mathematical set operation which creates a union of all the intervals in the ranges:
f1 = pr.load_dataset("f1")
print(f1.cluster())
## +--------------+-----------+-----------+
## | Chromosome | Start | End |
## | (category) | (int64) | (int64) |
## |--------------+-----------+-----------|
## | chr1 | 3 | 7 |
## | chr1 | 8 | 9 |
## +--------------+-----------+-----------+
## PyRanges object has 2 sequences from 1 chromosomes.
tssify
finds the starts of the regions (taking direction of transcription into account). It is named -ify to make clear that it is not finding the actual tsses (which requires metadata that signifies which intervals represent transcripts).
f1.tssify()
print(f1.tssify(slack=5))
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int64) | (int64) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1 | 0 | 9 | interval1 | 0 | + |
## | chr1 | 2 | 13 | interval2 | 0 | - |
## | chr1 | 3 | 14 | interval3 | 0 | + |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 3 sequences from 1 chromosomes.
tesify
finds the ends of the regions (taking direction of transcription into account).
f1.tesify()
print(f1.tesify(slack=5))
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int64) | (int64) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1 | 1 | 12 | interval1 | 0 | + |
## | chr1 | 2 | 13 | interval2 | 0 | - |
## | chr1 | 4 | 15 | interval3 | 0 | + |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 3 sequences from 1 chromosomes.
slack
extends the starts and ends of your interval.
print(f1.slack(5))
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int64) | (int64) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1 | 0 | 11 | interval1 | 0 | + |
## | chr1 | 0 | 12 | interval2 | 0 | - |
## | chr1 | 3 | 14 | interval3 | 0 | + |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 3 sequences from 1 chromosomes.