25 Turning Ranges into RLEs

Ranges can be turned into dicts of run length encodings with the to_rle function:

import pyranges as pr
gr = pr.data.aorta()
print(gr)
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   | Start     | End       | Name       | Score     | Strand       |
## | (category)   | (int32)   | (int32)   | (object)   | (int64)   | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1         | 9939      | 10138     | H3K27me3   | 7         | +            |
## | chr1         | 9953      | 10152     | H3K27me3   | 5         | +            |
## | chr1         | 10024     | 10223     | H3K27me3   | 1         | +            |
## | chr1         | 10246     | 10445     | H3K27me3   | 4         | +            |
## | ...          | ...       | ...       | ...        | ...       | ...          |
## | chr1         | 9978      | 10177     | H3K27me3   | 7         | -            |
## | chr1         | 10001     | 10200     | H3K27me3   | 5         | -            |
## | chr1         | 10127     | 10326     | H3K27me3   | 1         | -            |
## | chr1         | 10241     | 10440     | H3K27me3   | 6         | -            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 11 rows and 6 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
print(gr.to_rle())
## chr1 +
## --
## +--------+--------+------+------+-------+-------+---------+-------+
## | Runs   | 9939   | 14   | 71   | ...   | 199   | 99801   | 199   |
## |--------+--------+------+------+-------+-------+---------+-------|
## | Values | 0.0    | 1.0  | 2.0  | ...   | 1.0   | 0.0     | 1.0   |
## +--------+--------+------+------+-------+-------+---------+-------+
## Rle of length 110445 containing 10 elements (avg. length 11044.5)
## 
## chr1 -
## --
## +--------+--------+------+------+------+-------+------+------+------+-------+
## | Runs   | 9916   | 35   | 27   | 23   | ...   | 23   | 41   | 85   | 114   |
## |--------+--------+------+------+------+-------+------+------+------+-------|
## | Values | 0.0    | 1.0  | 2.0  | 3.0  | ...   | 2.0  | 1.0  | 2.0  | 1.0   |
## +--------+--------+------+------+------+-------+------+------+------+-------+
## Rle of length 10440 containing 12 elements (avg. length 870.0)
## RleDict object with 2 chromosomes/strand pairs.
print(gr.to_rle(strand=True))
## chr1 +
## --
## +--------+--------+------+------+-------+-------+---------+-------+
## | Runs   | 9939   | 14   | 71   | ...   | 199   | 99801   | 199   |
## |--------+--------+------+------+-------+-------+---------+-------|
## | Values | 0.0    | 1.0  | 2.0  | ...   | 1.0   | 0.0     | 1.0   |
## +--------+--------+------+------+-------+-------+---------+-------+
## Rle of length 110445 containing 10 elements (avg. length 11044.5)
## 
## chr1 -
## --
## +--------+--------+------+------+------+-------+------+------+------+-------+
## | Runs   | 9916   | 35   | 27   | 23   | ...   | 23   | 41   | 85   | 114   |
## |--------+--------+------+------+------+-------+------+------+------+-------|
## | Values | 0.0    | 1.0  | 2.0  | 3.0  | ...   | 2.0  | 1.0  | 2.0  | 1.0   |
## +--------+--------+------+------+------+-------+------+------+------+-------+
## Rle of length 10440 containing 12 elements (avg. length 870.0)
## RleDict object with 2 chromosomes/strand pairs.
print(gr.to_rle(strand=True, rpm=True))
## chr1 +
## --
## +--------+--------+-------------------+-------+---------+-------------------+
## | Runs   | 9939   | 14                | ...   | 99801   | 199               |
## |--------+--------+-------------------+-------+---------+-------------------|
## | Values | 0.0    | 90909.09090909091 | ...   | 0.0     | 90909.09090909091 |
## +--------+--------+-------------------+-------+---------+-------------------+
## Rle of length 110445 containing 10 elements (avg. length 11044.5)
## 
## chr1 -
## --
## +--------+--------+-------+-------------------+
## | Runs   | 9916   | ...   | 114               |
## |--------+--------+-------+-------------------|
## | Values | 0.0    | ...   | 90909.09090909091 |
## +--------+--------+-------+-------------------+
## Rle of length 10440 containing 12 elements (avg. length 870.0)
## RleDict object with 2 chromosomes/strand pairs.

To get the RPM-normalized coverage, use the rpm argument.

You can also create coverage for an any numeric value in your PyRanges:

print(gr.to_rle("Score"))
## chr1 +
## --
## +--------+--------+------+------+-------+-------+---------+-------+
## | Runs   | 9939   | 14   | 71   | ...   | 199   | 99801   | 199   |
## |--------+--------+------+------+-------+-------+---------+-------|
## | Values | 0.0    | 7.0  | 12.0 | ...   | 4.0   | 0.0     | 1.0   |
## +--------+--------+------+------+-------+-------+---------+-------+
## Rle of length 110445 containing 10 elements (avg. length 11044.5)
## 
## chr1 -
## --
## +--------+--------+------+------+------+-------+------+------+------+-------+
## | Runs   | 9916   | 35   | 27   | 23   | ...   | 23   | 41   | 85   | 114   |
## |--------+--------+------+------+------+-------+------+------+------+-------|
## | Values | 0.0    | 5.0  | 13.0 | 20.0 | ...   | 6.0  | 1.0  | 7.0  | 6.0   |
## +--------+--------+------+------+------+-------+------+------+------+-------+
## Rle of length 10440 containing 12 elements (avg. length 870.0)
## RleDict object with 2 chromosomes/strand pairs.