27 Run Length Encoding dicts
Since you need more than one run length to describe a genome with multiple chromosomes, pyranges has a datastructure called PyRles for collections of Rles. It can be created from a PyRanges object by invoking the to_rle function.
Rledicts support the arithmetic operations +, -, /, and *.
import pyranges as pr
= pr.data.chipseq()
gr = pr.data.chipseq_background()
gr_bg = gr.to_rle()
cs print(cs)
## chr1 +
## +--------+-----------+------+---------+-------+------+-----------+------+
## | Runs | 1541598 | 25 | 57498 | ... | 25 | 1156833 | 25 |
## |--------+-----------+------+---------+-------+------+-----------+------|
## | Values | 0.0 | 1.0 | 0.0 | ... | 1.0 | 0.0 | 1.0 |
## +--------+-----------+------+---------+-------+------+-----------+------+
## Rle of length 247134924 containing 894 elements (avg. length 276437.275)
## ...
## chrY -
## +--------+-----------+------+----------+-------+------+----------+------+
## | Runs | 7046809 | 25 | 358542 | ... | 25 | 156610 | 25 |
## |--------+-----------+------+----------+-------+------+----------+------|
## | Values | 0.0 | 1.0 | 0.0 | ... | 1.0 | 0.0 | 1.0 |
## +--------+-----------+------+----------+-------+------+----------+------+
## Rle of length 22210662 containing 32 elements (avg. length 694083.188)
## RleDict object with 48 chromosomes/strand pairs.
= gr_bg.to_rle()
bg print(bg)
## chr1 +
## +--------+-----------+------+-----------+-------+------+-----------+------+
## | Runs | 1041102 | 25 | 1088232 | ... | 25 | 1774357 | 25 |
## |--------+-----------+------+-----------+-------+------+-----------+------|
## | Values | 0.0 | 1.0 | 0.0 | ... | 1.0 | 0.0 | 1.0 |
## +--------+-----------+------+-----------+-------+------+-----------+------+
## Rle of length 245614348 containing 724 elements (avg. length 339246.337)
## ...
## chrY -
## +--------+------------+------+--------+-------+------+------------+------+
## | Runs | 10629111 | 25 | 3320 | ... | 25 | 45465323 | 25 |
## |--------+------------+------+--------+-------+------+------------+------|
## | Values | 0.0 | 1.0 | 0.0 | ... | 1.0 | 0.0 | 1.0 |
## +--------+------------+------+--------+-------+------+------------+------+
## Rle of length 57402239 containing 10 elements (avg. length 5740223.9)
## RleDict object with 50 chromosomes/strand pairs.
print(cs + bg)
## chr1 +
## +--------+-----------+------+----------+-------+------+-----------+------+
## | Runs | 1041102 | 25 | 500471 | ... | 25 | 1156833 | 25 |
## |--------+-----------+------+----------+-------+------+-----------+------|
## | Values | 0.0 | 1.0 | 0.0 | ... | 1.0 | 0.0 | 1.0 |
## +--------+-----------+------+----------+-------+------+-----------+------+
## Rle of length 247134924 containing 1618 elements (avg. length 152740.991)
## ...
## chrY -
## +--------+-----------+------+----------+-------+------+------------+------+
## | Runs | 7046809 | 25 | 358542 | ... | 25 | 35191552 | 25 |
## |--------+-----------+------+----------+-------+------+------------+------|
## | Values | 0.0 | 1.0 | 0.0 | ... | 1.0 | 0.0 | 1.0 |
## +--------+-----------+------+----------+-------+------+------------+------+
## Rle of length 57402239 containing 42 elements (avg. length 1366719.976)
## RleDict object with 50 chromosomes/strand pairs.
When using arithmetic operations with a stranded and an unstranded PyRle, the stranded PyRle is automatically demoted to an unstranded PyRle.
= gr_bg.to_rle(strand=True)
bg_stranded print(bg_stranded)
## chr1 +
## +--------+-----------+------+-----------+-------+------+-----------+------+
## | Runs | 1041102 | 25 | 1088232 | ... | 25 | 1774357 | 25 |
## |--------+-----------+------+-----------+-------+------+-----------+------|
## | Values | 0.0 | 1.0 | 0.0 | ... | 1.0 | 0.0 | 1.0 |
## +--------+-----------+------+-----------+-------+------+-----------+------+
## Rle of length 245614348 containing 724 elements (avg. length 339246.337)
## ...
## chrY -
## +--------+------------+------+--------+-------+------+------------+------+
## | Runs | 10629111 | 25 | 3320 | ... | 25 | 45465323 | 25 |
## |--------+------------+------+--------+-------+------+------------+------|
## | Values | 0.0 | 1.0 | 0.0 | ... | 1.0 | 0.0 | 1.0 |
## +--------+------------+------+--------+-------+------+------------+------+
## Rle of length 57402239 containing 10 elements (avg. length 5740223.9)
## RleDict object with 50 chromosomes/strand pairs.
print(cs + bg_stranded)
## chr1 +
## +--------+-----------+------+----------+-------+------+-----------+------+
## | Runs | 1041102 | 25 | 500471 | ... | 25 | 1156833 | 25 |
## |--------+-----------+------+----------+-------+------+-----------+------|
## | Values | 0.0 | 1.0 | 0.0 | ... | 1.0 | 0.0 | 1.0 |
## +--------+-----------+------+----------+-------+------+-----------+------+
## Rle of length 247134924 containing 1618 elements (avg. length 152740.991)
## ...
## chrY -
## +--------+-----------+------+----------+-------+------+------------+------+
## | Runs | 7046809 | 25 | 358542 | ... | 25 | 35191552 | 25 |
## |--------+-----------+------+----------+-------+------+------------+------|
## | Values | 0.0 | 1.0 | 0.0 | ... | 1.0 | 0.0 | 1.0 |
## +--------+-----------+------+----------+-------+------+------------+------+
## Rle of length 57402239 containing 42 elements (avg. length 1366719.976)
## RleDict object with 50 chromosomes/strand pairs.
Like Rles, PyGRles supports arithmetic operations with numbers.
print((0.67 + cs) * 5)
## chr1 +
## +--------+-----------+------+---------+-------+------+-----------+------+
## | Runs | 1541598 | 25 | 57498 | ... | 25 | 1156833 | 25 |
## |--------+-----------+------+---------+-------+------+-----------+------|
## | Values | 3.35 | 8.35 | 3.35 | ... | 8.35 | 3.35 | 8.35 |
## +--------+-----------+------+---------+-------+------+-----------+------+
## Rle of length 247134924 containing 894 elements (avg. length 276437.275)
## ...
## chrY -
## +--------+-----------+------+----------+-------+------+----------+------+
## | Runs | 7046809 | 25 | 358542 | ... | 25 | 156610 | 25 |
## |--------+-----------+------+----------+-------+------+----------+------|
## | Values | 3.35 | 8.35 | 3.35 | ... | 8.35 | 3.35 | 8.35 |
## +--------+-----------+------+----------+-------+------+----------+------+
## Rle of length 22210662 containing 32 elements (avg. length 694083.188)
## RleDict object with 48 chromosomes/strand pairs.
You can subset an Rledict with a pyranges:
print(bg[gr_bg])
## +--------------+-----------+-----------+-----------+-----------+-------+
## | Chromosome | Start | End | ID | Run | +2 |
## | (object) | (int64) | (int64) | (int64) | (int64) | ... |
## |--------------+-----------+-----------+-----------+-----------+-------|
## | chr1 | 39036822 | 39036847 | 0 | 25 | ... |
## | chr1 | 224145989 | 224146014 | 1 | 25 | ... |
## | chr1 | 167802964 | 167802989 | 2 | 25 | ... |
## | chr1 | 69101066 | 69101091 | 3 | 25 | ... |
## | ... | ... | ... | ... | ... | ... |
## | chrY | 11936866 | 11936891 | 1 | 25 | ... |
## | chrY | 10629111 | 10629136 | 2 | 25 | ... |
## | chrY | 10632456 | 10632481 | 3 | 25 | ... |
## | chrY | 11918814 | 11918839 | 4 | 25 | ... |
## +--------------+-----------+-----------+-----------+-----------+-------+
## Stranded PyRanges object has 10,004 rows and 7 columns from 25 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 2 hidden columns: Value, Strand