27 Run Length Encoding dicts

Since you need more than one run length to describe a genome with multiple chromosomes, pyranges has a datastructure called PyRles for collections of Rles. It can be created from a PyRanges object by invoking the to_rle function.

Rledicts support the arithmetic operations +, -, /, and *.

import pyranges as pr
gr = pr.data.chipseq()
gr_bg = pr.data.chipseq_background()
cs = gr.to_rle()
print(cs)
## chr1 +
## +--------+-----------+------+---------+-------+------+-----------+------+
## | Runs   | 1541598   | 25   | 57498   | ...   | 25   | 1156833   | 25   |
## |--------+-----------+------+---------+-------+------+-----------+------|
## | Values | 0.0       | 1.0  | 0.0     | ...   | 1.0  | 0.0       | 1.0  |
## +--------+-----------+------+---------+-------+------+-----------+------+
## Rle of length 247134924 containing 894 elements (avg. length 276437.275)
## ...
## chrY -
## +--------+-----------+------+----------+-------+------+----------+------+
## | Runs   | 7046809   | 25   | 358542   | ...   | 25   | 156610   | 25   |
## |--------+-----------+------+----------+-------+------+----------+------|
## | Values | 0.0       | 1.0  | 0.0      | ...   | 1.0  | 0.0      | 1.0  |
## +--------+-----------+------+----------+-------+------+----------+------+
## Rle of length 22210662 containing 32 elements (avg. length 694083.188)
## RleDict object with 48 chromosomes/strand pairs.
bg = gr_bg.to_rle()
print(bg)
## chr1 +
## +--------+-----------+------+-----------+-------+------+-----------+------+
## | Runs   | 1041102   | 25   | 1088232   | ...   | 25   | 1774357   | 25   |
## |--------+-----------+------+-----------+-------+------+-----------+------|
## | Values | 0.0       | 1.0  | 0.0       | ...   | 1.0  | 0.0       | 1.0  |
## +--------+-----------+------+-----------+-------+------+-----------+------+
## Rle of length 245614348 containing 724 elements (avg. length 339246.337)
## ...
## chrY -
## +--------+------------+------+--------+-------+------+------------+------+
## | Runs   | 10629111   | 25   | 3320   | ...   | 25   | 45465323   | 25   |
## |--------+------------+------+--------+-------+------+------------+------|
## | Values | 0.0        | 1.0  | 0.0    | ...   | 1.0  | 0.0        | 1.0  |
## +--------+------------+------+--------+-------+------+------------+------+
## Rle of length 57402239 containing 10 elements (avg. length 5740223.9)
## RleDict object with 50 chromosomes/strand pairs.
print(cs + bg)
## chr1 +
## +--------+-----------+------+----------+-------+------+-----------+------+
## | Runs   | 1041102   | 25   | 500471   | ...   | 25   | 1156833   | 25   |
## |--------+-----------+------+----------+-------+------+-----------+------|
## | Values | 0.0       | 1.0  | 0.0      | ...   | 1.0  | 0.0       | 1.0  |
## +--------+-----------+------+----------+-------+------+-----------+------+
## Rle of length 247134924 containing 1618 elements (avg. length 152740.991)
## ...
## chrY -
## +--------+-----------+------+----------+-------+------+------------+------+
## | Runs   | 7046809   | 25   | 358542   | ...   | 25   | 35191552   | 25   |
## |--------+-----------+------+----------+-------+------+------------+------|
## | Values | 0.0       | 1.0  | 0.0      | ...   | 1.0  | 0.0        | 1.0  |
## +--------+-----------+------+----------+-------+------+------------+------+
## Rle of length 57402239 containing 42 elements (avg. length 1366719.976)
## RleDict object with 50 chromosomes/strand pairs.

When using arithmetic operations with a stranded and an unstranded PyRle, the stranded PyRle is automatically demoted to an unstranded PyRle.

bg_stranded = gr_bg.to_rle(strand=True)
print(bg_stranded)
## chr1 +
## +--------+-----------+------+-----------+-------+------+-----------+------+
## | Runs   | 1041102   | 25   | 1088232   | ...   | 25   | 1774357   | 25   |
## |--------+-----------+------+-----------+-------+------+-----------+------|
## | Values | 0.0       | 1.0  | 0.0       | ...   | 1.0  | 0.0       | 1.0  |
## +--------+-----------+------+-----------+-------+------+-----------+------+
## Rle of length 245614348 containing 724 elements (avg. length 339246.337)
## ...
## chrY -
## +--------+------------+------+--------+-------+------+------------+------+
## | Runs   | 10629111   | 25   | 3320   | ...   | 25   | 45465323   | 25   |
## |--------+------------+------+--------+-------+------+------------+------|
## | Values | 0.0        | 1.0  | 0.0    | ...   | 1.0  | 0.0        | 1.0  |
## +--------+------------+------+--------+-------+------+------------+------+
## Rle of length 57402239 containing 10 elements (avg. length 5740223.9)
## RleDict object with 50 chromosomes/strand pairs.
print(cs + bg_stranded)
## chr1 +
## +--------+-----------+------+----------+-------+------+-----------+------+
## | Runs   | 1041102   | 25   | 500471   | ...   | 25   | 1156833   | 25   |
## |--------+-----------+------+----------+-------+------+-----------+------|
## | Values | 0.0       | 1.0  | 0.0      | ...   | 1.0  | 0.0       | 1.0  |
## +--------+-----------+------+----------+-------+------+-----------+------+
## Rle of length 247134924 containing 1618 elements (avg. length 152740.991)
## ...
## chrY -
## +--------+-----------+------+----------+-------+------+------------+------+
## | Runs   | 7046809   | 25   | 358542   | ...   | 25   | 35191552   | 25   |
## |--------+-----------+------+----------+-------+------+------------+------|
## | Values | 0.0       | 1.0  | 0.0      | ...   | 1.0  | 0.0        | 1.0  |
## +--------+-----------+------+----------+-------+------+------------+------+
## Rle of length 57402239 containing 42 elements (avg. length 1366719.976)
## RleDict object with 50 chromosomes/strand pairs.

Like Rles, PyGRles supports arithmetic operations with numbers.

print((0.67 + cs) * 5)
## chr1 +
## +--------+-----------+------+---------+-------+------+-----------+------+
## | Runs   | 1541598   | 25   | 57498   | ...   | 25   | 1156833   | 25   |
## |--------+-----------+------+---------+-------+------+-----------+------|
## | Values | 3.35      | 8.35 | 3.35    | ...   | 8.35 | 3.35      | 8.35 |
## +--------+-----------+------+---------+-------+------+-----------+------+
## Rle of length 247134924 containing 894 elements (avg. length 276437.275)
## ...
## chrY -
## +--------+-----------+------+----------+-------+------+----------+------+
## | Runs   | 7046809   | 25   | 358542   | ...   | 25   | 156610   | 25   |
## |--------+-----------+------+----------+-------+------+----------+------|
## | Values | 3.35      | 8.35 | 3.35     | ...   | 8.35 | 3.35     | 8.35 |
## +--------+-----------+------+----------+-------+------+----------+------+
## Rle of length 22210662 containing 32 elements (avg. length 694083.188)
## RleDict object with 48 chromosomes/strand pairs.

You can subset an Rledict with a pyranges:

print(bg[gr_bg])
## +--------------+-----------+-----------+-----------+-----------+-------+
## | Chromosome   | Start     | End       | ID        | Run       | +2    |
## | (object)     | (int64)   | (int64)   | (int64)   | (int64)   | ...   |
## |--------------+-----------+-----------+-----------+-----------+-------|
## | chr1         | 39036822  | 39036847  | 0         | 25        | ...   |
## | chr1         | 224145989 | 224146014 | 1         | 25        | ...   |
## | chr1         | 167802964 | 167802989 | 2         | 25        | ...   |
## | chr1         | 69101066  | 69101091  | 3         | 25        | ...   |
## | ...          | ...       | ...       | ...       | ...       | ...   |
## | chrY         | 11936866  | 11936891  | 1         | 25        | ...   |
## | chrY         | 10629111  | 10629136  | 2         | 25        | ...   |
## | chrY         | 10632456  | 10632481  | 3         | 25        | ...   |
## | chrY         | 11918814  | 11918839  | 4         | 25        | ...   |
## +--------------+-----------+-----------+-----------+-----------+-------+
## Stranded PyRanges object has 10,004 rows and 7 columns from 25 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 2 hidden columns: Value, Strand