11 Summarizing PyRanges

The summary-function gives a summary of the lengths of the intervals in a PyRange:

import pyranges as pr
import pandas as pd
from io import StringIO
gr = pr.data.exons()
print(gr)
## +--------------+-----------+-----------+-------+
## | Chromosome   | Start     | End       | +3    |
## | (category)   | (int32)   | (int32)   | ...   |
## |--------------+-----------+-----------+-------|
## | chrX         | 135721701 | 135721963 | ...   |
## | chrX         | 135574120 | 135574598 | ...   |
## | chrX         | 47868945  | 47869126  | ...   |
## | chrX         | 77294333  | 77294480  | ...   |
## | ...          | ...       | ...       | ...   |
## | chrY         | 15409586  | 15409728  | ...   |
## | chrY         | 15478146  | 15478273  | ...   |
## | chrY         | 15360258  | 15361762  | ...   |
## | chrY         | 15467254  | 15467278  | ...   |
## +--------------+-----------+-----------+-------+
## Stranded PyRanges object has 1,000 rows and 6 columns from 2 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 3 hidden columns: Name, Score, Strand
print(gr.summary())
## +-------+------------+--------------------+--------------------+-----------------------+
## |       |    pyrange |   coverage_forward |   coverage_reverse |   coverage_unstranded |
## |-------+------------+--------------------+--------------------+-----------------------|
## | count |   1000     |            452     |            421     |               873     |
## | mean  |    304.292 |            314.066 |            314.458 |               314.255 |
## | std   |    640.013 |            732.655 |            587.486 |               666.23  |
## | min   |      4     |             15     |              4     |                 4     |
## | 25%   |     88     |             84.75  |             94     |                88     |
## | 50%   |    127     |            123     |            138     |               127     |
## | 75%   |    195.5   |            183     |            212     |               199     |
## | max   |   6063     |           6063     |           5322     |              6063     |
## | sum   | 304292     |         141958     |         132387     |            274345     |
## +-------+------------+--------------------+--------------------+-----------------------+
## None

The column coverage_stranded tells you how the data looks when merging all overlapping features (taking strand into account), and coverage_unstranded is the same, but all features are merged independent of their strand.

To only get the length in bp of a PyRanges, use the length()-function. It takes an argument as_dict=False, which returns the lengths as a vector.

print(gr.lengths())
## 0       262
## 1       478
## 2       181
## 3       147
## 4       584
##        ... 
## 995      64
## 996     142
## 997     127
## 998    1504
## 999      24
## Length: 1000, dtype: int32
print(gr.lengths(as_dict=False))
## 0       262
## 1       478
## 2       181
## 3       147
## 4       584
##        ... 
## 995      64
## 996     142
## 997     127
## 998    1504
## 999      24
## Length: 1000, dtype: int32

To get the length of the PyRanges in bp, use the length property. To get the non-overlapping nucleotides, use the merge()-function first.

print(gr.length)
## 304292
print(gr.merge().length)
## 274345