11 Summarizing PyRanges
The summary-function gives a summary of the lengths of the intervals in a PyRange:
import pyranges as pr
import pandas as pd
from io import StringIO
= pr.data.exons()
gr print(gr)
## +--------------+-----------+-----------+-------+
## | Chromosome | Start | End | +3 |
## | (category) | (int32) | (int32) | ... |
## |--------------+-----------+-----------+-------|
## | chrX | 135721701 | 135721963 | ... |
## | chrX | 135574120 | 135574598 | ... |
## | chrX | 47868945 | 47869126 | ... |
## | chrX | 77294333 | 77294480 | ... |
## | ... | ... | ... | ... |
## | chrY | 15409586 | 15409728 | ... |
## | chrY | 15478146 | 15478273 | ... |
## | chrY | 15360258 | 15361762 | ... |
## | chrY | 15467254 | 15467278 | ... |
## +--------------+-----------+-----------+-------+
## Stranded PyRanges object has 1,000 rows and 6 columns from 2 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 3 hidden columns: Name, Score, Strand
print(gr.summary())
## +-------+------------+--------------------+--------------------+-----------------------+
## | | pyrange | coverage_forward | coverage_reverse | coverage_unstranded |
## |-------+------------+--------------------+--------------------+-----------------------|
## | count | 1000 | 452 | 421 | 873 |
## | mean | 304.292 | 314.066 | 314.458 | 314.255 |
## | std | 640.013 | 732.655 | 587.486 | 666.23 |
## | min | 4 | 15 | 4 | 4 |
## | 25% | 88 | 84.75 | 94 | 88 |
## | 50% | 127 | 123 | 138 | 127 |
## | 75% | 195.5 | 183 | 212 | 199 |
## | max | 6063 | 6063 | 5322 | 6063 |
## | sum | 304292 | 141958 | 132387 | 274345 |
## +-------+------------+--------------------+--------------------+-----------------------+
## None
The column coverage_stranded
tells you how the data looks when merging all
overlapping features (taking strand into account), and coverage_unstranded
is
the same, but all features are merged independent of their strand.
To only get the length in bp of a PyRanges, use the length()-function. It takes an argument as_dict=False, which returns the lengths as a vector.
print(gr.lengths())
## 0 262
## 1 478
## 2 181
## 3 147
## 4 584
## ...
## 995 64
## 996 142
## 997 127
## 998 1504
## 999 24
## Length: 1000, dtype: int32
print(gr.lengths(as_dict=False))
## 0 262
## 1 478
## 2 181
## 3 147
## 4 584
## ...
## 995 64
## 996 142
## 997 127
## 998 1504
## 999 24
## Length: 1000, dtype: int32
To get the length of the PyRanges in bp, use the length
property.
To get the non-overlapping nucleotides, use the merge()-function first.
print(gr.length)
## 304292
print(gr.merge().length)
## 274345