PyRanges documentation
Endre Bakken Stovner
2022-01-12
1 Introduction to PyRanges
This is the PyRanges’ tutorial. For docs, see: https://pyranges.readthedocs.io/en/latest/
PyRanges are collections of intervals that support comparison operations (like overlap and intersect) and other methods that are useful for genomic analyses. The ranges can have an arbitrary number of meta-data fields, i.e. columns associated with them.
The data in PyRanges objects are stored in a pandas dataframe. This means the vast Python ecosystem for high-performance scientific computing is available to manipulate the data in PyRanges-objects.
import pyranges as pr
from pyranges import PyRanges
import pandas as pd
from io import StringIO
= """Chromosome Start End Score Strand
f1 chr1 4 7 23.8 +
chr1 6 11 0.13 -
chr2 0 14 42.42 +"""
= pd.read_csv(StringIO(f1), sep="\s+")
df1 = PyRanges(df1) gr1
Now we can subset the PyRange in various ways:
print(gr1)
## +--------------+-----------+-----------+-------------+--------------+
## | Chromosome | Start | End | Score | Strand |
## | (category) | (int32) | (int32) | (float64) | (category) |
## |--------------+-----------+-----------+-------------+--------------|
## | chr1 | 4 | 7 | 23.8 | + |
## | chr1 | 6 | 11 | 0.13 | - |
## | chr2 | 0 | 14 | 42.42 | + |
## +--------------+-----------+-----------+-------------+--------------+
## Stranded PyRanges object has 3 rows and 5 columns from 2 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
print(gr1["chr1", 0:5])
## +--------------+-----------+-----------+-------------+--------------+
## | Chromosome | Start | End | Score | Strand |
## | (category) | (int32) | (int32) | (float64) | (category) |
## |--------------+-----------+-----------+-------------+--------------|
## | chr1 | 4 | 7 | 23.8 | + |
## +--------------+-----------+-----------+-------------+--------------+
## Stranded PyRanges object has 1 rows and 5 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
print(gr1["chr1", "-", 6:100])
## +--------------+-----------+-----------+-------------+--------------+
## | Chromosome | Start | End | Score | Strand |
## | (category) | (int32) | (int32) | (float64) | (category) |
## |--------------+-----------+-----------+-------------+--------------|
## | chr1 | 6 | 11 | 0.13 | - |
## +--------------+-----------+-----------+-------------+--------------+
## Stranded PyRanges object has 1 rows and 5 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
print(gr1.Score)
## 0 23.80
## 1 0.13
## 2 42.42
## Name: Score, dtype: float64
And we can perform comparison operations with two PyRanges:
= """Chromosome Start End Score Strand
f2 chr1 5 6 -0.01 -
chr1 9 12 200 +
chr3 0 14 21.21 -"""
= pd.read_csv(StringIO(f2), sep="\s+")
df2 = PyRanges(df2)
gr2 print(gr2)
## +--------------+-----------+-----------+-------------+--------------+
## | Chromosome | Start | End | Score | Strand |
## | (category) | (int32) | (int32) | (float64) | (category) |
## |--------------+-----------+-----------+-------------+--------------|
## | chr1 | 9 | 12 | 200 | + |
## | chr1 | 5 | 6 | -0.01 | - |
## | chr3 | 0 | 14 | 21.21 | - |
## +--------------+-----------+-----------+-------------+--------------+
## Stranded PyRanges object has 3 rows and 5 columns from 2 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
print(gr1.intersect(gr2, strandedness="opposite"))
## +--------------+-----------+-----------+-------------+--------------+
## | Chromosome | Start | End | Score | Strand |
## | (category) | (int32) | (int32) | (float64) | (category) |
## |--------------+-----------+-----------+-------------+--------------|
## | chr1 | 5 | 6 | 23.8 | + |
## | chr1 | 9 | 11 | 0.13 | - |
## +--------------+-----------+-----------+-------------+--------------+
## Stranded PyRanges object has 2 rows and 5 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
print(gr1.intersect(gr2, strandedness=False))
## +--------------+-----------+-----------+-------------+--------------+
## | Chromosome | Start | End | Score | Strand |
## | (category) | (int32) | (int32) | (float64) | (category) |
## |--------------+-----------+-----------+-------------+--------------|
## | chr1 | 5 | 6 | 23.8 | + |
## | chr1 | 9 | 11 | 0.13 | - |
## +--------------+-----------+-----------+-------------+--------------+
## Stranded PyRanges object has 2 rows and 5 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
There are also convenience methods for single PyRanges:
print(gr1.merge())
## +--------------+-----------+-----------+--------------+
## | Chromosome | Start | End | Strand |
## | (category) | (int32) | (int32) | (category) |
## |--------------+-----------+-----------+--------------|
## | chr1 | 4 | 7 | + |
## | chr1 | 6 | 11 | - |
## | chr2 | 0 | 14 | + |
## +--------------+-----------+-----------+--------------+
## Stranded PyRanges object has 3 rows and 4 columns from 2 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
The underlying dataframe can always be accessed:
print(gr1.df)
## Chromosome Start End Score Strand
## 0 chr1 4 7 23.80 +
## 1 chr1 6 11 0.13 -
## 2 chr2 0 14 42.42 +