1 Introduction to PyRanges

This is the PyRanges’ tutorial. For docs, see: https://pyranges.readthedocs.io/en/latest/

PyRanges are collections of intervals that support comparison operations (like overlap and intersect) and other methods that are useful for genomic analyses. The ranges can have an arbitrary number of meta-data fields, i.e. columns associated with them.

The data in PyRanges objects are stored in a pandas dataframe. This means the vast Python ecosystem for high-performance scientific computing is available to manipulate the data in PyRanges-objects.

import pyranges as pr
from pyranges import PyRanges
import pandas as pd
from io import StringIO
f1 = """Chromosome Start End Score Strand
chr1 4 7 23.8 +
chr1 6 11 0.13 -
chr2 0 14 42.42 +"""
df1 = pd.read_csv(StringIO(f1), sep="\s+")
gr1 = PyRanges(df1)

Now we can subset the PyRange in various ways:

print(gr1)
## +--------------+-----------+-----------+-------------+--------------+
## | Chromosome   |     Start |       End |       Score | Strand       |
## | (category)   |   (int32) |   (int32) |   (float64) | (category)   |
## |--------------+-----------+-----------+-------------+--------------|
## | chr1         |         4 |         7 |       23.8  | +            |
## | chr1         |         6 |        11 |        0.13 | -            |
## | chr2         |         0 |        14 |       42.42 | +            |
## +--------------+-----------+-----------+-------------+--------------+
## Stranded PyRanges object has 3 rows and 5 columns from 2 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
print(gr1["chr1", 0:5])
## +--------------+-----------+-----------+-------------+--------------+
## | Chromosome   |     Start |       End |       Score | Strand       |
## | (category)   |   (int32) |   (int32) |   (float64) | (category)   |
## |--------------+-----------+-----------+-------------+--------------|
## | chr1         |         4 |         7 |        23.8 | +            |
## +--------------+-----------+-----------+-------------+--------------+
## Stranded PyRanges object has 1 rows and 5 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
print(gr1["chr1", "-", 6:100])
## +--------------+-----------+-----------+-------------+--------------+
## | Chromosome   |     Start |       End |       Score | Strand       |
## | (category)   |   (int32) |   (int32) |   (float64) | (category)   |
## |--------------+-----------+-----------+-------------+--------------|
## | chr1         |         6 |        11 |        0.13 | -            |
## +--------------+-----------+-----------+-------------+--------------+
## Stranded PyRanges object has 1 rows and 5 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
print(gr1.Score)
## 0    23.80
## 1     0.13
## 2    42.42
## Name: Score, dtype: float64

And we can perform comparison operations with two PyRanges:

f2 = """Chromosome Start End Score Strand
chr1 5 6 -0.01 -
chr1 9 12 200 +
chr3 0 14 21.21 -"""
df2 = pd.read_csv(StringIO(f2), sep="\s+")
gr2 = PyRanges(df2)
print(gr2)
## +--------------+-----------+-----------+-------------+--------------+
## | Chromosome   |     Start |       End |       Score | Strand       |
## | (category)   |   (int32) |   (int32) |   (float64) | (category)   |
## |--------------+-----------+-----------+-------------+--------------|
## | chr1         |         9 |        12 |      200    | +            |
## | chr1         |         5 |         6 |       -0.01 | -            |
## | chr3         |         0 |        14 |       21.21 | -            |
## +--------------+-----------+-----------+-------------+--------------+
## Stranded PyRanges object has 3 rows and 5 columns from 2 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
print(gr1.intersect(gr2, strandedness="opposite"))
## +--------------+-----------+-----------+-------------+--------------+
## | Chromosome   |     Start |       End |       Score | Strand       |
## | (category)   |   (int32) |   (int32) |   (float64) | (category)   |
## |--------------+-----------+-----------+-------------+--------------|
## | chr1         |         5 |         6 |       23.8  | +            |
## | chr1         |         9 |        11 |        0.13 | -            |
## +--------------+-----------+-----------+-------------+--------------+
## Stranded PyRanges object has 2 rows and 5 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
print(gr1.intersect(gr2, strandedness=False))
## +--------------+-----------+-----------+-------------+--------------+
## | Chromosome   |     Start |       End |       Score | Strand       |
## | (category)   |   (int32) |   (int32) |   (float64) | (category)   |
## |--------------+-----------+-----------+-------------+--------------|
## | chr1         |         5 |         6 |       23.8  | +            |
## | chr1         |         9 |        11 |        0.13 | -            |
## +--------------+-----------+-----------+-------------+--------------+
## Stranded PyRanges object has 2 rows and 5 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.

There are also convenience methods for single PyRanges:

print(gr1.merge())
## +--------------+-----------+-----------+--------------+
## | Chromosome   |     Start |       End | Strand       |
## | (category)   |   (int32) |   (int32) | (category)   |
## |--------------+-----------+-----------+--------------|
## | chr1         |         4 |         7 | +            |
## | chr1         |         6 |        11 | -            |
## | chr2         |         0 |        14 | +            |
## +--------------+-----------+-----------+--------------+
## Stranded PyRanges object has 3 rows and 4 columns from 2 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.

The underlying dataframe can always be accessed:

print(gr1.df)
##   Chromosome  Start  End  Score Strand
## 0       chr1      4    7  23.80      +
## 1       chr1      6   11   0.13      -
## 2       chr2      0   14  42.42      +