3 Subsetting PyRanges

There are many ways to subset a PyRanges object. Each returns a new PyRanges object and does not change the old one.

import pyranges as pr
gr = pr.load_dataset("chipseq")
print(gr)
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   | Start     | End       | Name       | Score     | Strand       |
## | (category)   | (int64)   | (int64)   | (object)   | (int64)   | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr8         | 28510032  | 28510057  | U0         | 0         | -            |
## | chr7         | 107153363 | 107153388 | U0         | 0         | -            |
## | chr5         | 135821802 | 135821827 | U0         | 0         | -            |
## | ...          | ...       | ...       | ...        | ...       | ...          |
## | chr6         | 89296757  | 89296782  | U0         | 0         | -            |
## | chr1         | 194245558 | 194245583 | U0         | 0         | +            |
## | chr8         | 57916061  | 57916086  | U0         | 0         | +            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 10000 sequences from 24 chromosomes.

Chromosome only

print(gr["chrX"])
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   | Start     | End       | Name       | Score     | Strand       |
## | (category)   | (int64)   | (int64)   | (object)   | (int64)   | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chrX         | 41852946  | 41852971  | U0         | 0         | -            |
## | chrX         | 69979838  | 69979863  | U0         | 0         | -            |
## | chrX         | 34824145  | 34824170  | U0         | 0         | -            |
## | ...          | ...       | ...       | ...        | ...       | ...          |
## | chrX         | 5044527   | 5044552   | U0         | 0         | -            |
## | chrX         | 15281263  | 15281288  | U0         | 0         | -            |
## | chrX         | 120273723 | 120273748 | U0         | 0         | -            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 282 sequences from 1 chromosomes.

Chromosome and Strand

print(gr["chrX", "-"])
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   | Start     | End       | Name       | Score     | Strand       |
## | (category)   | (int64)   | (int64)   | (object)   | (int64)   | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chrX         | 41852946  | 41852971  | U0         | 0         | -            |
## | chrX         | 69979838  | 69979863  | U0         | 0         | -            |
## | chrX         | 34824145  | 34824170  | U0         | 0         | -            |
## | ...          | ...       | ...       | ...        | ...       | ...          |
## | chrX         | 5044527   | 5044552   | U0         | 0         | -            |
## | chrX         | 15281263  | 15281288  | U0         | 0         | -            |
## | chrX         | 120273723 | 120273748 | U0         | 0         | -            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 151 sequences from 1 chromosomes.

Chromosome and Slice

print(gr["chrX", 150000000:160000000])
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   | Start     | End       | Name       | Score     | Strand       |
## | (category)   | (int64)   | (int64)   | (object)   | (int64)   | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chrX         | 151324943 | 151324968 | U0         | 0         | +            |
## | chrX         | 152902449 | 152902474 | U0         | 0         | +            |
## | chrX         | 153632850 | 153632875 | U0         | 0         | +            |
## | ...          | ...       | ...       | ...        | ...       | ...          |
## | chrX         | 151277790 | 151277815 | U0         | 0         | -            |
## | chrX         | 153037423 | 153037448 | U0         | 0         | -            |
## | chrX         | 153255924 | 153255949 | U0         | 0         | -            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 8 sequences from 1 chromosomes.

Chromosome, Strand and Slice

print(gr["chrX", "-", 150000000:160000000])
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   |     Start |       End | Name       |     Score | Strand       |
## | (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chrX         | 150277236 | 150277261 | U0         |         0 | -            |
## | chrX         | 151277790 | 151277815 | U0         |         0 | -            |
## | chrX         | 153037423 | 153037448 | U0         |         0 | -            |
## | chrX         | 153255924 | 153255949 | U0         |         0 | -            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 4 sequences from 1 chromosomes.

Slice

Only using slices returns all ranges from all chromosomes and strands within those coordinates.

print(gr[0:100000])
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   |     Start |       End | Name       |     Score | Strand       |
## | (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr2         |     33241 |     33266 | U0         |         0 | +            |
## | chr2         |     13611 |     13636 | U0         |         0 | -            |
## | chr2         |     32620 |     32645 | U0         |         0 | -            |
## | chr3         |     87179 |     87204 | U0         |         0 | +            |
## | chr4         |     45413 |     45438 | U0         |         0 | -            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 5 sequences from 3 chromosomes.

Strand

print(gr["+"])
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   | Start     | End       | Name       | Score     | Strand       |
## | (category)   | (int64)   | (int64)   | (object)   | (int64)   | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr21        | 40099618  | 40099643  | U0         | 0         | +            |
## | chr19        | 19571102  | 19571127  | U0         | 0         | +            |
## | chr2         | 19357329  | 19357354  | U0         | 0         | +            |
## | ...          | ...       | ...       | ...        | ...       | ...          |
## | chr9         | 120803448 | 120803473 | U0         | 0         | +            |
## | chr1         | 194245558 | 194245583 | U0         | 0         | +            |
## | chr8         | 57916061  | 57916086  | U0         | 0         | +            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 5050 sequences from 24 chromosomes.

Slice and Strand

print(gr["+", 0:100000])
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   |     Start |       End | Name       |     Score | Strand       |
## | (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr3         |     87179 |     87204 | U0         |         0 | +            |
## | chr2         |     33241 |     33266 | U0         |         0 | +            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 2 sequences from 2 chromosomes.