3 Subsetting PyRanges
There are many ways to subset a PyRanges object. Each returns a new PyRanges object and does not change the old one.
import pyranges as pr
gr = pr.load_dataset("chipseq")
print(gr)
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int64) | (int64) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr8 | 28510032 | 28510057 | U0 | 0 | - |
## | chr7 | 107153363 | 107153388 | U0 | 0 | - |
## | chr5 | 135821802 | 135821827 | U0 | 0 | - |
## | ... | ... | ... | ... | ... | ... |
## | chr6 | 89296757 | 89296782 | U0 | 0 | - |
## | chr1 | 194245558 | 194245583 | U0 | 0 | + |
## | chr8 | 57916061 | 57916086 | U0 | 0 | + |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 10000 sequences from 24 chromosomes.
Chromosome only
print(gr["chrX"])
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int64) | (int64) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chrX | 41852946 | 41852971 | U0 | 0 | - |
## | chrX | 69979838 | 69979863 | U0 | 0 | - |
## | chrX | 34824145 | 34824170 | U0 | 0 | - |
## | ... | ... | ... | ... | ... | ... |
## | chrX | 5044527 | 5044552 | U0 | 0 | - |
## | chrX | 15281263 | 15281288 | U0 | 0 | - |
## | chrX | 120273723 | 120273748 | U0 | 0 | - |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 282 sequences from 1 chromosomes.
Chromosome and Strand
print(gr["chrX", "-"])
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int64) | (int64) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chrX | 41852946 | 41852971 | U0 | 0 | - |
## | chrX | 69979838 | 69979863 | U0 | 0 | - |
## | chrX | 34824145 | 34824170 | U0 | 0 | - |
## | ... | ... | ... | ... | ... | ... |
## | chrX | 5044527 | 5044552 | U0 | 0 | - |
## | chrX | 15281263 | 15281288 | U0 | 0 | - |
## | chrX | 120273723 | 120273748 | U0 | 0 | - |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 151 sequences from 1 chromosomes.
Chromosome and Slice
print(gr["chrX", 150000000:160000000])
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int64) | (int64) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chrX | 151324943 | 151324968 | U0 | 0 | + |
## | chrX | 152902449 | 152902474 | U0 | 0 | + |
## | chrX | 153632850 | 153632875 | U0 | 0 | + |
## | ... | ... | ... | ... | ... | ... |
## | chrX | 151277790 | 151277815 | U0 | 0 | - |
## | chrX | 153037423 | 153037448 | U0 | 0 | - |
## | chrX | 153255924 | 153255949 | U0 | 0 | - |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 8 sequences from 1 chromosomes.
Chromosome, Strand and Slice
print(gr["chrX", "-", 150000000:160000000])
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int64) | (int64) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chrX | 150277236 | 150277261 | U0 | 0 | - |
## | chrX | 151277790 | 151277815 | U0 | 0 | - |
## | chrX | 153037423 | 153037448 | U0 | 0 | - |
## | chrX | 153255924 | 153255949 | U0 | 0 | - |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 4 sequences from 1 chromosomes.
Slice
Only using slices returns all ranges from all chromosomes and strands within those coordinates.
print(gr[0:100000])
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int64) | (int64) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr2 | 33241 | 33266 | U0 | 0 | + |
## | chr2 | 13611 | 13636 | U0 | 0 | - |
## | chr2 | 32620 | 32645 | U0 | 0 | - |
## | chr3 | 87179 | 87204 | U0 | 0 | + |
## | chr4 | 45413 | 45438 | U0 | 0 | - |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 5 sequences from 3 chromosomes.
Strand
print(gr["+"])
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int64) | (int64) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr21 | 40099618 | 40099643 | U0 | 0 | + |
## | chr19 | 19571102 | 19571127 | U0 | 0 | + |
## | chr2 | 19357329 | 19357354 | U0 | 0 | + |
## | ... | ... | ... | ... | ... | ... |
## | chr9 | 120803448 | 120803473 | U0 | 0 | + |
## | chr1 | 194245558 | 194245583 | U0 | 0 | + |
## | chr8 | 57916061 | 57916086 | U0 | 0 | + |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 5050 sequences from 24 chromosomes.
Slice and Strand
print(gr["+", 0:100000])
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int64) | (int64) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr3 | 87179 | 87204 | U0 | 0 | + |
## | chr2 | 33241 | 33266 | U0 | 0 | + |
## +--------------+-----------+-----------+------------+-----------+--------------+
## PyRanges object has 2 sequences from 2 chromosomes.