4 Subsetting PyRanges
There are many ways to subset a PyRanges object. Each returns a new PyRanges object and does not change the old one.
For data exploration, the functions head, tail and sample (random choice without replacment) are convenient. They take an argument n to denote how many entries you want.
import pyranges as pr
= pr.data.chipseq()
gr print(gr.sample())
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int32) | (int32) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr2 | 223891559 | 223891584 | U0 | 0 | + |
## | chr2 | 205599166 | 205599191 | U0 | 0 | + |
## | chr4 | 141494539 | 141494564 | U0 | 0 | + |
## | chr6 | 40587449 | 40587474 | U0 | 0 | - |
## | chr12 | 61116050 | 61116075 | U0 | 0 | + |
## | chr16 | 77874004 | 77874029 | U0 | 0 | - |
## | chr18 | 17324817 | 17324842 | U0 | 0 | + |
## | chr19 | 41397102 | 41397127 | U0 | 0 | + |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 8 rows and 6 columns from 7 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
print(gr.tail(4))
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int32) | (int32) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chrY | 15224235 | 15224260 | U0 | 0 | - |
## | chrY | 13517892 | 13517917 | U0 | 0 | - |
## | chrY | 8010951 | 8010976 | U0 | 0 | - |
## | chrY | 7405376 | 7405401 | U0 | 0 | - |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 4 rows and 6 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
By subsetting with a list you can select one or more columns:
import pyranges as pr
= pr.data.chipseq()
gr print(gr)
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int32) | (int32) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1 | 212609534 | 212609559 | U0 | 0 | + |
## | chr1 | 169887529 | 169887554 | U0 | 0 | + |
## | chr1 | 216711011 | 216711036 | U0 | 0 | + |
## | chr1 | 144227079 | 144227104 | U0 | 0 | + |
## | ... | ... | ... | ... | ... | ... |
## | chrY | 15224235 | 15224260 | U0 | 0 | - |
## | chrY | 13517892 | 13517917 | U0 | 0 | - |
## | chrY | 8010951 | 8010976 | U0 | 0 | - |
## | chrY | 7405376 | 7405401 | U0 | 0 | - |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 10,000 rows and 6 columns from 24 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
print(gr[["Name"]])
## +--------------+-----------+-----------+------------+--------------+
## | Chromosome | Start | End | Name | Strand |
## | (category) | (int32) | (int32) | (object) | (category) |
## |--------------+-----------+-----------+------------+--------------|
## | chr1 | 212609534 | 212609559 | U0 | + |
## | chr1 | 169887529 | 169887554 | U0 | + |
## | chr1 | 216711011 | 216711036 | U0 | + |
## | chr1 | 144227079 | 144227104 | U0 | + |
## | ... | ... | ... | ... | ... |
## | chrY | 15224235 | 15224260 | U0 | - |
## | chrY | 13517892 | 13517917 | U0 | - |
## | chrY | 8010951 | 8010976 | U0 | - |
## | chrY | 7405376 | 7405401 | U0 | - |
## +--------------+-----------+-----------+------------+--------------+
## Stranded PyRanges object has 10,000 rows and 5 columns from 24 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
By subsetting using a boolean vector, you can get specific rows:
import pyranges as pr
= pr.data.cpg()
cpg print(cpg[cpg.CpG > 50])
## +--------------+-----------+-----------+-----------+
## | Chromosome | Start | End | CpG |
## | (category) | (int32) | (int32) | (int64) |
## |--------------+-----------+-----------+-----------|
## | chrX | 64181 | 64793 | 62 |
## | chrX | 69133 | 70029 | 100 |
## | chrX | 148685 | 149461 | 85 |
## | chrX | 166504 | 167721 | 96 |
## | ... | ... | ... | ... |
## | chrY | 21154603 | 21155040 | 61 |
## | chrY | 21238448 | 21240005 | 133 |
## | chrY | 26351343 | 26352316 | 76 |
## | chrY | 27610115 | 27611088 | 76 |
## +--------------+-----------+-----------+-----------+
## Unstranded PyRanges object has 530 rows and 4 columns from 2 chromosomes.
## For printing, the PyRanges was sorted on Chromosome.
By using strings, tuples and slices, you can subset the PyRanges based on position:
Chromosome only
print(gr["chrX"])
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int32) | (int32) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chrX | 13843759 | 13843784 | U0 | 0 | + |
## | chrX | 114673546 | 114673571 | U0 | 0 | + |
## | chrX | 131816774 | 131816799 | U0 | 0 | + |
## | chrX | 45504745 | 45504770 | U0 | 0 | + |
## | ... | ... | ... | ... | ... | ... |
## | chrX | 146694149 | 146694174 | U0 | 0 | - |
## | chrX | 5044527 | 5044552 | U0 | 0 | - |
## | chrX | 15281263 | 15281288 | U0 | 0 | - |
## | chrX | 120273723 | 120273748 | U0 | 0 | - |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 282 rows and 6 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
Chromosome and Strand
print(gr["chrX", "-"])
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int32) | (int32) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chrX | 41852946 | 41852971 | U0 | 0 | - |
## | chrX | 69979838 | 69979863 | U0 | 0 | - |
## | chrX | 34824145 | 34824170 | U0 | 0 | - |
## | chrX | 132354117 | 132354142 | U0 | 0 | - |
## | ... | ... | ... | ... | ... | ... |
## | chrX | 146694149 | 146694174 | U0 | 0 | - |
## | chrX | 5044527 | 5044552 | U0 | 0 | - |
## | chrX | 15281263 | 15281288 | U0 | 0 | - |
## | chrX | 120273723 | 120273748 | U0 | 0 | - |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 151 rows and 6 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
Chromosome and Slice
print(gr["chrX", 150000000:160000000])
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int32) | (int32) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chrX | 151324943 | 151324968 | U0 | 0 | + |
## | chrX | 152902449 | 152902474 | U0 | 0 | + |
## | chrX | 153632850 | 153632875 | U0 | 0 | + |
## | chrX | 153874106 | 153874131 | U0 | 0 | + |
## | chrX | 150277236 | 150277261 | U0 | 0 | - |
## | chrX | 151277790 | 151277815 | U0 | 0 | - |
## | chrX | 153037423 | 153037448 | U0 | 0 | - |
## | chrX | 153255924 | 153255949 | U0 | 0 | - |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 8 rows and 6 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
Chromosome, Strand and Slice
print(gr["chrX", "-", 150000000:160000000])
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int32) | (int32) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chrX | 150277236 | 150277261 | U0 | 0 | - |
## | chrX | 151277790 | 151277815 | U0 | 0 | - |
## | chrX | 153037423 | 153037448 | U0 | 0 | - |
## | chrX | 153255924 | 153255949 | U0 | 0 | - |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 4 rows and 6 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
Slice
Only using slices returns all ranges from all chromosomes and strands within those coordinates.
print(gr[0:100000])
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int32) | (int32) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr2 | 33241 | 33266 | U0 | 0 | + |
## | chr2 | 13611 | 13636 | U0 | 0 | - |
## | chr2 | 32620 | 32645 | U0 | 0 | - |
## | chr3 | 87179 | 87204 | U0 | 0 | + |
## | chr4 | 45413 | 45438 | U0 | 0 | - |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 5 rows and 6 columns from 3 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
Note that while the slice-operator is fast and returns seemingly immediately, it is inefficient; it builds the interval overlap datastructure anew every query. So if you have multiple queries you should build another PyRanges and do an intersect-operation.
Strand
print(gr["+"])
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int32) | (int32) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1 | 212609534 | 212609559 | U0 | 0 | + |
## | chr1 | 169887529 | 169887554 | U0 | 0 | + |
## | chr1 | 216711011 | 216711036 | U0 | 0 | + |
## | chr1 | 144227079 | 144227104 | U0 | 0 | + |
## | ... | ... | ... | ... | ... | ... |
## | chrY | 21559181 | 21559206 | U0 | 0 | + |
## | chrY | 11942770 | 11942795 | U0 | 0 | + |
## | chrY | 8316773 | 8316798 | U0 | 0 | + |
## | chrY | 7463444 | 7463469 | U0 | 0 | + |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 5,050 rows and 6 columns from 24 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
Slice and Strand
print(gr["+", 0:100000])
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int32) | (int32) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr2 | 33241 | 33266 | U0 | 0 | + |
## | chr3 | 87179 | 87204 | U0 | 0 | + |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 2 rows and 6 columns from 2 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.