9 Iterating over PyRanges
PyRanges can be iterated over by a simple for loop, and the data is guaranteed to come back in natsorted order:
import pyranges as pr
import pandas as pd
= pr.data.exons()
exons = pr.data.cpg()
cpg for k, df in cpg:
print(k)
print(df.head(3))
## chrX
## Chromosome Start End CpG
## 0 chrX 64181 64793 62
## 1 chrX 69133 70029 100
## 2 chrX 148685 149461 85
## chrY
## Chromosome Start End CpG
## 896 chrY 14181 14793 62
## 897 chrY 19133 20029 100
## 898 chrY 98685 99461 85
If the data is unstranded, the key is just the chromosome names, but if the data is stranded the key is a tuple of the chromosome and strand:
for k, df in exons:
print(k)
print(df.head(3))
## ('chrX', '+')
## Chromosome Start ... Score Strand
## 0 chrX 135721701 ... 0 +
## 2 chrX 135574120 ... 0 +
## 3 chrX 47868945 ... 0 +
##
## [3 rows x 6 columns]
## ('chrX', '-')
## Chromosome Start ... Score Strand
## 1 chrX 49069126 ... 0 -
## 4 chrX 154006958 ... 0 -
## 6 chrX 52257919 ... 0 -
##
## [3 rows x 6 columns]
## ('chrY', '+')
## Chromosome Start ... Score Strand
## 14 chrY 1693161 ... 0 +
## 82 chrY 1357411 ... 0 +
## 98 chrY 59233166 ... 0 +
##
## [3 rows x 6 columns]
## ('chrY', '-')
## Chromosome Start ... Score Strand
## 5 chrY 1481624 ... 0 -
## 8 chrY 15526614 ... 0 -
## 9 chrY 15591393 ... 0 -
##
## [3 rows x 6 columns]
If you would like to iterate over the chromosomes in a stranded PyRange the idiom is
for c in exons.chromosomes:
print(c)
= exons[c].df
df print(df.head())
## chrX
## Chromosome Start ... Score Strand
## 0 chrX 135721701 ... 0 +
## 1 chrX 135574120 ... 0 +
## 2 chrX 47868945 ... 0 +
## 3 chrX 77294333 ... 0 +
## 4 chrX 91090459 ... 0 +
##
## [5 rows x 6 columns]
## chrY
## Chromosome Start ... Score Strand
## 0 chrY 1693161 ... 0 +
## 1 chrY 1357411 ... 0 +
## 2 chrY 59233166 ... 0 +
## 3 chrY 1693161 ... 0 +
## 4 chrY 1664276 ... 0 +
##
## [5 rows x 6 columns]
but notice that we need the .df accessor, because subsetting a PyRange always returns a PyRange.
There are three more ways to iterate over a pyrange, namely the keys, values and items methods. These return a list, not a custom object like the python dict.
If you want to create a new PyRange while iterating over it, the idiom is
= {}
d for k, df in exons:
print(k)
= df.head(3) d[k]
## ('chrX', '+')
## ('chrX', '-')
## ('chrY', '+')
## ('chrY', '-')
= pr.PyRanges(d)
new_gr print(new_gr)
## +--------------+-----------+-----------+-------+
## | Chromosome | Start | End | +3 |
## | (category) | (int32) | (int32) | ... |
## |--------------+-----------+-----------+-------|
## | chrX | 135721701 | 135721963 | ... |
## | chrX | 135574120 | 135574598 | ... |
## | chrX | 47868945 | 47869126 | ... |
## | chrX | 49069126 | 49069255 | ... |
## | ... | ... | ... | ... |
## | chrY | 59233166 | 59233257 | ... |
## | chrY | 1481624 | 1481747 | ... |
## | chrY | 15526614 | 15526673 | ... |
## | chrY | 15591393 | 15592550 | ... |
## +--------------+-----------+-----------+-------+
## Stranded PyRanges object has 12 rows and 6 columns from 2 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 3 hidden columns: Name, Score, Strand
Note that this is basically the same as using the apply method of the PyRange, only that the for loops are never multithreaded.
To iterate over multiple pyranges you can use pr.itergrs. It iterates over multiple pyranges at the same time, returning the dfs belonging to the same Chromosome or Chromosome/Strand at each iteration. Missing entries in one or more PyRanges returns empty dfs.
import pyranges as pr
= pr.random(25), pr.random(25), pr.random(25)
l for key, grs in pr.itergrs(l, keys=True, strand=True):
print(key)
print(grs)
## ('chr1', '+')
## [ Chromosome Start End Strand
## 14 chr1 104739630 104739730 +
## 16 chr1 55438624 55438724 +, Chromosome Start End Strand
## 0 chr1 100558151 100558251 +
## 1 chr1 195335888 195335988 +, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr1', '-')
## [ Chromosome Start End Strand
## 13 chr1 5131353 5131453 -
## 15 chr1 75116590 75116690 -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr2', '+')
## [ Chromosome Start End Strand
## 0 chr2 202033606 202033706 +, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr2', '-')
## [ Chromosome Start End Strand
## 1 chr2 174454128 174454228 -
## 2 chr2 16626066 16626166 -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Chromosome Start End Strand
## 21 chr2 62607507 62607607 -
## 22 chr2 219390171 219390271 -]
## ('chr3', '-')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Chromosome Start End Strand
## 5 chr3 87386407 87386507 -
## 6 chr3 187987458 187987558 -
## 7 chr3 146864114 146864214 -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr4', '+')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Chromosome Start End Strand
## 2 chr4 190832327 190832427 +
## 3 chr4 188883932 188884032 +
## 4 chr4 50994009 50994109 +, Chromosome Start End Strand
## 17 chr4 103121138 103121238 +
## 18 chr4 114911752 114911852 +]
## ('chr5', '+')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Chromosome Start End Strand
## 4 chr5 18398874 18398974 +]
## ('chr5', '-')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Chromosome Start End Strand
## 2 chr5 48014611 48014711 -
## 3 chr5 94626865 94626965 -
## 5 chr5 164094296 164094396 -]
## ('chr6', '+')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Chromosome Start End Strand
## 0 chr6 101060912 101061012 +
## 1 chr6 109689679 109689779 +]
## ('chr6', '-')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Chromosome Start End Strand
## 22 chr6 104472225 104472325 -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr7', '+')
## [ Chromosome Start End Strand
## 20 chr7 119742242 119742342 +, Chromosome Start End Strand
## 16 chr7 110037677 110037777 +
## 17 chr7 112954928 112955028 +, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr7', '-')
## [ Chromosome Start End Strand
## 19 chr7 133398754 133398854 -
## 21 chr7 103511334 103511434 -, Chromosome Start End Strand
## 15 chr7 13757065 13757165 -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr8', '+')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Chromosome Start End Strand
## 11 chr8 43684448 43684548 +
## 12 chr8 28415271 28415371 +
## 13 chr8 50793593 50793693 +, Chromosome Start End Strand
## 16 chr8 1162008 1162108 +]
## ('chr8', '-')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Chromosome Start End Strand
## 14 chr8 75607942 75608042 -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr9', '+')
## [ Chromosome Start End Strand
## 7 chr9 26749511 26749611 +, Chromosome Start End Strand
## 20 chr9 92959833 92959933 +, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr9', '-')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Chromosome Start End Strand
## 19 chr9 129987149 129987249 -, Chromosome Start End Strand
## 11 chr9 26541771 26541871 -
## 12 chr9 2135110 2135210 -]
## ('chr10', '-')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Chromosome Start End Strand
## 7 chr10 104128684 104128784 -]
## ('chr11', '-')
## [ Chromosome Start End Strand
## 17 chr11 13870268 13870368 -, Chromosome Start End Strand
## 18 chr11 122066118 122066218 -, Chromosome Start End Strand
## 15 chr11 24360357 24360457 -]
## ('chr12', '+')
## [ Chromosome Start End Strand
## 4 chr12 106097457 106097557 +
## 5 chr12 115615734 115615834 +, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Chromosome Start End Strand
## 14 chr12 3498178 3498278 +]
## ('chr12', '-')
## [ Chromosome Start End Strand
## 3 chr12 58581596 58581696 -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Chromosome Start End Strand
## 13 chr12 118797159 118797259 -]
## ('chr13', '-')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Chromosome Start End Strand
## 24 chr13 114169303 114169403 -]
## ('chr14', '-')
## [ Chromosome Start End Strand
## 23 chr14 67421808 67421908 -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr15', '-')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Chromosome Start End Strand
## 24 chr15 50966188 50966288 -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr16', '+')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Chromosome Start End Strand
## 19 chr16 83141043 83141143 +
## 20 chr16 18412326 18412426 +]
## ('chr17', '+')
## [ Chromosome Start End Strand
## 10 chr17 14139586 14139686 +
## 11 chr17 17092250 17092350 +, Chromosome Start End Strand
## 23 chr17 62052981 62053081 +, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr17', '-')
## [ Chromosome Start End Strand
## 12 chr17 72214538 72214638 -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr18', '-')
## [ Chromosome Start End Strand
## 24 chr18 19461673 19461773 -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr19', '+')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Chromosome Start End Strand
## 21 chr19 30223992 30224092 +, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr20', '-')
## [ Chromosome Start End Strand
## 6 chr20 8292068 8292168 -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr21', '+')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Chromosome Start End Strand
## 6 chr21 43796351 43796451 +]
## ('chr22', '+')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Chromosome Start End Strand
## 9 chr22 21379916 21380016 +]
## ('chr22', '-')
## [ Chromosome Start End Strand
## 8 chr22 6803231 6803331 -
## 9 chr22 33463510 33463610 -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Chromosome Start End Strand
## 8 chr22 37341649 37341749 -]
## ('chrX', '+')
## [ Chromosome Start End Strand
## 18 chrX 122711903 122712003 +, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chrX', '-')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Chromosome Start End Strand
## 8 chrX 87500660 87500760 -
## 9 chrX 33393365 33393465 -
## 10 chrX 102220432 102220532 -, Chromosome Start End Strand
## 10 chrX 128685145 128685245 -]
## ('chrY', '+')
## [ Chromosome Start End Strand
## 22 chrY 17676467 17676567 +, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Chromosome Start End Strand
## 23 chrY 2440250 2440350 +]