9 Iterating over PyRanges

PyRanges can be iterated over by a simple for loop, and the data is guaranteed to come back in natsorted order:

import pyranges as pr
import pandas as pd
exons = pr.data.exons()
cpg = pr.data.cpg()
for k, df in cpg:
    print(k)
    print(df.head(3))
## chrX
##   Chromosome   Start     End  CpG
## 0       chrX   64181   64793   62
## 1       chrX   69133   70029  100
## 2       chrX  148685  149461   85
## chrY
##     Chromosome  Start    End  CpG
## 896       chrY  14181  14793   62
## 897       chrY  19133  20029  100
## 898       chrY  98685  99461   85

If the data is unstranded, the key is just the chromosome names, but if the data is stranded the key is a tuple of the chromosome and strand:

for k, df in exons:
    print(k)
    print(df.head(3))
## ('chrX', '+')
##   Chromosome      Start  ...  Score Strand
## 0       chrX  135721701  ...      0      +
## 2       chrX  135574120  ...      0      +
## 3       chrX   47868945  ...      0      +
## 
## [3 rows x 6 columns]
## ('chrX', '-')
##   Chromosome      Start  ...  Score Strand
## 1       chrX   49069126  ...      0      -
## 4       chrX  154006958  ...      0      -
## 6       chrX   52257919  ...      0      -
## 
## [3 rows x 6 columns]
## ('chrY', '+')
##    Chromosome     Start  ...  Score Strand
## 14       chrY   1693161  ...      0      +
## 82       chrY   1357411  ...      0      +
## 98       chrY  59233166  ...      0      +
## 
## [3 rows x 6 columns]
## ('chrY', '-')
##   Chromosome     Start  ...  Score Strand
## 5       chrY   1481624  ...      0      -
## 8       chrY  15526614  ...      0      -
## 9       chrY  15591393  ...      0      -
## 
## [3 rows x 6 columns]

If you would like to iterate over the chromosomes in a stranded PyRange the idiom is

for c in exons.chromosomes:
    print(c)
    df = exons[c].df
    print(df.head())
## chrX
##   Chromosome      Start  ...  Score Strand
## 0       chrX  135721701  ...      0      +
## 1       chrX  135574120  ...      0      +
## 2       chrX   47868945  ...      0      +
## 3       chrX   77294333  ...      0      +
## 4       chrX   91090459  ...      0      +
## 
## [5 rows x 6 columns]
## chrY
##   Chromosome     Start  ...  Score Strand
## 0       chrY   1693161  ...      0      +
## 1       chrY   1357411  ...      0      +
## 2       chrY  59233166  ...      0      +
## 3       chrY   1693161  ...      0      +
## 4       chrY   1664276  ...      0      +
## 
## [5 rows x 6 columns]

but notice that we need the .df accessor, because subsetting a PyRange always returns a PyRange.

There are three more ways to iterate over a pyrange, namely the keys, values and items methods. These return a list, not a custom object like the python dict.

If you want to create a new PyRange while iterating over it, the idiom is

d = {}
for k, df in exons:
    print(k)
    d[k] = df.head(3)
## ('chrX', '+')
## ('chrX', '-')
## ('chrY', '+')
## ('chrY', '-')
new_gr = pr.PyRanges(d)
print(new_gr)
## +--------------+-----------+-----------+-------+
## | Chromosome   | Start     | End       | +3    |
## | (category)   | (int32)   | (int32)   | ...   |
## |--------------+-----------+-----------+-------|
## | chrX         | 135721701 | 135721963 | ...   |
## | chrX         | 135574120 | 135574598 | ...   |
## | chrX         | 47868945  | 47869126  | ...   |
## | chrX         | 49069126  | 49069255  | ...   |
## | ...          | ...       | ...       | ...   |
## | chrY         | 59233166  | 59233257  | ...   |
## | chrY         | 1481624   | 1481747   | ...   |
## | chrY         | 15526614  | 15526673  | ...   |
## | chrY         | 15591393  | 15592550  | ...   |
## +--------------+-----------+-----------+-------+
## Stranded PyRanges object has 12 rows and 6 columns from 2 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 3 hidden columns: Name, Score, Strand

Note that this is basically the same as using the apply method of the PyRange, only that the for loops are never multithreaded.

To iterate over multiple pyranges you can use pr.itergrs. It iterates over multiple pyranges at the same time, returning the dfs belonging to the same Chromosome or Chromosome/Strand at each iteration. Missing entries in one or more PyRanges returns empty dfs.

import pyranges as pr
l = pr.random(25), pr.random(25), pr.random(25)
for key, grs in pr.itergrs(l, keys=True, strand=True):
    print(key)
    print(grs)
## ('chr1', '+')
## [   Chromosome      Start        End Strand
## 14       chr1  104739630  104739730      +
## 16       chr1   55438624   55438724      +,   Chromosome      Start        End Strand
## 0       chr1  100558151  100558251      +
## 1       chr1  195335888  195335988      +, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr1', '-')
## [   Chromosome     Start       End Strand
## 13       chr1   5131353   5131453      -
## 15       chr1  75116590  75116690      -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr2', '+')
## [  Chromosome      Start        End Strand
## 0       chr2  202033606  202033706      +, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr2', '-')
## [  Chromosome      Start        End Strand
## 1       chr2  174454128  174454228      -
## 2       chr2   16626066   16626166      -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [],    Chromosome      Start        End Strand
## 21       chr2   62607507   62607607      -
## 22       chr2  219390171  219390271      -]
## ('chr3', '-')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [],   Chromosome      Start        End Strand
## 5       chr3   87386407   87386507      -
## 6       chr3  187987458  187987558      -
## 7       chr3  146864114  146864214      -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr4', '+')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [],   Chromosome      Start        End Strand
## 2       chr4  190832327  190832427      +
## 3       chr4  188883932  188884032      +
## 4       chr4   50994009   50994109      +,    Chromosome      Start        End Strand
## 17       chr4  103121138  103121238      +
## 18       chr4  114911752  114911852      +]
## ('chr5', '+')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [],   Chromosome     Start       End Strand
## 4       chr5  18398874  18398974      +]
## ('chr5', '-')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [],   Chromosome      Start        End Strand
## 2       chr5   48014611   48014711      -
## 3       chr5   94626865   94626965      -
## 5       chr5  164094296  164094396      -]
## ('chr6', '+')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [],   Chromosome      Start        End Strand
## 0       chr6  101060912  101061012      +
## 1       chr6  109689679  109689779      +]
## ('chr6', '-')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [],    Chromosome      Start        End Strand
## 22       chr6  104472225  104472325      -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr7', '+')
## [   Chromosome      Start        End Strand
## 20       chr7  119742242  119742342      +,    Chromosome      Start        End Strand
## 16       chr7  110037677  110037777      +
## 17       chr7  112954928  112955028      +, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr7', '-')
## [   Chromosome      Start        End Strand
## 19       chr7  133398754  133398854      -
## 21       chr7  103511334  103511434      -,    Chromosome     Start       End Strand
## 15       chr7  13757065  13757165      -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr8', '+')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [],    Chromosome     Start       End Strand
## 11       chr8  43684448  43684548      +
## 12       chr8  28415271  28415371      +
## 13       chr8  50793593  50793693      +,    Chromosome    Start      End Strand
## 16       chr8  1162008  1162108      +]
## ('chr8', '-')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [],    Chromosome     Start       End Strand
## 14       chr8  75607942  75608042      -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr9', '+')
## [  Chromosome     Start       End Strand
## 7       chr9  26749511  26749611      +,    Chromosome     Start       End Strand
## 20       chr9  92959833  92959933      +, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr9', '-')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [],    Chromosome      Start        End Strand
## 19       chr9  129987149  129987249      -,    Chromosome     Start       End Strand
## 11       chr9  26541771  26541871      -
## 12       chr9   2135110   2135210      -]
## ('chr10', '-')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [],   Chromosome      Start        End Strand
## 7      chr10  104128684  104128784      -]
## ('chr11', '-')
## [   Chromosome     Start       End Strand
## 17      chr11  13870268  13870368      -,    Chromosome      Start        End Strand
## 18      chr11  122066118  122066218      -,    Chromosome     Start       End Strand
## 15      chr11  24360357  24360457      -]
## ('chr12', '+')
## [  Chromosome      Start        End Strand
## 4      chr12  106097457  106097557      +
## 5      chr12  115615734  115615834      +, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [],    Chromosome    Start      End Strand
## 14      chr12  3498178  3498278      +]
## ('chr12', '-')
## [  Chromosome     Start       End Strand
## 3      chr12  58581596  58581696      -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [],    Chromosome      Start        End Strand
## 13      chr12  118797159  118797259      -]
## ('chr13', '-')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [],    Chromosome      Start        End Strand
## 24      chr13  114169303  114169403      -]
## ('chr14', '-')
## [   Chromosome     Start       End Strand
## 23      chr14  67421808  67421908      -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr15', '-')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [],    Chromosome     Start       End Strand
## 24      chr15  50966188  50966288      -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr16', '+')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [],    Chromosome     Start       End Strand
## 19      chr16  83141043  83141143      +
## 20      chr16  18412326  18412426      +]
## ('chr17', '+')
## [   Chromosome     Start       End Strand
## 10      chr17  14139586  14139686      +
## 11      chr17  17092250  17092350      +,    Chromosome     Start       End Strand
## 23      chr17  62052981  62053081      +, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr17', '-')
## [   Chromosome     Start       End Strand
## 12      chr17  72214538  72214638      -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr18', '-')
## [   Chromosome     Start       End Strand
## 24      chr18  19461673  19461773      -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr19', '+')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [],    Chromosome     Start       End Strand
## 21      chr19  30223992  30224092      +, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr20', '-')
## [  Chromosome    Start      End Strand
## 6      chr20  8292068  8292168      -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chr21', '+')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [],   Chromosome     Start       End Strand
## 6      chr21  43796351  43796451      +]
## ('chr22', '+')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [],   Chromosome     Start       End Strand
## 9      chr22  21379916  21380016      +]
## ('chr22', '-')
## [  Chromosome     Start       End Strand
## 8      chr22   6803231   6803331      -
## 9      chr22  33463510  33463610      -, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [],   Chromosome     Start       End Strand
## 8      chr22  37341649  37341749      -]
## ('chrX', '+')
## [   Chromosome      Start        End Strand
## 18       chrX  122711903  122712003      +, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [], Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: []]
## ('chrX', '-')
## [Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [],    Chromosome      Start        End Strand
## 8        chrX   87500660   87500760      -
## 9        chrX   33393365   33393465      -
## 10       chrX  102220432  102220532      -,    Chromosome      Start        End Strand
## 10       chrX  128685145  128685245      -]
## ('chrY', '+')
## [   Chromosome     Start       End Strand
## 22       chrY  17676467  17676567      +, Empty DataFrame
## Columns: [Chromosome, Start, End, Strand]
## Index: [],    Chromosome    Start      End Strand
## 23       chrY  2440250  2440350      +]