13 Applying custom methods to pairs of PyRanges

By using the apply, apply_pair or apply_chunks methods, you can run custom methods on the dataframes in your PyRanges. The apply and apply_chunks methods takes a single dataframe, while apply_pair takes a pair of dataframes.

import pyranges as pr
chipseq = pr.data.chipseq()
chipseq_background = pr.data.chipseq_background()
def print_strands(df1, df2):
    print(df1.Chromosome.iloc[0], df1.Strand.iloc[0], df2.Strand.iloc[0])
    return df1.head(5)
result = chipseq.apply_pair(chipseq_background, print_strands, strandedness="opposite")
## chr1 + -
## chr1 - +
## chr2 + -
## chr2 - +
## chr3 + -
## chr3 - +
## chr4 + -
## chr4 - +
## chr5 + -
## chr5 - +
## chr6 + -
## chr6 - +
## chr7 + -
## chr7 - +
## chr8 + -
## chr8 - +
## chr9 + -
## chr9 - +
## chr10 + -
## chr10 - +
## chr11 + -
## chr11 - +
## chr12 + -
## chr12 - +
## chr13 + -
## chr13 - +
## chr14 + -
## chr14 - +
## chr15 + -
## chr15 - +
## chr16 + -
## chr16 - +
## chr17 + -
## chr17 - +
## chr18 + -
## chr18 - +
## chr19 + -
## chr19 - +
## chr20 + -
## chr20 - +
## chr21 + -
## chr21 - +
## chr22 + -
## chr22 - +
## chrX + -
## chrX - +
## chrY + -
## chrY - +
print(result)
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   | Start     | End       | Name       | Score     | Strand       |
## | (category)   | (int32)   | (int32)   | (object)   | (int64)   | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1         | 212609534 | 212609559 | U0         | 0         | +            |
## | chr1         | 169887529 | 169887554 | U0         | 0         | +            |
## | chr1         | 216711011 | 216711036 | U0         | 0         | +            |
## | chr1         | 144227079 | 144227104 | U0         | 0         | +            |
## | ...          | ...       | ...       | ...        | ...       | ...          |
## | chrY         | 21751211  | 21751236  | U0         | 0         | -            |
## | chrY         | 7701983   | 7702008   | U0         | 0         | -            |
## | chrY         | 21910706  | 21910731  | U0         | 0         | -            |
## | chrY         | 22054002  | 22054027  | U0         | 0         | -            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 240 rows and 6 columns from 24 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
def set_start_to_zero(df):
    df.Start = 0
    return df
print(chipseq.apply(set_start_to_zero))
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome   | Start     | End       | Name       | Score     | Strand       |
## | (category)   | (int64)   | (int32)   | (object)   | (int64)   | (category)   |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1         | 0         | 212609559 | U0         | 0         | +            |
## | chr1         | 0         | 169887554 | U0         | 0         | +            |
## | chr1         | 0         | 216711036 | U0         | 0         | +            |
## | chr1         | 0         | 144227104 | U0         | 0         | +            |
## | ...          | ...       | ...       | ...        | ...       | ...          |
## | chrY         | 0         | 15224260  | U0         | 0         | -            |
## | chrY         | 0         | 13517917  | U0         | 0         | -            |
## | chrY         | 0         | 8010976   | U0         | 0         | -            |
## | chrY         | 0         | 7405401   | U0         | 0         | -            |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 10,000 rows and 6 columns from 24 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.

Keyword arguments can be sent to the function:

def _print(df, **kwargs):
    print("My keyword arguments were:", kwargs.get("value"), "and", kwargs.get("whatever"))
    return df
chipseq.apply(_print, value=123, whatever="hi there!")
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!

They were printed 24 times since the function was applied once per chromosome.

apply_chunks splits each chromosome into nb_cpu chunks and runs each chunk in parallel, which is useful for slow row-based operations (like fisher exact, for example).