13 Applying custom methods to pairs of PyRanges
By using the apply
, apply_pair
or apply_chunks
methods, you can run custom
methods on the dataframes in your PyRanges. The apply
and apply_chunks
methods
takes a single dataframe, while apply_pair
takes a pair of dataframes.
import pyranges as pr
= pr.data.chipseq()
chipseq = pr.data.chipseq_background()
chipseq_background def print_strands(df1, df2):
print(df1.Chromosome.iloc[0], df1.Strand.iloc[0], df2.Strand.iloc[0])
return df1.head(5)
= chipseq.apply_pair(chipseq_background, print_strands, strandedness="opposite") result
## chr1 + -
## chr1 - +
## chr2 + -
## chr2 - +
## chr3 + -
## chr3 - +
## chr4 + -
## chr4 - +
## chr5 + -
## chr5 - +
## chr6 + -
## chr6 - +
## chr7 + -
## chr7 - +
## chr8 + -
## chr8 - +
## chr9 + -
## chr9 - +
## chr10 + -
## chr10 - +
## chr11 + -
## chr11 - +
## chr12 + -
## chr12 - +
## chr13 + -
## chr13 - +
## chr14 + -
## chr14 - +
## chr15 + -
## chr15 - +
## chr16 + -
## chr16 - +
## chr17 + -
## chr17 - +
## chr18 + -
## chr18 - +
## chr19 + -
## chr19 - +
## chr20 + -
## chr20 - +
## chr21 + -
## chr21 - +
## chr22 + -
## chr22 - +
## chrX + -
## chrX - +
## chrY + -
## chrY - +
print(result)
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int32) | (int32) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1 | 212609534 | 212609559 | U0 | 0 | + |
## | chr1 | 169887529 | 169887554 | U0 | 0 | + |
## | chr1 | 216711011 | 216711036 | U0 | 0 | + |
## | chr1 | 144227079 | 144227104 | U0 | 0 | + |
## | ... | ... | ... | ... | ... | ... |
## | chrY | 21751211 | 21751236 | U0 | 0 | - |
## | chrY | 7701983 | 7702008 | U0 | 0 | - |
## | chrY | 21910706 | 21910731 | U0 | 0 | - |
## | chrY | 22054002 | 22054027 | U0 | 0 | - |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 240 rows and 6 columns from 24 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
def set_start_to_zero(df):
= 0
df.Start return df
print(chipseq.apply(set_start_to_zero))
## +--------------+-----------+-----------+------------+-----------+--------------+
## | Chromosome | Start | End | Name | Score | Strand |
## | (category) | (int64) | (int32) | (object) | (int64) | (category) |
## |--------------+-----------+-----------+------------+-----------+--------------|
## | chr1 | 0 | 212609559 | U0 | 0 | + |
## | chr1 | 0 | 169887554 | U0 | 0 | + |
## | chr1 | 0 | 216711036 | U0 | 0 | + |
## | chr1 | 0 | 144227104 | U0 | 0 | + |
## | ... | ... | ... | ... | ... | ... |
## | chrY | 0 | 15224260 | U0 | 0 | - |
## | chrY | 0 | 13517917 | U0 | 0 | - |
## | chrY | 0 | 8010976 | U0 | 0 | - |
## | chrY | 0 | 7405401 | U0 | 0 | - |
## +--------------+-----------+-----------+------------+-----------+--------------+
## Stranded PyRanges object has 10,000 rows and 6 columns from 24 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
Keyword arguments can be sent to the function:
def _print(df, **kwargs):
print("My keyword arguments were:", kwargs.get("value"), "and", kwargs.get("whatever"))
return df
apply(_print, value=123, whatever="hi there!") chipseq.
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
## My keyword arguments were: 123 and hi there!
They were printed 24 times since the function was applied once per chromosome.
apply_chunks splits each chromosome into nb_cpu
chunks and runs each chunk in
parallel, which is useful for slow row-based operations (like fisher exact, for
example).