30 Using multiple cores

Most PyRanges and PyRle-operations can be run in parallel. Even user-made functions can be run in parallel using the apply, apply_pair or apply_chunks methods.

import pyranges as pr
gr = pr.data.chipseq()
gr2 = pr.data.chipseq_background()
gr.intersect(gr3, nb_cpu=5)

PyRanges uses Ray, a “flexible, high-performance distributed execution framework” to run in parallel-mode. As Ray is a rather heavy dependency, it is not automatically installed with pyranges, but can easily be installed using conda or pip.

All pyranges-methods take a flag nb_cpu. It lets you run the method with nb_cpu cores. As it uses Ray behind the scenes, it will fail if Ray is already initialized. To use nb_cpu with pyrle methods, you need to use r.add(r2, nb_cpu=48), not r + r2.

Note: By default PyRanges uses no extra cores. Unless the data are reasonably big or the functions are very long-running, running in parallel-mode is actually more time-consuming than single-core mode. Also, if the PyRanges contains a lot of text data, there might be less to be gained by using multithreading. This is due to how strings are represented in memory in Python and Pandas.