16 Joining Ranges

You can combine all the intervals that overlap in two PyRanges objects with the join method. If you do not use a suffix, the default _b is chosen.

import pyranges as pr
gr = pr.data.aorta()
gr2 = pr.data.aorta2()
print(gr.join(gr2, suffix="_2"))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome   | Start     | End       | Name       | Score     | +6    |
## | (category)   | (int32)   | (int32)   | (object)   | (int64)   | ...   |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1         | 9939      | 10138     | H3K27me3   | 7         | ...   |
## | chr1         | 9939      | 10138     | H3K27me3   | 7         | ...   |
## | chr1         | 9939      | 10138     | H3K27me3   | 7         | ...   |
## | chr1         | 9939      | 10138     | H3K27me3   | 7         | ...   |
## | ...          | ...       | ...       | ...        | ...       | ...   |
## | chr1         | 10241     | 10440     | H3K27me3   | 6         | ...   |
## | chr1         | 10241     | 10440     | H3K27me3   | 6         | ...   |
## | chr1         | 10241     | 10440     | H3K27me3   | 6         | ...   |
## | chr1         | 10241     | 10440     | H3K27me3   | 6         | ...   |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 49 rows and 11 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 6 hidden columns: Strand, Start_2, End_2, Name_2, Score_2, Strand_2

Both methods also take a strandedness option, which can either be "same", "opposite" or False/None

print(gr.join(gr2, strandedness="opposite"))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome   | Start     | End       | Name       | Score     | +6    |
## | (category)   | (int32)   | (int32)   | (object)   | (int64)   | ...   |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1         | 9939      | 10138     | H3K27me3   | 7         | ...   |
## | chr1         | 9939      | 10138     | H3K27me3   | 7         | ...   |
## | chr1         | 9939      | 10138     | H3K27me3   | 7         | ...   |
## | chr1         | 9953      | 10152     | H3K27me3   | 5         | ...   |
## | ...          | ...       | ...       | ...        | ...       | ...   |
## | chr1         | 10127     | 10326     | H3K27me3   | 1         | ...   |
## | chr1         | 10127     | 10326     | H3K27me3   | 1         | ...   |
## | chr1         | 10241     | 10440     | H3K27me3   | 6         | ...   |
## | chr1         | 10241     | 10440     | H3K27me3   | 6         | ...   |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 22 rows and 11 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 6 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b

The join method also takes a how argument, which currently accepts the option "containment", "inner" (default), "outer", "left" and “right”. Containment requires that the intervals in self be completely within the intervals in other. The others are similar to SQL-style inner, outer, left and right joins.

f1 = pr.data.f1()
f2 = pr.data.f2()
print(f2.join(f1, how="containment"))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome   |     Start |       End | Name       |     Score | +6    |
## | (category)   |   (int32) |   (int32) | (object)   |   (int64) | ...   |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1         |         6 |         7 | b          |         0 | ...   |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 1 rows and 11 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 6 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b
print(f1.join(f2, how="outer"))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome   |     Start |       End | Name       |     Score | +6    |
## | (category)   |   (int64) |   (int64) | (object)   |   (int64) | ...   |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1         |         3 |         6 | interval1  |         0 | ...   |
## | chr1         |         8 |         9 | interval3  |         0 | ...   |
## | chr1         |        -1 |        -1 | -1         |        -1 | ...   |
## | chr1         |        -1 |        -1 | -1         |        -1 | ...   |
## | chr1         |         5 |         7 | interval2  |         0 | ...   |
## | chr1         |        -1 |        -1 | -1         |        -1 | ...   |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 6 rows and 11 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 6 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b
## 
## /usr/share/miniconda/lib/python3.9/site-packages/pyranges-0.0.115-py3.9.egg/pyranges/methods/join.py:87: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   scdf = scdf.append(sh)
## /usr/share/miniconda/lib/python3.9/site-packages/pyranges-0.0.115-py3.9.egg/pyranges/methods/join.py:88: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   ocdf = ocdf.append(oh)
## /usr/share/miniconda/lib/python3.9/site-packages/pyranges-0.0.115-py3.9.egg/pyranges/methods/join.py:87: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   scdf = scdf.append(sh)
## /usr/share/miniconda/lib/python3.9/site-packages/pyranges-0.0.115-py3.9.egg/pyranges/methods/join.py:88: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
##   ocdf = ocdf.append(oh)

After joining, you can use the method new_position() to find the union or intersection of the joined ranges. By default it uses the two first columns containing “Start” and the two first containing “End”. Otherwise, the columns to be used can be given by the columns argument.

print(f2.join(f1).new_position("intersection"))
# same as:
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome   |     Start |       End | Name       |     Score | +6    |
## | (category)   |   (int32) |   (int32) | (object)   |   (int64) | ...   |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1         |         6 |         7 | b          |         0 | ...   |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 1 rows and 11 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 6 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b
print(f2.join(f1).new_position("intersection", columns=["Start", "End", "Start_b", "End_b"]))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome   |     Start |       End | Name       |     Score | +6    |
## | (category)   |   (int32) |   (int32) | (object)   |   (int64) | ...   |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1         |         6 |         7 | b          |         0 | ...   |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 1 rows and 11 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 6 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b

If you want to swap which columns are considered the Start and End, you can use the swap argument.

gr1, gr2 = pr.data.chipseq(), pr.data.chipseq_background()
j = gr1.join(gr2)
print(j)
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome   |     Start |       End | Name       |     Score | +6    |
## | (category)   |   (int32) |   (int32) | (object)   |   (int64) | ...   |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1         | 226987592 | 226987617 | U0         |         0 | ...   |
## | chr8         |  38747226 |  38747251 | U0         |         0 | ...   |
## | chr15        |  26105515 |  26105540 | U0         |         0 | ...   |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 3 rows and 11 columns from 3 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 6 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b
j2 = j.new_position("intersection")
print(j2)
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome   |     Start |       End | Name       |     Score | +6    |
## | (category)   |   (int32) |   (int32) | (object)   |   (int64) | ...   |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1         | 226987603 | 226987617 | U0         |         0 | ...   |
## | chr8         |  38747236 |  38747251 | U0         |         0 | ...   |
## | chr15        |  26105515 |  26105518 | U0         |         0 | ...   |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 3 rows and 11 columns from 3 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 6 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b
print(j2.new_position("swap"))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome   |     Start |       End | Name       |     Score | +6    |
## | (category)   |   (int32) |   (int32) | (object)   |   (int64) | ...   |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1         | 226987603 | 226987628 | U0         |         0 | ...   |
## | chr8         |  38747236 |  38747261 | U0         |         0 | ...   |
## | chr15        |  26105493 |  26105518 | U0         |         0 | ...   |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 3 rows and 11 columns from 3 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 6 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b