16 Joining Ranges
You can combine all the intervals that overlap in two PyRanges objects with the join method.
If you do not use a suffix, the default _b
is chosen.
import pyranges as pr
= pr.data.aorta()
gr = pr.data.aorta2()
gr2 print(gr.join(gr2, suffix="_2"))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome | Start | End | Name | Score | +6 |
## | (category) | (int32) | (int32) | (object) | (int64) | ... |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1 | 9939 | 10138 | H3K27me3 | 7 | ... |
## | chr1 | 9939 | 10138 | H3K27me3 | 7 | ... |
## | chr1 | 9939 | 10138 | H3K27me3 | 7 | ... |
## | chr1 | 9939 | 10138 | H3K27me3 | 7 | ... |
## | ... | ... | ... | ... | ... | ... |
## | chr1 | 10241 | 10440 | H3K27me3 | 6 | ... |
## | chr1 | 10241 | 10440 | H3K27me3 | 6 | ... |
## | chr1 | 10241 | 10440 | H3K27me3 | 6 | ... |
## | chr1 | 10241 | 10440 | H3K27me3 | 6 | ... |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 49 rows and 11 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 6 hidden columns: Strand, Start_2, End_2, Name_2, Score_2, Strand_2
Both methods also take a strandedness option, which can either be "same"
, "opposite"
or False
/None
print(gr.join(gr2, strandedness="opposite"))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome | Start | End | Name | Score | +6 |
## | (category) | (int32) | (int32) | (object) | (int64) | ... |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1 | 9939 | 10138 | H3K27me3 | 7 | ... |
## | chr1 | 9939 | 10138 | H3K27me3 | 7 | ... |
## | chr1 | 9939 | 10138 | H3K27me3 | 7 | ... |
## | chr1 | 9953 | 10152 | H3K27me3 | 5 | ... |
## | ... | ... | ... | ... | ... | ... |
## | chr1 | 10127 | 10326 | H3K27me3 | 1 | ... |
## | chr1 | 10127 | 10326 | H3K27me3 | 1 | ... |
## | chr1 | 10241 | 10440 | H3K27me3 | 6 | ... |
## | chr1 | 10241 | 10440 | H3K27me3 | 6 | ... |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 22 rows and 11 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 6 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b
The join method also takes a how argument, which currently accepts the option
"containment"
, "inner"
(default), "outer"
, "left"
and “right
”.
Containment requires that the intervals in self be completely within the
intervals in other. The others are similar to SQL-style inner, outer, left and
right joins.
= pr.data.f1()
f1 = pr.data.f2()
f2 print(f2.join(f1, how="containment"))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome | Start | End | Name | Score | +6 |
## | (category) | (int32) | (int32) | (object) | (int64) | ... |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1 | 6 | 7 | b | 0 | ... |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 1 rows and 11 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 6 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b
print(f1.join(f2, how="outer"))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome | Start | End | Name | Score | +6 |
## | (category) | (int64) | (int64) | (object) | (int64) | ... |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1 | 3 | 6 | interval1 | 0 | ... |
## | chr1 | 8 | 9 | interval3 | 0 | ... |
## | chr1 | -1 | -1 | -1 | -1 | ... |
## | chr1 | -1 | -1 | -1 | -1 | ... |
## | chr1 | 5 | 7 | interval2 | 0 | ... |
## | chr1 | -1 | -1 | -1 | -1 | ... |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 6 rows and 11 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 6 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b
##
## /usr/share/miniconda/lib/python3.9/site-packages/pyranges-0.0.115-py3.9.egg/pyranges/methods/join.py:87: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## scdf = scdf.append(sh)
## /usr/share/miniconda/lib/python3.9/site-packages/pyranges-0.0.115-py3.9.egg/pyranges/methods/join.py:88: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## ocdf = ocdf.append(oh)
## /usr/share/miniconda/lib/python3.9/site-packages/pyranges-0.0.115-py3.9.egg/pyranges/methods/join.py:87: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## scdf = scdf.append(sh)
## /usr/share/miniconda/lib/python3.9/site-packages/pyranges-0.0.115-py3.9.egg/pyranges/methods/join.py:88: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
## ocdf = ocdf.append(oh)
After joining, you can use the method new_position() to find the union or intersection of the joined ranges. By default it uses the two first columns containing “Start” and the two first containing “End”. Otherwise, the columns to be used can be given by the columns argument.
print(f2.join(f1).new_position("intersection"))
# same as:
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome | Start | End | Name | Score | +6 |
## | (category) | (int32) | (int32) | (object) | (int64) | ... |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1 | 6 | 7 | b | 0 | ... |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 1 rows and 11 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 6 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b
print(f2.join(f1).new_position("intersection", columns=["Start", "End", "Start_b", "End_b"]))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome | Start | End | Name | Score | +6 |
## | (category) | (int32) | (int32) | (object) | (int64) | ... |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1 | 6 | 7 | b | 0 | ... |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 1 rows and 11 columns from 1 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 6 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b
If you want to swap which columns are considered the Start and End, you can use the swap argument.
= pr.data.chipseq(), pr.data.chipseq_background()
gr1, gr2 = gr1.join(gr2)
j print(j)
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome | Start | End | Name | Score | +6 |
## | (category) | (int32) | (int32) | (object) | (int64) | ... |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1 | 226987592 | 226987617 | U0 | 0 | ... |
## | chr8 | 38747226 | 38747251 | U0 | 0 | ... |
## | chr15 | 26105515 | 26105540 | U0 | 0 | ... |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 3 rows and 11 columns from 3 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 6 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b
= j.new_position("intersection")
j2 print(j2)
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome | Start | End | Name | Score | +6 |
## | (category) | (int32) | (int32) | (object) | (int64) | ... |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1 | 226987603 | 226987617 | U0 | 0 | ... |
## | chr8 | 38747236 | 38747251 | U0 | 0 | ... |
## | chr15 | 26105515 | 26105518 | U0 | 0 | ... |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 3 rows and 11 columns from 3 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 6 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b
print(j2.new_position("swap"))
## +--------------+-----------+-----------+------------+-----------+-------+
## | Chromosome | Start | End | Name | Score | +6 |
## | (category) | (int32) | (int32) | (object) | (int64) | ... |
## |--------------+-----------+-----------+------------+-----------+-------|
## | chr1 | 226987603 | 226987628 | U0 | 0 | ... |
## | chr8 | 38747236 | 38747261 | U0 | 0 | ... |
## | chr15 | 26105493 | 26105518 | U0 | 0 | ... |
## +--------------+-----------+-----------+------------+-----------+-------+
## Stranded PyRanges object has 3 rows and 11 columns from 3 chromosomes.
## For printing, the PyRanges was sorted on Chromosome and Strand.
## 6 hidden columns: Strand, Start_b, End_b, Name_b, Score_b, Strand_b