[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: feature suggestion: --preserve-blocking-factor
From: |
Ole Tange |
Subject: |
Re: feature suggestion: --preserve-blocking-factor |
Date: |
Sat, 18 Feb 2017 22:36:25 +0100 |
On Sat, Feb 18, 2017 at 5:39 AM, Cook, Malcolm <MEC@stowers.org> wrote:
> I don't think my needs were clear.
Your needs were clear and I am really surprised that you did not
understand the solution I proposed.
> I know you are bioinformatics savvy and are familiar with bedtools, so let me
> cast my example in terms of bedtools.
>
> I have a huge sorted bedfile, my.bed, that I want to pipe into bedtools merge
> (http://bedtools.readthedocs.io/en/latest/content/tools/merge.html)
>
> As required, it is sorted already.
>
> I could
>
> cat my.bed | parallel -j10 --pipe --block 50M bedtools merge
>
> but the blocks that my.bed get broken by parallel into might not keep
> together the chromosomes, but this is required for the merge to be correct.
>
> So I am looking for a means to instruct parallel that some ranges of records
> must stay together within a block.
Yup. You want each chromosome to be treated as a record. So what you
do is to insert a record separator before each chromosome and tell GNU
Parallel to use that as record separator. Column 0 is the chromosome,
so when that changes we insert '\0' which will never be in a normal
bedfile. Then we ask GNU Parallel to split records on \0 and remove
the \0 before passing it to bedtools.
cat my.bed | perl -ape '$F[0] ne $old and print "\0"; $old = $F[0]' |
parallel --recend '\0' --rrs --pipe --block 50M -j10 bedtools merge
The only thing I have changed from my previous email is:
example -> my.bed
$F[1] -> $F[0]
--block 200 -> --block 50M
wc -> bedtools merge
and added -j10.
I have the feeling you are now saying *DOH*.
/Ole