parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Parallel with sed group capture


From: Carlos Pérez Cantalapiedra
Subject: Parallel with sed group capture
Date: Wed, 8 May 2013 13:01:47 +0300

Hello everyone,

I am new to this list and to the parallel command. I hope answer to next question is not too obvious, but enough to get some advice :)

I have to process a big file, and have been reading about parallel command to try to use more than 1 core processor when using sed, sort and so on. So I first wanted to change first line of every four (because of naming conventions of this kind of file - FastQ format).

For example, this would be a group of four, and I want to modify the first line:

    cat sbcc073_pcm_ill_all.musket_default.fastq | head -4
    
    @HWUSI-EAS1752R:29:FC64CL3AAXX:8:65:16525:4289_1:N:0:ACTTGA
    GCGAGAGAAT
    +
    GHHHHHHHHHH

With the next command I have the work done:

    cat sbcc073_pcm_ill_all.musket_default.fastq | head -4 | sed 's#^\(@.*\)_\([12]\).*#\1/\2#'
    
    @HWUSI-EAS1752R:29:FC64CL3AAXX:8:65:16525:4289/1
    GCGAGAGAAT
    +
    GHHHHHHHHHH

However, when using parallel it seems that is not recognizing the group capture brackets:

    cat sbcc073_pcm_ill_all.musket_default.fastq | head -4 | parallel --pipe sed 's#^\(@.*\)_\([12]\).*#\1/\2#'
    
    @HWUSI-EAS1752R:29:FC64CL3AAXX:8:65:16525:4289_1:N:0:ACTTGA
    GCGAGAGAAT
    +
    GHHHHHHHHHH

When removing backslashes or using sed -r the command is telling me:

    /bin/bash: -c: line 3: syntax error near unexpected token `('
    /bin/bash: -c: line 3: `             (cat /tmp/60xrxvCIRX.chr; rm /tmp/60xrxvCIRX.chr; cat - ) | (sed s#^(@.*)_([12]).*#\1/\2# );'

Could anyone put some light on this?

thank you very much

reply via email to

[Prev in Thread] Current Thread [Next in Thread]