bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#51792: coreutils - csplit - feature request


From: Pádraig Brady
Subject: bug#51792: coreutils - csplit - feature request
Date: Fri, 12 Nov 2021 18:23:37 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:95.0) Gecko/20100101 Thunderbird/95.0

On 12/11/2021 17:05, Rodolfo Aramayo wrote:
Dear Coreutils Maintainers,

First, thank you for your work. I use coreutils daily both for my research
and teaching. It is a great set of tools.

Second, I recently needed to extract Coding Sequences information from a
GenBank file. GenBank files are used in Computational
Genomics/Bioinformatics extensively. I used csplit, and it works like a
charm.

The command I used is:

csplit -sz -n 5 --prefix=02_ 01_00001
/[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]CDS[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]/
{*};

I was unable to declare: "[[:space:]]\+" as I expected for POSIX aware code.

My question is: Is csplit POSIX compatible? and if it is not, can we make
it POSIX compatible?


Well POSIX defines BRE and ERE, with csplit supporting the former.
From the code we have:

  re_syntax_options =
    RE_SYNTAX_POSIX_BASIC & ~RE_CONTEXT_INVALID_DUP & ~RE_NO_EMPTY_RANGES;

Generally one can replace '+' functionality from ERE, with '\{1,\}' in BRE.
So you'd be using something like:

  [[:space:]]\{1,\}CDS[[:space:]]\{1,\}

We might add an option to use ERE, though there isn't a big need
for that I think for csplit use cases.

cheers,
Pádraig





reply via email to

[Prev in Thread] Current Thread [Next in Thread]