bug-sed
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#39432: Long line issue in sed 4.8


From: Paul Fox
Subject: bug#39432: Long line issue in sed 4.8
Date: Wed, 5 Feb 2020 22:18:49 +0000

hello Assaf

thank you for taking the time to respond to this. I am wondering if it
"really is a bug". I appreciate the regexp package may have limitations. I
havent examined in detail how it "compiles" the regexp to byte code. Older
regexp patterns will limit themselves to typical "int" sizes, so that very
complex regexps cannot be of arbitrary complexity.

However, the issue here is the item to search is a "long line" (>2GB).
Whilst maybe regexp itself will have issues keeping track of any grouping
patterns and backtrack, it "should ideally just work".

I am more than familiar with the pains of 16/32/64 bit ness - so please
dont assume I am being naive. And happy to take your word for it as
maintainer.

However, one thing that is fairly disappointing in sed, is the unhelpful
INT_MAX panic. At least the message should say something like:

   * .... is larger than INT_MAX (%lu) <= insert the value

I had to disassemble the code, to pick out the 2^32-2 value that was being
used. And even better still "what" is exceeding the byte length. The code
says or implies the regexp is too complex, but its the search target which
is too long (ie "line is too long, sorry, we cant handle that just now.
Please try later!").

I was fairly amazed at the bug, having lived with gnu sed since probably
the 0.01 days. And its a shame to not be a little more intuitive to an end
user.

(I didnt hit the bug - someone at my org said "this didnt work", so I was
curious how/why/where).

many thanks and really appreciate your efforts on this.


On Wed, 5 Feb 2020 at 19:15, Assaf Gordon <address@hidden> wrote:

> tag 39432 notabug
> close 39432
> stop
>
> Hello,
>
> On 2020-02-05 1:11 a.m., Paul Fox wrote:
> > Seems there are bugs in sed handling long lines. You can reproduce by:
> >
> > 1. generate file > 2GB - must be a single line.
> > 2. sed -e s/xxx/yyy/ <file
> [...]
> > reg_exp: INT_MAX overflow
>
> This is indeed the intended behavior, as the regular-expression module
> can't handle strings larger than 2GB.
>
> This was reported in https://bugs.gnu.org/30520
> and the error was added in
>
> https://git.savannah.gnu.org/cgit/sed.git/commit/?id=5433dc245b222f6c98ab1436e170fd5e3e6e3907
>
> If in the future gnulib's regex module is improved
> to handle large buffers, we can revisit this issue and
> remove the message.
>
> As such I'm closing this as "not a bug", but discussion
> can continue by replying to this thread.
>
> regards,
>   - assaf
>
>
>
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]