[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questionable code in handling of wordend in the regexp engine in reg

From: Stefan Monnier
Subject: Re: Questionable code in handling of wordend in the regexp engine in regex-emacs.c
Date: Fri, 01 Mar 2019 08:41:31 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux)

> down", other times it "rounds it up" to a character position.  I think
> it should be defined as rounding it down.  It would be a relatively
> simple correction (at least, technically ;-).

When moving forward, rounding it up is more natural ;-)

> But I'm still a little worried about buf_bytepos_to_charpos.  Perhaps it
> should state that the result is undefined when the bytepos is "invalid".

Yes, I think it's the intention.  Even better would be to signal an
error (when built with --enable-checking).

> For that matter, how many charpos <-> bytepos functions are there in
> Emacs?  Just this one?

I think so, yes.

>> Worse, in notwordbound we do:
>>              ptrdiff_t offset = PTR_TO_OFFSET (d - 1);
>>              ptrdiff_t charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset);
>>              UPDATE_SYNTAX_TABLE (charpos);
>> which seems even more broken because `d` might point to the first byte
>> after the gap, so `d - 1` will point in the middle of the gap, so it's
>> simply an invalid argument to PTR_TO_OFFSET.
> I don't think this is right.  Both `d' and `offset' are byte
> measurements, not character measurements, so it shouldn't matter whether
> the "- 1" is inside or outside the parens.  However, it would be less
> confusing if they were both (?all) the same.

The difference between `d` and `offset` is just an offset, indeed, but
it can be 2 different offsets depending on whether `d` is before or
after the gap, so what happens when `d` is within the gap depends on how
the test for "before/after the gap" is implemented.

More specifically, when `d` is N bytes before the end of the gap, the
code could consider it as being N bytes before the beginning of the
second part, or being "gap-size - N" bytes after the end of the
first part.

>> According to the definition of PTR_TO_OFFSET and POINTER_TO_OFFSET,
>> the result may be the same as if we did the decrement after the fact,
>> but it still looks fishy.  WDYT?
> I think it is suboptimal to have both PTR_TO_OFFSET and
> POINTER_TO_OFFSET meaning different things in the same source file.  ;-)

I'm so glad you're volunteering to clean this up.
Thank you, really.

> There are eight occurrences of SYNTAX_TABLE_BYTE_TO_CHAR in
> regex-emacs.c.  I think I will check them all, amending them as in your
> patch.
> What do you say?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]