emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Line wrap reconsidered


From: Yuan Fu
Subject: Re: Line wrap reconsidered
Date: Thu, 28 May 2020 15:34:28 -0400


> On May 28, 2020, at 2:05 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Thu, 28 May 2020 13:31:37 -0400
>> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>> emacs-devel@gnu.org
>> 
>> Before answering your questions, this is my understanding of the word 
>> wrapping in redisplay:
>> On every iteration we check if current character allow wrapping after it, if 
>> so, we set may_wrap to true.
> 
> Yes.
> 
>> That basically means the _previous char_ allows wrapping after it.
> 
> No, we never wrap at the character where we set may_wrap = true, we
> wrap _after_ it.  In the current code, may_wrap is set when we see a
> SPC character, and when we wrap, that SPC is left on the line before
> the wrap.

I see, I must have been misinterpreted some part of the logic. 

> 
>> While we are at the same iteration, we may want to wrapping point (wrap_it) 
>> if 1) the previous char allows wrapping after it (from may_wrap’s value set 
>> by _previous iteration_) and 2) the current character allows wrapping before 
>> it.
> 
> Yes.  But I don't understand what you mean by "at the same iteration"
> here: if may_wrap was set, then we will not try to save the wrap point
> until we process the next character.  So I cannot call this "the same
> iteration", it's rather "the next iteration".
> 

In the current code IT_DISPLAYING_WHITESPACE can check for can_wrap_before and 
can_wrap_after in the same time, but in my new code, we have to perform two 
checks in the same iteration, because some char can wrap before but now after, 
and some can wrap after but not before, etc. 


>> When we found ourselves at the end of a line
> 
> You mean, when we reach the edge of the window, right?

Yes.

> 
>> we have two choices: continue to the next line (which I assume is what 
>> “continue” means in the comment), or instead go to a previously saved wrap 
>> point and break the line there.
> 
> Which "continue" are you alluding to here?  Do you mean "lines are
> continued"?  because if lines are not being wrapped, we have only one
> choice: go back to the last wrap point we found, end the screen line
> (a.k.a. "break the line") there, and continue with the text on the
> next screen line.

I think there is another option where we wrap at current point. On line 23356:

                              /* If line-wrap is on, check if a
                                 previous wrap point was found.  */
                              else if (wrap_row_used > 0
                                       /* Even if there is a previous wrap
                                          point, continue the line here as
                                          usual, if (i) the previous character
                                          was a space or tab AND (ii) the
                                          current character is not.  */
                                       && (!may_wrap
                                           || IT_DISPLAYING_WHITESPACE (it)))
                                goto back_to_wrap;

I was alluding to this “continue”: “continue the line here as usual”. What does 
it mean? Does it mean we insert a line break here? I think your response below 
confirms my guess.

> 
>> If there is no previous wrap point, we continue, if there is a previous wrap 
>> point but we actually can wrap at this point (previous char can wrap after & 
>> this char can wrap before), we just continue, since this position is a valid 
>> wrap position. Otherwise we go back to previous wrap point and wrap there 
>> (goto back_to_wrap;).
> 
> You lost me here.  The logic is actually: if there is a wrap point, go
> back to it and break the line there; if there's no wrap point, break
> the line where we are now (i.e. at the last character that still fits
> inside the window). 

I mean I think the logic is that, even if there is a previous wrap point, we 
still break here if here is also a valid wrap point. I got this impression from 
reading the comment mentioned above. If this position is not a valid wrap 
point, go back to previous wrap point and wrap here. If there is no previous 
wrap point, we break here.

> 
>> Although the original code only has one checker (IS_WHITESPACE), it serves a 
>> dual purpose: it is used to determine if we can wrap after—whitespace and 
>> tab allow wrapping after; it is also used to determine if we can wrap 
>> before—they don’t allow wrapping before (otherwise you see whitespace and 
>> tabs on the beginning of the next line).
> 
> That is true.  I suggested to have a single function so that you could
> in that single function perform the same test, just using two
> different categories.  then you could basically keep the rest of the
> logic intact.  If that is somehow not possible, can you explain why?

IT_DISPLAYING_WHITESPACE works because it was only check for two types of 
characters. Now we have four. A Boolean function can’t return more than two 
possibilities. Two Boolean functions combined can express four possibilities.

| Type      | wrap_before? | wrap_after? |
|-----------+--------------+-------------|
| space/tab | no           | yes         |
| other     | yes          | no          |

| Type      | wrap_before? | wrap_after? |
|-----------+--------------+-------------|
| space/tab | no           | yes         |
| CJK       | yes          | yes         |
| other     | yes          | no          |
| ??        | no           | no          |


> 
>> Needless to say, I’m a newbie in Emacs C internals and redisplay, so my 
>> understanding from reading the original code and comments could be wrong. 
>> But the code seems to work right so I think the truth isn’t far away.
> 
> The truth isn't far away, but we need to go all the way so that the
> result doesn't break some use cases (of which there are a gazillion in
> the display code).
> 

Indeed, I’ve heard a lot of urban stories about the redisplay code of Emacs ;-)


>>> Do we really need two separate functions?  And note that each one
>>> calls IT_DISPLAYING_WHITESPACE, so in some situations you will be
>>> testing that twice for the same character -- because you have 2
>>> separate functions.
>> 
>> IT_DISPLAYING_WHITESPACE would run twice, too: one time to check for warp 
>> after and one time for wrap before.
> 
> My point was to bring the two tests together so that it could be just
> one test.  Is that possible?  If not, why not?

(See above)

> 
>>>> +        if (may_wrap && char_can_wrap_before(it))
>>>>            {
>>>>              /* We have reached a glyph that follows one or more
>>>> -               whitespace characters.  If the position is
>>>> -               already found, we are done.  */
>>>> +               whitespace characters (or a character that allows
>>>> +               wrapping after it).  If the position is already
>>>> +               found, we are done.  */
>>> 
>>> The code calls char_can_wrap_before, but the comment says we can wrap
>>> after it.  Which one is right?
>> 
>> The comment says this char _follows_ a char that allows wrapping after, we 
>> still need to check if _this_ char allows wrapping before.
> 
> But the comment is after the char_can_wrap_before test, so we have
> already tested that, no?

We have tested previous char for can_wrap_after, which is represented as 
may_wrap == true. We still need to check this char for can_wrap_before. For a 
position to be a valid wrap point, the char before must allow wrapping after 
and the char after must allow wrapping before. 

Say we have A|B, we can break line at the bar only if A allows wrapping after 
and B allows wrapping before.

> 
>>>> -                        /* If we are at a whitespace character
>>>> -                           that barely fits on this screen line,
>>>> -                           but the next character is also
>>>> -                           whitespace, we cannot wrap here.  */
>>>> +                        /* If the previous character says we can
>>>> +                                 wrap after it, but the current
>>>> +                                 character says we can't wrap before
>>>> +                                 it, then we can't wrap here.  */
>>> 
>>> It sounds like your Emacs is set up to use only spaces for indentation
>>> in C source files, whereas our convention is to use tabs and spaces.
>> 
>> I’m not sure what you mean, do you mean the indent style for the new code, 
>> or are you talking about the word wrapping?
> 
> I mean the indentation: the original code used TABs and spaces, but
> your code uses only spaces.

How do I fix it? Any guideline file?

> 
>>> And finally, one more general comment/question: isn't your code assume
>>> implicitly that buffer text is always processed in the logical order,
>>> i.e. in the increasing order of buffer positions?  I mean, the fact
>>> that you have the "before" and the "after" function seems to imply
>>> that you do assume that, and the logic of processing the categories is
>>> relying on that, expecting that when you see a wrap_after character,
>>> you can wrap on the next character.  Is this so?  because if it is,
>>> then this will break when processing RTL text, since we may process it
>>> in the reverse order of buffer positions.  Please look into these
>>> situations and see that the code does TRT in them.
>> 
>> I tested with some Alaric text made by google translate, and the wrapping 
>> seems not take effect. IIUC bidi.c reorders the line to RTL
> 
> Yes.
> 
>> and the redisplay iterator will still go through them LTR
> 
> Not sure what you mean by "go through them LTR".  The iterator can
> move forward or backward, or even jump to a far-away place.  You
> cannot assume that the next character examined will be the next
> character in the buffer.

Essentially I want to ask “if may_wrap == true, what does that mean? (In bidi 
context)” Did the character to the left of me (on glass) set it or the 
character to the right of me set it?

> 
>> I wonder how does the original wrapping works in that case. Does the old 
>> code handle bidi text?
> 
> Of course it does, you can easily see that if you run the unmodified
> Emacs.  It would be a terrible misfeature if we didn't handle wrappng
> correctly in bidirectional scripts.

Actually, I just tried again and the code works for bidi. Maybe last time I 
didn’t turned on word-wrap while thinking I did.

Yuan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]