Re: Line wrap reconsidered

emacs-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Line wrap reconsidered

From:	Yuan Fu
Subject:	Re: Line wrap reconsidered
Date:	Thu, 28 May 2020 13:31:37 -0400

> On May 27, 2020, at 1:29 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Tue, 26 May 2020 18:29:04 -0400
>> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
>> emacs-devel@gnu.org
>> 
>> Here is the version that doesn’t use text properties.
> 
> Thanks, few comments below.

Thank you for reviewing.

> 
>> I assume by string you mean display properties? I checked with display 
>> property and it wraps fine in this version.
> 
> Display properties whose values are strings, and also before-string
> and after-string overlay properties.
> 
>> +static bool char_can_wrap_before (struct it *it)
>> +{
>> +  /* We used to only check for whitespace for wrapping, hence this
>> +     macro.  You cannot wrap before a whitespace.  */
>> +  return ((it->what == IT_CHARACTER
>> +           && !CHAR_HAS_CATEGORY(it->c, NOT_AT_BOL))
>> +          /* There used to be   */
>> +          && !IT_DISPLAYING_WHITESPACE (it));
>> +}
> 
> The order here is wrong: the IT_DISPLAYING_WHITESPACE should be tested
> first, as that is the more frequent situation, so it should be
> processed faster.
> 
Fixed.

Before answering your questions, this is my understanding of the word wrapping 
in redisplay:
On every iteration we check if current character allow wrapping after it, if 
so, we set may_wrap to true. That basically means the _previous char_ allows 
wrapping after it. While we are at the same iteration, we may want to wrapping 
point (wrap_it) if 1) the previous char allows wrapping after it (from 
may_wrap’s value set by _previous iteration_) and 2) the current character 
allows wrapping before it. When we found ourselves at the end of a line, we 
have two choices: continue to the next line (which I assume is what “continue” 
means in the comment), or instead go to a previously saved wrap point and break 
the line there. If there is no previous wrap point, we continue, if there is a 
previous wrap point but we actually can wrap at this point (previous char can 
wrap after & this char can wrap before), we just continue, since this position 
is a valid wrap position. Otherwise we go back to previous wrap point and wrap 
there (goto back_to_wrap;).

Although the original code only has one checker (IS_WHITESPACE), it serves a 
dual purpose: it is used to determine if we can wrap after—whitespace and tab 
allow wrapping after; it is also used to determine if we can wrap before—they 
don’t allow wrapping before (otherwise you see whitespace and tabs on the 
beginning of the next line).

Needless to say, I’m a newbie in Emacs C internals and redisplay, so my 
understanding from reading the original code and comments could be wrong. But 
the code seems to work right so I think the truth isn’t far away.

>> +/* Return true if the current character allows wrapping after it.   */
>> +static bool char_can_wrap_after (struct it *it)
>> +{
>> +  return ((it->what == IT_CHARACTER
>> +           && CHAR_HAS_CATEGORY (it->c, LINE_BREAKABLE)
>> +           && !CHAR_HAS_CATEGORY(it->c, NOT_AT_EOL))
>> +          /* We used to only check for whitespace for wrapping, hence
>> +             this macro.  Obviously you can wrap after a space.  */
>> +          || IT_DISPLAYING_WHITESPACE (it));
>> +}
> 
> Do we really need two separate functions?  And note that each one
> calls IT_DISPLAYING_WHITESPACE, so in some situations you will be
> testing that twice for the same character -- because you have 2
> separate functions.

IT_DISPLAYING_WHITESPACE would run twice, too: one time to check for warp after 
and one time for wrap before.

> 
>> -          if (IT_DISPLAYING_WHITESPACE (it))
>> -            may_wrap = true;
>> -          else if (may_wrap)
>> +              /* Can we wrap here? */
>> +          if (may_wrap && char_can_wrap_before(it))
>>              {
>>                /* We have reached a glyph that follows one or more
>> -                 whitespace characters.  If the position is
>> -                 already found, we are done.  */
>> +                 whitespace characters (or a character that allows
>> +                 wrapping after it).  If the position is already
>> +                 found, we are done.  */
> 
> The code calls char_can_wrap_before, but the comment says we can wrap
> after it.  Which one is right?

The comment says this char _follows_ a char that allows wrapping after, we 
still need to check if _this_ char allows wrapping before.

> 
>> +              /* This has to run after the previous block.  */
> 
> This kind of comment begs the question: "why?"  Please rewrite the
> comment to answer that question up front.

Fixed. Hopefully it’s clear now.

> 
>> -                          /* If we are at a whitespace character
>> -                             that barely fits on this screen line,
>> -                             but the next character is also
>> -                             whitespace, we cannot wrap here.  */
>> +                          /* If the previous character says we can
>> +                                 wrap after it, but the current
>> +                                 character says we can't wrap before
>> +                                 it, then we can't wrap here.  */
> 
> It sounds like your Emacs is set up to use only spaces for indentation
> in C source files, whereas our convention is to use tabs and spaces.
> 

I’m not sure what you mean, do you mean the indent style for the new code, or 
are you talking about the word wrapping?

>> +  DEFSYM (Qword_wrap, "word-wrap");
>> +  DEFSYM (Qonly_before, "only-before");
>> +  DEFSYM (Qonly_after, "only-after");
>> +  DEFSYM (Qno_wrap, "no-wrap");
> 
> These symbols are not used in the code.

Removed.

> 
> And finally, one more general comment/question: isn't your code assume
> implicitly that buffer text is always processed in the logical order,
> i.e. in the increasing order of buffer positions?  I mean, the fact
> that you have the "before" and the "after" function seems to imply
> that you do assume that, and the logic of processing the categories is
> relying on that, expecting that when you see a wrap_after character,
> you can wrap on the next character.  Is this so?  because if it is,
> then this will break when processing RTL text, since we may process it
> in the reverse order of buffer positions.  Please look into these
> situations and see that the code does TRT in them.

I tested with some Alaric text made by google translate, and the wrapping seems 
not take effect. IIUC bidi.c reorders the line to RTL and the redisplay 
iterator will still go through them LTR, is that true? I wonder how does the 
original wrapping works in that case. Does the old code handle bidi text?

Yuan
[Prev in Thread]
Current Thread
[Next in Thread]
Re: Line wrap reconsidered, (continued)
Prev by Date: Re: What is the most useful potential feature which Emacs lacks?
Next by Date: Re: [PATCH] Improve Malayalam language transliteration
Previous by thread: Re: Line wrap reconsidered
Next by thread: Re: Line wrap reconsidered
Index(es):
- Date
- Thread