[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [emacs-bidi] Mixed L2R and R2L paragraphs and horizontal scroll

From: Davis Herring
Subject: Re: [emacs-bidi] Mixed L2R and R2L paragraphs and horizontal scroll
Date: Wed, 3 Feb 2010 13:02:28 -0800 (PST)
User-agent: SquirrelMail/1.4.8-5.7.lanl7

> What you describe here is the Emacs screen as rectangle frame moving
> over the visual ordered text. This is technically sound but very wrong
> from a user view point. The reason is that Hebrew reader see the
> "continuation" lines in reverse order, and has to read this from the
> last (continuation) line (the most down one) upward.

The difference between a L2R line and an R2L line is not in the visual
order of characters in them (that's determined by the directionality of
the text itself, of course), but in the layout of pieces of text.  The
question is, what is a "piece of text"?  Certainly the (logical) boundary
between L2R and R2L text is also the boundary of such pieces.  Consider
this rendering of a L2R line:

 |the ordinals TSRIF and DNOCES |

The two R2L words appear to be in the wrong order if you read only them or
if you interpret the line as R2L, in which case it's the rendering of the
logical line "SECOND and FIRST the ordinals".

Suppose we narrow the window:

 |the ordinals ???\|
 |?? and DNOCES    |

If we interpret the ??? and ?? as two different pieces of R2L text, we
have split the logical string "FIRST" into "FIR" and "ST", and so we

 |the ordinals RIF\|
 |TS and DNOCES    |

with the two pieces presented in their logical order (the piece that
occurs earlier in the buffer is on an earlier screen line).  If, however,
we consider the ????? as one piece, presented in two places merely as a
rearrangement of glyphs on the screen, we render

 |the ordinals TSR\|
 |IF and DNOCES    |

Neither of these is perfect: in the first case, in addition to the line
break there are also two breaks (logically in the middle of "RS") that
disrupt the flow of the text more than the usual break between L2R and
R2L.  In the second case, as you say, the continuation runs bottom-to-top,
which is undesirable.

However, both of them are usable.  In the first case, the reader applies
this algorithm upon encountering a R2L character after a L2R stretch:

 1. Scan rightward until an L2R character or the right end of the screen
line; remember which was encountered.
 2. Read R2L from here to the starting position.
 3. If the end of the screen line was encountered, seek to the left end of
the next screen line and note it as the new starting position; go to 1.

In the second case, the algorithm is

 1. Scan rightward until an L2R character (and go to step 3) or the right
end of the screen line.
 2. Seek to the left end of the next screen line, and go to 1.
 3. Read R2L, bottom-to-top from here to the starting position.

I don't claim to know which technique R2L readers would prefer; I am not
one.  The first has the disadvantage that you must interleave scanning for
the (visual) end of the R2L string, while the second has the disadvantage
that you must read bottom-to-top.

I tend to prefer the second interpretation, partly because no characters
on the first screen line move or change when the window width changes
(unless they are removed entirely by narrowing).  Also because the process
to find the point from which to read is the same as the process involved
in reading the surrounding L2R text anyway.  Also because it makes the
continuation lines a rigid rearrangement of pieces of the longer screen
line we'd have in the ideal case.  (This last point is a question of "what
do we mean continuation lines to mean as currently implemented, anyway?",
so there's no automatic right answer.)

> No word processor that uses that approach (check open-office with a
> long enough text, or with the other OS word processor).

Word processors don't have continuation lines that are meant to be
interpreted as one long screen line.

>> What this means is that an L2R line that ends with a stretch of R2L
>> text will be continued as follows:
>>        +-------------------------------+
>>        |name2 1234 catag2 NOITPIRCSED-\|
>>        |GNOL-YREV                      |
>>        +-------------------------------+
> See the problem - the user must start reading from the second line.

What is the alternative?  (What you proposed later in your message doesn't
address this case to my understanding.)  Perhaps you would want this?

 |name2 1234 catag2 ED-GNOL-YREV\|
 |                     NOITPIRCS |

Of course, we would need some sort of separator/indicator in the (visual)
"2 E" space to indicate that it was the middle and not the end of the R2L
text.  And it would be really odd to have

 |name2 1234 catag2 OITPIRCSED-GNOL-YREV\|
 |               with some latin after N |


 |name2 1234 catag2 NOITPIRCSED-GNOL-YREV |
 |with some latin after                   |

when the window is widened by one column.

>> and truncated thusly:
>>        +-------------------------------+
>>        |name2 1234 catag2 NOITPIRCSED-$|
>>        +-------------------------------+
> The truncation is technically correct but wrong from user perspective.

This is how word processors truncate when they do.  In that mode they show
a (rectangular) subset of the visual layout of the overall document.

> I claim the scrolling should follow the "same" rules.
> e.g. 1: wide screen scrolled 20 (half width):
>        +----------------------------------------+
>        |$owd by OF TXET GNOL 2YREV 1YREV WERBEH$|
>        +----------------------------------------+
> e.g. 2: wide screen scrolled 60 (1.5 width):
>        +----------------------------------------+
>        +----------------------------------------+
> e.g. 3: wide screen scrolled 150:
>        +----------------------------------------+
>        |$HTIW H small latin tail                |
>        +----------------------------------------+
> The word "same" is between quotes because it is the same appearance
> even though the rules may be slightly different.

It looks odd to me to have two pieces of text on the same line move in
different directions when you scroll the window.

New sub-topic: what does one do with the (logical) text

 he said, "SHE SAID, 'latin again.' TODAY." yesterday.

?  Stripping the punctuation for simplicity, it would seem that it would
get rendered as

 he said DIAS EHS latin again YADOT yesterday

which seems confusing because the L2R text brackets its quotation but the
R2L text doesn't.  I bring this up in this thread because, if the right
answer is to render it as

 he said YADOT latin again DIAS EHS yesterday

(with some sort of punctuation or special graphical indication as to how
far the reader must seek to the right before beginning the R2L scan), then
it may have bearing on the present scrolling discussion.


This product is sold by volume, not by mass.  If it appears too dense or
too sparse, it is because mass-energy conversion has occurred during

reply via email to

[Prev in Thread] Current Thread [Next in Thread]