emacs-bidi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [emacs-bidi] Embedding levels of formatting codes


From: Eli Zaretskii
Subject: Re: [emacs-bidi] Embedding levels of formatting codes
Date: Wed, 17 Oct 2001 13:03:06 +0200 (IST)

On Wed, 17 Oct 2001, Behdad Esfahbod wrote:

> >    ``instead of removing the format codes, assign the embedding 
> >      level to each embedding character''
> > 
> > What is ``the embedding level'' which I should assign to those codes?
> 
> It means:
> 
>   ``X9'. With each RLE, LRE, RLO, LRO, PDF, and BN character, set it's
>     level to the current embedding level, then turn it's type to BN.''

What is ``current embedding level''?  Is this the level _before_ or
_after_ increasing/decreasing it due to these codes?

I currently implemented that as _after_ the level update, so RLE, LRE,
RLO, and LRO get the higher level, while PDF gets the lower level.
This needs an artificial correction at the final stage, to prevent a
buffer like this:

       abcd{RLO}foo{PDF}xyz

from being displayed like this:

       abcdoof{RLO}{PDF}xyz

> >    ``In rule X10, assign L or R to the last of a sequence of adjacent BNs 
> >      according to the eor / sor, and set the level to the higher of the
> >      two levels.''
> > 
> > Do you even understand what are they trying to tell here?  What does
> > ``according to the eor / sor'' mean in practical terms?  What ``two
> > levels'' do they mean in the last part of this sentence?
> 
> For each run, the spec has defined sor and eor levels, the sor is the 
> level of previous run, and eor is the level of next run, now:
> 
>   ``X10'. With each maximal sequence of adjacent BNs, set it's level
>     to the higher of sor and eor, name this level x, then if x is  
>     even, change the bidi type of the last character in sequence, to 
>     ltr, and otherwise, change it to rtl.''

The problem here is that formatting codes in most cases (with the
exception of RLM and LRM) start or end the run.  Since X10 is applied
to a single level run only, what does it mean, practically, ``maximal
sequence of adjacent BNs''?  For example, if we have a buffer like
this:

   abcd{LRE}{RLE}{RLO}{LRE}{LRO}xyz{PDF}{PDF}{PDF}{PDF}{PDF}

I don't really have any ``adjacent BNs'' here, since each BN in this
example is in another level run, right?

So, with the exception of LRM/RLM, when would we see a ``maximal
sequence of BNs''?

> > > I won't recommend implementing something based on this section.
> > 
> > Actually, what I wrote is based on that section, and it seems to work
> > fairly well.  Most of what they say there is not very important
> > anyway, since the algorithm mostly works on each level run separately,
> > and formatting codes almost always (with the exception of LRM and RLM)
> > change the level, i.e. end the current level run and start another.
> 
> Ok, but lots of bugs arise in the run boundaries, are you trying to be 
> strictly UAX#9 conformant, or some very small exceptions are not too 
> important to you?

I'm trying to be compliant, but UTR#9 doesn't have any test suite to
test the code against (and their reference implementation could be
buggy, so is not a very good tool for verifying other
implementations).  What I need is a test suite which was hand-verified
against the algorithm definition.  I didn't find such a suite.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]