bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#39799: 28.0.50; Most emoji sequences don’t render correctly


From: Robert Pluim
Subject: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Fri, 28 Feb 2020 17:39:56 +0100

>>>>> On Fri, 28 Feb 2020 18:19:10 +0200, Eli Zaretskii <eliz@gnu.org> said:

    >> From: Robert Pluim <rpluim@gmail.com>
    >> Cc: Glenn Morris <rgm@gnu.org>,  mfabian@redhat.com,  
39799@debbugs.gnu.org
    >> Date: Fri, 28 Feb 2020 15:14:01 +0100
    >> 
    >> >> It matches forward off the first char, so the
    >> >> composition-function-table entries all have '0' as the number of chars
    >> >> to match. Would it be better to match backwards?
    >> 
    Eli> I don't think matching backwards is better in general.  Did you have a
    Eli> reason for thinking it was?
    >> 
    >> I thought I saw a comment in composite.c that says matching is done
    >> backward, but I see that itʼs done forwards as well.

    Eli> Btw, it sometimes _can_ be beneficial to use backward matching: if it
    Eli> makes the size of composition-function-table smaller.  Since
    Eli> composition-function-table is a char-table, and char-tables allocate
    Eli> sub-tables only if needed, you can conserve memory (and thus make
    Eli> Emacs's memory footprint smaller) and faster (because 'aref' will llok
    Eli> up values in a char-table faster) by setting a smaller number of
    Eli> slots.  For example, if the 2nd character of an Emoji sequence was
    Eli> always one specific character, or a small set of characters, you could
    Eli> set only the slots of those few characters, which would make the
    Eli> char-table smaller.  OTOH, if that would yield many different
    Eli> composition rules in the list of rules for those few characters,
    Eli> redisplay could become slower, because it generally examines the rules
    Eli> one by one until it finds an appropriate one.  So the winning setup of
    Eli> composition-function-table is the one that sets the smallest number of
    Eli> slots, but still keeps the lists of rules for those slots short.  And
    Eli> note that setting the same rule for a range of codepoints generally
    Eli> uses up only one slot in the char-table, so rules that can be
    Eli> generalized to cover many characters are preferable.

I donʼt think that applies in this case. The sequences are all easily
categorised based on the first char in the sequence. It could be done
based on the 2nd, or 3rd or whatever, but I donʼt think that reduces
the number of entries. Plus thereʼs always one rule per character,
since multiple patterns starting with the same character are combined
using regexp-opt.

One thing though: the code currently does set-char-table-range to a
new value. Is there a chance that an entry already exists in
composition-function-table for a particular character? If so Iʼd have
to change it to add the new rule after the existing one (before?).

Robert





reply via email to

[Prev in Thread] Current Thread [Next in Thread]