[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#16457: 24.3.50; crash rendering Arabic Uthmani script
From: |
Eli Zaretskii |
Subject: |
bug#16457: 24.3.50; crash rendering Arabic Uthmani script |
Date: |
Thu, 16 Jan 2014 19:33:22 +0200 |
> Date: Thu, 16 Jan 2014 12:01:04 +0400
> From: Dmitry Antipov <dmantipov@yandex.ru>
> CC: 16457@debbugs.gnu.org
>
> I'm not familiar with composition sequences in detail
The compositions stuff is under-documented. I provide some
information I know of below.
> For the uthmani-test.txt, the following code in set_iterator_to_next:
>
> 7127 /* Composition created while scanning forward. */
> 7128 /* Update IT's char/byte positions to point to the
> first
> 7129 character of the next grapheme cluster, or to the
> 7130 character visually after the current composition.
> */
> 7131 for (i = 0; i < it->cmp_it.nchars; i++)
> 7132 bidi_move_to_visually_next (&it->bidi_it);
> 7133 IT_BYTEPOS (*it) = it->bidi_it.bytepos;
> 7134 IT_CHARPOS (*it) = it->bidi_it.charpos;
>
> advances IT from charpos:bytepos 11:21 to 13:25. But the following fragment
> from scan_for_column:
>
> 586 /* Check composition sequence. */
> 587 if (cmp_it.id >= 0
> 588 || (scan == cmp_it.stop_pos
> 589 && composition_reseat_it (&cmp_it, scan, scan_byte,
> end,
> 590 w, NULL, Qnil)))
> 591 composition_update_it (&cmp_it, scan, scan_byte, Qnil);
> 592 if (cmp_it.id >= 0)
> 593 {
> 594 scan += cmp_it.nchars;
> 595 scan_byte += cmp_it.nbytes;
>
> advances SCAN:SCAN_BYTE from 11:21 to 13:24. So the byte position becomes
> invalid
> and FETCH_CHAR_ADVANCE decodes invalid byte sequence to invalid character C.
> Finally, CHAR_TABLE_REF (Vcomposition_function_table, C) goes out of bounds.
In effect, you are saying that cmp_it.nbytes above is incorrect.
This is really strange. First, I cannot reproduce the crash on
MS-Windows, so the problem might be related to the shaping engine
being used (I presume yours is libotf and libm17n). (I tried on both
Windows XP and on Windows 7, which have very different versions of
Uniscribe, and they both work fine.)
Moreover, set_iterator_to_next uses the same code from composite.c
that scan_for_column does, so it is unclear to me how the former
works, while the latter doesn't.
Specifically, cmp_it.nbytes is computed in composition_update_it as
the sum of byte-widths of all the characters being composed:
cmp_it->width = 0;
for (i = cmp_it->nchars - 1; i >= 0; i--)
{
c = XINT (LGSTRING_CHAR (gstring, cmp_it->from + i));
cmp_it->nbytes += CHAR_BYTES (c);
cmp_it->width += CHAR_WIDTH (c);
}
And the characters in the LGSTRING object are simply copied from the
buffer in fill_gstring_header, when LGSTRING is created:
for (i = 0; i < len; i++)
{
int c;
if (NILP (string))
FETCH_CHAR_ADVANCE_NO_CHECK (c, from, from_byte);
else
FETCH_STRING_CHAR_ADVANCE_NO_CHECK (c, string, from, from_byte);
ASET (header, i + 1, make_number (c));
}
Could you please trace through these fragments and see what goes wrong
there? Specifically, what characters (which Unicode codepoints) are
being composed, and what are the contents of the cmp_it structure in
scan_for_column when it advances from 11:21 to 13:24. (Granted, here
I see it advance from 11:21 to 13:25, as expected.)
Also, what does "C-u C-x =" report when you put the cursor in column
10?
Some more details:
The LGSTRING object is created when Emacs encounters for the first
time a group of characters that should be composed together. The
structure of LGSTRING is describe in the comments to
composition-get-gstring. Emacs recognizes the character compositions
in composition_reseat_it, which calls autocmp_chars, which calls
composition-get-gstring, which collects the characters to be composed
by calling fill_gstring_header, as shown in the fragment above.
The LGSTRING object is then cached, such that later references to it
use the cached data, instead of computing it from scratch. The cmp_it
structure holds an ID of the LGSTRING which can be used to look it up
in the cached. When composition_update_it is called, simply uses the
information already stored in LGSTRING to advance past the composed
characters.
So to understand why it crashes for you, we need to find out why the
nbytes value stored by fill_gstring_header somehow became incorrect.
Btw, does the problem go away if you disable cache-long-scans?
- bug#16457: 24.3.50; crash rendering Arabic Uthmani script, Dmitry Antipov, 2014/01/15
- bug#16457: 24.3.50; crash rendering Arabic Uthmani script, Eli Zaretskii, 2014/01/15
- bug#16457: 24.3.50; crash rendering Arabic Uthmani script, Glenn Morris, 2014/01/15
- bug#16457: 24.3.50; crash rendering Arabic Uthmani script,
Eli Zaretskii <=
- bug#16457: 24.3.50; crash rendering Arabic Uthmani script, Dmitry Antipov, 2014/01/17
- bug#16457: 24.3.50; crash rendering Arabic Uthmani script, Eli Zaretskii, 2014/01/17
- bug#16457: 24.3.50; crash rendering Arabic Uthmani script, Dmitry Antipov, 2014/01/17
- bug#16457: 24.3.50; crash rendering Arabic Uthmani script, Eli Zaretskii, 2014/01/17
- bug#16457: 24.3.50; crash rendering Arabic Uthmani script, K. Handa, 2014/01/17
bug#16457: 24.3.50; crash rendering Arabic Uthmani script, K. Handa, 2014/01/19