[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#10701: 24.0.93; Crash while decoding input with DOS EOLs

From: Eli Zaretskii
Subject: bug#10701: 24.0.93; Crash while decoding input with DOS EOLs
Date: Thu, 02 Feb 2012 20:15:39 +0200

This bug report will be sent to the Bug-GNU-Emacs mailing list
and the GNU bug tracker at debbugs.gnu.org.  Please check that
the From: line contains a valid email address.  After a delay of up
to one day, you should receive an acknowledgement at that address.

Please write in English if possible, as the Emacs maintainers
usually do not have translators for other languages.

Please describe exactly what actions triggered the bug, and
the precise symptoms of the bug.  If you can, give a recipe
starting from `emacs -Q':

I see this both with today's trunk and in the 24.0.93 pretest, both on
GNU/Linux and on MS-Windows.

To reproduce:

 emacs -Q
 C-x b foo RET
 M-: (set-buffer-multibyte nil) RET
 C-x RET c undecided-dos RET C-u M-! gunzip -c emacs-24.0.93.tar.gz RET

(It must be the tarball of Emacs 24.0.93, because the bug is
data-dependent.  It doesn't have to be .tar.gz, as long as you use the
correct decompressor: bunzip2 for .tar.bz2. xz for .tar.xz, etc.  You
can even do this with an uncompressed tarball and cat.  The important
part is that Emacs gets the byte stream of that tarball, and it gets
it from a subprocess.)

This crashes somewhere in the middle of reading the output from the
subprocess.  The immediate reason for the crash can be seen from this
fragment of the backtrace:

  #0  w32_abort () at w32fns.c:7196
  #1  0x012eea83 in temp_set_point_both (buffer=0x10137600, charpos=45817604,
      bytepos=45817605) at intervals.c:1870
  #2  0x01135816 in Fcall_process (nargs=6, args=0x82f644) at callproc.c:846

As you see temp_set_point_both gets character position and byte
position that are different, which cannot happen in a unibyte buffer
(as can be seen above, the recipe makes the buffer `foo' a unibyte
one).  There's an assertion inside temp_set_point_both that aborts due
to this.

The call to temp_set_point_both is in call-process:

                  TEMP_SET_PT_BOTH (PT + process_coding.produced_char,
                                    PT_BYTE + process_coding.produced);
                  carryover = process_coding.carryover_bytes;
                  if (carryover > 0)
                    memcpy (buf, process_coding.carryover,

The crash happens at the point in the input byte stream where the last
byte in the chunk we read from the pipe is \r.  Since the stream is
decoded with raw-text-dos coding-system, this last \r is left as a
"carryover", for the possibility that there will be a \n at the
beginning of the next chunk.  However, process_coding.produced does
not account for this single byte that was not processed, and gets the
value one more than it should.

As far as I could see, the problematic code that sets
process_coding.produced to incorrect value is in decode_coding, around
line 7176:

          /* Record unprocessed bytes in coding->carryover.  We are
             sure that the number of data is less than the size of
             coding->carryover.  */
          unsigned char *p = coding->carryover;

          if (nbytes > sizeof coding->carryover)
            nbytes = sizeof coding->carryover;
          coding->carryover_bytes = nbytes;
          while (nbytes-- > 0)
            *p++ = *src++;
      coding->consumed = coding->src_bytes; <<<<<<<<<<<<<<<<<<<

This last assignment then causes produce_chars to set
coding->produced to an incorrect value:

      /* Source characters are at coding->source.  */
      const unsigned char *src = coding->source;
      const unsigned char *src_end = src + coding->consumed; <<<<<<<<<<<<
          produced_chars = coding->consumed_char;
          while (src < src_end)
            *dst++ = *src++;

  produced = dst - (coding->destination + coding->produced);  <<<<<<<<<<<
  if (BUFFERP (coding->dst_object) && produced_chars > 0)
    insert_from_gap (produced_chars, produced);
  coding->produced += produced; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
  coding->produced_char += produced_chars;

I don't understand the logic of "carryover" in decode_coding well
enough to decide how to fix it.

If Emacs crashed, and you have the Emacs process in the gdb debugger,
please include the output from the following gdb commands:
    `bt full' and `xbacktrace'.
For information about debugging Emacs, please read the file

In GNU Emacs (i386-mingw-nt5.1.2600)
 of 2012-02-02 on HOME-C4E4A596F7
Windowing system distributor `Microsoft Corp.', version 5.1.2600
Configured using:
 `configure --with-gcc (3.4) --no-opt'

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: ENU
  value of $XMODIFIERS: nil
  locale-coding-system: cp1255
  default enable-multibyte-characters: t

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
M-x r e p o r t - e m <tab> <return>

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.

Load-path shadows:
None found.

(shadow sort gnus-util mail-extr message format-spec rfc822 mml easymenu
mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 rfc2047 rfc2045
ietf-drums mm-util mail-prsvr mailabbrev mail-utils gmm-utils mailheader
emacsbug time-date tooltip ediff-hook vc-hooks lisp-float-type mwheel
dos-w32 disp-table ls-lisp w32-win w32-vars tool-bar dnd fontset image
fringe lisp-mode register page menu-bar rfn-eshadow timer select
scroll-bar mouse jit-lock font-lock syntax facemenu font-core frame cham
georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese hebrew greek romanian slovak czech european ethiopic
indian cyrillic chinese case-table epa-hook jka-cmpr-hook help simple
abbrev minibuffer loaddefs button faces cus-face files text-properties
overlay sha1 md5 base64 format env code-pages mule custom widget
hashtable-print-readable backquote make-network-process multi-tty emacs)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]