bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order ma


From: Eli Zaretskii
Subject: bug#4047: 23.1.1: hexl-mode doesn't like UTF8 files with a byte-order mark
Date: Sat, 08 Aug 2009 15:20:10 +0300

> From: "Pierre Bogossian" <bogossian@mail.com>
> Date: Fri, 7 Aug 2009 09:50:54 +0100
> 
> >[...] does it help to say
> >"C-x RET f utf-8-with-signature RET" before entering hexl-mode?
> 
> No, but forcing the coding system of any buffer to utf_8-with-signature
> using this command and then entering hexl-mode is enough to trigger
> the error. I can even reproduce it with a blank scratch buffer.
> 
> >> Unfortunately I can't test a unix version at the moment.
> >
> >Which means your OS is what?
> 
> Windows XP SP3.

The problem happens on GNU/Linux as well.

I think I've identified why the problem happens, but I need help in
finding the right solution.  Handa-san, can you please comment on
what's below?  Of course, others are welcome to comment as well.

The cause of the problem is this: hexlify-buffer must bind
coding-system-for-write to the buffer's encoding, to force
call-process-region use the buffer's encoding when writing the text to
the temporary file.  OTOH, it needs to avoid encoding the arguments
passed to the `hexl' program by the buffer's encoding, because that
could be inappropriate for encoding command lines on the underlying
system.  However, call-process-region normally uses
coding-system-for-write, if it is non-nil, to encode the arguments as
well.  To resolve this contradiction, hexlify-buffer encodes the
arguments manually (by locale-coding-system), assuming that, being
unibyte strings after that encoding, they will not be encoded by
call-process-region.

But call-process (called by call-process-region) does this:

    /* If arguments are supplied, we may have to encode them.  */
    if (nargs >= 5)
      {
        int must_encode = 0;
        Lisp_Object coding_attrs;

        for (i = 4; i < nargs; i++)
          CHECK_STRING (args[i]);

        for (i = 4; i < nargs; i++)
          if (STRING_MULTIBYTE (args[i]))
            must_encode = 1;

        if (!NILP (Vcoding_system_for_write))
          val = Vcoding_system_for_write;
        else if (! must_encode)
          val = Qnil;
        else
          {
            args2 = (Lisp_Object *) alloca ((nargs + 1) * sizeof *args2);
            args2[0] = Qcall_process;
            for (i = 0; i < nargs; i++) args2[i + 1] = args[i];
            coding_systems = Ffind_operation_coding_system (nargs + 1, args2);

First, if coding-system-for-write is non-nil, it is used, even if none
of the argument strings is a multibyte string.  (This particular bug
can easily be solved by making the test for must_encode before we test
that coding-system-for-write is non-nil, but I'm not sure this is the
right solution because other arguments could be multibyte strings,
which will still cause us to use coding-system-for-write for _all_
arguments.)

And second, this fragment, which actually encodes the arguments,
further down in call-process:

  if (nargs > 4)
    {
      register int i;
      struct gcpro gcpro1, gcpro2, gcpro3, gcpro4, gcpro5;

      GCPRO5 (infile, buffer, current_dir, path, error_file);
      argument_coding.dst_multibyte = 0;
      for (i = 4; i < nargs; i++)
        {
          argument_coding.src_multibyte = STRING_MULTIBYTE (args[i]);
          if (CODING_REQUIRE_ENCODING (&argument_coding))
            /* We must encode this argument.  */
            args[i] = encode_coding_string (&argument_coding, args[i], 1);
        }

encodes the argument even though argument_coding.src_multibyte is set
to nil.  Is encode_coding_string supposed to encode unibyte strings?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]