[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: please use ?\u2014 instead of the unicode character inbuff-menu.el

From: Drew Adams
Subject: RE: please use ?\u2014 instead of the unicode character inbuff-menu.el
Date: Sun, 18 Feb 2007 14:15:14 -0800

> Please tell the details about how you downloaded and saved the file to
> disk.  It is impossible to know what went wrong without these details.

What went wrong is not the point. However it is that the characters got
messed up (Web site, browser, user error, cosmic rays, CIA, Al Qaeda), there
is no reason not to use the escape sequence, for portability and better code

> I did it twice with two different methods:
>   . Clicked the "download" link and saved the file to disk.
>   . Clicked the "view" link; then, after seeing that the Unicode
>     characters are displayed incorrectly, clicked View->Encoding from
>     the menu bar, selected "Unicode UTF-8", which fixed the display;
>     then File->Save As, selected "Text files" and made sure the
>     encoding is set to UTF-8; clicked OK.
> Both methods gave me a valid UTF-8 encoded file that displayed
> correctly in Emacs 22.

I used the "view" link, clicking mouse-1 on it, because I wanted to look at
the code before saving it. I did not scan the entire file to notice that two
of the characters were displayed incorrectly, so I did not change my browser
encoding - after all, this is code, which displays as plain text.

And how would one know that those two characters were in fact displayed
incorrectly? How would you know what they were supposed to be? Did you read
all of the code comments, and analyze the code, to come to the conclusion
that the browser encoding for those two characters was incorrect? Or did you
in fact know just what to look for, because you had read my bug report?
That's cheating ;-).

Or did you notice the -*- coding: utf-8 -*- in the header, and realize that
your current browser encoding didn't correspond to that? You said, however,
that you noticed that the (two) Unicode characters were displayed
incorrectly - a much harder thing to spot.

Some other methods a user might use to try to retrieve the code:

- Right-click the "download" link, and use Save As" (as I assume you meant
by "clicked the 'download' link"). Here, you can Save as type All Files.
This works.

- Right-click the "view" link, use Save Target As", and Save as type All
Files, changing the suffix to "el". For some reason, this does nothing, for
me - no file is saved.

- Click mouse-1 on the "download" link, and use "Save As". This does default
to the Unicode encoding, but, at least in my IE6 browser, there is no filter
option for All Files at that point, and you must choose Save as type Text
File (*.txt) (the other options involving saving as HTML pages). When I open
the resulting file in Emacs 22, C-h C shows raw-text-unix, not Unicode, and
the buffer is filled with null bytes (^@) - every other byte. C-x RET r
utf-8 does not change what I see. The -*- coding never takes effect because
each of its characters is preceded by a null character.

There are multiple ways a user might try to retrieve this code from that
site, and there will be other sites that also offer the code, perhaps in
other ways.

As I mentioned, I first ran into this problem on the Emacs Wiki (with the
same em-dash character, in a library that is derived from buff-menu.el).
Simply uploading or downloading the code on the Wiki changes the characters
(in the same way, BTW). Here, the downloading user has no choice. If the
normal page-edit means of uploading is used, then the characters are messed
up in the file on the wiki, so regardless of how you download it, you get
garbage. AFAIK, this has nothing to do with the browser. You might not care
about the Emacs Wiki, but you might care that such a problem exists there,
because other sites might present similar problems.

The real point is that there is no good reason *not* to use the escape
sequence in this case, and there are good reasons to use it: easier file
exchange using email and Internet, and better code legibility.

The only reason given so far not to use the escape sequence was code
legibility, and I pointed out that the code is in fact less legible without
the escape sequence, because the em-dash and hyphen characters are
indistinguishable in a fixed font. They both appear as ?-, making it
impossible to tell which is which (without a comment).

This seems a no-brainer, to me. Further resistance to using the escape
sequence in this case seems to me to reflect only unwillingness to see the

If, on the other hand, your concern was the Web site and how to ensure that
users download Unicode code correctly, then I share that concern. You might
want to include explicit instructions for how to download, and explicit
mention that "view" of code that includes Unicode characters might require
that you change your browser encoding to Unicode. Or something like that.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]