[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: address@hidden: Coding problem with Euro sign]
From: |
Ralf Angeli |
Subject: |
Re: address@hidden: Coding problem with Euro sign] |
Date: |
Fri, 16 Dec 2005 12:55:47 +0100 |
User-agent: |
Gnus/5.110004 (No Gnus v0.4) Emacs/22.0.50 (gnu/linux) |
* Kevin Rodgers (2005-12-15) writes:
> Ralf Angeli wrote:
> > * Kevin Rodgers (2005-12-15) writes:
> >
> >>You could try something like this:
> >>
> >>(setq auto-coding-regexp-alist
> >> (cons '("[\040-\177][\200-\237]" . cp1252)
> >> auto-coding-regexp-alist))
> >
> > This doesn't seem to work here. I still see the byte codes of the
> > 8-bit characters when opening the file after evaluating the above
> > form.
[...]
> I assume those display problems are because I haven't configured an
> Emacs fontset for the cp850 coding system. But the
> auto-coding-regexp-alist entry worked as intended, and you're on
> Windows so your fontset should be properly configured for that.
Currently I am on GNU/Linux. Anyway, with the development version of
Emacs I did not have the problems with cp1252 you described when
loading the file. But when trying to write the file I got this
warning:
,----
| Warning (:warning): Invalid coding system `cp1252' is specified
| for the current buffer/file by the variable `auto-coding-regexp-alist'.
| It is highly recommended to fix it before writing to a file.
`----
I didn't do `M-x codepage-setup RET' before trying all of this.
Interestingly loading and writing the file worked fine if I used
windows-1252 instead of cp1252.
> One other detail: that entry only sets the coding system if the euro
> is immediately preceded by an ASCII character. Is that the case in
> your file?
No. On emacs-pretest-bug I already explained that the original (test)
file doesn't include the A circumflex, that means the euro is preceded
by a newline. (Maybe it would be better to continue the discussion in
the thread on emacs-pretest-bug in order to avoid repetition?)
If I insert a space or a random ASCII character before the Euro sign
and evaluate the form above (using windows-1252 for the encoding) the
encoding is being identified correctly and both the u umlaut and the
Euro sign are being displayed correctly.
> What does `C-h C RET' say after visiting the file?
In case the encoding is not identfied correctly:
,----
| Coding system for saving this buffer:
| t -- raw-text-dos
|
| Default coding system (for new files):
| 1 -- iso-latin-1 (alias: iso-8859-1 latin-1)
|
| Coding system for keyboard input:
| 1 -- iso-latin-1 (alias: iso-8859-1 latin-1)
|
| Coding system for terminal output:
| 1 -- iso-8859-1 (alias of iso-latin-1)
|
| Defaults for subprocess I/O:
| decoding: 1 -- iso-latin-1 (alias: iso-8859-1 latin-1)
|
| encoding: 1 -- iso-latin-1 (alias: iso-8859-1 latin-1)
|
|
| Priority order for recognizing coding systems when reading files:
| 1. iso-latin-1 (alias: iso-8859-1 latin-1)
| 2. mule-utf-8 (alias: utf-8)
| 3. mule-utf-16be-with-signature (alias: utf-16be-with-signature
mule-utf-16-be utf-16-be)
| 4. mule-utf-16le-with-signature (alias: utf-16le-with-signature
mule-utf-16-le utf-16-le)
| 5. iso-2022-jp (alias: junet)
| 6. iso-2022-7bit
| 7. iso-2022-7bit-lock (alias: iso-2022-int-1)
| 8. iso-2022-8bit-ss2
| 9. emacs-mule
| 10. raw-text
| 11. japanese-shift-jis (alias: shift_jis sjis cp932)
| 12. chinese-big5 (alias: big5 cn-big5 cp950)
| 13. no-conversion
|
| Other coding systems cannot be distinguished automatically
| from these, and therefore cannot be recognized automatically
| with the present coding system priorities.
|
| The following are decoded correctly but recognized as iso-2022-7bit-lock:
| iso-2022-7bit-ss2 iso-2022-7bit-lock-ss2 iso-2022-cn iso-2022-cn-ext
| iso-2022-jp-2 iso-2022-kr
| [...]
`----
In case the coding is identified correctly:
,----
| Coding system for saving this buffer:
| * -- windows-1252-dos
|
| Default coding system (for new files):
| 1 -- iso-latin-1 (alias: iso-8859-1 latin-1)
|
| Coding system for keyboard input:
| 1 -- iso-latin-1 (alias: iso-8859-1 latin-1)
|
| Coding system for terminal output:
| 1 -- iso-8859-1 (alias of iso-latin-1)
|
| Defaults for subprocess I/O:
| decoding: 1 -- iso-latin-1 (alias: iso-8859-1 latin-1)
|
| encoding: 1 -- iso-latin-1 (alias: iso-8859-1 latin-1)
| [...]
`----
> I assume you're running with multibyte characters enabled.
Yes. The relevant setting should be included in the original bug
report.
> > And a customization is actually not what I am interested in; I'd like
> > Emacs to figure this out by itself, out of the box.
>
> How is Emacs supposed to infer the coding system from the contents of
> that file? If you can come up with a suitable customization, perhaps
> it will be incorporated into Emacs as the default behavior.
If I knew how to do that I would have sent a patch already. My naive
approach would be to look for the presence of bytes which are
characteristic for Windows codepages in order to identify the encoding
as a Windows codepage. Maybe looking at line endings can help to make
the right decision. After the encoding was identified to be a Windows
codepage, the exact codepage could be chosen based on the language
environment. But this suggestion is just random guesswork from my
side because I know close to nothing about what processes are involved
in identifying an encoding.
> Can Notepad display files in anything besides CP850/Windows-1252 and
> probably UTF-8 w/BOM? E.g. can it distinguish ISO 8859-1 from ISO
> 8859-2 from ISO 8859-15?
As far as I understood Reiner on emacs-pretest-bug this is impossible
anyway.
> Yes, Windows applications simply assumes you're using a proprietary
> Microsoft character set, and GNU/Linux apps prioritize support for
> standard character encodings. Maybe all you need is
> (prefer-coding-system 'cp850)
Wouldn't that be a bit too restricted as a general solution for Emacs?
--
Ralf
- address@hidden: Coding problem with Euro sign], Richard M. Stallman, 2005/12/13
- Re: address@hidden: Coding problem with Euro sign], Kevin Rodgers, 2005/12/14
- Re: address@hidden: Coding problem with Euro sign], Ralf Angeli, 2005/12/14
- Re: address@hidden: Coding problem with Euro sign], Kevin Rodgers, 2005/12/14
- Re: address@hidden: Coding problem with Euro sign], Ralf Angeli, 2005/12/15
- Re: address@hidden: Coding problem with Euro sign], Kevin Rodgers, 2005/12/15
- Re: address@hidden: Coding problem with Euro sign], Eli Zaretskii, 2005/12/16
- Re: address@hidden: Coding problem with Euro sign], Kevin Rodgers, 2005/12/16
- Re: address@hidden: Coding problem with Euro sign], Eli Zaretskii, 2005/12/17
- Re: address@hidden: Coding problem with Euro sign],
Ralf Angeli <=
- Re: address@hidden: Coding problem with Euro sign], Kevin Rodgers, 2005/12/16
- Re: address@hidden: Coding problem with Euro sign], Eli Zaretskii, 2005/12/17
- Re: address@hidden: Coding problem with Euro sign], Reiner Steib, 2005/12/17
- Re: address@hidden: Coding problem with Euro sign], David Hansen, 2005/12/16