[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

documentation bug: Mule and MSDOS

From: dirk janssen
Subject: documentation bug: Mule and MSDOS
Date: Tue, 27 Mar 2001 19:03:34 +0200

This bug report will be sent to the Free Software Foundation,
 not to your local site managers!!
Please write in English, because the Emacs maintainers do not have
translators to read other languages for them.

In GNU Emacs 20.6.1 (i386-suse-linux, X toolkit)
 of Sat Mar 11 2000 on Hahn
configured using `configure  --with-gcc --with-pop --with-system-malloc 
--prefix=/usr --exec-prefix=/usr --infodir=/usr/share/info 
--mandir=/usr/share/man --sharedstatedir=/var/state --libexecdir=/usr/lib 
--with-x --with-x-toolkit=lucid --x-includes=/usr/X11R6/include 
--x-libraries=/usr/X11R6/lib i386-suse-linux'

Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:

The documentation on the topic of reading msdos/windows files with
`european' characters on a unix box is unclear. I am a LISP
programmer, but it took me several hours to find the (very simple)
solution to this problem.

Here is a backtrack of my mental states :-)
1. I assumed I had to convert the buffer *after* it was read in
2. I could find info on disabling multibyte, but not much on enabling
3. the MULE docs do not mention codepages at all, one has to go to the
emacs on dos section. this section then lists the `dos-codepage-setup'
command, that is not available to me.
4. The other command `codepage-setup' does not change the display at
all, even not when I next choose this as an encoding in the
problematic buffer. Hence, I have no way to check what I am doing.

Scope of the problem:
Although codepages are a completely broken way to `support'
international characters, they are in common use. Windows
generated plain text files use a codepage, and not iso-latin.
Emacs should support them better, especially because all the machinery
is there.

1. Make the MULE doc more `hands-on'. Currently, it tells me a whole
lot about various options and possibilities, but too little about how
I put it to use. 
2. In the mule docs, insert a section on `Reading international files
from MS-DOS or Windows (codepages)'.  Suggestion:

 Applications on the MS-DOS and Windows platform commonly write files
 that are not in any ISO encoding, but use a so-called `code page'. Emacs
 has no way to determine the code page from the file, because
 different code pages use the same numbers to represent things. 

 To read these files, tell Emacs which code page has been used to
 encode them when opening the file. This is something you need to
 write down when saving the file. Windows commonly uses code page 850
 for iso-latin-1.

 When opening the file, Emacs will convert the text to its internal
 format and editing will proceed as usual. Upon saving, the file will
 be converted back to the its code page encoding.

 Opening a file with a code page takes three steps:

 M-x codepage-setup 
  Extend emacs built-in encodings with one for the specified codepage.
  Normal encodings are automatically available, code pages have to be
  set up first with this command. 
  This command asks for the number of the code page, eg. 850.

  The encoding prefix. Use this to specify the code page
  to use with the next command (which will be `open file'). For each code
  page set up above, three encodings are created that represent the
  unix, dos, or mac end-of-line conventions. For code page 850, these
  are named `cp850-unix', `cp850-dos', and `cp850-mac'. 

  Usually your DOS files will adhere to the DOS end-of-line
  convention, so specify `cp850-dos' (inserting the correct code page
  number for your file).

C-x C-f 
  Open file, using the code page. It will automatically be saved
  using the same code page.

While editing a file encoded with a code page, the mode line will show
something like `-D:--'. The `D' stands for Dos code page, and there
are two characters before the `:' to show that multibyte support is

If your buffer contains escape character of the type `\213' and the
mode line shows only one character before the colon, you have read in
the file without specifying the code page. Close the file and read it
in again using the procedure above.


I know this repeats some information that is also available elsewhere
in the mule docs, and some notes and links would be useful. But the
perspective of someone trying to convert the odd `broken platform'
file is VERY different from someone trying to use emacs for Korean in
her/his daily life.  Therefore a problem directed info section seems
warranted to me.

Virtually yours,


Dirk Janssen
University of Leipzig, Germany

reply via email to

[Prev in Thread] Current Thread [Next in Thread]