bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in em

From: Emmanuel Bigler
Subject: bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
Date: Fri, 04 Feb 2011 18:08:51 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Thunderbird/3.1.7

I see this:
   buffer-file-coding-system is a variable defined in `C source code'.
   Its value is iso-latin-1-dos

See "M-: (coding-system-priority-list) RET".

The highest-priority encoding is set from your locale, but look what
is the next one.

hello again.
I think I'm starting to understand what is going on.

I had created a long time ago an unibyte file containing the 1-byte characters I want to test within emacs. The file was created with a program on which I have total control byte-by-byte, so I know what is inside the file exactly. I have attached the file to this mail, not sure that this is allowed on the gnu-debug mailing list, but this is simple and very short .txt file, that reads as follows : (this mail itself is typeset and displayed here as iso-8859-1)

------- mytestchars-224-255-iso-8859.txt ---------------------

  224 \340  à   225 \341  á   226 \342  â   227 \343  ã
  228 \344  ä   229 \345  å   230 \346  æ   231 \347  ç
  232 \350  è   233 \351  é   234 \352  ê   235 \353  ë
  236 \354  ì   237 \355  í   238 \356  î   239 \357  ï
  240 \360  ð   241 \361  ñ   242 \362  ò   243 \363  ó
  244 \364  ô   245 \365  õ   246 \366  ö   247 \367  ÷
  248 \370  ø   249 \371  ù   250 \372  ú   251 \373  û
  252 \374  ü   253 \375  ý   254 \376  þ   255 \377  ÿ

éèçàù  < test strings to see how they behave


I started /usr/local/bin/emacs -Q mytestchars-224-255-iso-8859.txt
under emacs (i686-pc-linux-gnu)

The file displays perfectly correctly. (describe-char (point)) gives me exactly what I want, i.e. an extended asci decimal code between 224 and 255. Almost all operations (except capitalize, see below) work exactly as I wish and exactly like in older emacs versions, no mystery since the priority list
M-: (coding-system-priority-list) RET reads as :
(iso-latin-1 utf-8 iso-2022-7bit iso-2022-7bit-lock iso-2022-8bit-ss2 emacs-mule raw-text iso-2022-jp in-is13194-devanagari chinese-iso-8bit utf-8-auto utf-8-with-signature ...)

Again I'm perfectly happy since I see that iso-latin-1 comes first, but is this what I want ? certainly yes,
my locale environment variables look like :

However, in this emacs -Q session, with a correct unibyte display of
an unibyte file, *capitalize does not work*.
At the beginning of this discussion, Sven explained that capitalize would only work on 2-byte characters. Which I tested of course, and of course it works, but I simply wish I could continue to capitalize M-c unibyte words like in the good old iso-8859 days !!

Additional info : when applying the M-c command to a letter above
decimal ascii 224, nothing happens on the display as reported, *although the buffer is marked as being changed.*

Incidentally in a good ol' xterm window (fitted with gnu readline and
obeying my LOCALE preferences as liste above), M-c works perfectly as
it should, and if I cut-paste from the xterm to the emacs buffer,
everything looks fine & unibyte ... except that I can no longer change
the case of the pasted string with 'capitalize' or a similar 'case'

Bug, or UTF-8 emacs 23.2 feature ?


