emacs-pretest-bug
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: default value of terminal-coding-system


From: Peter Dyballa
Subject: Re: default value of terminal-coding-system
Date: Sat, 26 Mar 2005 12:46:41 +0100


Am 26.03.2005 um 03:18 schrieb Richard Stallman:

    Emacs needs to learn more about combining characters before it can
    handle such things correctly.

Emacs knows quite a bit about combining characters;
what precisely is it doing wrong?


In xterm's tcsh these are set:

        LANG=de_DE.UTF-8
        LC_ALL=de_DE.UTF-8
        LC_CTYPE=de_DE.UTF-8
        TERM=xterm-color
TERMPATH=/Users/pete/.termcap:/usr/share/misc/termcap:/usr/X11R6/lib/ X11/etc/xterm.termcap
        nokanji
version tcsh 6.12.00 (Astron) 2002-07-23 (powerpc-apple-darwin) options 8b,nls,dl,al,kan,sm,rh,color,dspm,filec
        
ls -lw shows these file names correctly:
        
        -rwxrwxr-x    1 pete  pete      32216 17 Nov  2002 RGB äöüæÆÜÖÄ.txt
        -rw-r--r--    1 pete  pete         62 25 Mär 01:38 áÛïŬà.txt
        -rw-r--r--    1 pete  pete        107  2 Dez 21:29 äöüßÜÖÄ€

as Finder or dired-mode in X11 do too. In GNU Emacs 22.0.50.1 (powerpc-apple-darwin7.8.0, X toolkit, Xaw3d scroll bars) of 2005-03-25 on localhost (last CVS update on 2005-03-19, patches from Stefan Monnier) configured using `configure '--without-carbon' '--with-x' '--without-pop' '--with-xpm' '--with-jpeg' '--with-tiff' '--with-png' '--with-gif' '--with-x-toolkit=lucid' 'CFLAGS=-I/sw/include' 'CPPFLAGS=-I/sw/include' 'LDFLAGS=-L/sw/lib''

Important settings:
  value of $LC_ALL: de_DE.UTF-8
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: de_DE.UTF-8
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: de_DE.UTF-8
  locale-coding-system: utf-8
  default-enable-multibyte-characters: t

Major mode: Calendar

Minor modes in effect:
  auto-compression-mode: t
  display-time-mode: t
  mouse-sel-mode: t
  show-paren-mode: t
  encoded-kbd-mode: t
  menu-bar-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  unify-8859-on-decoding-mode: t
  unify-8859-on-encoding-mode: t
  utf-translate-cjk-mode: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

in its dired-mode (-uuu:%% in modeline) in xterm I can see the file names mentioned above correctly, only the right most position the cursor can have is a few columns after the file's name ends. For RGB äöüæÆÜÖÄ.txt the name ends in column 69, C-e leads the cursor to column 76 (taken from column-number-mode). 'Spelling' (C-u C-x =) the file's name I have (starting with <SPC> at column 57:

  character: SPC (040, 32, 0x20, U+0020)
    charset: ascii (ASCII (ISO646 IRV))
 code point: 32
     syntax:    which means: whitespace
   category: a:ASCII   l:Latin
buffer code: 0x20
  file code: 0x20 (encoded by coding system mule-utf-8)
    display: terminal code 0x20

character: a (0141, 97, 0x61, U+0061) ; *Help* in -uuu:-- and minibuffer charset: ascii (ASCII (ISO646 IRV)) ; show an 'a', column 58, should
 code point: 97                                                 ; be 'ä'
     syntax: w  which means: word
   category: a:ASCII   l:Latin
buffer code: 0x61
  file code: 0x61 (encoded by coding system mule-utf-8)
    display: terminal code 0x61

  character:  (01211310, 332488, 0x512c8, U+0308)
charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
 code point: 37 72                                                      ; 
minibuffer shows a dieresis on :
     syntax: w  which means: word                       ; after Char, *Help* 
shows nothing,
category: ^:Combining diacritic or mark ; column is 59, should be 'ö' now
buffer code: 0x9C 0xF4 0xA5 0xC8
  file code: 0xCC 0x88 (encoded by coding system mule-utf-8)
    display: terminal code 0xCC 0x88

  character: o (0157, 111, 0x6f, U+006F)                ; at column 60 is an 'ü'
    charset: ascii (ASCII (ISO646 IRV))
 code point: 111
     syntax: w  which means: word
   category: a:ASCII   l:Latin
buffer code: 0x6F
  file code: 0x6F (encoded by coding system mule-utf-8)
    display: terminal code 0x6F

  character:  (01211310, 332488, 0x512c8, U+0308)
charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
 code point: 37 72                                                      ; 
minibuffer shows a dieresis on :
     syntax: w  which means: word                       ; after Char, *Help* 
shows nothing,
   category: ^:Combining diacritic or mark      ; should be 'æ' in column 61
buffer code: 0x9C 0xF4 0xA5 0xC8
  file code: 0xCC 0x88 (encoded by coding system mule-utf-8)
    display: terminal code 0xCC 0x88

now in first column in dired after RGB äöüæÆÜÖÄ.txt:

  character:  (01211310, 332488, 0x512c8, U+0308)
charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
 code point: 37 72                                                      ; 
minibuffer shows and describes A,
     syntax: w  which means: word                       ; *Help* shows nothing 
as character
category: ^:Combining diacritic or mark ; column is 70, should be linefeed
buffer code: 0x9C 0xF4 0xA5 0xC8
  file code: 0xCC 0x88 (encoded by coding system mule-utf-8)
    display: terminal code 0xCC 0x88

  character:  (01211310, 332488, 0x512c8, U+0308)
charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
 code point: 37 72                                                      ; 
minibuffer shows a dieresis on :
     syntax: w  which means: word                       ; after Char, *Help* 
shows nothing,
category: ^:Combining diacritic or mark ; column is 71, one after linefeed
buffer code: 0x9C 0xF4 0xA5 0xC8
  file code: 0xCC 0x88 (encoded by coding system mule-utf-8)
    display: terminal code 0xCC 0x88

The next characters in the file's name are '.', 't', 'x', 't', and 'C-j' at column 76. Next <right> brings cursor into next line. This buffer too has line like this:

  -rw-r--r--    1 pete   pete      10992 13 Mär 19:31 RefTeX-inst.txt

Here the cursor's right most position is one after 'txt' and the month's name 'Mär' is spelled like this:

  character: M (0115, 77, 0x4d, U+004D)
    charset: ascii (ASCII (ISO646 IRV))
 code point: 77
     syntax: w  which means: word
   category: a:ASCII   l:Latin
buffer code: 0x4D
  file code: 0x4D (encoded by coding system mule-utf-8)
    display: terminal code 0x4D

  character: ä (04344, 2276, 0x8e4, U+00E4)
    charset: latin-iso8859-1
(Right-Hand Part of Latin Alphabet 1 (ISO/IEC 8859-1): ISO-IR-100.)
 code point: 100
     syntax: w  which means: word
   category: l:Latin
buffer code: 0x81 0xE4
  file code: 0xC3 0xA4 (encoded by coding system mule-utf-8)
    display: terminal code 0xC3 0xA4

  character: r (0162, 114, 0x72, U+0072)
    charset: ascii (ASCII (ISO646 IRV))
 code point: 114
     syntax: w  which means: word
   category: a:ASCII   l:Latin
buffer code: 0x72
  file code: 0x72 (encoded by coding system mule-utf-8)
    display: terminal code 0x72

Incremental search for 'ä' lets me find the month Mär, but not as part of a file's name!

Here is some mule-diag:

        Multibyte characters awareness:
          default: t
          current-buffer: t
        
        Current language environment: German
        
        ########################################
        # Section 2.  Display
        ########################################
        
        Terminal: xterm-color
        
        Coding system of the terminal: utf-8


Anything more?

--
Greetings

  Pete





reply via email to

[Prev in Thread] Current Thread [Next in Thread]