I've just checked this page: ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ROMAN.TXT and found that only the following characters are not yet supported by Emacs. 0xD7 0x25CA # LOZENGE 0xDE 0xFB
Ok. As it seems that there's no objection, I'll soon install codes to support mule-unicode-2500-33ff and mule-unicode-e000-ffff, then send a patch for lisp/term/mac-win to Andrew so that he can test
I've just installed these changes for those new charsets. 2000-10-30 Kenichi Handa <address@hidden> * international/mule-conf.el: New charsets mule-unicode-2500-33ff and mule-unicode-e000-ffff. * int
I thought mac-roman font contains glyphs for those iso8859-1 characters (those decoded from mac-roman encoding). Isn't it correct? I found a bug in my code. It was the reason why those characters are
Right. By the way, Takahashi-san has just given me a very short UTF-8 encoder/decoder. It decode UTF-8 into ascii, latin-1, mule-unicode-0100-24ff, mule-unicode-2500-33ff, mule-unicode-e000-ffff. Wit
Sorry for not joining this important topic much earlier. But, we have font-lock (jit-lock), and a Lisp program is called while redisplaying. I think jit-lock/font-lock is very fast even if we run Ema
I agree with the idea of autoloading coding system or at least a translation table or CCL codes used in a coding system. Yes. Perhaps, coding-system-p is better because setup_coding_system is called
It's possible but not worth doing that because, then, the rest part is very small as I wrote below. Yes. And if we setup all information in char-coding-system-table, loading of coding systems is almo
Ah, yes. For instance, we can gradually shrink the set of coding systems to check. For Emacs 21.1, we already preload all the chinese-based coding systems. And, that preloading doesn't require much m
No. It uses less space, but, isn't it time-consuming to check if a specific character is included or not with such an format? No. I checked it for Big5, GB2312, and CNS. Less than 10% of chinese char
I've just read the code added by Dave (sorry for not doing that earliear). It seems that it doesn't have a major problem, but I found one problem related to handling unibyte case. If unify-8859-on-de
I may have written various ideas but please don't assume that they all works fine. :-p C-q (quoted-insert) assumes that a code in the range 0240..0377 is a code for single-byte charset, and convert i
Why? Using hex escapes means that we embed internal character codes. They will be changed in the future (i.e. in unicode-base Emacs). But, emacs-mule and iso-2022-7bit-no-trans are decoded correctly
I think show-trailing-whitespace should not be affected by syntax of characters because what it should concern is a glyph. If a glyph of a character is just space, even if the syntax is `symbol' or w
Sure. How about putting this change in the current emacs-unicode? Could you please improve the English text? ** startup.el.~1.290.~ Fri May 10 11:13:48 2002 -- startup.el Wed Jul 17 21:09:59 2002 **
It seem that this problem was already fixed. As I also found one unnecessary mule-unicode-0100-24ff char, I deleted it. At least, (find-charset-region 1 (point-max)) will give you some information. I
Oops... Then, it's a general problem of desktop. If a user define a new charset in a session, and there's a string containing that charset in kill-ring, he always fails in the next session while reco
The reported escape sequence is "ESC % G ... ESC % @" which is not the extended segments of CTEXT (described in the section 6 of the ctext document), but the special sequence for utf-8 (described in
Oops, I forgot to attach it. Here it is. -- Ken'ichi HANDA address@hidden 7. The UTF-8 encoding Unicode characters that are not contained in one of the approved standard encodings can be encoded usin
This is because Emacs received this byte sequence: ESC $ ( B ! H "ESC $ ( B" is a designation sequence for jisx0208, and the following two bytes "! H" specifies the above Japanese symbol. This is a p
I checked the contents of the html file itself and found this: „Die Familie Schroffenstein“ I thought that the notation &#NUMBER is for transmitting Unicode character of code NUMBER. But, 13
Ah, I see. I found that windows-125X maps 132 and 147 to U+201E and U+201C. So, perhaps those systems (galeon and lynx) parse them as U+201E and U+201C. Anyway, how to encode them in X selection is t
I'm very sorry for not reacting on this matter more quickly. I've been busy for the other matters. Could you please apply any necessary changes to the unicode branch? As I'm currently working on the
While working on emacs-unicode, I noticed a very difficult problem which also exists in the current emacs. (let ((case-fold-search nil)) (string-match "[Þ-ß]" "Þ")) => 0 (let ((case-fold-search ni
I agree. Ideally, the range "[A-_]" must be converted to "[a-z[-_]". But, it seems that your idea is to convert "[A-_]" to "[_-z]", correct? I agree that it results in less counter-intuitive behaviou
Of course, it's not impossible. It's just not easy. It makes sense only when we assume some character set (or locale). For instance, in Emacs 21, Cyrillic characters has the same code order as that o
People know the character codes that are based on their familiar charset. So, they can take advantage only when Emacs internally uses the character representation in which character code order is the
Why aren't you using that code? Does it mean that you changed some of them locally? I noticed those `fixme's. Yes, it is better to solve all of them, but, for the moment, I want to concentrate on fix
It defines two CCL codes to decode and encode utf-8 byte sequence, and makes the coding system mule-utf-8 by using those CCL codes. I'll attach the necessary change to enable RC's utf-8 to encode lat
Yes. On reading and writing iso-2022 files, Emacs-unicode may designate different charsets. I still can't find a time to fix it. But, is it a big problem? Even in Emacs 21, iso-2022 files may be chan
As Emacs-unicode unifies, for instance, character C1 of charset CS1 and character C2 of CS2. So, so even if an original iso-2022-7bit file uses the different byte sequence to represent them, when ema
Yes. But, that mode is on by default in RC too. How about adding this paragraph? See also the documentations of: `unify-8859-on-decoding-mode', `unify-8859-on-encoding-mode', `utf-8-fragment-on-decod
Not necessarily. If we put the text property `charset' (the value is a charset) to a text on decoding, and check it on encoding, we can preserve the same byte sequence. Putting that text property to
But, `diff' is another program. Emacs' ediff-files should work well. And, please note that even Emacs 21 don't preserve the same byte sequence for iso-2022-7bit file in rare case. This problem can ac
If you specify an XLFD field other than FOUNDRY, FAMILY, CHARSET_REGISTRY, CHARSET_ENCODING in each font name, Emacs tries to use the specified font name as is (i.e. the same font X server will find
Each charset has different reason. IPA: Some characters are not in Unicode. Korean: It contains Chinese characters too. But, by default, it is unified with Unicode. (unify-charset 'korean-ksc5601 nil
I noticed some combinations of unify-8859-on-encoding-mode, utf-8-fragment-on-decoding, and utf-8-translate-cjk doesn't work in HEAD. So, I made a fairly comprehensive testsuite for testing them (att
I tend to agree with getting rid of it. But, I have not yet considered that possibility deeply. If we are going to remove it, I think we should do that in 21.3. Introducing something in 21.3 and remo
Sorry for the late response. Ah, I see. You actually have courier iso10646-1 fonts. Hmmm, currently, Emacs tries at first the family specified by face, and if you have that font, Emacs uses it. Perha
I've just installed fixes in HEAD. Could people please test these customizalbe variables. unify-8859-on-encoding-mode unify-8859-on-decoding-mode utf-fragment-on-decoding utf-translate-cjk Then, why
It seems that the word "get rid of" is confusing. Emacs-unicode still keeps all latin-iso8859-X charsets. We can't get rid of them. Those charsets carry such information as how to map their code poin
(1) I'm working on fixing ps-print and ps-mule so that non-ASCII characters in a buffer name can also be converted to correct PostScript code. I think I can find time to finish it by the end of this
Yes, it's better for Emacs, but not for the other programs (e.g. ispell). And such Emacs Lisp applications that use wrong coding system should be fixed anyway. I was also thinking about that for emac
Sure. Though, I see this mail in linux-utf8 mailing list. But, I think it's not that difficult to fix this behaviour so that it breaks a line at an unencodable word. If that is possible (i.e. users a
It is useful for instance in this scenario. I don't remember a concrete example, but some package is doing this kind of thing. Read a file of weird encoding in a unibyte buffer. Parse the contents an
I agree that those names are not that intuitive, but the first two were there before I noticed it. :-p But, in what sense, the concepts are confusing? Please note that decode-coding-string also does
Being a little bit tired of the discussion in emacs-devel, I tried to fix this bug. At first sight, it seems easy ... but it's not ... of course. :-( That bug can be reproduced as below. (insert "abc
Ah, thank you! That explains the strange behaviour I noticed after I applied my patch, and is fixed by adjusting buf_charpos in increment_row_positions. But... Oops, I didn't know about string_buffer
Thank you. If you can't, perhaps no one can. :-) Ok, I've just installed that change. Let's see the effect. By the way, I found that current-column, move-to-column, etc. don't take display properties
Of course we can't reuse the existing glyph matrics, but, was it the conclusion that we can't use `struct it' without a glyph matrics? For instance, it seems to me that doing this is possible. (1) ca
I've got a personal mail from him. It seems that he is now extremely busy, and can't dug out the clear permission documentation. He at least found the attached mail. As the mail date is three year ag
That's a very good news. Thank you for taking time for it. I checked those fonts. As I wrote, it seems that it's not difficult to use those fonts. With emacs-unicode, as I have implemented auto-compo
Ah! Sure. I forgot that I set scalable-fonts-allowed to "-morisawa-.*". So, the font selector doesn't fallback to the other scalable fonts. Hmmm, even in that case, the font selector should try all t
I've just installed that change in HEAD. -- Ken'ichi HANDA address@hidden 2003-02-19 Kenichi Handa <address@hidden> * xfaces.c (try_alternative_families): Try all scalable fonts if Vscalable_fonts_al
I think you can simply do this: (if (featurep 'fontset) (set-fontset-font "fontset-default" (cons (decode-char 'ucs #x0D00) (decode-char 'ucs #x0D7f)) (cons "misc-malayalam" "iso10646-1"))) There are
I see. Apart from the correctness of that non-standard way, if the Unicode standard requires that such a sequence should be rendered correctly, Emacs should support it. Do you know what Unicode requi
As for memory, such optimization may be worth considering except for CJK users, but as for speed, not that much. And in emacs-unicode, it gets worse. And, memory is not a big problem nowadays. On the
Sorry for the late reply on this matter. I have no idea, I know nothing about Macintosh. But as the coding system mac-roman doesn't have `mime-charset' property, I think Gnus should prefer some other
Those characters belongs to the charset japanese-jisx0208, and the current Emacs still can't encode them into UTF-8. How did you get such characters? -- Ken'ichi HANDA address@hidden
From which part of manual, did you get that impression? It doesn't work. Please follow what described in the "Difining Fontsets" node of Emacs info. -- Ken'ichi HANDA address@hidden
Unfortunately, the current Emacs doesn't have a facility to detect UTF-8 byte sequence. So, if we put UTF-8 the higher priority, all files are detected as UTF-8. :-( The UTF-8 support was surely impr
I fully agree with that idea. [...] We connect charsets to font registries vis fontset. And in the emacs-unicode version, we have enhanced it so that we can connect scripts, charsets, range of charac
[...] Private Use Area in U+E000..U+F8FF are supported. Perhaps, it is better to mention utf-translate-cjk mode as this. * Encoding some characters as Unicode (UTF-8) is rejected by Emacs. Emacs curr
Unfortunately, a fontset is not a variable, thus can't be customized easily. Another way to modify a fontset is to do something like this in .emacs. (set-fontset-font "fontset-default" 'mule-unicode-
Dave has compiled the current problems in the file emacs-unicode/README.unicode. Some of them (especially serious ones) are already fixed. Dave, do you have anything else to add to that file? I think
It is documented as the docstring of set-fontset-font (as attached at the tail). Internally, a fontset is implemented by a char-table of a special format. I don't know about "the specifiers interface
As far as I remember, the redisplay problem is because of a bug of the original display routine which is already fixed in HEAD, and thus, once emacs-unicode is merged with HEAD, the problem will disa
I see no problem with such a patch as far as there's no license problem. As UnicodeData.txt is less than 1M-byte, the above methos will be ok, but Unihan.dat is about 26M-byte which, I think, is too
The current Emacs still don't unify Unicode and the other legacy charsets (e.g. iso-8859-2, jisx0208, gb2312) automatically. So, for instance, if iso-8859-2 characters arrive at Emacs with UTF8_STRIN
It's surely Emacs' problem that the same iso-8859-2 character is represented in two ways internally. But, incomplete support of COMPOUND_TEXT is GTK's (or some other X client's) problem. As far as th
How about this? Please note that I don't know if this is the responsibility of GTK, nor of the underling GDK, nor of each client program. Could you send it (after polishing English) to a proper perso
XmbTextPropertyToTextList can handle only such compound-text that contains characters supported in the current X locale. So, in your way, if you are in GBK locale, Emacs can't receive, for instance,
The reason why you can't input Hangul characters is that your input method doesn't support it or it generates data only in chiense-gbk encoding which can't contain Hangul characters. Provided that yo
Are there any other big changes scheduled? If not, I'll start the work of synchronizing emacs-unicode to HEAD next week. -- Ken'ichi HANDA address@hidden
No. For the moment, I'll just modify emacs-unicode branch so that it become easier to actually integrate emacs-unicode into HEAD in the near feature. -- Ken'ichi HANDA address@hidden
I've found that at least we must modify iso-2022 decoder so that it retains ctext extended segements. Otherwise, we can't handle this kind of sequence: ESC $ A --GB-SEQ-- ESC % / --EX-SEGENT-- --GB-S
The ctext coding system at least should not treat extended segments as a part of "Standard Character Set Encodings". So, I commited a change to coding.c. But, ctext itself doesn't have to support it,
Currently, even if we customize utf-fragment-on-decoding to t, iso-8859-2 chars encoded in utf-8 can't be decoded into latin-iso8859-2 charset because utf-fragmentation-table contain only Greek and C
At least composition property should not be stripped until I install on-the-fly-composition feature in HEAD (emacs-unicode already has that). I think display property should be retained too. -- Ken'i
I'm now working to synchronize the emacs-unicode branch to the latest HEAD code. As there were many cosmetic and restructuring changes in HEAD, it's not easy to adjust the code of emacs-unicode. I th
Synchronizing emacs-unicode to HEAD will take a few weeks more. After that, I'll take a look at Eli's code and decide what to do. In any cases, I will install the changes for BIDI in the branch emacs
To reflect the changes in HEAD to the Unicode version of emacs, I made a branch emacs-unicode-2. When I finish the work, emacs-unicode branch will be an obsolete branch. -- Ken'ichi HANDA address@hid
About a month ago, I made a branch emacs-unicode-2 from HEAD and started to work on synchronizing codes of emacs-unicode branch to HEAD in that new branch. I've just finished the work and committed t
Oops, as I've been testing emacs-unicode on GNU/Linux which defines GC_MARK_STACK to GC_MAKE_GCPROS_NOOPS, I have not paid match attention to GC cleanness. I think we can test if c_functions is GC cl
Almost all changes are about character and fontset handling; charset.[ch], coding.[ch], fontset.c are mostly re-written, character.[ch] and chartab.c are newly created. So, if your change depends on
No. I'm now working on improving it. Please wait for a while. I think `' should not be displayed by U+2018 and U+2019. Unicode defines them not as balanced quotes. Using them as balanced quotes is ab
The first three letters are "FULL WIDTH LATIN ?? LETTER" (U+FF??). Yes, they are representable in utf-8. But, in subst-jis.el, we have this code: (mapc (lambda (pair) (let ((unicode (car pair)) (char
Sorry, I don't understand the meaning of the last sentence. Ah, right, they have double-width glyphs for those chars. But, I think there are still many those who are not using the recent XFree86, or
No. In emacs-unicode, we can assign multiple fonts for each script, charset, or a range of character codes, and Emacs selects one that has a requested glyph and has the highest priority depending on
Yes. I see. "using utf-8" means many things. If they are using utf-8 locale, I think they surely have those fonts. But, as far as I know, ja_JP.UTF-8 is still not that popular in Japan. And, even in
It's not easy to see the effect of preloading code-pages from this output. But, when I put this in loadup.el (load "international/code-pages") the resulting Emacs is 585K-byte bigger. I think we can'
I'm sorry for the late response on this thread. I at first want to clarify these things: (1) windows-1252 This is actually not a charset but a coding system in Emacs. When Emacs reads a file by this
Yes, emacs-unicode still can read/write mule-unicode charsets. Actually it supports all current charsets, and the coding system emacs-mule is also still supported. -- Ken'ichi HANDA address@hidden
Sorry, mine was a typo, ESC $ - 1 is correct because mule-unicode-0100-24ff is treated as a 96x96 charset. [...] That's a good idea. I'll work on it. The latest emacs-unicode is available in the "ema
I think so. Currently we encode mule-unicode-0100-24ff by ESC $ - 1 ... which is also an invalid code, and only Emacs can decode it. If we use UTF-8 encoding, more clients can decode it. Emacs decode