Re: Character literals for Unicode (control) characters

From:

Philipp Stephani

Subject:

Date:

Sun, 06 Mar 2016 19:16:37 +0000

Paul Eggert <address@hidden> schrieb am So., 6. März 2016 um 20:03 Uhr:

Philipp Stephani wrote:
> Initially I used ucs-names, but the decided against it because it lacks
> most characters.

Can you describe in general terms the difference between what's in ucs-names and
what's in the new hash table? Should the two things be unified?

ucs-names uses a whitelist of ranges to consider:

'((#x0000 . #x33FF)

;; (#x3400 . #x4DBF) CJK Ideographs Extension A

(#x4DC0 . #x4DFF)

;; (#x4E00 . #x9FFF) CJK Unified Ideographs

(#xA000 . #xD7FF)

;; (#xD800 . #xFAFF) Surrogate/Private

(#xFB00 . #x134FF)

;; (#x13500 . #x167FF) unused

(#x16800 . #x16A3F)

;; (#x16A40 . #x1AFFF) unused

(#x1B000 . #x1B0FF)

;; (#x1B100 . #x1CFFF) unused

(#x1D000 . #x1FFFF)

;; (#x20000 . #xDFFFF) CJK Ideograph Extension A, B, etc, unused

(#xE0000 . #xE01FF))

This is probably for practical purposes (no point in showing thousands of "CJK UNIFIED IDEOGRAPH-xyz" completions). For a character escape these considerations don't apply, and it would be very surprising and confusing to not accept all characters.