emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: master e39cb515a10 1/4: Correctly handle non-BMP characters in Andro


From: Eli Zaretskii
Subject: Re: master e39cb515a10 1/4: Correctly handle non-BMP characters in Android content file names
Date: Sat, 23 Mar 2024 12:24:03 +0200

> diff --git a/lisp/term/android-win.el b/lisp/term/android-win.el
> index 8d262e5da98..6512ef81ff7 100644
> --- a/lisp/term/android-win.el
> +++ b/lisp/term/android-win.el
> @@ -529,5 +529,94 @@ accessible to other programs."
>    (android-browse-url-internal url send))
>  
>  
> +;; Coding systems used by androidvfs.c.
> +
> +(define-ccl-program android-encode-jni
> +  `(2 ((loop
> +     (read r0)
> +     (if (r0 < #x1) ; 0x0 is encoded specially in JNI environments.
> +         ((write #xc0)
> +          (write #x80))
> +       ((if (r0 < #x80) ; ASCII
> +            ((write r0))
> +          (if (r0 < #x800) ; \u0080 - \u07ff
> +              ((write ((r0 >> 6) | #xC0))
> +               (write ((r0 & #x3F) | #x80)))
> +            ;; \u0800 - \uFFFF
> +            (if (r0 < #x10000)
> +                ((write ((r0 >> 12) | #xE0))
> +                 (write (((r0 >> 6) & #x3F) | #x80))
> +                 (write ((r0 & #x3F) | #x80)))
> +              ;; Supplementary characters must be converted into
> +              ;; surrogate pairs before encoding.
> +              (;; High surrogate
> +               (r1 = ((((r0 - #x10000) >> 10) & #x3ff) + #xD800))
> +               ;; Low surrogate.
> +               (r2 = (((r0 - #x10000) & #x3ff) + #xDC00))
> +               ;; Write both surrogate characters.
> +               (write ((r1 >> 12) | #xE0))
> +               (write (((r1 >> 6) & #x3F) | #x80))
> +               (write ((r1 & #x3F) | #x80))
> +               (write ((r2 >> 12) | #xE0))
> +               (write (((r2 >> 6) & #x3F) | #x80))
> +               (write ((r2 & #x3F) | #x80))))))))
> +     (repeat))))
> +  "Encode characters from the input buffer for Java virtual machines.")

AFAIU, this is because Java uses UTF-16 encoded strings to support
Unicode, is that right?  If so, why not use encode-coding and
decode-coding to en/decode between UTF-16 and the internal
representation?  AFAIR, we want to deprecate CCL, and thus using it in
new code should be avoided.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]