bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#55777: [PATCH] Improve documentation of `string-to-multibyte', `stri


From: Richard Hansen
Subject: bug#55777: [PATCH] Improve documentation of `string-to-multibyte', `string-to-unibyte'
Date: Sun, 5 Jun 2022 22:00:35 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1

On 6/5/22 01:37, Eli Zaretskii wrote:
Could you please state what is confusing in the current wording?

  * "Raw 8-bit bytes" isn't really defined. It's mentioned earlier in
    the chapter -- the term is even in a @dfn{} -- but there's no
    definition there.

  * The term "raw 8-bit bytes" is misleading. It suggests binary data
    (bytes with values 0-255) but it's actually meant to only cover
    128-255.

  * The term "raw 8-bit bytes" is not used consistently. Sometimes "8"
    is spelled out as "eight", sometimes "raw" comes after "8-bit",
    and sometimes it refers to all byte values 0-255 (see the first
    sentence under `@cindex unibyte text`).

  * It's not clear whether "raw 8-bit bytes" is meant to refer to
    bytes with values 128-255, or to the *characters* that map to
    those byte values.

  * The following phrasing is weird: "The function assumes that
    @var{string} includes ASCII characters and raw 8-bit bytes". The
    purpose of "raw 8-bit bytes" is to cover non-ASCII byte values, so
    by definition that assumption is always true. By saying "the
    function assumes", the reader is left wondering about the cases
    where that assumption is not true, which in turn causes the reader
    to question whether "raw 8-bit bytes" fully covers non-ASCII byte
    values, which in turn causes the reader to wonder how to handle
    those non-covered values (whatever they are).

    Maybe something like this:

        By definition, unibyte strings contain only @acronym{ASCII}
        characters (bytes with values 0-127) and raw 8-bit bytes
        (bytes with values 128-255); the latter are converted to their
        corresponding multibyte representations in the
        @code{eight-bit} character set (@pxref{Text Representations,
        codepoints}).





reply via email to

[Prev in Thread] Current Thread [Next in Thread]