[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#28179: Fwd: Re: bug#28179: Fix use of string-to-multibyte in ispell.
From: |
Eli Zaretskii |
Subject: |
bug#28179: Fwd: Re: bug#28179: Fix use of string-to-multibyte in ispell.el |
Date: |
Thu, 24 Aug 2017 21:20:46 +0300 |
> Cc: 28179@debbugs.gnu.org
> From: Reuben Thomas <rrt@sc3d.org>
> Date: Thu, 24 Aug 2017 18:45:33 +0100
>
> The reason I am asking again is because you first said:
>
> > What if decode-coding-string returns a pure ASCII string, which is
> > therefore unibyte?
>
> and then later you said:
>
> > The way I meant it, it has to do with the internal flag marking a
> > string either unibyte or multibyte. Observe:
> > (multibyte-string-p "abcd") => nil
> >
> > but
> >
> > (multibyte-string-p (decode-coding-string "abcd" 'utf-8)) => t
That example may be conclusive for UTF-8, but is it conclusive for
_any_ encoding? I don't know. E.g., what about the ISO-2022 based
encodings, where all the bytes are (AFAIR) pure ASCII?
> 1. As far as I can tell from the above (and my own confirmatory
> experiments and reading of the documentation), a pure ASCII string can
> be multibyte (it's a matter of the multibyte flag, not the number of
> bytes used to store each character).
>
> 2. decode-coding-string always returns a multibyte string.
Can you show me why 2 is always correct? It might be, I simply don't
know. All I know is that in general relying on plain-ASCII strings to
be always multibyte in any given situation is risky, we were bitten by
that a few times. But maybe it's not an issue in this case. Which is
why I was asking you whether you have sufficient basis to believe this
to be so in this case.
> Since these two observations seemed to mean that you contradicted
> yourself, I was checking whether in fact I had misunderstood (so that
> for example one of my two observations above is wrong), or if your
> original understanding was incomplete (so that in fact your question
> about decode-coding-string is therefore misguided, because it can return
> a pure ASCII unibyte string (in the coding sense) which is nonetheless a
> multibyte string (in the sense that multibyte-string-p on it returns t).
I only used decode-coding-string because I remembered it as an easy
way of creating a multibyte ASCII string, when the coding-system is
UTF-8, that's all. There was no contradiction in what I said, at
least not an intended one.