emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] add 'string-distance' to calculate Levenshtein distance


From: Eli Zaretskii
Subject: Re: [PATCH] add 'string-distance' to calculate Levenshtein distance
Date: Sat, 14 Apr 2018 20:08:51 +0300

> From: Chen Bin <address@hidden>
> Cc: address@hidden
> Date: Sun, 15 Apr 2018 02:40:18 +1000
> 
> Correct me if I'm wrong.
> 
> I read cod eand found definion of Lisp_String:
>   struct GCALIGNED Lisp_String
>   {
>     ptrdiff_t size;
>     ptrdiff_t size_byte;
>     INTERVAL intervals;               /* Text properties in this string.  */
>     unsigned char *data;
>   };
> 
> I understand string text is encoded in UTF8 format and is stored in
> 'Lisp_String::data'. There is actually no difference between unibyte
> and multibyte text since UTF8 is compatible with ASCII and we only deal
> with 'data' field.

No, that's incorrect.  The difference does exist, it just all but
disappear for unibyte strings encoded in UTF-8.  But if you encode a
string in some other encoding, like Latin-1, you will see a very
different stream of bytes.

> I attached the latest patch.

Thanks.

> +  ;; string containing unicode character (Hanzi)
> +  (should (equal 6 (string-distance "ab" "ab我她")))
> +  (should (equal 3 (string-distance "我" "她"))))

Should the distance be measured in bytes or in characters?  I think
it's the latter, in which case the implementation should work in
characters, not bytes.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]