[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] add 'string-distance' to calculate Levenshtein distance
From: |
Eli Zaretskii |
Subject: |
Re: [PATCH] add 'string-distance' to calculate Levenshtein distance |
Date: |
Sat, 14 Apr 2018 16:24:41 +0300 |
[Please CC the mailing list when you respond, so others could see your
messages.]
> From: Chen Bin <address@hidden>
> Date: Sat, 14 Apr 2018 21:01:46 +1000
>
> Hi, Eli,
> Thanks for the review.
>
> I fixed most issues except two things.
>
> 1. In Asia, it's possible to cacluate distance between one unibyte and
> one multibyte string. As a Chinese, I might create a document containing
> Hanzi characters whose file name is obviously multibyte string. I may
> need get the distance of this document to a file named "README.txt".
If you mean unibyte pure-ASCII strings, then I agree. But that
doesn't mean we should avoid the text altogether, because we might
compute non-zero distance between a string and its encoded unibyte
variant, which will confuse users. At the very least the doc string
should say something about this.
> 2. Algorithm is based on
> https://en.wikibooks.org/w/index.php?title=Algorithm_Implementation/Strings/Levenshtein_distance&stable=0#C
> It's optimized to use O(min(m,n)) space instead of O(mn).
> Say you compare two string whose string length is 512 bytes.
> You only need allocate 512 bytes instead of 262K (512*512)
> in memory.
>
> Please check attached patch for latest code.
>
> --- a/etc/NEWS
> +++ b/etc/NEWS
> @@ -463,6 +463,8 @@ x-lost-selection-hooks, x-sent-selection-hooks
> +++
> ** New function assoc-delete-all.
>
> +** New function string-distance.
This should mention Levenshtein distance.
> +DEFUN ("string-distance", Fstring_distance, Sstring_distance, 2, 2, 0,
> + doc: /* Return Levenshtein distance of STRING1 and STRING2.
^^^^^^^^^^^^^^^^^^^^^^
"between STRING1 and STRING2"
> + unsigned int s1len, s2len, x, y, lastdiag, olddiag;
These variables should be declared EMACS_INT, not unsigned int,
because the size of Emacs strings can be larger than UINT_MAX,
especially on 64-bit systems.
> + unsigned int *column = SAFE_ALLOCA ((s1len + 1) * sizeof (unsigned int));
Likewise here.
> + char *s1 = SSDATA (string1);
> + char *s2 = SSDATA (string2);
> +
> + unsigned int s1len, s2len, x, y, lastdiag, olddiag;
> + s1len = strlen(s1);
> + s2len = strlen(s2);
You could optimize the code by using SCHARS and SBYTES, instead of
calling strlen.
> +(ert-deftest subr-tests--string-distance ()
> + "Test `string-distance' behavior."
> + (should (equal 1 (string-distance "heelo" "hello")))
> + (should (equal 2 (string-distance "aeelo" "hello")))
> + (should (equal 0 (string-distance "ab" "ab")))
> + (should (equal 1 (string-distance "ab" "abc"))))
Could you please add a test or two with non-ASCII characters?
Thanks.
- [PATCH] add 'string-distance' to calculate Levenshtein distance, Chen Bin, 2018/04/13
- Re: [PATCH] add 'string-distance' to calculate Levenshtein distance, Eli Zaretskii, 2018/04/14
- Message not available
- Re: [PATCH] add 'string-distance' to calculate Levenshtein distance,
Eli Zaretskii <=
- Re: [PATCH] add 'string-distance' to calculate Levenshtein distance, Chen Bin, 2018/04/14
- Re: [PATCH] add 'string-distance' to calculate Levenshtein distance, Eli Zaretskii, 2018/04/14
- Re: [PATCH] add 'string-distance' to calculate Levenshtein distance, Chen Bin, 2018/04/15
- Re: [PATCH] add 'string-distance' to calculate Levenshtein distance, Eli Zaretskii, 2018/04/15
- Message not available
- Message not available
- Re: [PATCH] add 'string-distance' to calculate Levenshtein distance, chen bin, 2018/04/16
- Re: [PATCH] add 'string-distance' to calculate Levenshtein distance, Eli Zaretskii, 2018/04/17
- Re: [PATCH] add 'string-distance' to calculate Levenshtein distance, chen bin, 2018/04/18
- Message not available
- Message not available
- Message not available
- Message not available
- Re: [PATCH] add 'string-distance' to calculate Levenshtein distance, chen bin, 2018/04/17
- Re: [PATCH] add 'string-distance' to calculate Levenshtein distance, Eli Zaretskii, 2018/04/19
- Re: [PATCH] add 'string-distance' to calculate Levenshtein distance, chen bin, 2018/04/19