guile-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Guile-commits] 02/02: scm_i_utf8_string_hash: don't overrun when len is


From: Rob Browning
Subject: [Guile-commits] 02/02: scm_i_utf8_string_hash: don't overrun when len is zero
Date: Wed, 11 Dec 2024 12:49:53 -0500 (EST)

rlb pushed a commit to branch main
in repository guile.

commit 35f13806af653ef9ed656708dddcd1d2c8f8da9e
Author: Rob Browning <rlb@defaultvalue.org>
AuthorDate: Sun Jun 30 22:41:40 2024 -0500

    scm_i_utf8_string_hash: don't overrun when len is zero
    
    When the length is zero, the previous code would include the byte after
    the end of the string in the hash.  Fix that (the wide and narrow
    hashers also guard against it via "case 0"), and don't bother mutating
    length for the trailing bytes.
    
    Since we already compute the char length, use that to detect all ASCII
    strings and follow the same narrow string path that we do for latin-1.
    
    libguile/hash.c (scm_i_utf8_string_hash): avoid overrun when len == 0.
---
 libguile/hash.c | 30 ++++++++++++++++--------------
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/libguile/hash.c b/libguile/hash.c
index ba2a1207d..b7ad03309 100644
--- a/libguile/hash.c
+++ b/libguile/hash.c
@@ -169,25 +169,26 @@ scm_i_latin1_string_hash (const char *str, size_t len)
 unsigned long 
 scm_i_utf8_string_hash (const char *str, size_t len)
 {
-  const uint8_t *end, *ustr = (const uint8_t *) str;
-  unsigned long ret;
-
-  /* The length of the string in characters.  This name corresponds to
-     Jenkins' original name.  */
-  size_t length;
-
-  uint32_t a, b, c, u32;
-
   if (len == (size_t) -1)
     len = strlen (str);
 
-  end = ustr + len;
+  // FIXME: eventually make fewer passes over str
 
+  const uint8_t *ustr = (const uint8_t *) str;
   if (u8_check (ustr, len) != NULL)
     /* Invalid UTF-8; punt.  */
     return scm_i_string_hash (scm_from_utf8_stringn (str, len));
 
-  length = u8_mbsnlen (ustr, len);
+  /* The length of the string in characters.  This name corresponds to
+     Jenkins' original name.  */
+  size_t length = u8_mbsnlen (ustr, len);
+
+  if (len == length) // ascii, same as narrow_string_hash above
+    return narrow_string_hash ((uint8_t *) str, len);
+
+  const uint8_t * const end = ustr + len;
+  uint32_t a, b, c, u32;
+  unsigned long ret;
 
   /* Set up the internal state.  */
   a = b = c = 0xdeadbeef + ((uint32_t)(length<<2)) + 47;
@@ -205,14 +206,15 @@ scm_i_utf8_string_hash (const char *str, size_t len)
       length -= 3;
     }
 
-  /* Handle the last 3 elements's.  */
+  // Similar to narrow_string_hash().  Handle the last 3 chars; length
+  // cannot be zero because len != length above.
   ustr += u8_mbtouc (&u32, ustr, end - ustr);
   a += u32;
-  if (--length)
+  if (length > 1)
     {
       ustr += u8_mbtouc (&u32, ustr, end - ustr);
       b += u32;
-      if (--length)
+      if (length > 2)
         {
           ustr += u8_mbtouc (&u32, ustr, end - ustr);
           c += u32;



reply via email to

[Prev in Thread] Current Thread [Next in Thread]