bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not byte

bug-guile

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not byte

From:	Ludovic Courtès
Subject:	bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes
Date:	Sat, 05 Nov 2022 23:18:26 +0100
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/28.1 (gnu/linux)

Hi,

Rob Browning <rlb@defaultvalue.org> skribis:

> Noticed while investigating a migration to utf-8 strings.  After making
> changes that routed non-ascii symbol hashing through this function,
> encoding-iso88597.test began intermittently failing because it would
> traverse trailing garbage when u8_strnlen reported 8 chars instead of 4.
>
> Change the scm_i_str2symbol internal hash type to unsigned long to
> explicitly match the hashing result type.

Oh, good catch.

For the final patch please add a ChangeLog-style entry.

> +  // Make sure a utf-8 symbol has the expected hash.  In addition to
> +  // catching algorithmic regressions, this would have caught a
> +  // long-standing buffer overflow.
> +
> +  // περί
> +  char about_u8[] = {0xce, 0xa0, 0xce, 0xb5, 0xcf, 0x81, 0xce, 0xaf, 0};
> +  SCM sym = scm_from_utf8_symbol (about_u8);
> +
> +  const unsigned long expect = 4029223418961680680;
> +  const unsigned long actual = scm_to_ulong (scm_symbol_hash (sym));

Is this a documented example of Jenkins?  Or did you use a reference
implementation?

> Hmm.  I suppose the current test could be handled on the scheme side
> instead.  (I'd started off attempting some more direct, elaborate tests
> that didn't pan out.)  Happy to rework that if desired.

Yes, it may be nicer to have it in ‘test-suite/tests/hash.test’.

AFAICS this will only change the hash of UTF-8 symbols and won’t have
any effect on the output of ‘string-hash’, right?  If not that would be
an incompatibility.

Thanks and sorry for the delay!

Ludo’.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes, Ludovic Courtès <=
- bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes, Rob Browning, 2022/11/06
  - bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes, Rob Browning, 2022/11/06
  - bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes, Ludovic Courtès, 2022/11/07
- bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes, Rob Browning, 2022/11/06
  - bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes, Ludovic Courtès, 2022/11/07
  - bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes, Rob Browning, 2022/11/08
    - bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes, Ludovic Courtès, 2022/11/08

Prev by Date: bug#59021: Unbounded heap growth when combining dynamic states & delimited continuation
Next by Date: bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes
Previous by thread: bug#59055: [PATCH] Fix possible deadlock.
Next by thread: bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes
Index(es):
- Date
- Thread