bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RFC: add a string-desc module


From: Bruno Haible
Subject: RFC: add a string-desc module
Date: Fri, 24 Mar 2023 22:50:17 +0100

In most application areas, it is not a problem if strings cannot contain NUL
bytes, and thus the C type 'char *' with its NUL terminator is well usable.

In areas where strings with embedded NUL bytes need to be handled, the common
approach is to use a 'char * data' pointer together with a 'size_t nbytes'
size. This works fine in code that constructs or manipulates strings with
embedded NUL bytes. But when it comes to *storing* them, for example in an
array or as key or value of a hash table, one needs a type that combines these
two fields:

  struct
  {
    size_t nbytes;
    char * data;
  }

I propose to add a module that adds such a type, together with elementary
functions that work on them.

Such a type was long known as a "string descriptor" in VMS. It's also known
as basic_string_view<char> in C++, or as String in Java.

The type that I'm proposing does not have NUL byte appended to the data
always and automatically, because I think it is more important to have a
string_desc_substring function that does not cause memory allocation,
than to have string_desc_c function (conversion to 'char *') that does
not cause memory allocation.

The type that I'm proposing does not have two distinct fields
nbytes_used and nbytes_allocated. Such a type, e.g. [1] attempts to
cover the use-case of accumulating a string as well. But
  - The Java experience with String vs. StringBuffer/StringBuilder
    shows that it is cleaner to separate the two use cases.
  - For the use-case of accumulating a string, C programmers have been using
    ad-hoc code with n_used and n_allocated for a long time; there is
    no need for anything else (except for lazy people who want C to be
    a scripting language).

The type that I'm proposing also does not have fields for heap management,
such as a 'bool heap' [2] or a reference count. That's because I think that
  - managing the allocated memory of a data structure is a different
    problem than that of representing a string, and it can be achieved
    with data outside the string descriptor,
  - Such a field would make it wrong to simply assign a string descriptor
    to a variable.

Please let me know what you think: Does this have a place in Gnulib? (Or
should it stay in GNU gettext, where I need it for the Perl parser?)

Bruno

[1] https://github.com/websnarf/bstrlib/blob/master/bstrlib.txt
[2] https://github.com/maxim2266/str

Attachment: string-desc.h
Description: Text Data

Attachment: string-desc.c
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]