[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: string types

From: Bruno Haible
Subject: Re: string types
Date: Fri, 27 Dec 2019 11:51:18 +0100
User-agent: KMail/5.1.3 (Linux/4.4.0-170-generic; KDE/5.18.0; x86_64; ; )

Aga wrote:
> I do not know if
> you can (or if it is possible, how it can be done), extract with a way a 
> specific
> a functionality from gnulib, with the absolute necessary code and only that.

gnulib-tool does this. With its --avoid option, the developer can even customize
their notion of "absolutely necessary".

> In a myriad of codebases a string type is implemented at least as:
>   size_t mem_size;
>   size_t num_bytes;
>   char *bytes;

This is actually a string-buffer type. A string type does not need two size_t
members. Long-term experience has shown that using different types for string
and string-buffer is a win, because
  - a string can be put in a read-only virtual memory area, thus enforcing
    immutability (-> reducing multithread problems),
  - providing primitives for string allocation reduces the amount of buffer
    overflow bugs that otherwise occur in this area. [1]

Unfortunately, the common string type in C is 'char *' with NUL termination,
and a different type is hard to establish
  - because developers already know how to use 'char *',
  - because existing functions like printf consume 'char *' strings.
  - Few programs have had the need to correctly handles strings with embedded

> An extended ustring (unicode|utf8) type can include information for its bytes 
> with
> character semantics, like:
>  (utf8 typedef'ed as signed int)
>   utf8 code;   // the integer representation
>   int len;     // the number of the needed bytes
>   int width;   // the number of the occupied cells
>   char buf[5]; // and probably the character representation

Such a type would have a niche use, IMO, because
  - 99% of the processing would not need to access the width (screen columns) - 
    why spend CPU time and RAM to store it and keep it up-to-date?
  - 80% of the processing does not care about the Unicode code points either,
    and libraries like libunistring can do the Unicode-aware processing.

> But the programmer mind would be probably best
> if could concentrate to how to express the thought (with whatever meaning of 
> what we
> are calling "thought") and follow this flow, or if could concentrate the 
> energy to
> understand the intentions (while reading) of the code (instead of wasting 
> self with
> the "details" of the code) and finally to the actual algorithm (usually 
> conditions
> that can or can't be met).

That is the idea behind the container types (list, map) in gnulib. However, I 
see how to reasonably transpose this principle to string types.


[1] https://lists.gnu.org/archive/html/bug-gnulib/2019-09/msg00031.html

reply via email to

[Prev in Thread] Current Thread [Next in Thread]