gm2
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Inconsistent HIGH values with 'ARRAY OF CHAR'


From: Benjamin Kowarsch
Subject: Re: Inconsistent HIGH values with 'ARRAY OF CHAR'
Date: Thu, 11 May 2023 21:13:41 +0900


On Thu, 11 May 2023 at 18:40, Rudolf Schubert wrote:

        I find handling of C-strings somehow 'difficult' or even ugly
        because of this terminating 0C. The idea of terminating a string
        (or as we say an ARRAY OF CHAR) with a 0C immediately shows 2
        disadvantages:

        1. we do not know the length of the string until we search the
        terminating 0C. Wouldn't it be much simpler just to have another
        variable which tells us the length?
        2. with 0C having this special meaning, 0C itself can never be
        part of our ARRAY. But sometimes this might be useful!

        But of course we now must live with this situation. So if we
        want to deal with 'real' C-strings we need this terminating 0C.
        Either as a 'real' CAHR which we put somewhere into memory or
        as some 'virtual' thing which we 'imagine' would be located
        beyond the very end of our ARRAY OF CHAR.

        And thus at different occasions we have no other chance than
        generating this 0C and putting it at the end of the ARRAY.

We have solved this problem in M2 R10 by looking back at Pascal for inspiration.

Our arrays, not only character arrays, but all arrays are always represented
internally by a record with a field that contains the element count, or for
character arrays, that would be the string length, and the actual payload,
that is the actual array of values, characters in the case of a character array.

Thus, a declaration of the form

TYPE String = ARRAY 80 OF CHAR;

will internally create a record of the form

TYPE String  = RECORD
  length : LONGCARD;
  data : ARRAY 80 OF CHAR
END; (* String *)

A pervasive function LENGTH() then returns the value of the length field,
and a pervasive function CAPACITY, returns the allocated maximum.

This way, you do not have to scan the content of the string for any terminator
in order to determine the length of the string.

Nevertheless, we still add an ASCII-NUL terminator at the end of each char
array in order to make interfacing with C safe. If anybody was to use the
character array from code written in C, and pass that character array to
any C function that expects a char pointer or a char array, it would be
safe as long as the function respects the NUL terminator convention.

As a nice side effect, we found that we could relax the type regime and
make all arrays follow what we call value type equivalence, which means
any two arrays are copy and passing compatible as long as their value
types are compatible.

And as a result of this, it is possible to do array slicing such as

arrayA := arrayB[n..m];

and array concatenation such as

arrayA := arrayB & arrayC;

as long as the value types of the arrays are compatible.

And this then encouraged us to take it one step further and invent
syntax for insertion

arrayA[n..] := { value1, value2, value3, ... };

Of course none of this will do you any good when using PIM or ISO Modula-2.

HOWEVER, there is nothing that stops you from implementing a string library
that uses the same record representation we do internally. It would just be
in the open and user defined.

TYPE MyString80 = RECORD
  length : CARDINAL;
  data : ARRAY 80 OF CHAR
END; (* MyString80 *)

You could either do this as a statically allocated type, or you could do it as
a dynamically allocated and possibly opaque type.

I think I had already posted a link for you to a string library I had written in
classic Modula-2 with both a version for PIM and another for ISO.

https://github.com/m2sf/m2bsk/blob/master/src/lib/String.def
https://github.com/m2sf/m2bsk/blob/master/src/lib/imp/String.pim.mod
https://github.com/m2sf/m2bsk/blob/master/src/lib/imp/String.iso.mod

This library not only implements variable length dynamically allocated strings
but also maintains a dictionary of key/value pairs where the keys are the strings,
and the values are pointers to the record that stores the string with length and
data field. This is called interned strings, as strings are stored only once and
a lookup for a string will always return the same pointer, so it has the additional
benefit of being able to compare two strings by simple pointer comparison, thus

IF str1 = str2 THEN ...

You may not need nor want the interning of strings, but as it seems you are
struggling with string handling, why not just use the library? And you can of
course modify it and remove the dictionary and interning component from it.

regards
benjamin

PS for Gaius: Maybe it is time for you to take another look at the completed
specification and implement it as an additional dialect in GM2 ;-)

https://github.com/m2sf/m2c/wiki/Language-Specification

I may be wrong, but it seems to me that you can work on GM2 on university
paid time which would be a serious advantage over me, I can only work very
sporadically on my compiler. I am not even able to work in IT anymore these
days and get paid fairly well for a contract to then take a break for a few months
to work on something I like doing like my compiler. I am "too old" to be recruitable
even as a contractor now. Instead, I run a small artisan bakery, which takes up
quite a bit of my time and doesn't leave much for private interests.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]