[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Thu, 1 Mar 2001 12:19:20 +0100 (MET)
On Wed, 28 Feb 2001, Bruce Korb wrote:
> Dirk Herrmann wrote:
> > Hello everybody,
> > some time ago I had promised to provide a function scm_c_make_string as a
> > replacement for scm_makstr. This should, in principle, be easy to do.
> > However, I am still thinking about our recent discussion about string
> > representations, and thus realized, that with a variable character length
> > string representation, there is no possibility to deliver something like
> > an 'uninitialized' string:
> I know I'm late to the party, but is it too late to ask for
> both a 'used length' and an 'allocated length'? That's part
> of the reason I have avoided some of your string stuff already.
> I know maximum amount space I am going to need, but the final
> count isn't determinable until I am done formatting the string.
> Is there a convenient way of dealing with this situation without
> an extra copy of the data? If so, it did not jump out at me. :-)
Currently, there isn't. However, if we are going to switch to a shared
substring implementation, this can be easily solved: You have some
'working' string, that has the maximum required capacity. As soon as you
are done formatting stuff, you create a substring of the 'working' string
that has the length of the actually used part. It has to be remembered,
that the substring, even if it is very short, will still keep the whole
memory of the 'working' string alive.
Unfortunately, with a variable character width string representation,
things are more complicated, again: Every string object holds two
length values: The length in bytes, and the length in characters. If you
allocate your 'working' string, it will be initialized somehow, say with n
single-byte characters. What happens if you want to insert a character
that requires more than a single byte (say b bytes)?
One solution is, to allocate a new string, that has a byte length of n + b
- 1, but still a character length of n bytes. The disadvantage is, that
this is time consuming, and it has to happen every time you insert a
character of a length longer than 1. If you primarily work with ASCII
characters, this does not happen very often, though.
A different solution is, to just use the space of b single byte characters
for the longer character. This means to keep the byte length of the
string, but to decrease the character length to n - b + 1. This is less
time consuming, but must be done very carefully: If the string is used
somewhere else, code may break: First, it violates the guarantee that
strings in scheme have a fixed character length. If there is some code
(for example, a closure) out there, which has extracted the old length of
the string, that code will break. Second, such a change turns a char*
that formerly pointed at a valid beginning of a single-byte character into
a char* that points into the middle of a multi-byte character. If
there is code (for example in some outer loop) that uses such a char*,
it will break.
Thus, the second solution can only be used if the corresponding string
object is not visible to code somewhere else. For general string objects
it is wise to follow the rule: "Don't overwrite characters of length l1
with characters of a different length l2".
|[Prev in Thread]
||[Next in Thread]|
- Re: scm_c_make_string,
Dirk Herrmann <=