Re: review/merge request: wip-array-refactor

guile-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: review/merge request: wip-array-refactor

From:	Andy Wingo
Subject:	Re: review/merge request: wip-array-refactor
Date:	Tue, 04 Aug 2009 14:21:08 +0200
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/23.0.92 (gnu/linux)

Hi Neil,

On Thu 30 Jul 2009 23:10, Neil Jerram <address@hidden> writes:

> Andy Wingo <address@hidden> writes:
>
>> On Wed 22 Jul 2009 23:48, Neil Jerram <address@hidden> writes:
>>
>>> I have two overall questions in mind.
>>>
>>> - What do you have in mind as regards releasing this?  Even though it
>>>   looks good, I think it would be better to let it mature for a while,
>>>   and hence not to put it into 1.9.x/2.0.  (And we're not short of new
>>>   stuff in 1.9.x/2.0!)
>>
>> Personally I would prefer that it come out in 2.0. I'm fairly (but not
>> entirely) confident of its consistency as it is, and quite confident
>> that it is a more maintainable, and hence fixable, codebase.
>
> I could be wrong, but I don't intuitively feel comfortable with that.
> It just feels too quick/early.
>
> On the other hand, I think this is really valuable work, and
> absolutely don't want an interval of years or months before it gets
> out there.
>
> What is our release plan after 2.0?  I don't know.  I'd like something
> more dynamic than the very long intervals between major releases that
> we've had in the past.  But it seems there is a conflict between
>
> - major releases being the points at which we can break the API/ABI
>   (with accompanying documentation)
>
> - wanting to have such releases more frequently than in the past, so
>   that good new stuff gets out quicker
>
> - wanting not to create grief for Guile users by changing the API/ABI
>   frequently.
>
> Is there a solution?

I don't know of one, no.

I know of two models that work: one, when you're just starting
developing a library, and downstream users are making the first cuts at
their software too, and everything is in froth and people are willing to
adapt to API or ABI changes. The library is not widely distributed, so
changes don't affect many people besides the developers, like
distributors or users.

The second model is when you already have a wide deployed base. You can
make additions to your API and ABI, and deprecated old API or ABI, but
you can't remove old API or change the ABI. Incompatible breaks are
painful, and the switching-over time is somewhere between a year and
three years. The right length of a stable series seems to be about 4 or
5 years.

So in the second condition, where Guile seems to be, we need to mostly
preserve API and ABI, though we can remove the deprecated bits every few
years. But new API or ABI has to be accompanied with lots of thought,
because you have to support it for 5 years or more.

Dunno, I'm babbling, but the thing is that I feel like if there are
changes that need making, we should make them now. Like Mark's %nil
work. My perception is that we won't have another chance for another few
years.

Unless of course, distros miss 2.0 altogether, like Python has done with
3.0 and 3.1... We could do that. Seems like needless churn, but perhaps
it's necessary to get the wider exposure.

>> The reason that I want it in is that bytevectors are a nice api for I/O,
>> but once you have data in memory, often it's best to treat it as a
>> uniform array -- be it of u8 color components, or s32 audio samples.
>>
>> Uniform vectors are almost by nature "in flight" between two places.
>
> (Not sure I agree.  I'd say uniform vectors are mostly holding numbers
> in a computation, or for plotting on a graph.)

But how do you plot? If you use some sort of external software, you have
two options: code your plotting in C, and loop over the data with the C
API. Or do it in Scheme, and... loop over the s16vector, writing each
sample individually? How do you get at the bits of the s16vector so they
can be written to a port? Use the impoverished uniform-vector-write ?

(rnrs bytevectors) combined with (rnrs io ports) is the best way to get
numeric data into and out of a process, from Scheme. But -- the uniform
vector API is the best API for dealing with that data from Scheme.

> That sounds to me like motivation for adding a richer API to
> bytevectors (and which could all be in Scheme), not necessarily for
> the deep unification of uniform and byte vectors that you have coded.

There is e.g.:

   (bytevector-s16-native-ref bv n)

or

   (bytevector-s16-ref bv n (endianness big))

But that `n' is in bytes, not in elements. If you really want to treat
the bytevector as a numeric array, you're better off with the SRFI-4
API. It is a better API. There's no reason why the SRFI-4 API could not
apply to bytevectors:

   (s16vector-ref bv n) == (bytevector-s16-native-ref bv (* n 2))

Also, only srfi-4 vectors have a read syntax like #s16(1 2 3). You can't
express that with bytevectors, because you would have to encode the
endianness into your source file.

> TBH, with your refactoring up to this point, I still don't have the
> overall picture (arrays, uniform vectors, bitvectors, strings etc)
> firmly in my head.  I'd like to do that and then reconsider your
> points above.

There are two things. One is a generic API for accessing arrays, using
array handles. The second is a rebase of srfi-4 vectors on top of
bytevectors.

>>>>   (u8vector-ref #u32(#xffffffff) 0) => 255
>
> Note that using #xffffffff here glosses over the endianness problem.

Of course. Fortunately there is a sensible interpretation -- that the
u32vector is in native-endianness. The alternative is this:

   (let ((bv (make-bytevector 4)))
     (bytevector-u32-native-set! bv 0 #xffffffff)
     (bytevector-u8-ref bv 0))

Which is actually less efficient. You could of course do:

   (bytevector-u8-ref #u32(#xffffffff) 0) => 255
   (bytevector-u8-ref #u32(#x01234567) 0) => ?

if that would be your preference; the latter answer is just as
endianness-dependent as if you used the `let' idiom above to ref the
value.

> (I think my inclination at this point is that I'd prefer explicit
> conversions.)

When it matters, I would think that the bytevector API is sufficiently
explicit for anyone. Note that referencing values that are more than 8
bits wide have two flavors:

  bytevector-s16-ref bv n endianness
  bytevector-s16-native-ref bv n

So you have all the power available to you.

Or... we could indeed prohibit (u8vector-ref #u32(0) 0). But there
doesn't seem to be a point. Why bother?

>> I ought to be able to get at the bits of a packed (uniform) vector. The
>> whole point of being a uniform vector is to specify a certain bit layout
>> of your data.
>
> Huh?  I would say it is to be able to store numbers with a given range
> (or precision) efficiently, and to be able to access them efficiently
> from both Scheme and C.

Note that for access, an f64vector is almost certainly less efficient
than a Scheme vector of reals, from Scheme, due to the need to
heap-allocate the f64 values as you ref them.

I've written lots of code that deals with srfi-4 vectors. I have three
kinds of use cases. First is data being shoved around in a
dynamically-typed system: dbus messages, gconf values, a system we 
at work, etc. Second, but related, is dealing with chunks of data that
come from elsewhere, like GDK pixbufs, or GStreamer buffers. Third is
hacking compilers, as in Guile itself, or emitting machine code for
other machines.

In all of these cases, the data doesn't just stay in Guile. It either
comes from somewhere else or ends up going somewhere else. The semantics
that are implemented in this patch set actually help all of these cases,
and make Scheme more powerful -- it's not just C any more that can get
at the bits of an array. It allows me to code less in C and more in
Scheme.

>>>>   (u8vector? #u32(#xffffffff)) => #f
>>
>> However, we need to preserve type dispatch:
>>
>>   (cond
>>     ((u8vector? x) ...)
>>     ((s8vector? x) ...)
>>     ...)
>>
>>>>   (bytevector? #u32(#xffffffff)) => #t
>>
>> This to me is like allowing (integer? 1) and (integer? 1.0); in
>> /essence/ a #u32 is an array of bytes, interpreted in a certain way.
>
> I think you have in mind that all uniform vectors are filled by
> reading from a port, or are destined for writing out to a port.
>
> That is an important use, but there is another one: preparing
> numerical data for handling in both C and Scheme.  In this use, the
> concept of an underlying array of bytes plays no part.

You are correct. But the other use cases I mentioned are no less valid.

In summary... I don't mean to be a bore, but I really don't like the
existing unif.c and srfi-4.c. They are painful to understand and to hack
on. I think those bits should be merged.

I also think that srfi-4 vectors should be implemented in terms of
bytevectors, for the reasons above. If you really want to, we can
prohibit u8vector-ref from operating on u32vectors, but that seems
unnecessary to me. I also think that the behavior as implemented in
wip-array-refactor should go in, for 2.0 -- because we just won't have
another chance in the next few years. Not enough testing isn't really a
valid concern IMO, because how else is it going to get testing?

But I do appreciate your input, and decisions.

Cheers,

Andy
-- 
http://wingolog.org/

[Prev in Thread]

Current Thread

[Next in Thread]

Re: review/merge request: wip-array-refactor, Andy Wingo <=
- Re: review/merge request: wip-array-refactor, Ludovic Courtès, 2009/08/09
  - Re: review/merge request: wip-array-refactor, Andy Wingo, 2009/08/12
    - Re: review/merge request: wip-array-refactor, Ludovic Courtès, 2009/08/13

Prev by Date: Re: Elisp performance
Next by Date: Re: guile performance - Ackermann function: way slower than emacs, slower still if compiled
Previous by thread: Re: Elisp performance
Next by thread: Re: review/merge request: wip-array-refactor
Index(es):
- Date
- Thread