guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

The future: accessing vectors, arrays, etc from C


From: Marius Vollmer
Subject: The future: accessing vectors, arrays, etc from C
Date: 30 Dec 2004 15:56:06 +0100
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3

Hi,

after some procrastination, I have more or less convinced myself to
make accessing vectors and arrays (including uniform numeric vectors
and arrays) more difficult from C.  Here is how and why.  Please
comment!


Right now, Guile's implementation of vectors and arrays is a bit
upside down: arrays are build on top of vectors, with the consequence
that not all one-dimensional arrays can be treated as vectors.  While
fixing this, we should also allow future improvements like
copy-on-write sub-arrays, growable vectors etc.

The first step is to come up with a C API that supports rich array
(and vector) implementations.  (We already have such an API for
strings, but it is not exported since the string implementation needs
to change significantly once more for Unicode support.)

One observation is that we can not have an API that locks arrays while
the locker is allowed to run arbitrary code.  That would probably very
quickly lead to unmanageable dead locks.

However, we do want allow C code access to the raw storage block of an
array, so that it can pass it to external code such as an image
processing library or linear algebra routines.

This can be achieved with a two-level scheme where an array points to
a storage object.  The storage object of an array can change over time
(when the array is copied-on-write, say), but storage objects
themselves always point to the same raw memory.

When accessing an array from C, one extracts the storage object and
then only works with that object.  The raw memory of the storage
object is guaranteed to stay in place as long as the storage object
itself is protected.

This mechanism is abstracted away via an 'array handle'.  You need to
get an array handle and then can access the array memory through this
handle.  The handle needs to be protected: this is most easily done by
just placing it on the stack, and then you don't need to 'release' it,
which is very comfortable.

Here are procedures for dealing with such handles.

  - void scm_array_get_handle (SCM array, scm_t_array_handle *h);

  Fill the array handle H so that it can be used with the procedures
  below.  The handle H must be visible to the garbage collector,
  therefore H must point to a struct on the stack.  See also
  scm_array_handle_copy.

  When ARRAY is not an array, an error is signalled.  All kinds of
  arrays are acceptable, including uniform numeric arrays, strings,
  and bitvectors.  To restrict the type of the array, use one of the
  scm_array_handle_*_elements functions.


  - void scm_vector_get_handle (SCM vec, scm_t_array_handle *);

  Like scm_array_get_handle, but only accepts one-dimensional arrays.


  - scm_t_array_handle *scm_array_handle_copy (scm_t_array_handle *H)

  Make a copy of H and return it.  This copy must be freed with
  scm_array_handle_free (H).  This function might be useful in
  situationswhere you can not allocate you handle on the stack.


  - void scm_array_handle_free (scm_t_array_handle *H)

  Free the array handle H, which _must_ have been created with
  scm_array_handle_copy.  Normal handles that are allocated on the
  stack _must_ _not_ be freed with this procedure.


  - size_t scm_array_handle_rank (scm_t_array_handle *);
  - scm_t_array_dimension *scm_array_handle_dims (scm_t_array_handle *);

  These procedures deliver information about the storage layout of the
  array, to be detailed elsewhere.


  - SCM scm_array_handle_ref (scm_t_array_handle *, size_t pos);

  Return the value at position POS in the storage vector of the
  handle.  POS can be computed from the layout information above and
  must be valid; no range checking is done.

  This function works for all kinds of arrays.


  - void scm_array_handle_set (scm_t_array_handle *, size_t pos, SCM val);

  Set the value at position POS in the storage vector of the handle to
  VAL.


  - const SCM *scm_array_handle_elements (scm_t_array_handle *);

  Return a pointer to the raw memory of a generic (non-uniform) array
  for reading.  When the array is not a generic one, signal an error.

  This pointer is valid as long as the handle is protected.  It is
  possible that the representation of the original array changes (in a
  copy-on-write operation, say) and in that case the pointer returned
  by this function will still be valid, but will no longer belong to
  the original array.  Thus, you might miss modifications to the
  array.  It is therefore best to refresh the pointer by a new call to
  this function from time to time.  Exactly how often is up to you.


  - SCM *scm_array_handle_writable_elements (scm_t_array_handle *);
  
  Like scm_array_handle_elements, but returns a pointer that is good
  for reading and writing.

  - size_t scm_array_handle_element_size (scm_t_array_handle *);
  - const void *scm_array_handle_untyped_elements (scm_t_array_handle *);
  - void *scm_array_handle_untyped_writable_elements (scm_t_array_handle *);

  Like above, but works with any kind of array.  You are not allowed
  to interpret the values, but you can copy them around with memcpy,
  say.


  - const scm_t_uint8 *scm_array_handle_u8_elements (scm_t_array_handle *);
  - scm_t_uint8 *scm_array_handle_u8_writable_elements (scm_t_array_handle *);
  - const scm_t_int8 *scm_array_handle_s8_elements (scm_t_array_handle *);
  - scm_t_int8 *scm_array_handle_s8_writable_elements (scm_t_array_handle *);
  - ETC

  Like scm_array_handle_elements and scm_array_handle_writable_elements,
  but for uniform numeric arrays.


  - const scm_t_uint32 *scm_array_handle_bit_elements (scm_t_array_handle *);
  - scm_t_uint8 *scm_array_handle_bit_writable_elements (scm_t_array_handle *);

  For bitvectors.


A typical function that optimizes for f64vectors:

  double
  vector_norm (SCM vec)
  {
    scm_t_array_handle h;
    scm_t_array_dimension *dim;
    size_t i;
    double sum = 0;

    scm_vector_get_handle (vec, &h);
    dim = scm_array_handle_dimensions (&h);

    if (scm_is_true (scm_f64vector_p (vec)))
      {
        double *elts;

        elts = scm_array_handle_f64_elements (&h);
        for (i = 0; i <= dim->len; i++, elts += dim->inc)
          sum += elts[0]*elts[0];

      }
    else
      {
        size_t pos = 0;
        for (i = 0; i <= dim->len; i++, pos += dim->inc)
          {
            double elt = scm_to_double (scm_array_handle_ref (&h, pos));
            sum += elt*elt;
          }
      }

    return sqrt (sum);
  }


There are and will be alternative and simpler ways to access vectors.
The first is just to use scm_c_vector_ref and scm_c_vector_set_x.  A
second is to only work with 'simple' vectors.  A simple vector is what
we have now: a simple, non-changing pointer to memory.  You can use
the macros SCM_SIMPLE_VECTOR_REF and SCM_SIMPLE_VECTOR_SET with them
but you don't get the full generality.


So what do you say?  Is something like the above acceptable?  Too
involved?  Are there holes in the thinking?

Again, the point is to make it relatively easy to write very general
code when dealing with vectors and arrays, to allow for future
improvements to the arrays implementation (maybe to the point of
unifying it with the string implementation) and to be thread-safe.

-- 
GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3  331E FAF8 226A D5D4 E405




reply via email to

[Prev in Thread] Current Thread [Next in Thread]