guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Add 'bytevector-slice'.


From: Maxime Devos
Subject: Re: [PATCH] Add 'bytevector-slice'.
Date: Thu, 12 Jan 2023 01:10:57 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.5.1

This looks useful to me, especially once the optimiser recognises 'bytevector-slice'. (In Scheme-GNUnet, I defined a wrapper around bytevectors called 'bytevectors slices' to implement such a thing.)

The only thing missing for me, is a procedure 'bytevector-slice/read-only' and 'bytevector-slice/write-only', then I could throw the module implementing the wrapping in Scheme-GNUnet and the overhead incurred by wrapping away.

On 11-01-2023 16:00, Ludovic Courtès wrote:
+@deffn {Scheme Procedure} bytevector-slice @var{bv} @var{offset} [@var{size}]
+@deffnx {C Function} scm_bytevector_slice (@var{bv}, @var{offset}, @var{size})
+Return the slice of @var{bv} starting at @var{offset} and counting
+@var{size} bytes.  When @var{size} is omitted, the slice covers all
+of @var{bv} starting from @var{offset}.  The returned slice shares
+storage with @var{bv}: changes to the slice are visible in @var{bv}
+and vice-versa.
+
+When @var{bv} is actually a SRFI-4 uniform vector, its element
+type is preserved unless @var{offset} and @var{size} are not aligned
+on its element type size.
+@end deffn
I wrote stuff here about 'What if offset or size unaligned? This is undocumented.', and only later noticed this paragraph documenting exactly that.

+
+Here is an example showing how to use it:
+
+@lisp
+(use-modules (rnrs bytevectors)
+             (rnrs bytevectors gnu))

I thought that R6RS reserved (rnrs ...) to the RnRS process, so I would think that naming it (rnrs bytevectors gnu) would be standards-incompliant, though I cannot find in the specification, so maybe it isn't actually reserved.

(SRFI looks a bit looser to me, so I find the (srfi ... gnu) acceptable, but (rnrs ... gnu) looks weird to me, I would propose (ice-9 bytevector-extensions) or such.).

+
diff --git a/doc/ref/guile.texi b/doc/ref/guile.texi
index 6a81a0893..8414c3e2d 100644
--- a/doc/ref/guile.texi
+++ b/doc/ref/guile.texi
@@ -13,7 +13,7 @@
  @copying
  This manual documents Guile version @value{VERSION}.
-Copyright (C) 1996-1997, 2000-2005, 2009-2021 Free Software Foundation,
+Copyright (C) 1996-1997, 2000-2005, 2009-2023 Free Software Foundation,


Where does this year 2022 come from? Does a previous version of this patch predate the new year?


diff --git a/libguile/bytevectors.c b/libguile/bytevectors.c
index bbc23f449..6b920c88a 100644
--- a/libguile/bytevectors.c
+++ b/libguile/bytevectors.c
@@ -1,4 +1,4 @@
-/* Copyright 2009-2015,2018-2019
+/* Copyright 2009-2015,2018-2019,2022-2023
       Free Software Foundation, Inc.
Ditto.

     This file is part of Guile.
@@ -325,6 +325,73 @@ scm_c_take_typed_bytevector (signed char *contents, size_t 
len,
    return ret;
  }
+SCM_DEFINE (scm_bytevector_slice, "bytevector-slice", 2, 1, 0,
+            (SCM bv, SCM offset, SCM size),
+            "Return the slice of @var{bv} starting at @var{offset} and 
counting\n"
+            "@var{size} bytes.  When @var{size} is omitted, the slice covers 
all\n"
+            "of @var{bv} starting from @var{offset}.  The returned slice 
shares\n"
+            "storage with @var{bv}: changes to the slice are visible in 
@var{bv}\n"
+            "and vice-versa.\n"
+            "\n"
+            "When @var{bv} is actually a SRFI-4 uniform vector, its element\n"
+            "type is preserved unless @var{offset} and @var{size} are not 
aligned\n"
+            "on its element type size.\n")
+#define FUNC_NAME s_scm_bytevector_slice
+{
+  SCM ret;
+  size_t c_offset, c_size;
+  scm_t_array_element_type element_type;
+
+  SCM_VALIDATE_BYTEVECTOR (1, bv);
+
+  /* FIXME: Until 3.0.8 included, the assembler would not set the
+     SCM_F_BYTEVECTOR_CONTIGUOUS flag on literals.  Thus, ignore it and
+     assume BV is contiguous (how could it not be anyway?).  */
+#if 0
+  if (!SCM_BYTEVECTOR_CONTIGUOUS_P (bv))
+    scm_wrong_type_arg_msg (FUNC_NAME, 0, bv, "contiguous bytevector");
+#endif


I don't know what's up with that, I'm leaving this fragment of code for other to review.

+
+  c_offset = scm_to_size_t (offset);
+
+  if (SCM_UNBNDP (size))
+    {
+      if (c_offset < SCM_BYTEVECTOR_LENGTH (bv))
+        c_size = SCM_BYTEVECTOR_LENGTH (bv) - c_offset;
+      else
+        c_size = 0; > +    }
+  else
+    c_size = scm_to_size_t (size);
+
+  if (c_offset + c_size > SCM_BYTEVECTOR_LENGTH (bv))
+    scm_out_of_range (FUNC_NAME, offset);


If offset=SIZE_MAX-1 and size=1, this will overflow to 0 and hence not trigger the error reporting. This bounds check needs to be rewritten, with corresponding additional tests.

+
+  /* Preserve the element type of BV, unless we're not slicing on type
+     boundaries.  */
+  element_type = SCM_BYTEVECTOR_ELEMENT_TYPE (bv);
+  if ((c_offset % SCM_BYTEVECTOR_TYPE_SIZE (bv) != 0)
+      || (c_size % SCM_BYTEVECTOR_TYPE_SIZE (bv) != 0))
+    element_type = SCM_ARRAY_ELEMENT_TYPE_VU8;
+  else
+    c_size /= (scm_i_array_element_type_sizes[element_type] / 8);

I was worried about the alignment but it looks like it is handled.

+  ret = make_bytevector_from_buffer (c_size,
+                                     SCM_BYTEVECTOR_CONTENTS (bv) + c_offset,
+                                     element_type);
+  if (!SCM_MUTABLE_BYTEVECTOR_P (bv))
+    {
+      /* Preserve the immutability property.  */
+      scm_t_bits flags = SCM_BYTEVECTOR_FLAGS (ret);
+      SCM_SET_BYTEVECTOR_FLAGS (ret, flags | SCM_F_BYTEVECTOR_IMMUTABLE);
+    }
+
+  SCM_BYTEVECTOR_SET_PARENT (ret, bv);


IIUC, if you use bytevector-slice iteratively, say:

(let loop ((bv some-initial-value)
           (n large-number))
  (if (> n 0)
      (loop (bytevector-slice bv 0 (bytevector-length bv))
            (- n 1))
      bv))

you will end up with a bytevector containing a reference to a bytevector containing a reference to ... containing a reference to the original reference, in a chain of length ≃ large-number.

This sounds rather inefficient to me (memory-wise). Instead, I propose setting the parent of 'ret' to the parent of 'bv'. (Its not really the 'parent' anymore then, but AFAIK nothing cares about that, though you may want to rename the field then).

  (--)

Do you know what this 'parent' field even is for? Going by some comments in 'libguile/bytevectors.c', it is for GC reasons, but going by the existence of the 'bytevector->pointer' + 'pointer->bytevector' trick which destroys 'parent' information, it seems unnecessary to me.

It appears to be added in '059a588fedf377ffd32cc1f1fee7ed829b263890', but it doesn't say why it is needed. My guess is that the reason the 'pointer->bytevector' tricks works at all is:

    * libguile/foreign.c (scm_pointer_to_bytevector): Use the parent field
      instead of registering a weak reference from bytevector to foreign
      pointer.

and hence 'parent' is needed after all, but this stuff isn't well-documented in the implementation.

+
+  return ret;
+}
+#undef FUNC_NAME
+
  /* Shrink BV to C_NEW_LEN (which is assumed to be smaller than its current
     size) and return the new bytevector (possibly different from BV).  */
  SCM
diff --git a/libguile/bytevectors.h b/libguile/bytevectors.h
index 980d6e267..6179bfd86 100644
--- a/libguile/bytevectors.h
+++ b/libguile/bytevectors.h
@@ -1,7 +1,7 @@
  #ifndef SCM_BYTEVECTORS_H
  #define SCM_BYTEVECTORS_H
-/* Copyright 2009,2011,2018
+/* Copyright 2009,2011,2018,2022
       Free Software Foundation, Inc.
This file is part of Guile.
@@ -52,6 +52,7 @@ SCM_API uint8_t scm_c_bytevector_ref (SCM, size_t);
  SCM_API void scm_c_bytevector_set_x (SCM, size_t, uint8_t);
SCM_API SCM scm_make_bytevector (SCM, SCM);
+SCM_API SCM scm_bytevector_slice (SCM, SCM, SCM);
  SCM_API SCM scm_native_endianness (void);
  SCM_API SCM scm_bytevector_p (SCM);
  SCM_API SCM scm_bytevector_length (SCM);
diff --git a/module/rnrs/bytevectors/gnu.scm b/module/rnrs/bytevectors/gnu.scm
new file mode 100644
index 000000000..ce97535a8
--- /dev/null
+++ b/module/rnrs/bytevectors/gnu.scm
@@ -0,0 +1,24 @@
+;;;; gnu.scm --- GNU extensions to the bytevector API.
+
+;;;;   Copyright (C) 2022 Free Software Foundation, Inc.
+;;;;
+;;;; This library is free software; you can redistribute it and/or
+;;;; modify it under the terms of the GNU Lesser General Public
+;;;; License as published by the Free Software Foundation; either
+;;;; version 3 of the License, or (at your option) any later version.
+;;;;
+;;;; This library is distributed in the hope that it will be useful,
+;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;;;; Lesser General Public License for more details.
+;;;;
+;;;; You should have received a copy of the GNU Lesser General Public
+;;;; License along with this library; if not, write to the Free Software
+;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 
USA

Nowadays a https://.../ instead of a mail address, is more conventional and useful, I'd think. Its even used in old files, e.g. libguile/r6rs-ports.c.

+
+(define-module (rnrs bytevectors gnu)
+  #:version (6)
+  #:export (bytevector-slice))
+
+(define bytevector-slice
+  (@@ (rnrs bytevectors) bytevector-slice))
diff --git a/test-suite/tests/bytevectors.test 
b/test-suite/tests/bytevectors.test
index 732aadb3e..dc4b32370 100644
--- a/test-suite/tests/bytevectors.test
+++ b/test-suite/tests/bytevectors.test
@@ -1,6 +1,6 @@
  ;;;; bytevectors.test --- R6RS bytevectors. -*- mode: scheme; coding: utf-8; 
-*-
  ;;;;
-;;;; Copyright (C) 2009-2015, 2018, 2021 Free Software Foundation, Inc.
+;;;; Copyright (C) 2009-2015, 2018, 2021, 2023 Free Software Foundation, Inc.
  ;;;;
  ;;;; Ludovic Courtès
  ;;;;
@@ -22,6 +22,7 @@
    #:use-module (test-suite lib)
    #:use-module (system base compile)
    #:use-module (rnrs bytevectors)
+  #:use-module (rnrs bytevectors gnu)
    #:use-module (srfi srfi-1)
    #:use-module (srfi srfi-4))
@@ -666,6 +667,56 @@
      exception:out-of-range
      (with-input-from-string "#vu8(0 256)" read)))
+
+(with-test-prefix "bytevector-slice"
+
+  (pass-if-exception "wrong size"
+      exception:out-of-range
+    (let ((b #vu8(1 2 3)))
+      (bytevector-slice b 1 3)))


+
+  (pass-if-equal "slices"
+      (list #vu8(1 2) #vu8(2 3)
+            #vu8(1)   #vu8(2) #vu8(3))
+    (let ((b #vu8(1 2 3)))
+      (list (bytevector-slice b 0 2)
+            (bytevector-slice b 1)
+            (bytevector-slice b 0 1)
+            (bytevector-slice b 1 1)
+            (bytevector-slice b 2))))
+
+  (pass-if-exception "immutable flag preserved"
+      exception:wrong-type-arg
+    (compile '(begin
+                (use-modules (rnrs bytevectors)
+                             (rnrs bytevectors gnu))
+
+                ;; The literal bytevector below is immutable.
+                (let ((bv #vu8(1 2 3)))
+                  (bytevector-u8-set! (bytevector-slice bv 1) 0 0)))
+
+             ;; Disable optimizations to invoke the full-blown
+             ;; 'scm_bytevector_u8_set_x' procedure, which checks for
+             ;; the SCM_F_BYTEVECTOR_IMMUTABLE flag.
+             #:optimization-level 0
+             #:to 'value))
+
+  (pass-if-equal "slice of f32vector"
+      '(8 2)
+    (let* ((v #f32(1.1 1.2 3.14))
+           (s (bytevector-slice v 4)))
+      (and (= (f32vector-ref s 0)
+              (f32vector-ref v 1))
+           (list (bytevector-length s)
+                 (f32vector-length s))))) > +
+  (pass-if-equal "unaligned slice of f32vector"
+      10
+    (let* ((v #f32(1.1 1.2 3.14))
+           (s (bytevector-slice v 2)))
+      (and (not (f32vector? s))
+           (bytevector-length s)))))

A test is missing for the case where the size is unaligned instead of the offset.

  
  (with-test-prefix "Arrays"

Attachment: OpenPGP_0x49E3EE22191725EE.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]