guix-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug#51838] [PATCH v6 03/41] guix: node-build-system: Add JSON utilities


From: Philip McGrath
Subject: [bug#51838] [PATCH v6 03/41] guix: node-build-system: Add JSON utilities.
Date: Fri, 7 Jan 2022 23:13:54 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.3.1

Hi,

On 12/31/21 05:18, Liliana Marie Prikler wrote:
Am Freitag, dem 31.12.2021 um 00:22 -0500 schrieb Philip McGrath:
+(define (alist-pop alist key)
+  "Return two values: the first pair in ALIST with the given KEY
in its
+'car' (or #f, if no such pair exists) and an assosciation list
like (and
+potentially sharing storage with) ALIST, but with no entry for
KEY."
+  (match (assoc key alist)
+    ;; If key isn't present, we don't need to do any allocation
+    (#f
+     (values #f alist))
+    (found
+     (values found
+             ;; Because we have `found`, we can find it more
+             ;; efficiently this time with `eq?`. We avoid using
+             ;; `delq` because it would copy pairs in a shared
+             ;; tail. We assume a sufficiently smart compiler to
+             ;; handle "tail recursion modulo cons" (vid. e.g.
Indiana
+             ;; University Technical Report No. 19, Friedman &
Wise
+             ;; 1975) at least as efficiently as a hand-written
+             ;; tail-recursive implementation with an
accumulator.
+             (let loop ((alist alist))
+               (match alist
+                 ;; We know that `found` is present,
+                 ;; so no need to check for '()
+                 ((this . alist)
+                  (if (eq? this found)
+                      alist
+                      (cons this (loop alist))))))))))
I think this can be more efficiently be done in a "single" loop.

    (let loop ((rest alist)
               (previous '()))
      (match rest
        (() (values #f alist))
        ((first . rest)
         (if (eq? (car first) key)
             (values first (reverse! previous rest))
             (loop rest (cons first previous))))))


I'll admit to a Racket bias, but, having just eliminated the use of
'assoc-set!', I'm loathe to start mutating pairs (even correctly). To
quote a bit from the SRFI-1 spec for 'append-reverse!', "note that
this pattern of iterative computation followed by a reverse can
frequently be rewritten as a recursion, dispensing with the reverse
and append-reverse steps, and shifting temporary, intermediate
storage from the heap to the stack, which is typically a win for
reasons of cache locality and eager storage reclamation." (See how
'set-cdr!' can crash safe Chez Scheme!
<https://github.com/cisco/ChezScheme/issues/599>)

IIUC, using SRFI-1's 'span' would lead to the same situation.
For the record, we can use the non-destructive append and reverse here
at the expense of more copying.  If done in terms of SRFI-1 span, we
would not need reverse as far as I understand.

Also, I don't think your version is tail-recursive.  (loop alist)
is not in tail position from what I can tell.

Yes, "tail recursion modulo cons" refers to a compiler optimization
for functions which are _not_ tail recursive. For full details, see
the Friedman & Wise 1975 tech report I cited at
<https://legacy.cs.indiana.edu/ftp/techreports/TR19.pdf> (or various
other articles), but, as briefly as I can: The optimization rests on
the observation that many recursive functions, like the classic
definition of 'map':

      (define (map f lst)
        (match lst
          (()
           '())
          ((this . lst)
           (cons (f this)
                 (map f lst)))))

are nearly tail-recursive, and the only real work remaining to be
done in the continuation of the recursive call is to fill in the cdr
of the pair. Thus, a compiler can safely transform this code into a
truly tail-recursive implementation:

      (define (map f lst)
        (match lst
          (()
           '())
          ((this . lst)
           (define ret (list (f this)))
           (let loop ((dest ret)
                      (lst lst))
             (match lst
               ((this . lst)
                (define new (list (f this)))
                (set-cdr! dest new)
                (loop new lst))
               (()
                ret))))))

Unlike the Proper Implementation of Tail Calls (so-called "tail-call
optimization"), handling "tail recursion modulo cons" truly is an
optimization: it does not change the space complexity of the
function. But it can allow the compiler to generate whatever code it
thinks will work best with its collector/allocator and
continuation/"call stack" implementation.

(The optimizations applies to constructors in general, not just
'cons', and a compiler can safely apply it to values that are
immutable from the perspective of the source language.)
I'm not aware to which extent Guile implements tail recursion modulo
cons and I'd argue neither are you until you dig down into disassembly.
I think it's better here to avoid patterns from Racket that would feel
foreign to Guilers, particularly if you have to explain them with
reference to a paper (we already get hate for referring to Wingo's fold
for XML handling).

In a sense, "tail recursion modulo cons" was a red herring here. The essential requirement for implementing 'alist-pop' or 'map' as I did is that the language implementation be "safe for space", i.e. not have "stack overflow"s: Guile meets that requirement. [1]

In a safe-for-space language, the naturally recursive implementations and the implementations with explicit, non-destructive accumulators both allocate O(n) temporary storage. The difference is that the explicit accumulator versions allocate temporary pairs on the heap, while the naturally recursive version allocates its temporary space on the "stack" (i.e. additional frames of the (non-reified) continuation), which is generally, and specifically for Guile per [1], much better (though a sufficiently smart generational garbage collector with bump-pointer allocation in the nursery could mitigate the difference somewhat).

All of that relies just on the guarantees of Guile as a safe-for-space language. The optimization in "tail recursion modulo cons" is that a compiler could, if it chose to expend its effort this way, make the naturally recursive implementations work without the O(n) temporary "stack" storage by transforming transforming the non-tail recursion into tail recursion. In essence, it could achieve a similar effect to an explicit accumulator plus 'reverse!' without the many downsides (some of which [1] discusses).

But the naturally recursive implementation is preferable even if the optimization does not apply.


In principle, what you're supposing is that a sufficiently smart
compiler could rewrite

   (let ((before after (span PRED mylist))) (append before after))

to (list-copy mylist), which as far as I'm aware Guile currently
doesn't.  It could be argued that it would start doing so once I cast
some magic incantations, but I wouldn't count on it without reading the
disassembly.

In some sense that's true, but your example would require a lot of interprocedural analysis, not just a directly visible pattern with well-known primitives using analysis that has been well known since the '70s. But, again, the optimization isn't really relevant.
Is order relevant here?  Because we could just as well reimplement
our alist-delete* loop and cons the replacement onto the rest.
WDYT?

Relying on order for JSON objects is non-interoperable, per RFC 8259
§4. I'm not intending for these alist procedures to be exported, so
I'm not trying to handle any more general case than that, as I
explain in the comments at the top of the file.

I'm not sure what the advantage would be to reimplementing the
'alist-delete' loop here.
Fair enough, the question was however not so much what is required per
RFC, but rather if there is a natural feel of order to package.json
that we ought not disturb.  Particularly, putting dependencies before
name and version could be confusing to whoever needs to debug a delete-
dependencies phase gone wrong.

I haven't noticed a consistent convention in "package.json" files (which IIUC may not be entirely hand-written).

For debugging, the biggest problem is that (guix build json) doesn't add any linebreaks or indentation.

If I were changing it, I'd want it to write object keys in 'string<?' order and to raise an exception if given duplicate keys.

-Philip

[1]: https://www.gnu.org/software/guile/docs/docs-2.2/guile-ref/Stack-Overflow.html





reply via email to

[Prev in Thread] Current Thread [Next in Thread]