chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New egg: CHICKEN Transducers


From: Jeremy Steward
Subject: Re: New egg: CHICKEN Transducers
Date: Tue, 10 Jan 2023 18:22:26 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0 Thunderbird/102.6.1

Hey! These are some interesting changes, let me go through them bit-by-bit:

On 1/9/23 11:45, siiky wrote:
[^1]: This can be easily remied if the sentinel is given to the collect
instead of the transduce, in which case the current def of transduce is
a single case-lambda case instead of two:

(transduce fold (map *) (collect 'sentinel-value) data)

I was "forced" to procrastinate yesterday by this incredible force... ^^'

Before jumping into the remaining bits, I saw that there was a commit in your fork of the repo regarding this. It's an interesting idea and I'm inclined to actually merge this in proper, but there is a trade-off (as there always is).

Namely, as transducers currently are one can use variable argument procedures as "collectors." I somewhat weighed on this back and forth when writing 0.1.0: is it worth having separate sum / product procedures or should I allow for collectors to be any procedure that is defined with 0-arity to 2-arity?

    (transduce list-fold
               values
               +
               (list 1 2 3 4 5))

See, if you want to be consistent and make it so that collectors always define their sentinel (which makes the API a bit more consistent), then you cannot do the above, since there's no way to call |(+ 100)| as a collector. You need a separate "sum" or "collect-sum" to function for you here.

I left it as-is because the broader Clojure community uses a similar strategy for their transducers. I think a difference here isn't necessarily /bad/ but it may mean that certain combinations collector + sentinel aren't as easy to express (without doing your own case-lambda, which means writing collectors yourself).

Thoughts? I don't think giving every collector an optional argument is necessarily wrong in terms of API but the trade-off does exist. Given that there's little code written with transducers today I'm not sure I have any argument-from-inertia to say which way is better or more useful but I think this is a good question to solve before a 1.0 release.

I've been thinking for the past couple of days about this and couldn't
come up with any way to make a generic "zip" work with transducers other
than (a) rewriting the basic algorithm of the egg (which is simple and I
prefer that way), or (b) introducing the concept of an "iterator" (which
is, I think, how Rust does it, and is equivalent to C++ Ranges (cf.
"Functional Programming in C++", Ivan Čukić)). The problem is that
type-fold does all the work from start to finish and can't be
"suspended" to look to the side for a second.

I personally think a fully generic zip / chain / interleave / flatten aren't that appealing. You (should) almost always know the type of something, and if you're doing a partial application, e.g.:

    (define my-process
      (compose
        (zip-list (list 'a 'b 'c))
        (map ...)
        (filter ...)))

to later call |my-process| in a transducer, then the "zip" is well-defined by this point in most scenarios (in my experience, although that might be me conflating how I write code based on the API restrictions I have presented for myself). You have to monomorphize your function at some point, else you're using an intermediate type (which is just monomorphizing by indirection).

Because (a) goes against extending this egg, replacing its
implementation instead, and because it's a lot more work than (and not
as obvious as) (b), I went with (b).

An iterator is some object that keeps internal state about a collection
and how to access it, that can be used to incrementally[^0] traverse
this collection's elements in some (iterator-defined) order. Just like
streams![^1] So after a few hours hacking at it, it's as simple as this:

I do like your process of discovery; In fact, I actually started writing this egg with this exact thinking in mind: why not try to define a common iterator type for everything and then only provide one |iterable-fold| to rule them all? Well, it ended up being pretty slow, which forced me to re-think my assumptions about performance and how I should think about types in Scheme. Trying too hard to avoid any type-specification ends up being a problem in its own right if you take it too far.

Nevertheless, Rust's iterators are lazy (and very similar to streams). SRFI-41 streams have some performance questions compared to e.g. generators though, which is why I opted to make sure that reader-fold works with generators off the bat.

Generators are the more appropriate solution to laziness in most cases. Delay and Force seem to come with too much performance baggage for my taste, although I could absolutely see providing (transducers streams) and doing basically what you've done with "iterators" within transducers in a future version of the library.

After all, if you had (transducers streams) you could use the base and streams sub-modules and write whatever extension you wanted extant to that. This to me is why I came to settle on transducers in the first place - it allows this extension without requiring it in the core library itself. ;)

You actually do reference this later in your reply, but I wanted to underscore it here. It's an important point!

By implementing iterator-fold and the type->iterator functions
everything works OOB with the transducers egg. With some small changes
to the egg[0] (especially [1]), it can be improved (IMO) to this:

So I looked at the changes required here - regardless of the point on collect & sentinel vs. (collect #!optional sentinel), I am unsure that I want to change |transduce| too much.

I think the transduce you've provided, i.e.:

    (define (transduce folder xform collector iterable . iterables)
      (let* ((xf (xform collector))
             (sentinel (collector))
             (result (if (null? iterables)
                       (folder xf sentinel iterable)
                       (apply folder
                              xf
                              sentinel
                              (cons iterable iterables)))))
        (xf result)))

is probably better implemented as syntax (to avoid the apply, as was mentioned in the TODO comment), and probably best to call |transduce*|. That section for (if (null? iterables) ...) can easily distinguish between single collection and multi-collection variants, assuming they all use the same folder.

From an API perspective that to me seems more uniform / less special-case-y. I have to experiment with this though, because I'm biased towards believing that such code isn't really worth the extra complexity. With |chain| and |zip| transducers being available I think you can accomplish this in a more generic way (not requiring that all collections have the same folder), and I seldom find myself writing code where I need to collate / work across multiple collections.

I know Scheme is dynamic and that enables this, but 99% of my code will always tend towards using a single collection kind. In CHICKEN's type terminology, that's usually (list-of T) or (vector-of T). The example of:

    (transduce iterator-fold
               (map (cute apply * <>))
               (collect-list)
               (iterator-zip (list->iterator '(0 1 2 3 4))
                             (vector->iterator #(5 6 7 8 9))
                             (range->iterator 5))))
    ;=> (0 6 28 72 144)

seems a bit strange to me. I cannot rightly remember a time where I'd do something like this across a lot of types. Even thinking of code I've written in Rust -- it would be strange to do it this way. I struggle to think of a scenario where I'd have 3 different collection types that needed to be all zipped together like this. Enumeration is an obvious use-case, but |enumerate| is already a transducer.

I'm not fully convinced that this code is what I'd steer people towards, but I do appreciate that you've gone through the trouble / learning process on this one though. It has definitely helped me think about it more deeply. I still have to give it more consideration but there may be something here for a future version of the library.

[0]: https://git.sr.ht/~siiky/chicken-transducers/log/multiple-iterables
[1]:
https://git.sr.ht/~siiky/chicken-transducers/commit/68f9d6d3faaf80e1b57c7bda5dec888b34c670dd
[2]:
https://git.sr.ht/~siiky/experiments/tree/main/item/iterators/transducers-example.scm

If it's alright with you, I will probably go ahead and grab at least the (test-exit) patch from above and apply it to transducers today. I can't believe I forgot that.

PS: great work on the docs, Jeremy, they look very comprehensive!

Thank you. This is probably the best compliment I could have received.

Cheers,
--
Jeremy Steward



reply via email to

[Prev in Thread] Current Thread [Next in Thread]