bug-datamash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New program: rand(1)


From: Tim Rice
Subject: Re: New program: rand(1)
Date: Sun, 21 Aug 2022 19:38:39 +0000

Hey Erik,

I do not like the idea of copying lots of code from an actively
maintained library instead of using the library.  Especially when
the intent is to just use that functionality, as opposed to using
it as a base for modification to implement some different, but
related, functionality.

...

I concur that a simple copy & paste approach does not seem to
lead to maintainable code.

Thanks, that makes sense to me.


I think this would be fine if done as an optional dependency to
add more functionality.  Without GSL, rand(1) could IMHO still
function, but be limited to functionality not implemented with
GSL.

Hmm, yeah, good thought. The --version flag could indicate whether the extra 
functionality was compiled in, like in Vim?


There might be a twist here, in that it seems possible that you
would have used GSL for the existing rand(1) functionality, if
you had intended to use GSL anyway.

Haha, true :) I may have to revise what was already done.


With rand(1) being a new addition not yet included in a formal
GNU Datamash release it would seem OK to change course and require
GSL for rand(1), but neither datamash(1) nor decorate(1).  It
could be determined during ./configure if rand(1) can be built
or not, and only rand(1) excluded without GSL.

That sounds fair, with one caveat. Part of the motivation for including rand(1) 
in GNU Datamash is it allows benchmarking to assess new features or 
optimizations, as well as testing some functionality like the jarque operation.

I am okay with not enabling such tests by default, so long as we spell out in 
HACKING.md that benchmarking is recommended for certain types of development. 
So, anyone who is thinking about contributing to GNU Datamash would be 
encouraged to also build rand(1). Especially if their proposed changes may 
affect GNU Datamash's performance.

This all started when I was thinking about adding more compiler hints to 
increase vectorization. I realized I had no good way to assess what difference 
it makes :)


The addition of rand(1) has already placed some burden on packagers,
since at least Ubuntu already has a "rand" package[1] containing
a rand(1) binary[2].  The Debian packaging system has provisions
to handle this and they probably need to be used for the next
GNU Datamash release.

[1] https://packages.ubuntu.com/kinetic/rand
[2] https://launchpad.net/rand

Ah, my distro doesn't have that package. Thanks for the heads up.

Do you know if there is a process for giving Ubuntu packagers a heads up about 
the potential conflict? I'm not sure if I can count on them to read the NEWS 
file in every GNU package :)

Thinking about Ubuntu packaging is on my radar anyway. We probably want GNU 
Datamash v1.8 to go into 22.10, and the next version after that should go in 
23.04. It is unclear to me how automatic that process is.

I was going to wait until 22.10 is released and then check whether its GNU 
Datamash version has bumped appropriately. I figure we have 18 months to get 
this right before the next Ubuntu LTS release in 2024.


* Implement something from scratch? I am not completely averse to this, but it increases duplication of effort between different GNU projects. I am also worried that with fewer eyes on GNU Datamash than GSL, I will introduce bugs that are not an issue in other implementations.

It seems to me as if doing this just to avoid a dependency on a
widely available library would not be worth it.

I consider the copy & paste approach as something similar to this.
Copying existing code into GNU Datamash would require taking ownership
of this code copy.  As such I would view it as a starting point for
a new implementation, because it is quite likely that it would diverge
over time from GSL anyway.

I too think that this risks introducing bugs that would have been
avoided by using GSL as a library.

Yes, you are right.


I would not mind if GSL were added as an optional dependency.
IMHO datamash(1) and decorate(1) should not require GSL.

Thanks for your thoughts! I now have clearer direction: attempt to get 
`configure` to detect whether gsl-devel is installed; and/or add a new flag to 
the Makefile to enable/disable GSL support.

The current implementation could remain more-or-less as-is for people who don't 
have GSL.

~ Tim



reply via email to

[Prev in Thread] Current Thread [Next in Thread]