guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Concerns/questions around Software Heritage Archive


From: Ian Eure
Subject: Re: Concerns/questions around Software Heritage Archive
Date: Wed, 01 May 2024 08:29:29 -0700
User-agent: mu4e 1.12.2; emacs 29.3

Hello Guixers,

It’s been another week with no response or movement on this. I’m disappointed that this situation seems to be getting treated so lightly. Adhering to the terms of software licenses is fundamental to the operation of the free software ecosystem; there is no software freedom without it. It’s surprising that a pretty clear-cut situation of creating derivative works of free software in violation of their licenses would be shrugged off so easily.

Whatever the Guix organization’s position is, I’m reaching my personal limit, and need to see some kind of positive movement on this[1]. If Guix is going to continue to facilitate license violations, I will have no choice but to remove my software from it to defend them.

 — Ian

[1]: Personally, I would be satisfied with a per-package setting which disables scheduling source for archiving by SWH. Seeing this, or a committment to build this within a reasonable timeframe, would allay my concerns.

Ian Eure <ian@retrospec.tv> writes:

Hello,

I’m following up on this since discussion since it’s been a month and
I haven’t heard any updates.

Summarizing the situation:

- SHF has an opaque, difficult, and undocumented process for
  handling name changes.  I’s like to stress again that this is
*not* strictly a transgender issue (though it likely affects them more, or in worse/different ways) -- it is a human respect issue.
  Many, many more cisgender people change their name than
  transgender people.

- SHF gave their archive to HuggingFace, an "AI" company which is
  generating derived works with no attribution or provenance, in
ways which violate the both licenses of the projects used to train
 their model, and the SHF principles for LLMs.

- HuggingFace wasn’t respecting requests to opt-out of their model.


On the first point, it sounds like SHF has made concrete progress to improve[1], which is very good to hear. If SHF continues on this
course, I think the concern is resolved.

On the third point, HuggingFace has begun honoring opt-out requests, but is still very far behind. Also, they don’t remove code from the older versions of their model -- it remains there forever. This is
progress, but still, not great.

On the second point, I have not seen any public statements indicating that either SHF or HuggingFace even acknowledges the problem. SHF’s
most recent newsletter[2], published in April 2024 (after these
concerns came to light), continues to tout that StarCoder2 is "the
first AI model aligned with our principles," which appears to be
false. StarCoder2 includes both licensed and unlicensed code, and HuggingFace’s own StarChat2 playground produces works derivative of this code, with no attribution or licensing information. There is
also no statement or position on the SHF news blog.  Nor hsa
HuggingFace either fixed their tools, or made a statement. This is
still very much a live concern.

I have a few questions:

- Has Guix reached out to SHF to express these concerns / get a
  response?
- Whether a public or private response, what would Guix consider to
 be an acceptable response?  An unacceptable respoinse?
- How long is Guix willing to wait for a response?

Thanks,

 — Ian

[1]: https://cohost.org/arborelia/post/5273879-they-are-fixing-some
[2]:
https://www.softwareheritage.org/wp-content/uploads/2024/04/Software-Heritage-2024-Vision-Milestones-Newsletter.pdf

Ian Eure <ian@retrospec.tv> writes:

Hi Guixy people,

I’d never heard of SWH before I started hacking on Guix last fall,
and
it struck me as rather a good idea. However, I’ve seen some things
lately which have soured me on them.

They appear to be using the archive to build LLMs:
https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/

I was also distressed to see how poorly they treated a developer who
wished to update their name:
https://cohost.org/arborelia/post/4968198-the-software-heritag
https://cohost.org/arborelia/post/5052044-the-software-heritag

GPL’d software I’ve created has been packaged for Guix, which I
assume
means it’s been included in SWH. While I’m dealing with their (IMO: unethical) opt-out process, I likely also need to stop new copies
from
being uploaded again in the future.

Is there a way to indicate, in a Guix package, that it should
*never*
be included in SWH?

Is there a way to tell Guix to never download source from SWH?

I want absolutely nothing to do with them.

Thanks,

 — Ian





reply via email to

[Prev in Thread] Current Thread [Next in Thread]