Hello,
I’m following up on this since discussion since it’s been a
month and
I haven’t heard any updates.
Summarizing the situation:
- SHF has an opaque, difficult, and undocumented process for
handling name changes. I’s like to stress again that this is
*not* strictly a transgender issue (though it likely affects
them
more, or in worse/different ways) -- it is a human respect
issue.
Many, many more cisgender people change their name than
transgender people.
- SHF gave their archive to HuggingFace, an "AI" company which
is
generating derived works with no attribution or provenance, in
ways which violate the both licenses of the projects used to
train
their model, and the SHF principles for LLMs.
- HuggingFace wasn’t respecting requests to opt-out of their
model.
On the first point, it sounds like SHF has made concrete
progress to
improve[1], which is very good to hear. If SHF continues on
this
course, I think the concern is resolved.
On the third point, HuggingFace has begun honoring opt-out
requests,
but is still very far behind. Also, they don’t remove code from
the
older versions of their model -- it remains there forever. This
is
progress, but still, not great.
On the second point, I have not seen any public statements
indicating
that either SHF or HuggingFace even acknowledges the problem.
SHF’s
most recent newsletter[2], published in April 2024 (after these
concerns came to light), continues to tout that StarCoder2 is
"the
first AI model aligned with our principles," which appears to be
false. StarCoder2 includes both licensed and unlicensed code,
and
HuggingFace’s own StarChat2 playground produces works derivative
of
this code, with no attribution or licensing information. There
is
also no statement or position on the SHF news blog. Nor hsa
HuggingFace either fixed their tools, or made a statement. This
is
still very much a live concern.
I have a few questions:
- Has Guix reached out to SHF to express these concerns / get a
response?
- Whether a public or private response, what would Guix consider
to
be an acceptable response? An unacceptable respoinse?
- How long is Guix willing to wait for a response?
Thanks,
— Ian
[1]:
https://cohost.org/arborelia/post/5273879-they-are-fixing-some
[2]:
https://www.softwareheritage.org/wp-content/uploads/2024/04/Software-Heritage-2024-Vision-Milestones-Newsletter.pdf
Ian Eure <ian@retrospec.tv> writes:
Hi Guixy people,
I’d never heard of SWH before I started hacking on Guix last
fall,
and
it struck me as rather a good idea. However, I’ve seen some
things
lately which have soured me on them.
They appear to be using the archive to build LLMs:
https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
I was also distressed to see how poorly they treated a
developer who
wished to update their name:
https://cohost.org/arborelia/post/4968198-the-software-heritag
https://cohost.org/arborelia/post/5052044-the-software-heritag
GPL’d software I’ve created has been packaged for Guix, which I
assume
means it’s been included in SWH. While I’m dealing with their
(IMO:
unethical) opt-out process, I likely also need to stop new
copies
from
being uploaded again in the future.
Is there a way to indicate, in a Guix package, that it should
*never*
be included in SWH?
Is there a way to tell Guix to never download source from SWH?
I want absolutely nothing to do with them.
Thanks,
— Ian