guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Concerns/questions around Software Heritage Archive


From: Maxim Cournoyer
Subject: Re: Concerns/questions around Software Heritage Archive
Date: Thu, 09 May 2024 12:00:10 -0400
User-agent: Gnus/5.13 (Gnus v5.13)

Hi Ian, Ludovic.

Ludovic Courtès <ludo@gnu.org> writes:

> Hi Ian,
>
> Ian Eure <ian@retrospec.tv> skribis:
>
>> Summarizing the situation:
>>
>> - SHF has an opaque, difficult, and undocumented process for
>>   handling name changes.  I’s like to stress again that this is
>>   *not* strictly a transgender issue (though it likely affects   them
>>   more, or in worse/different ways) -- it is a human respect   issue.
>>   Many, many more cisgender people change their name than
>>   transgender people.
>
> It is also not strictly an SWH issue: how does Internet Archive handle
> name changes?  What about append-only storage in general?  We’ve
> discussed this already.

>> - SHF gave their archive to HuggingFace, an "AI" company which is
>>   generating derived works with no attribution or provenance, in
>>   ways which violate the both licenses of the projects used to   train
>>  their model, and the SHF principles for LLMs.
>
> [...]
>
>> - Has Guix reached out to SHF to express these concerns / get a
>>   response?
>
> I’ve seen and participated in informal discussions, but that’s all I
> know.  Maintainers?

We haven't.  Given some improvements were apparently already made by SWF
in response to concerns raised, it seems the dialogue should continue.

>> - Whether a public or private response, what would Guix consider   to
>>  be an acceptable response?  An unacceptable respoinse?
>> - How long is Guix willing to wait for a response?
>
> Free software people, myself included, have expressed disappointment
> regarding the use of code harvested by SWH for HuggingFace’s training.
> Stefano Zacchiroli of SWH responded to these concerns on Mastodon back
> in March, as you probably saw.
>
> One important point is that copyleft code is excluded from the training
> dataset; I was able to anecdotally check that for GPL code such as Guix
> using their interface (there was a thread on Mastodon but I can’t find
> it): <https://huggingface.co/spaces/bigcode/in-the-stack>.  That
> addresses my main concern.
>
> Remaining concerns include the weak wording of the principles put
> forward by SWH in its statement on LLMs:
> <https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code/>.
> I think this is something worth discussing further with them (it’s
> already been brought up notably on Mastodon).  It’s not clear to me
> whether this is a task for Guix as a project.

I don't think it is a task for Guix specifically, but rather for all
users of SWH or interested parties.

-- 
Thanks,
Maxim



reply via email to

[Prev in Thread] Current Thread [Next in Thread]