[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: File search
From: |
Ryan Prior |
Subject: |
Re: File search |
Date: |
Tue, 25 Jan 2022 23:45:35 +0000 |
On Friday, January 21st, 2022 at 9:03 AM, Ludovic Courtès <ludo@gnu.org> wrote:
> The database for 18K packages is quite big:
>
> --8<---------------cut here---------------start------------->8---
>
> $ du -h /tmp/db*
>
> 389M /tmp/db
>
> 82M /tmp/db.gz
>
> 61M /tmp/db.zst
>
> --8<---------------cut here---------------end--------------->8---
> [snip]
> In terms of privacy, I think it’s better if we can avoid making
> one request per file searched for. Off-line operation would be
> sweet, and it comes with responsiveness; fast off-line search is
> necessary for things like ‘command-not-found’ (where the shell
> tells you what package to install when a command is not found).
Offline operation is crucial, and I don't think it's desirable to download tens
or hundreds of megabytes. What about creating & distributing a bloom filter per
package, with members being file names? This would allow us to dramatically
reduce the size of data we distribute, at the cost of not giving 100% reliable
answers. We've established, though, that some information is better than none,
and the uncertainty can be resolved by querying a web service or building the
package locally and searching its directory.
- Re: File search, (continued)
Re: File search, raingloom, 2022/01/22
Re: File search,
Ryan Prior <=