bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#61394: 30.0.50; [PATCH] Image-dired thumb name based on content


From: Manuel Giraud
Subject: bug#61394: 30.0.50; [PATCH] Image-dired thumb name based on content
Date: Sat, 11 Feb 2023 13:30:48 +0100
User-agent: Gnus/5.13 (Gnus v5.13)

Eli Zaretskii <eliz@gnu.org> writes:

>> Cc: 61394@debbugs.gnu.org
>> Date: Fri, 10 Feb 2023 19:46:02 +0100
>> From:  Manuel Giraud via "Bug reports for GNU Emacs,
>>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
>> 
>> +(defun image-dired-content-sha1 (filename)
>> +  "Compute the SHA-1 of a part of FILENAME."
>
> Not "part of FILENAME", but "the first 4KiB of FILENAME's contents".

Yes, I'll fix that.

> Btw, using only the first 4KiB would mean a collision is still
> possible, albeit rarely, right?  So your use case of having all the
> thumbnails in the same directory could sometimes fail, right?

The 4KiB was "quite large but not so much" guess.  I've made tests with
the following code:

--8<---------------cut here---------------start------------->8---
(defun sha1-test (filename size)
  (with-temp-buffer
    (insert-file-contents-literally filename nil 0 size)
    (sha1 (current-buffer))))

;; From 1KiB to 64KiB
(list
 (benchmark-run-compiled 1000
   (sha1-test "/tmp/a-5MiB-photo.jpg" (expt 2 10)))
 (benchmark-run-compiled 1000
   (sha1-test "/tmp/a-5MiB-photo.jpg" (expt 2 11)))
 (benchmark-run-compiled 1000
   (sha1-test "/tmp/a-5MiB-photo.jpg" (expt 2 12)))
 (benchmark-run-compiled 1000
   (sha1-test "/tmp/a-5MiB-photo.jpg" (expt 2 13)))
 (benchmark-run-compiled 1000
   (sha1-test "/tmp/a-5MiB-photo.jpg" (expt 2 14)))
 (benchmark-run-compiled 1000
   (sha1-test "/tmp/a-5MiB-photo.jpg" (expt 2 15)))
 (benchmark-run-compiled 1000
   (sha1-test "/tmp/a-5MiB-photo.jpg" (expt 2 16))))
--8<---------------cut here---------------end--------------->8---

And here are the results on my machine:
((0.664336771 1 0.14466495299998883)
 (0.707937024 2 0.28811983400001395)
 (0.940229304 3 0.44037704100000497) ;; <- 4KiB
 (1.672118528 4 0.7672738199999856)
 (2.6194289370000003 6 1.046699996000001)
 (3.169999951 11 1.5916382949999957)
 (6.547043287 21 3.195145416999992))

So this 4KiB seems practical: about 1 second for one thousand run.
WDYT?

About collision, my wild guess here is that, as we are considering
images, most of the modifications on these images we'll have an impact
on those first 4KiB anyway.  But you're that collision is still possible
and the thumb could be wrong.  I'll try to find out what is the
probability of a SHA-1 collision on 4KiB of data.

>> +  (with-temp-buffer
>> +    (insert-file-contents filename nil 0 4096)
>
> Please use insert-file-contents-literally here.  It should be much
> faster, and we only care about the file's bytestream anyway.

Thanks, I'll do that too.

>>  (defun image-dired-thumb-name (file)
>>    "Return absolute file name for thumbnail FILE.
>>  Depending on the value of `image-dired-thumbnail-storage', the
>>  file name of the thumbnail will vary:
>> -- For `use-image-dired-dir', make a SHA1-hash of the image file's
>> -  directory name and add that to make the thumbnail file name
>> -  unique.
>> +- For `image-dired', make a SHA1-hash of some of the image file.
>>  - For `per-directory' storage, just add a subdirectory.
>>  - For `standard' storage, produce the file name according to the
>>    Thumbnail Managing Standard.  Among other things, an MD5-hash
>
> This doc string "needs work".  Could you please fix it as part of the
> patch, even though most of the problems are not due to this patch?  In
> any case, please either say here that only the first 4KiB of the file's
> contents are SHA1-hashed or include a link to the new function.

You're right it is even not complete for `per-directory'.  I could try
to come up with a fix.  Thanks.
-- 
Manuel Giraud





reply via email to

[Prev in Thread] Current Thread [Next in Thread]